Use an Intelligent SSD to Accelerate Machine Learning | SNIA

Abstract

As accelerators, including latest GPU architectures and TPUs, significantly shrinking the kernel computation time, the bottleneck in machine learning (ML) applications is shifting to data movement/preparation. Though in-storage/near-data processing model that leverages intelligence closer to data storage can fundamentally eliminate the demand of moving data, offloading all compute kernels to these processors is unrealistic given their capabilities and the budgets of building the device. It is essential to investigate the best principles of using intelligent storage devices for popular ML applications. This project presents ML-SSD, an intelligent storage device that (1) adjusts data resolutions, and (2) shufflies datasets for ML applications. The proposed SSD architecture simply requires extensions to NVMe protocol and minor modifications to the kernel driver and system library. By providing a few operators that help to reduce the data size, ML-SSD can reduce the data input time by 36%. With the shuffle operation in the NVMe protocol stack, ML-SSD can further reduce the total amount of time by 24%.