Why does NVMe need to evolve for efficient storage access from GPUs?
Abstract
With its introduction in 2011, the NVMe storage protocol has allowed CPUs to handle more data with less latency. This in turn has significantly improved the CPU's ability to manage parallel tasks with multiple queues while improving CPU utilization rates. More recently, the growing relevance of GPUs in AI training and inference has led to innovations that illustrate enabling NVMe storage access directly from GPUs. In this presentation we discuss some challenges with doing this efficiently. We compare CPUs and GPUs in how they execute code and how they access IO. We illustrate key bottlenecks with the current NVMe IO protocol that will need addressing to improve storage access from GPUs. While the NVMe protocol was designed with latency-sensitive CPUs in mind, GPUs are high compute, parallel execution engines that are more latency tolerant. We conclude with a call to action for areas of improvements in the NVMe standard to address these next set of challenges from GPUs for storage.