| SNIA | Experts on Data

SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA

Cypress

Tue Sep 17 | 2:00pm

Abstract

With the increased business value that AI enabled applications can unlock, there is a need to support Gen AI models at varying degrees of scale - from foundational model training in data centers to inference deployment on edge and mobile devices. Flash Storage and PCIe/NVMe storage in particular, can play an important role in enabling this with their density and cost benefits. Enabling NVMe offload for Gen AI requires a combination of careful ML model design and its effective deployment on a memory-flash storage tier. Using inference as an example, with the Microsoft Deep Speed library, we highlight the benefits of NVMe offload and call out specific optimizations and improvements that NVMe storage can target to demonstrate improved LLM inference metrics

Recognize the need to democratize training and inference at scale
Understand what does enabling NVMe offload of LLMs require
Be aware of opportunities for NVMe flash to enable improved LLM inference performance

Suresh Rajgopal

SSD Systems Architect - DMTS,

Micron Technology

Sujit Somandepalli

Principal Storage Solutions Engineer

Micron Technology

Katya Giannios

Machine Learning Engineer

Micron Technology

Download PDF

What can Storage do for AI?

Abstract

Learning Objectives