SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA
Flash storage devices are essential components inside AI compute nodes, and of course also in the external storage tiers. Storage is however the slowest component in AI data path when compared to compute and memory. Training and inference workloads impose different requirements on storage devices inside the AI data center. AI is driving the adoption of next generation interfaces for storage devices in the AI data center. To meet the performance requirements for AI workloads, optimizations in following areas need to be considered for flash storage: NAND media, SSD controller, host interfaces and form factors.
The rapid advancement of AI is significantly increasing demands on compute, memory and the storage infrastructure. As NVMe storage evolves to meet these needs, it is experiencing a bifurcation in requirements. On one end, workloads such as model training, checkpointing, and key-value (KV) cache tiering are driving the need for line-rate saturating SSDs with near-GPU and HPC attachment. On the other end, the rise of multi-stage inference, synthetic data generation, and post-training optimization is fueling demand for dense, high-capacity disaggregated storage solutions—effectively displacing traditional rotating media in the nearline tier of the datacenter. This paper explores the architectural considerations across both ends of this spectrum, including Gen6 performance, indirection unit (IU) selection, power monitoring for energy efficiency, liquid cooled thermal design, and strategies for enabling high capacity through form factor and packaging choices. We demonstrate how thoughtful design decisions can unlock the full potential of storage systems in addressing the evolving challenges of AI workloads.
Generative AI models, such as Stable Diffusion, have revolutionized the field of AI by enabling the generation of images from textual prompts. These models impose significant computational, and storage demands in HPC environments. The I/O workload generated during image generation is a critical factor affecting overall performance and scalability. This paper presents a detailed analysis of the I/O workload generated by Stable Diffusion when accessing storage devices, specifically NVMe-oF drives. The study explores various I/O patterns, including read and write operations, latency, throughput, bandwidth, LBA mappings, WAF (Write Amplification Factor) influence the performance of generative AI workloads. The IO pattern shows that how it affects the WAF of the SSD when multiple user requests. By using containerized Stable Diffusion deployed on FDP(Flexible Data Placement) enabled environment as a case study, we investigate how different storage configurations affects the efficiency of image generation and reduce the WAF for individual and concurrent user requests. We have developed a tool that provides insights into I/O activity on storage devices. It provides the graphical view of logical block address (LBA) mapping of I/O hits, block size and a granular view of data access patterns. This enables in-depth I/O analysis, helps identify performance bottlenecks, uncovers latency patterns, and supports optimization across the hardware and software stack.
Current SSD devices are mostly built with a 4KiB transaction unit, or even larger for bigger drive capacities. But what if your workload has high IOPs demands at smaller granularities? We will take a deep dive into our GNN testing using NVIDIA BaM and the modifications that we made to test smaller than 4K transactions. We will also discuss how this workload is a good example of the need for Storage Next.
The rate of change in the structure and capabilities of applications has never been as high as in the last year. There's a huge shift from stockpiling data cheaply to leveraging data to create insight with GenAI and to capitalize on business leads with predictive AI. Excitement and opinions about where storage matters run rampant. Thankfully, we can "follow the data" to pinpoint whether storage performance is critical in the compute node or just in the back end, discern the relative importance of bandwidth and latency, determine whether the volume and granularity of accesses is suitable for a GPU, and what the range of granularities of accesses are. Walking through recent developments in AI apps and their implications will lead to insights that are likely to surprise the audience. There are new opportunities to create innovative solutions to these challenges. The architectures of NAND and their controllers may adjust to curtail ballooning power with more efficient data transfers and error checking. IOPs optimizations that will be broadly mandatory in the future may be pulled in to benefit some applications now. New hardware/software codesigns may lead to protocol changes, and trade-offs in which computing agents and data structures are best suited to accomplish new goals. Novel software interfaces and infrastructure enable movement, access, and management of data that is tailored to the specific needs of each application. Come join in a fun, refreshing, provocative, and interactive session on storage implications for this new generation of AI applications!
How do we assess the performance of AI network and storage infrastructure that is critical to the successful deployment of today's complex AI training and inferencing engines? And is it possible to do this without needing to provision racks of expensive GPU Capex? This presentation discusses methodologies and considerations in performing such assessments. We look at different topologies, host and network side considerations and metrics. The performance aspects of NICs/SmartNICs, storage offload processing, switches and interconnects are examined. Benchmarking of AI collective communications with RoCE transport are considered along with the overall impact on training convergence time and network utilization. The operational aspect of commercial networks includes proxies, encapsulations, connection scale and encryption. We discuss their impact on AI training and inferencing.
The extreme growth in modern AI-model training datasets, as well as the explosion of Gen-AI data output are both fueling unprecedented levels of data-storage capacity growth in the datacenters. Such rapid growth in mass-capacity is demanding evolutionary steps in foundational storage technologies to enable higher areal density, optimized data-access interface methodologies and highly efficiency power/cooling infrastructure. We will explore these evolutionary technologies and take a sneak peek at the future of mass data-storage in the AI datacenters.