| SNIA | Experts on Data

SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA

Abstract

Training state-of-the-art AI models, including LLMs, creates unprecedented demands on storage systems that go far beyond simple throughput. The I/O patterns in these workloads—characterized by heavy metadata operations, multi-threaded asynchronous I/O, random access, and complex data formats—present a significant bottleneck that traditional benchmarks fail to capture. This disconnect leads to inefficient storage design and procurement for critical AI infrastructure. To address this challenge, the MLPerf Storage Working Group has been developing a comprehensive benchmark suite to realistically model these complex I/O behaviors. In this session, we present a deep dive into this effort, focusing on the DLIO benchmark. We will detail the technical lessons learned from our benchmark development and previous submission cycles, including the critical I/O access patterns we identified in training pipelines like data loading and model checkpointing. Attendees will leave with actionable insights on how to better design and configure storage hardware and software stacks to support AI workloads. We will share our analysis of I/O behavior that directly informs system architecture and demonstrate how to leverage our open-source tools to identify and resolve storage bottlenecks in your own AI environments.

Identify I/O Bottlenecks: Attendees will be able to identify the specific I/O bottlenecks in AI training workloads that traditional storage benchmarks overlook, such as metadata contention and asynchronous I/O patterns. Evaluate Storage for AI: Attendees will learn how to use the DLIO benchmark to realistically evaluate and compare storage system performance for AI and LLM workloads. Optimize System Design: Attendees will understand how to apply insights from MLPerf Storage results to better design and configure storage hardware and software stacks for AI training clusters.

Huihuo Zheng

Computer Scientist

Argonne National Laboratory

Beyond Throughput: Benchmarking Storage for the Complex I/O Patterns of AI with MLPerf Storage and DLIO

Abstract

Learning Objectives

Beyond Throughput: Benchmarking Storage for the Complex I/O Patterns of AI with MLPerf Storage and DLIO

Abstract

Learning Objectives

Enhancing Defect Triaging in Storage Systems Using Generative AI from Integration Test-Based Knowledge Graphs

AI Driven Mass-Storage Evolution

Gen6 is coming, but what is Needed from NV Storage?

Assessing AI Storage Communication Performance At Scale

Storage Implications for the New Generation of AI Applications

Small Granularity Graph Neural Network Training and the Future of Storage