SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA
Citigroup Inc. analysts quote, "Enterprise data is expected to continue to grow at over 40% CAGR as AI becomes an incremental driver for data creation, storage, and data management."
Today's AI ecosystem require fundamental shifts in the requirements of every datacenter infrastructure component. The predominant AI infrastructure strategy tends to currently focus on the most drastically impactful infrastructure components, as in GPUs, CPUs and Memory.
Unfortunately, this leaves a major gap in the detailed understanding of the various AI Storage Infrastructure usage models with regards to the various TCO optimized storage tiers and their requirements from a Capacity, Workloads and Performance.
Our presentation's focus is to inform and empower the audience of this session with key information and strategies that pertain specifically to the AI ecosystem impact on Mass Storage HDD Workloads, Performance, Capacity, Interface-Type and key advanced features that could potentially reshape the future of HDDs and their datacenter TCO optimization enablement.
Describe today's overall AI ecosystem strategy.
Define the current major AI infrastructure components and the role they play in the ecosystem.
Discuss the various types of AI usage models and their impact on the various storage tiers.
Understand the AI impact on HDD Mass Storage tier and how it must change and adapt.
Learn about the upcoming AI workload optimizations impacting HDD storage
The recent AI explosion is reshaping storage architectures in data centers, where GPU servers increasingly need to access vast amounts of data on network-attached disaggregated storage servers for more scalability and cost-effectiveness. However, conventional CPU-centric servers encounter critical performance and scalability challenges. First, the software mechanisms required to access remote storage over the network consume considerable CPU resources. Second, the datapath between GPU and storage nodes is not optimized, failing to fully leverage the high-speed network and interconnect bandwidth. To address such challenges, DPUs can offer acceleration of infrastructure functions, including networking, storage, and peer-to-peer communications with GPUs.
In this talk, AMD and MangoBoost will present the following:
• A tutorial on the trends in AI, such as Large Language Models (LLMs), larger datasets, storage-optimized AI frameworks, which drive demands for high-speed storage systems for GPUs.
• An overview of AMD’s GPU systems.
• A discussion on how DPUs can improve GPU systems efficiencies, specifically in accessing storage servers.
• Case studies of modern LLMs AI workloads on AMD MI300X GPU server using open-source AMD ROCm software, where MangoBoost DPU using GPU-storage-boost technology fully accelerate Ethernet-based storage server communications directly with GPUs using NVME-over-TCP and peer-to-peer communications, resulting in reduced CPU utilizations and improvements in performance and scalability.
Citigroup Inc. analysts quote, "Enterprise data is expected to continue to grow at over 40% CAGR as AI becomes an incremental driver for data creation, storage, and data management."
Today's AI ecosystem require fundamental shifts in the requirements of every datacenter infrastructure component. The predominant AI infrastructure strategy tends to currently focus on the most drastically impactful infrastructure components, as in GPUs, CPUs and Memory.
Unfortunately, this leaves a major gap in the detailed understanding of the various AI Storage Infrastructure usage models with regards to the various TCO optimized storage tiers and their requirements from a Capacity, Workloads and Performance.
Our presentation's focus is to inform and empower the audience of this session with key information and strategies that pertain specifically to the AI ecosystem impact on Mass Storage HDD Workloads, Performance, Capacity, Interface-Type and key advanced features that could potentially reshape the future of HDDs and their datacenter TCO optimization enablement.
Join us for an in-depth exploration of how Azure Blob Storage (Azure's object storage service) has innovated and scaled to meet the demands of supercomputer AI training efforts. Using OpenAI as a case study, we will dive into the internal architecture and new capabilities that enable blob storage to deliver tens of terabits per second (Tbps) of consistent throughput across a multi-exabyte (EiB) scale hierarchical namespace. Attendees will gain insights into workload patterns, best practices, and the implementation strategies that make such high performance possible. We will also cover the API and tooling improvements that allow storage users to tap into this performance seamlessly for AI training and HPC workloads.
This session is presented by the engineers that build blob storage at Microsoft and is a must-attend for developers and architects looking to understand and leverage the latest advancements in object storage technology to support AI and other high-scale workloads.
- Understanding the architecture and improvements enabling Blob Storage's scalability.
- Best practices for managing large-scale AI training workloads.
- Detailed insights into the "scaled accounts" feature and its implementation.
The SNIA TC AI Taskforce is working on a paper on AI workloads and the storage requirements for those workloads. This presentation will be an introduction to the key AI workloads with a description of how they use the data transports and storage systems. This is intended to be a foundational level presentation that will give participants a basic working knowledge of the subject.
HDDs have been the traditional hardware infrastructure for object stores like S3, Google Cloud and Azure Blob in data lakes .But as AI solution deployment transitions to production scale in organizations (Meta's Tectonic-Shift platform being a good example), it begins to impose demands on the data storage ingestion pipeline which have not been seen before. With Deep Learning Recommendation Model (DLRM) training as an AI use case, we first introduce the challenges object stores can expect to face as AI deployments scale. These include the growth in the scale of available data, the growth of faster training GPUs, and the growth in AI/ML ops deployment. We then explain how flash storage is well positioned to meet the needs of bandwidth and power that these systems require. We will share key observations from storage trace analysis of a few MLPerf DLRM preprocessing and training captures. We will conclude with a call to action for more work on standardizing benchmarks to characterize data ingestion performance and power efficiency.