Sorry, you need to enable JavaScript to visit this website.

Assessing AI Storage Communication Performance At Scale

Submitted by Anonymous (not verified) on

How do we assess the performance of AI network and storage infrastructure that is critical to the successful deployment of today's complex AI training and inferencing engines? And is it possible to do this without needing to provision racks of expensive GPU Capex? This presentation discusses methodologies and considerations in performing such assessments. We look at different topologies, host and network side considerations and metrics. The performance aspects of NICs/SmartNICs, storage offload processing, switches and interconnects are examined.

Advancing the AI Factory Sustainability

Submitted by Anonymous (not verified) on
ChatGPT began AI's watershed moment that triggered IT infrastructure's tectonic shift and race in extraordinary and lasting commitments to AI Factory. Many governments and enterprises alike are making enormous capital and people investments to not be left behind the AI boom. Corporate boardrooms are evaluating purposeful infrastructure plans. What is the best architectural decision - retrofitting, built from scratch or adopt a wait-and-see? This fork in the road has given pause and decision paralysis to some infrastructure decision makers.

Simulating CXL.mem for Fun and Profit

Submitted by Anonymous (not verified) on
CXL.mem enables hosts to expand their memories beyond individual servers and access memory regions using load and store instructions. In addition, CXL.mem enables memory sharing among its endpoints. Realizing memory sharing requires extending the coherency management protocol beyond individual hosts. Hosts and devices need to track the state of each memory region using individual finite state machines.

Highly Scalable, Masterless, Distributed Filesystem at Rubrik

Submitted by Anonymous (not verified) on
Rubrik is a cybersecurity company protecting mission critical data for thousands of customers across the globe including banks, hospitals, and government agencies. SDFS is the filesystem that powers the data path and makes this possible. In this talk, we will discuss challenges in building a masterless distributed filesystem with support for data resilience, strong data integrity, and high performance which can run across a wide spectrum of hardware configurations including cloud platforms.

Cloud Storage Considerations for Retrieval Augmented Generation (RAG) in AI Applications

Submitted by Anonymous (not verified) on
Data enhances foundational LLMs (e.g. GPT-4, Mistral Large and Llama 2) for context-aware outputs. In this session, we'll cover using unstructured, multi-modal data (e.g. PDFs, images or videos) in retrieval augmented generation (RAG) systems and learn about how cloud object storage can be an ideal file system for LLM-based applications that transform and use of domain-specific data, store user context and much more.

Disrupting the GPU Hegemony: Can Smart Memory and Storage Redefine AI Infrastructure

Submitted by Anonymous (not verified) on

AI infrastructure is dominated by GPUs — but should it be? As foundational model inference scales, performance bottlenecks are shifting away from compute and toward memory and I/O. HBM sits underutilized, KVCache explodes, and model transfer times dominate pipeline latency. Meanwhile, compression, CXL fabrics, computational memory, and SmartNIC-enabled storage are emerging as powerful levers to close the tokens-per-second-per-watt gap.

Rethinking Storage for the AI/ML Era: Disaggregation Powered with FDP

Submitted by Anonymous (not verified) on
Generative AI models, such as Stable Diffusion, have revolutionized the field of AI by enabling the generation of images from textual prompts. These models impose significant computational, and storage demands in HPC environments. The I/O workload generated during image generation is a critical factor affecting overall performance and scalability. This paper presents a detailed analysis of the I/O workload generated by Stable Diffusion when accessing storage devices, specifically NVMe-oF drives.

AI Driven Mass-Storage Evolution

Submitted by Anonymous (not verified) on
The extreme growth in modern AI-model training datasets, as well as the explosion of Gen-AI data output are both fueling unprecedented levels of data-storage capacity growth in the datacenters. Such rapid growth in mass-capacity is demanding evolutionary steps in foundational storage technologies to enable higher areal density, optimized data-access interface methodologies and highly efficiency power/cooling infrastructure. We will explore these evolutionary technologies and take a sneak peek at the future of mass data-storage in the AI datacenters.

SMR and HAMR Advancing HDD Areal Density

Submitted by Anonymous (not verified) on

This talk reflects on 18 years of SMR evolution —covering physical layouts, filesystems, garbage collection algorithms, device drivers, and simulators.  Furthermore, the talk will also discuss how SMR disks integrated with data storage solutions like RAID and deduplication, including real-world use cases of SMR disks by hyperscalers.

We will also discuss how SMR and HAMR technology interact in the context of AI workloads to provide intriguing new possibilities for HDDs.

OFA Sunfish: New Applications for Distributed Storage with SNIA Swordfish®

Submitted by Anonymous (not verified) on
The OpenFabrics Alliance (OFA), together with its partners, DMTF, SNIA, and the CXL Consortium, are continuing development of Sunfish, an open-source composable computing system framework, to provide a unified set of tools to control and monitor both computing resources and multiple network fabric types. The Sunfish workgroup has demonstrated management of disaggregated memory systems based on CXL and compute accelerators such as GPUs. We are now expanding our scope to encompass the management of scalable fabric attached storage resources with SNIA Swordfish®.
Subscribe to