SNIA Developer

Attend

Why attend

Pricing/register

Hotel and venue

Agenda

Conference schedule

Full Conference Agenda

Speakers

Special Events/Plugfests

Call for presentations

Sponsorship Opportunities

Plugfests/Special Events

SMB3 Plugfest

Cloud Object Storage Plugfest

Present a Birds of a Feather Session

STA Storage Plugfest

Data-Intensive Inference Done Better: Scaling Models and RAG in Limited Memory with SSD Offload

San Tomas + Lawrence

Wed Sep 17 | 1:30pm

Abstract

Enterprises are rushing to adopt AI inference solutions with RAG to solve business problems, but enthusiasm for the technology's potential is outpacing infrastructure readiness. It quickly becomes prohibitively expensive or even impossible to use more complex models and bigger RAG data sets due to the cost of memory. Using open-source software components and high-performance NVMe SSDs, we explore two different but related approaches for solving these challenges and unlocking new levels of scale: offloading model weights to storage using DeepSpeed, and offloading RAG data to storage using DiskANN. By combining these, we can achieve (a) more complex models running on GPUs that it was previously impossible to use, and (b) greater cost efficiency when using large amounts of RAG data. We'll talk through the approach, share benchmarking results, and show a demo of how the solution works in an example use case.

Learning Objectives

The opportunities and challenges associated with AI inference with RAG The solution stack that enables offload of significant amounts of AI data from memory to SSD The impact of SSD offload to DRAM usage, QPS, index time, and recall The results of the SSD offload approach in an example use case (traffic video)

Ace Stryker

Director of Market Development,

Solidigm

Download PDF

Rate this Session

Description

Related Sessions

Emerging Technologies

The Information Bank: A Proposal for the Next Class of Storage Solving 50 Year Old Problems Managing Unstructured Data

Computers are pretty stupid. They have no idea what they are storing. Surprisingly, this is the root cause of 100’s of computing problems costing the world $Billions every year. We will show that just the ""Top 10"" issues should justify action.

As a simple example, a contract is more than a Word document. What kind of contract is it? Who and what is is for? Has everyone approved it? Where did you get the template file from? Does an outside counsel have to look at it? Did it go to the customer? Wouldn't you like the contract to come get you just before it expires? And more! This is just one of thousands of unstructured data processes people struggle with every day.

To attack this challenge, we suggest a two pronged approach. The first is defining unstructured data that matches the way end users create, reference, communicate, locate, and share Information. Second, we propose the creation of a new class of storage specifically designed to understand this new definition and to be able to handle the dynamics of that information. An event-based architecture will be presented based upon both information and technology event types that drive the information's lifecycle.

We believe this new class of storage has significant benefits and capabilities and could be a fertile area for innovation across the industry. However, these benefits can only be realized through cooperation in the creation of definitions, requirements, protocols, and standards from members of our industry association. Let's get started!

Bruce Thompson

CEO,

In2Bank

Emerging Technologies

Media Objectives for Next-Generation Video Games

The consumer video game software industry is on the cusp of its 50th year, and its revenues continue to grow steadily, in recent years outpacing even those of motion pictures. The role of data storage media is uniquely central to the video game industry, as it is ultimately the canvas upon which game developers paint. Despite the convenience of downloadable content via the Internet, strong demand continues for games on physical storage media, especially from major publishers.

Games from major publishers also require more storage than ever, not only for their distribution, but also for logging data about game play, from high score lists to saved session information and sharing playback videos. But the growth in the industry’s distribution media has slowed, from more than 1000 times over three generations of semiconductor ROM cartridges to hardly more than 150 times over five generations of optical discs. So what’s next?

This session reprises a popular 2019 SDC presentation surveying the significant roles and evolving types of digital storage media that enable this interactive digital art form, updated to include the latest generation of video game systems.

John Carlsen

Principal,

Syncopated Systems

Emerging Technologies

OCP Storage Project Update

This is an update on the activities in the OCP Storage Project.

Ross Stenfort

Hardware Systems Engineer,

Data-Intensive Inference Done Better: Scaling Models and RAG in Limited Memory with SSD Offload

Ace Stryker

Director of Market Development,

Solidigm

Emerging Technologies

Chiplets, UCIe, Persistent Memory, and Heterogeneous Integration: The Processor Chip of the Future!

Chiplets have become a near-overnight success with today’s rapid-fire data center conversion to AI. But today’s integration of HBM DRAM with multiple SOC chiplets is only the very beginning of a larger trend in which multiple incompatible technologies will adopt heterogeneous integration to connect new memory technologies with advanced logic chips to provide both significant energy savings and vastly-improved performance at a reduced price point. In this presentation analysts Tom Coughlin and Jim Handy will explain how memory technologies like MRAM, ReRAM, FRAM, and even PCM will eventually displace the DRAM HBM stacks used with xPUs, on-chip NOR flash and SRAM, and even NAND flash in many applications. They will explain how DRAM’s refresh mechanism and NAND and NOR flash’s energy-hogging writes will give way to much cooler memories that will be easier to integrate within the processor’s package, how processor die sizes will dramatically shrink through the use of new memory technologies to replace on-chip NOR and SRAM, and how the UCIe interface will allow these memories to compete to bring down overall costs. They will also show how the approach will not only reduce the purchase price per teraflop, but also how the energy costs per teraflop will also improve.

SNIA Developer Conference September 28-30, 2026

SDC 2026 is brought to you by SNIA. SNIA is an industry association committed to its mission of worldwide leadership developing and promoting architectures, standards, education and vendor-neutral collaboration.