SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA
Salon V
Mon Sep 18 | 1:30pm
Emerging Deep Learning/ Machine learning and cloud native Applications at data center scale demand terabytes of data flowing across the storage/ memory hierarchy, straining interconnect bandwidth and component capacities. The Industry has responded with a wide range of solutions like process node shrink, higher capacity devices, new tiers, innovative form factors, new interconnect technologies and fabrics, new types of compute architectures, new algorithms and more to creatively leverage storage/memory tiering.
New paradigms like Computational storage/ memory accelerator offloads are under intense exploration to process data where it resides to ease movement of exponentially generated data. At the same time, progress has hit the proverbial wall: practical hurdles limit the scalability at every level of the memory hierarchy. On-die SRAM scaling seems to have completely stalled going from 5nm to 3nm, limiting processor IPC (Instructions per cycle) performance. Main Memory bandwidth per processor core growth slowed dramatically compared to the growth of compute "FLOPs". New memory tiers like CXL memory dramatically increase capacity per core, but at the expense of latency and the need for all new infrastructure. QLC SSDs provide Terabytes of capacity in a single device, but are limited by endurance and overprovisioning requirements. Staying within established power, thermal and cost budgets at each level of the hierarchy and at the system envelope level is critical to ease new technology introductions.
To address these challenges, Data Center customers, component manufacturers and researchers alike are investigating or have implemented several innovations, like lossless compression technology, at various levels of the hierarchy, to increase capacity, enhance effective bandwidth and stay within cost and power budgets. Compression requires more than just algorithmic implementation...compaction, management, software compatibility are critical considerations in order to be widely deployable at scale.
One size does not fit all: choices need to be made between various industry standard and proprietary algorithms, operating at varying granularities: cache line, page or file. CXL memory semantic SSDs are emerging, compression technology requires integration with cxl.io, cxl.mem semantics, dynamic capacity has to be addressed. Offload accelerators are now available within several platform ingredients, but choices need to be made carefully between processor-integrated accelerators, cores on SmartNICs ("DPUs", "IPUs"), IP/firmware integrated into SSD and CXL controllers/ switches, "AFU" (Accelerator Functional Unit) on board specialized FPGAs and purely software offloads.
In this panel session, we will explore the need, opportunities, challenges and implications of emerging data compression techniques and accelerators associated with storage and memory technologies through diverse viewpoints of ecosystem participants, including an SOC Architect, technologists in the storage/memory device and controller space, Academic Researcher in the storage and systems domain as well as Hardware IP provider. We will simulate the type of discussion that typically takes place between technologists, architects and end customers to meet design and TCO requirements, requirements to integrate into existing kernel and application software stacks. Attendees will have an opportunity to ask questions of the panel and share their collective industry/ research insights.
The consumer video game software industry is on the cusp of its 50th year, and its revenues continue to grow steadily, in recent years outpacing even those of motion pictures. The role of data storage media is uniquely central to the video game industry, as it is ultimately the canvas upon which game developers paint. Despite the convenience of downloadable content via the Internet, strong demand continues for games on physical storage media, especially from major publishers.
Games from major publishers also require more storage than ever, not only for their distribution, but also for logging data about game play, from high score lists to saved session information and sharing playback videos. But the growth in the industry’s distribution media has slowed, from more than 1000 times over three generations of semiconductor ROM cartridges to hardly more than 150 times over five generations of optical discs. So what’s next?
This session reprises a popular 2019 SDC presentation surveying the significant roles and evolving types of digital storage media that enable this interactive digital art form, updated to include the latest generation of video game systems.
This is an update on the activities in the OCP Storage Project.
Enterprises are rushing to adopt AI inference solutions with RAG to solve business problems, but enthusiasm for the technology's potential is outpacing infrastructure readiness. It quickly becomes prohibitively expensive or even impossible to use more complex models and bigger RAG data sets due to the cost of memory. Using open-source software components and high-performance NVMe SSDs, we explore two different but related approaches for solving these challenges and unlocking new levels of scale: offloading model weights to storage using DeepSpeed, and offloading RAG data to storage using DiskANN. By combining these, we can achieve (a) more complex models running on GPUs that it was previously impossible to use, and (b) greater cost efficiency when using large amounts of RAG data. We'll talk through the approach, share benchmarking results, and show a demo of how the solution works in an example use case.
Chiplets have become a near-overnight success with today’s rapid-fire data center conversion to AI. But today’s integration of HBM DRAM with multiple SOC chiplets is only the very beginning of a larger trend in which multiple incompatible technologies will adopt heterogeneous integration to connect new memory technologies with advanced logic chips to provide both significant energy savings and vastly-improved performance at a reduced price point. In this presentation analysts Tom Coughlin and Jim Handy will explain how memory technologies like MRAM, ReRAM, FRAM, and even PCM will eventually displace the DRAM HBM stacks used with xPUs, on-chip NOR flash and SRAM, and even NAND flash in many applications. They will explain how DRAM’s refresh mechanism and NAND and NOR flash’s energy-hogging writes will give way to much cooler memories that will be easier to integrate within the processor’s package, how processor die sizes will dramatically shrink through the use of new memory technologies to replace on-chip NOR and SRAM, and how the UCIe interface will allow these memories to compete to bring down overall costs. They will also show how the approach will not only reduce the purchase price per teraflop, but also how the energy costs per teraflop will also improve.