SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA
Rapidly increasing data sizes, the high cost of data movement, and the advent of fast, NVMe-over-fabric based flash enclosures have led to the exploration of computation near flash for more efficient and economical storage solutions. Ordered key-value stores, commonly developed as software library code that runs inside application processes such as LevelDB and RocksDB, are one of many storage functions that can potentially benefit from offloaded processing. This is because traditional host managed key-value stores often exhibit long processing delays when their background worker threads cannot sort data as fast as a foreground writer application can write it, due to large amounts of data movement between the host and storage during the sort. Offloading key-value store computation to storage is interesting because it allows those data-intensive background tasks to be deferred and performed asynchronously on storage rather than on a host. This better hides background work latency and prevents it from blocking foreground writes. Offloaded key-value stores are interesting also because the key-value interface itself provides sufficient knowledge of data without requiring external metadata, leaving room for building more types of indexes such as secondary indexes and histograms. In this talk, we present KV-CSD, a research collaboration between SK hynix and Los Alamos National Lab (LANL) that explores the lab's next-generation performance-tier storage designs. A KV-CSD is a key-value based computational storage device consisting of a ZNS NVMe SSD and a System-on-a-Chip (SoC) that implements an ordered key-value store atop the SSD. It supports insertion, deletion, histogram generation, point/range queries over primary keys, and point/range queries over user-defined secondary index keys, a function that is often missed by today’s popular software key-value stores. We show why computational storage in the form of a hardware-accelerated key-value store is particularly interesting to LANL's simulation-based science workflows, how it fits into LANL's overall storage infrastructure designs, and how we implement KV-CSD to address bottlenecks experienced by scientists when high volumes of small data records previously written by a massively parallel simulation are subsequently read for interactive data analytics with potentially very selective queries against multiple data dimensions.
This is an update on the activities in the OCP Storage Project.
Enterprises are rushing to adopt AI inference solutions with RAG to solve business problems, but enthusiasm for the technology's potential is outpacing infrastructure readiness. It quickly becomes prohibitively expensive or even impossible to use more complex models and bigger RAG data sets due to the cost of memory. Using open-source software components and high-performance NVMe SSDs, we explore two different but related approaches for solving these challenges and unlocking new levels of scale: offloading model weights to storage using DeepSpeed, and offloading RAG data to storage using DiskANN. By combining these, we can achieve (a) more complex models running on GPUs that it was previously impossible to use, and (b) greater cost efficiency when using large amounts of RAG data. We'll talk through the approach, share benchmarking results, and show a demo of how the solution works in an example use case.
Drawing from recent surveys of the end user members of the HPC-AI Leadership Organization (HALO), Addison Snell of Intersect360 Research will present the trends, needs, and "satisfaction gaps" for buyers of HPC and AI technologies. The talk will focus primarily on the Storage and Networking modules of the survey, with some highlights from others (e.g. processors, facilities, cloud) as appropriate. Addison will also provide overall market context of the total AI or accelerated computing market at a data center level, showing the growth of hyperscale AI, AI-focused clouds, and national sovereign AI data centers, relative to the HPC-AI and enterprise segments, which are experiencing diminishing influence in a booming market.
Chiplets have become a near-overnight success with today’s rapid-fire data center conversion to AI. But today’s integration of HBM DRAM with multiple SOC chiplets is only the very beginning of a larger trend in which multiple incompatible technologies will adopt heterogeneous integration to connect new memory technologies with advanced logic chips to provide both significant energy savings and vastly-improved performance at a reduced price point. In this presentation analysts Tom Coughlin and Jim Handy will explain how memory technologies like MRAM, ReRAM, FRAM, and even PCM will eventually displace the DRAM HBM stacks used with xPUs, on-chip NOR flash and SRAM, and even NAND flash in many applications. They will explain how DRAM’s refresh mechanism and NAND and NOR flash’s energy-hogging writes will give way to much cooler memories that will be easier to integrate within the processor’s package, how processor die sizes will dramatically shrink through the use of new memory technologies to replace on-chip NOR and SRAM, and how the UCIe interface will allow these memories to compete to bring down overall costs. They will also show how the approach will not only reduce the purchase price per teraflop, but also how the energy costs per teraflop will also improve.