SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA
Lafayette
Mon Sep 15 | 1:30pm
DRAM constitutes roughly half of the cost of an individual server, over $100B is spent on it annually, and prices are only rising. The cost of DRAM is a major reason why modern computing tends to get so expensive. For such a costly resource, the expectation would be that all memory being provisioned is truly needed and utilized effectively. In many business environments, however, this is not the case. Studies from major cloud providers, however, have shown that DRAM utilization regularly drops to 50% or below. MEXT is tackling the memory utilization challenge. MEXT continually offloads underutilized memory pages to Flash and leverages AI to predict which pages in Flash should be preloaded back into DRAM. This keeps application performance intact within a far smaller DRAM footprint—yielding lower computing costs. The MEXT AI engine was inspired by modern AI techniques based on neural networks; but instead of using these techniques to predict words or natural language patterns (like ChatGPT does), it predicts sequences of future memory page accesses.
This presentation examines the transformative role of Compute Express Link (CXL) technology in addressing the critical challenges of memory scaling within modern computing architectures. As data-intensive applications in graph analytics, machine learning, and recommendation engines continue to demand exponentially increasing memory resources, traditional memory architectures face significant limitations in scalability and efficiency. The presentation provides a comprehensive analysis of current memory architecture challenges, including CPU memory channel limitations and resource utilization inefficiencies, before exploring how CXL's innovative protocol stack offers solutions through its three key components: CXL.IO, CXL.cache, and CXL.mem.
We present some real data points from our experiment using distributed in-memory database as workload running on a clustered platform with CXL used as memory expander, well complemented by Samsung Cognos. The experiment shows 5x capacity increase and meeting the targeted performance/SLA.
These findings suggest that CXL represents a fundamental shift in memory architecture design, offering a viable pathway for meeting the escalating memory demands of next-generation computing applications while maintaining efficiency and cost-effectiveness.
This presentation will explore the benefits of CXL memory pooling and tiering for storage devices. The session will also examine CXL-based storage applications which can be deployed in data centers.
AI infrastructure is dominated by GPUs — but should it be? As foundational model inference scales, performance bottlenecks are shifting away from compute and toward memory and I/O. HBM sits underutilized, KVCache explodes, and model transfer times dominate pipeline latency. Meanwhile, compression, CXL fabrics, computational memory, and SmartNIC-enabled storage are emerging as powerful levers to close the tokens-per-second-per-watt gap. This panel assembles voices from across the AI hardware and software stack to ask the hard question: Can memory and storage innovation disrupt the GPU-centric status quo — or is AI destined to remain homogeneous?
You’ll hear from a computational HBM vendor, an AI accelerator startup, a compression IP company, a foundational model provider, and a cloud-scale storage architect: Potential panelists: computational HBM vendor (Numem), an AI accelerator startup (Recogni), a compression IP company(MaxLinear), a foundational model provider(Zyphra), and a cloud-scale storage architect (Solidigm). Together, they’ll explore: Why decode-heavy inference is choking accelerators — even with massive FLOPs Whether inline decompression and memory-tiering can fix HBM underutilization How model developers should (or shouldn’t) design for memory-aware inference Whether chiplet and UCIe-based systems can reset the balance of power in AI Expect live debate, real benchmark data, and cross-layer perspectives on a topic that will define AI system economics in the coming decade. If you care about performance-per-watt, memory bottlenecks, or building sustainable AI infrastructure — don’t miss this conversation.
Join SDXI TWG chair Shyam Iyer and Editor/Contributor William Moyes to learn what’s new in this SNIA standard for memory-to-memory data movement and acceleration. Learn the key differences from v1.0, how SDXI v1.1 improves extensibility and open-ness, and exciting new features added to v1.1. This talk will also briefly discuss software ecosystem enablement and opportunities to engage with the TWG.
Various stages in the RAG pipeline of AI Inference involve large amounts of data being processed. Specifically, the preparation of data to create vector embeddings and the subsequent insertion into a Vector DB requires a large amount of transient memory consumption. Furthermore, the search phase of a RAG pipeline, depending on the sizes of the index trees, parallel queries, etc. also result in an increased memory consumption. We observe that the peak memory consumption is dependent on the load the RAG pipeline is under; whether the vectors are being inserted or updated and other such transient dynamic behaviors. Thus, we find that having local memory attached to meet the peak memory consumption is inefficient.
To improve the efficiency of RAG pipeline under the scenarios described, we propose the use of CXL based memory to meet the high memory challenges while reducing the statically provisioned local memory. In specific, we explore two approaches: 1.) CXL Memory Pooling: Provisioning memory based on dynamic and transient needs to reduce locally attached memory costs. 2.) CXL Memory Tiering: Using cheaper and larger capacity memory to reduce locally attached memory costs. We explore the current state of open-source infrastructure to support both solutions, and show that these solutions can result in significant DRAM cost saving for a minimal tradeoff in performance. Additionally, we comment on potential gaps in open source infrastructure and discuss potential ideas to bridge these gaps going forward.
SDXI is an emerging standard for a memory data movement and acceleration interface. NVMe is an industry leading storage access protocol. Memory transfers are integral to storage access, including NVMe.Data is transferred by DMA from host memory to device memory or from device memory to host memory. With SDXI as the data mover, data movement is standardized and new transformation (compute) capabilities are enabled. Transparent memory data movement within and across storage nodes remains an active area of optimization for NVM subsystems. Leveraging SDXI as an industry standard technology within and across storage nodes for memory data movement and transformation is prudent and necessary for storage OEMs. The SNIA SDXI + CS subgroup will present standardizing data movement within NVMe, leveraging SDXI transformations to manipulate data in-flight, and an example flow for transparent data movement across storage nodes.