SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA
Learn about the architecture of CXL memory in Windows Learn about the CXL memory usages possible on Windows Learn about the CXL memory related APIs on Windows
This presentation will explore the benefits of CXL memory pooling and tiering for storage devices. The session will also examine CXL-based storage applications which can be deployed in data centers.
AI infrastructure is dominated by GPUs—but should it be? As foundational model inference scales, performance bottlenecks are shifting away from compute and toward memory and I/O. HBM sits underutilized, KVCache explodes, and model transfer times dominate pipeline latency. Meanwhile, compression, CXL fabrics, computational memory, and SmartNIC-enabled storage are emerging as powerful levers to close the tokens-per-second-per-watt gap. This panel assembles voices from across the AI hardware and software stack to ask the hard question: Can memory and storage innovation disrupt the GPU-centric status quo—or is AI destined to remain homogeneous? You’ll hear from a computational HBM vendor, an AI accelerator startup, a compression IP company, a foundational model provider, and a cloud-scale storage architect: Potential panelists: computational HBM vendor(Numem), an AI accelerator startup(Recogni), a compression IP company(MaxLinear), a foundational model provider(Zyphra), and a cloud-scale storage architect(Solidigm). . Together, they’ll explore: Why decode-heavy inference is choking accelerators—even with massive FLOPs Whether inline decompression and memory-tiering can fix HBM underutilization How model developers should (or shouldn’t) design for memory-aware inference Whether chiplet and UCIe-based systems can reset the balance of power in AI Expect live debate, real benchmark data, and cross-layer perspectives on a topic that will define AI system economics in the coming decade. If you care about performance-per-watt, memory bottlenecks, or building sustainable AI infrastructure—don’t miss this conversation.
Join SDXI TWG chair Shyam Iyer and Editor/Contributor William Moyes to learn what’s new in this SNIA standard for memory-to-memory data movement and acceleration. Learn the key differences from v1.0, how SDXI v1.1 improves extensibility and open-ness, and exciting new features added to v1.1. This talk will also briefly discuss software ecosystem enablement and opportunities to engage with the TWG.
Various stages in the RAG pipeline of AI Inference involve large amounts of data being processed. Specifically, the preparation of data to create vector embeddings and the subsequent insertion into a Vector DB requires a large amount of transient memory consumption. Furthermore, the search phase of a RAG pipeline, depending on the sizes of the index trees, parallel queries, etc. also result in an increased memory consumption. We observe that the peak memory consumption is dependent on the load the RAG pipeline is under; whether the vectors are being inserted or updated and other such transient dynamic behaviors. Thus, we find that having local memory attached to meet the peak memory consumption is inefficient. To improve the efficiency of RAG pipeline under the scenarios described, we propose the use of CXL based memory to meet the high memory challenges while reducing the statically provisioned local memory. In specific, we explore two approaches: 1.) CXL Memory Pooling: Provisioning memory based on dynamic and transient needs to reduce locally attached memory costs. 2.) CXL Memory Tiering: Using cheaper and larger capacity memory to reduce locally attached memory costs. We explore the current state of open-source infrastructure to support both solutions, and show that these solutions can result in significant DRAM cost saving for a minimal tradeoff in performance. Additionally, we comment on potential gaps in open source infrastructure and discuss potential ideas to bridge these gaps going forward.
SDXI is an emerging standard for a memory data movement and acceleration interface. NVMe is an industry leading storage access protocol. Memory transfers are integral to storage access, including NVMe. Data is transferred by DMA from host memory to device memory or from device memory to host memory. With SDXI as the data mover, data movement is standardized and new transformation (compute) capabilities are enabled. Transparent memory data movement within and across storage nodes remains an active area of optimization for NVM subsystems. Leveraging SDXI as an industry standard technology within and across storage nodes for memory data movement and transformation is prudent and necessary for storage OEMs. The SNIA SDXI + CS subgroup will present standardizing data movement within NVMe, leveraging SDXI transformations to manipulate data in-flight, and an example flow for transparent data movement across storage nodes.
In this presentation, we will present the architecture of CXL memory in Windows and describe the support that will be available. We will describe the possible usages of CXL memory, the RAS workflows and the developer interfaces available to use CXL memory.