This talk explores fabric-attached, shared, disaggregated, and multi-tier memory architectures designed to meet the rapidly scaling memory demands of XPUs.
We present intelligent memory fabrics powered by smart memory devices capable of in-situ data transformation and reduction—compression, deduplication, and quantization—combined with protection and dynamic tiering. These capabilities enable AI-native data services such as distributed KV caches for long-context LLM serving, embedding stores for recommender systems, and vector databases for RAG.
Our approach addresses the core challenges of XPU far memory by hiding latency, minimizing data movement across the AI fabric, reducing energy consumption, and improving overall system efficiency. We share practical insights from deploying disaggregated, tiered memory in heterogeneous architectures to accelerate AI data pipelines and deliver meaningful performance gains. Finally, we outline an architectural direction aligned with work being done by SNIA Computational Storage and Smart Data Accelerator workgroups to drive scalable adoption of smart memory and storage