Magic Memory
Disrupting the GPU Hegemony: Can Smart Memory and Storage Redefine AI Infrastructure
AI infrastructure is dominated by GPUs — but should it be? As foundational model inference scales, performance bottlenecks are shifting away from compute and toward memory and I/O. HBM sits underutilized, KVCache explodes, and model transfer times dominate pipeline latency. Meanwhile, compression, CXL fabrics, computational memory, and SmartNIC-enabled storage are emerging as powerful levers to close the tokens-per-second-per-watt gap. This panel assembles voices from across the AI hardware and software stack to ask the hard question: Can memory and storage innovation disrupt the GPU-centric status quo — or is AI destined to remain homogeneous?
You’ll hear from a computational HBM vendor, an AI accelerator startup, a compression IP company, a foundational model provider, and a cloud-scale storage architect: Potential panelists: computational HBM vendor (Numem), an AI accelerator startup (Recogni), a compression IP company(MaxLinear), a foundational model provider(Zyphra), and a cloud-scale storage architect (Solidigm). Together, they’ll explore: Why decode-heavy inference is choking accelerators — even with massive FLOPs Whether inline decompression and memory-tiering can fix HBM underutilization How model developers should (or shouldn’t) design for memory-aware inference Whether chiplet and UCIe-based systems can reset the balance of power in AI Expect live debate, real benchmark data, and cross-layer perspectives on a topic that will define AI system economics in the coming decade. If you care about performance-per-watt, memory bottlenecks, or building sustainable AI infrastructure — don’t miss this conversation.