Sorry, you need to enable JavaScript to visit this website.

SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA

Principal Research Scientist

Pliops

Disaggregated KV Storage: A New Tier for Efficient Scalable LLM Inference

Submitted by Anonymous (not verified) on

As generative AI models continue to grow in size and complexity, the infrastructure costs of inference—particularly GPU memory and power consumption—have become a limiting factor. This session presents a disaggregated key-value (KV) storage architecture designed to offload KV-cache tensors efficiently, reducing GPU compute pressure while maintaining low-latency, high-throughput inference.

Subscribe to Eshcar Hillel