Eshcar Hillel

Principal Research Scientist

Pliops

Disaggregated KV Storage: A New Tier for Efficient Scalable LLM Inference

Read more about Disaggregated KV Storage: A New Tier for Efficient Scalable LLM Inference

As generative AI models continue to grow in size and complexity, the infrastructure costs of inference—particularly GPU memory and power consumption—have become a limiting factor. This session presents a disaggregated key-value (KV) storage architecture designed to offload KV-cache tensors efficiently, reducing GPU compute pressure while maintaining low-latency, high-throughput inference.

Subscribe to Eshcar Hillel