Scaling RAG with NVME: DISKANN's Hybrid Approach to Vector Databases Indexing
Is it still realistic to rely solely on DRAM for vector index storage when Large Language Models are driving petabyte-scale growth? Traditional in-memory indexing strategies quickly exhaust host memory as vector collections expand.
DISKANN (Disk-Accelerated Approximate Nearest Neighbor) is a hybrid vector search algorithm developed by Microsoft, designed to offload portions of the search index to NVMe SSDs. It enables scalable approximate nearest neighbor search by intelligently managing a multi-level index—keeping latency-critical portions in memory and using SSDs for the rest, without significant performance degradation.
This session explores whether offloading parts of the index to NVMe devices—using DISKANN—can reduce memory footprints without sacrificing query performance. We’ll first examine current vector indexing approaches and sizing challenges, then introduce DISKANN’s hybrid memory-and-SSD architecture. Through real-world use cases, we’ll compare latency, throughput, and resource utilization between pure in-memory indexes and DISKANN-backed indexes.
Attendees will leave with a critical understanding of when and how NVMe-augmented indexing makes sense, plus insights into tuning SSD parameters to sustain high service levels as vector databases continue to scale.