DiskANN: Storage-Aware Provider Architecture for Vector Search

Winchester

Tue Sep 29 | 3:35pm

Abstract

Vector search systems have traditionally treated storage primarily as a persistence layer for vector datasets and index recovery, while relying on DRAM-resident structures for active query execution. As vector datasets continue to scale beyond practical memory limits, modern NVMe storage provides an opportunity to rethink this model by enabling graph-based approximate nearest neighbor (ANN) indexes to actively participate in the online search path rather than serving only as cold storage or index reload media. With its combination of low latency, high parallelism, density, persistence, and cost efficiency, NVMe has become an increasingly viable medium for scalable storage-aware vector search architectures.

This session explores the evolution of DiskANN toward a storage-aware provider architecture for vector search. Instead of tightly coupling vector, graph, and metadata management to a monolithic ANN implementation, the architecture enables indexing and search logic to interact with provider interfaces for vectors, edge lists, quantization data, and metadata services while preserving storage-aware query behavior and NVMe optimization principles.

Particular focus is placed on the continued relevance of NVMe storage for vector search workloads, including graph locality, random-read amplification, tiered storage placement, and cost-efficient scaling beyond DRAM-only architectures. The session also discusses how provider-based integration enables research and production convergence while simplifying adoption across emerging AI and vector database environments.

A proof-of-concept deployment will demonstrate the provider-based architecture operating across multiple storage backends and storage tiers while leveraging NVMe as a foundational component of scalable vector infrastructure. The session will include example indexing workflows, search execution, and integration patterns illustrating how DiskANN can evolve beyond standalone SSD-resident libraries into a flexible storage-aware vector search framework.

Attendees will leave with a practical understanding of storage-aware ANN architecture design, tradeoffs involved in NVMe-based vector indexing, and integration patterns for scalable vector search across modern storage and AI platforms.

Alessandro Goncalves

Storage Solutions Architect

Solidigm

Harsha Vardhan Simhadri

Partner Researcher, Microsoft Azure

Microsoft

Abstract

Data aware Open Context Engine for AI Agents

Enabling Ultra-High-Scale RAG with Cost-Efficient, All-in-Storage ANNS with High-Capacity SSDs

PageANN: Scalable I/O-Efficient Disk-Based Approximate Nearest Neighbor Search with Disk Page-Aligned Graph

Compressing the LLM Context Window

LPDDR Memory: A New CPU Memory Choice for AI Inference

Storage for AI 104 - A Continuing Intro to Storage for Inference