| SNIA | Experts on Data

Salon II

Tue Sep 19 | 1:00pm

Abstract

Rapidly increasing data sizes, the high cost of data movement, and the advent of fast, NVMe-over-fabric based flash enclosures have led to the exploration of computation near flash for more efficient and economical storage solutions. Ordered key-value stores, commonly developed as software library code that runs inside application processes such as LevelDB and RocksDB, are one of many storage functions that can potentially benefit from offloaded processing. This is because traditional host managed key-value stores often exhibit long processing delays when their background worker threads cannot sort data as fast as a foreground writer application can write it, due to large amounts of data movement between the host and storage during the sort. Offloading key-value store computation to storage is interesting because it allows those data-intensive background tasks to be deferred and performed asynchronously on storage rather than on a host. This better hides background work latency and prevents it from blocking foreground writes. Offloaded key-value stores are interesting also because the key-value interface itself provides sufficient knowledge of data without requiring external metadata, leaving room for building more types of indexes such as secondary indexes and histograms. In this talk, we present KV-CSD, a research collaboration between SK hynix and Los Alamos National Lab (LANL) that explores the lab's next-generation performance-tier storage designs. A KV-CSD is a key-value based computational storage device consisting of a ZNS NVMe SSD and a System-on-a-Chip (SoC) that implements an ordered key-value store atop the SSD. It supports insertion, deletion, histogram generation, point/range queries over primary keys, and point/range queries over user-defined secondary index keys, a function that is often missed by today’s popular software key-value stores. We show why computational storage in the form of a hardware-accelerated key-value store is particularly interesting to LANL's simulation-based science workflows, how it fits into LANL's overall storage infrastructure designs, and how we implement KV-CSD to address bottlenecks experienced by scientists when high volumes of small data records previously written by a massively parallel simulation are subsequently read for interactive data analytics with potentially very selective queries against multiple data dimensions.

Overview of HPC storage architectures and bottlenecks
HPC-driven motivations for ordered key-value based computational storage
A practical implementation of a hardware-accelerated ordered key-value store for large-scale scientific data absorption, process, and analytics

Qing Zheng

Scientist

Los Alamos National Laboratory

Download PDF

KV-CSD: An Ordered, Hardware-Accelerated Key-Value Store For Rapid Data Insertion and Queries

Abstract

Learning Objectives

KV-CSD: An Ordered, Hardware-Accelerated Key-Value Store For Rapid Data Insertion and Queries

Abstract

Learning Objectives

The Information Bank: A Proposal for the Next Class of Storage Solving 50 Year Old Problems Managing Unstructured Data

Media Objectives for Next-Generation Video Games

OCP Storage Project Update

Data-Intensive Inference Done Better: Scaling Models and RAG in Limited Memory with SSD Offload

Chiplets, UCIe, Persistent Memory, and Heterogeneous Integration: The Processor Chip of the Future!