SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA
Lokesh Jain is a Senior Software Engineer at Cloudera. He is an early developer of Apache Ozone(Object Store) and Apache Ratis project(Raft Consensus Protocol implementation) and has been contributing for the past 4 years. He holds committer and PMC privileges for Apache Hadoop, Apache Ozone and Apache Ratis. He has pursued M.Sc.(Hons.) Mathematics and B.E.(Hons.) Computer Science from BITS Pilani. Lokesh has experience with distributed computing, data replication, data pipeline and storage systems.
Large-scale data analytics, machine learning, and big data applications often require the storage of a massive amount of data. For cost-effective high bandwidth, many data centers have used tiered storage with warmer tiers made of flashes or persistent memory modules and cooler tiers provisioned with high-density rotational drives. While ultra fast data insertion and retrieval rates have been increasingly demonstrated by research communities and industry at warm storage, complex queries with predicates on multiple columns tend to still experience excessive delays when unordered, unindexed (or potentially only lightly indexed) data written in log-structured formats for high write bandwidth is subsequently read for ad-hoc analysis at row level. Queries run slowly because an entire dataset may have to be scanned in the absence of a full set of indexes on all columns. In the worst case, significant delays are experienced even when data is read from warm storage. A user sees even higher delays when data must be streamed from cool storage before analysis takes place. In this presentation, we present C2, a research collaboration between Seagate and Los Alamos National Lab (LANL) for the lab's next-generation campaign storage. Campaign is a scalable cool storage tier at LANL managed by MarFS that currently provides 60 PBs of storage space for longer-term data storage. Cost-effective data protection is done through multi-level erasure coding at both node level and rack level. To prevent users from always having to read back all data for complex queries, C2 enables direct data analytics at the storage layer by leveraging Seagate Kinetic Drives to asynchronously add indexes to data at per-drive level after data lands on the drives. Asynchronously constructed indexes cover all data columns and are read at query time by the drives to drastically reduce the amount of data that needs to be sent back to the querying client for result aggregation. Combining computational storage technologies with erasure coding based data protection schemes for rapid data analytics over cool storage presents unique challenges in which individual drives may not be able to see complete data records and may not deliver performance required by high-level data insertion, access, and protection workflows. We discuss those challenges in the talk, share our designs, and report early results.
Rapidly increasing data sizes, the high cost of data movement, and the advent of fast, NVMe-over-fabric based flash enclosures have led to the exploration of computation near flash for more efficient and economical storage solutions. Ordered key-value stores, commonly developed as software library code that runs inside application processes such as LevelDB and RocksDB, are one of many storage functions that can potentially benefit from offloaded processing. This is because traditional host managed key-value stores often exhibit long processing delays when their background worker threads cannot sort data as fast as a foreground writer application can write it, due to large amounts of data movement between the host and storage during the sort. Offloading key-value store computation to storage is interesting because it allows those data-intensive background tasks to be deferred and performed asynchronously on storage rather than on a host. This better hides background work latency and prevents it from blocking foreground writes. Offloaded key-value stores are interesting also because the key-value interface itself provides sufficient knowledge of data without requiring external metadata, leaving room for building more types of indexes such as secondary indexes and histograms. In this talk, we present KV-CSD, a research collaboration between SK hynix and Los Alamos National Lab (LANL) that explores the lab's next-generation performance-tier storage designs. A KV-CSD is a key-value based computational storage device consisting of a ZNS NVMe SSD and a System-on-a-Chip (SoC) that implements an ordered key-value store atop the SSD. It supports insertion, deletion, histogram generation, point/range queries over primary keys, and point/range queries over user-defined secondary index keys, a function that is often missed by today’s popular software key-value stores. We show why computational storage in the form of a hardware-accelerated key-value store is particularly interesting to LANL's simulation-based science workflows, how it fits into LANL's overall storage infrastructure designs, and how we implement KV-CSD to address bottlenecks experienced by scientists when high volumes of small data records previously written by a massively parallel simulation are subsequently read for interactive data analytics with potentially very selective queries against multiple data dimensions.