Democratizing AI Storage: Can We Achieve Object-over-RDMA to Generic JBOF Without Proprietary Transport Lock-In?

Winchester

Tue Sep 29 | 12:30pm

Abstract

As enterprise AI workloads transition to localized on-premises data centers, the demand for scalable, high-throughput storage has exposed the latency limitations of standard HTTP/TCP object stores. To keep accelerator pipelines fully saturated, zero-copy data transfer is an absolute necessity. Current industry proposals attempt to solve this by mapping the object storage API directly over remote direct memory access networks; however, these prevailing architectures rely heavily on Dynamically Connected Transport. While Dynamically Connected Transport successfully mitigates Queue Pair state memory exhaustion in massive clusters, it inherently restricts deployments to proprietary networking hardware and forces the central Object Server to remain a persistent bottleneck on the data path. How can infrastructure architects achieve massive, line-rate Object-over-RDMA throughput with standard JBOF using generic, vendor-agnostic network infrastructure?

This session introduces a novel, software-defined architectural paradigm that decouples the object metadata path from the physical data transfer path, drawing heavy structural inspiration from the pNFS block layout mechanism defined. We will demonstrate exactly how a modified client can query an Object Metadata Server to receive a layout array similar to pNFS, mapping logical objects directly to namespace and physical block addresses for NVMe-oF over RDMA JBOF device. Armed with this geometry, the client can utilize generic Reliable Connection to issue raw NVMe-over-RDMA commands directly read or write to standard JBOF targets. Moreover, client caching of the layouts and Reliable Connections can further reduce the load on the Object Metadata Server for better scaling. Finally, we plan to share performance metrics from a proof-of-concept implementation designed to achieve near-linear scale-out as JBOF targets are added. Through this, we intend to demonstrate the architecture's ability to seamlessly aggregate throughput to the point of saturating a continuous 800Gbps network link for modern GPU workloads.

Attendees will dive deep into the specific data flow architecture, analyzing the schema required to translate RESTful object requests into precise block offsets. We will examine the technical challenges of bypassing the metadata server during sustained data transfer, maintaining layout consistency across distributed nodes, and enabling standard, non-compute-heavy flash arrays to serve as direct AI data targets.