Sorry, you need to enable JavaScript to visit this website.

Toward GPU‑Agnostic Storage: RDMA-Accelerated Data Movement Across Heterogeneous GPU Systems

Abstract

AI/ML and data-intensive workloads are increasingly constrained by CPU-mediated data movement between storage and GPUs. While GPU-direct approaches reduce overhead, they are largely vendor-specific, limiting portability across heterogeneous environments.

This session presents a practical approach to GPU-agnostic storage using RDMA-enabled data paths to move data directly between disaggregated storage and GPU memory across NVIDIA, AMD, and Intel platforms. We focus on designing storage as a first-class participant in GPU data pipelines, eliminating host staging buffers to reduce latency, CPU overhead, and data copies.

We will cover key architecture considerations, including NFS over RDMA and emerging S3 over RDMA models, GPU memory registration and addressability, and secure, efficient exposure of GPU buffers in a vendor-neutral framework. On the client side, we compare cuFile, ROCm, and oneAPI/Level Zero and present patterns for building portable abstractions.

We also examine performance tradeoffs across vendors, including memory pinning overheads, DMA behavior, and impacts on throughput, latency, and GPU utilization.