SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA
SDXI is a standard for a memory-to-memory data mover and acceleration interface that is extensible, forward-compatible, and independent of I/O interconnect technology. Among other features, SDXI standardizes an interface and architecture that can be abstracted or virtualized with a well-defined capability to quiesce, suspend, and resume the architectural state of a per-address-space data mover.
Computational Storage is a SNIA standard that defines architectures for offloading the host or reducing data movement. Compute resources residing on a storage device or very near a storage device perform computations on data instead of the host. Computational Storage reduces bandwidth and power requirements of the storage fabric and frees the host for other purposes.
While reducing data movement and offloading the host remains the goal, there are times when data needs to move to the compute or results need to move to the host. The SNIA SDXI+CS Subgroup is a new working group exploring the architectural possibilities of combining SDXI with Computational Storage. This presentation will share our current thinking and architectural explorations around unifying Computational Storage and SDXI as well as the use cases for SDXI-based Computational Storage devices.
Large-scale simulations at Los Alamos can produce petabytes of data per timestep, yet the scientific focus often lies in narrow regions of interest—like a wildfire’s leading edge. Traditional HPC tools read entire datasets to extract these key features, resulting in significant inefficiencies in time, energy, and resource usage. To address this, Los Alamos—in collaboration with Hammerspace and SK hynix—is leveraging computational storage to process data closer to its source, enabling selective access to high-value information. This talk introduces a new pushdown architecture for pNFS, built on open-source tools such as Presto, DuckDB, Substrait, and Apache Arrow. It distributes complex query execution across application and storage layers, significantly reducing data movement, easing the load on downstream analytics systems, and allowing large-scale analysis on modest platforms—even laptops—while accelerating time to insight.
An important enabler of this design is the ability for pNFS clients to identify which pNFS data server holds a given file, allowing queries to be routed correctly. Complementing this is a recent Linux kernel enhancement that transparently localizes remote pNFS reads when a file is detected to be local. Together, these capabilities enable efficient query offload without exposing application code to internal filesystem structures—preserving abstraction boundaries and enforcing standard POSIX permissions. We demonstrate this architecture in a real-world scientific visualization pipeline, modified to leverage pushdown to query large-scale simulation data stored in Parquet, a popular columnar format the lab is adopting for its future workloads. We conclude with key performance results and future opportunities.
While a host has been able to address NVMe device memory using Controller Memory Buffer (CMB) and Persistent Memory Region (PMR), that memory has never been addressable by NVMe commands. NVMe introduced the Subsystem Local Memory IO Command Set (SLM), which allowed NVMe device memory to be addressable by NVMe commands; however, this memory could not be addressed by the host using host memory addresses. A new technical proposal is being developed by NVM Express that would allow SLM to be assigned to a host memory address range. We will describe the architecture of this new NVMe feature and discuss the benefits and use cases that host addressable SLM enables.
Storage systems leverage data transformations such as compression, checksums, and erasure coding. These transformations are necessary to save on capacity and protect against data loss. These transformations however are both memory bandwidth and CPU intensive. This is leading to a large disparity between the performance from the storage software layers and the storage devices backing the data. This disparity only continues to grow as NVMe devices provide increasing bandwidth with each new PCIe generation. Computational storage devices (accelerators) provide a path forward by offloading these resource intensive transformations to hardware designed to accelerate operations.
However, integrating these devices in storage system software stacks has been a challenge: Each accelerator has its own custom API that must be integrated directly into the storage software. This leads to challenges in supporting different accelerators and maintaining custom code for each. This challenge has been solved by the Data Processing Unit Services Module (DPUSM) kernel module that provides a uniform API for storage software stacks to communicate with any accelerator. The storage software layers leverage the DPUSM API, and accelerator vendors can write code specific to their device through the API. This separation allows for accelerators to seamlessly integrate with storage system software. This talk will highlight how the DPUSM is being leveraged with the Zettabyte File System (ZFS) through the ZFS Interface for Accelerators (Z.I.A.). ZFS can now use different accelerators for data transformations that can lead to a 16x speed up in performance.