SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA
Understanding the differences between object and file-based workloads Learning about how Google has approached bridging the two workload types together Learning about the technical details of Google's approach to accelerate object storage
In 2018, the developers of RGW—the open source S3 and Swift object storage endpoint in Ceph—began work on “zipper,” a new, pluggable storage provider API, inspired in part by the loadable backend storage drivers of NFS-Ganesha, which several of those developers also maintain. Zipper drivers can present entirely new storage backends and mappings, or customize existing data paths and core server workflow through stackable filter drivers implementing a subset of the same API. In addition to Ceph cluster support, Zipper drivers exist to support two kinds of filesystem export (posixdriver and s3gw [https://s3gw.tech]), a SQL data model (dbstore), as well as prototype backends for DAOS (https://daos.io/) and former Seagate MoTR storage clusters. In-progress applications of Zipper filters include D4N, a distributed, materialized S3 object cache (https://www.ugurkaynar.com/publications/kariz-fast21.pdf). In 2019, RGW added initial support for dynamic customization of S3 requests and other workflows through Lua scripting. Both Zipper and Lua efforts have expanded in scope over time, and will soon yield the first in-tree, non-Ceph RADOS storage backend for Ceph Object Storage (RGW Standalone). The Ceph object development team wishes to grow awareness of Ceph RGW as an extensible, open-source object storage server platform with strong S3 fidelity and a vibrant developer community—and invite new participants to join our ranks!
The Cloud Storage Acceleration Layer (CSAL), an open-source host-based Flash Translation Layer (FTL) within the Storage Performance Development Kit (SPDK), has redefined cloud storage by transforming random write workloads into sequential patterns, optimizing performance and endurance for high-density NAND SSDs. This proposal introduces an enhanced CSAL framework integrating core scaling with RAID5F—a novel RAID implementation that eliminates the read-modify-write overhead and write hole problem inherent in traditional RAID5. By leveraging multi-core architectures to dynamically distribute CSAL’s write-shaping and data placement tasks, our approach achieves near-linear scalability across CPU cores while maintaining low-latency I/O. RAID5F, built on full-stripe writes and optimized parity computation, ensures data integrity without compromising performance, even under mixed workloads. This new configuration, paired with high-capacity QLC drives, exceeds performance, blends cost-efficiency and provides cutting-edge capabilities. Preliminary simulations and results demonstrate up to 2x throughput gains and a 30% reduction in write amplification compared to baseline CSAL deployments. This work positions CSAL with RAID5F as a cornerstone for next-generation cloud storage, delivering unmatched efficiency and reliability for hyperscale environments.
As cloud adoption accelerates, organizations are increasingly managing data estates that span petabytes and exabytes. At this scale, traditional tools fall short. Modern data management in the cloud must go beyond just storage and embrace granular visibility, governance, and optimization. This session explores how cloud platforms are evolving to meet these demands with scalable, intelligent solutions. We’ll discuss how modern architectures support massive-scale data environments while keeping performance, cost, and compliance in balance. A key innovation is advanced telemetry—offering deep insights into data estates through rich analytics and visualizations. Users can now explore detailed breakdowns of storage usage by region, and service and track trends over time. Enhanced data discovery features also allow organizations to identify unused, orphaned, or sensitive data and address them appropriately.
Data enhances foundational LLMs (e.g. GPT-4, Mistral Large and Llama 2) for context-aware outputs. In this session, we'll cover using unstructured, multi-modal data (e.g. PDFs, images or videos) in retrieval augmented generation (RAG) systems and learn about how cloud object storage can be an ideal file system for LLM-based applications that transform and use of domain-specific data, store user context and much more.
Cloud object storage systems have been built to satisfy simple storage workloads where traditional POSIX semantics are sacrificed for simplicity and scalability. With AI and analytics workloads migrating towards hyperscale cloud computing, object storage users are increasingly requesting file-oriented access to their data. In this presentation, we will discuss how Google Cloud Storage has approached bridging the gap between object-oriented and file-oriented storage to provide the optimal foundation for a scalable and performant data lakehouse which has been validated for both GPUs and TPUs at >60,000 accelerator scale, within both the Pytorch and JAX frameworks. Google Cloud Storage has recently launched several features to bring file-oriented features to object storage at cloud scale: * Hierarchical Namespace for directory-style bucket structure and access patterns, including atomic rename of folders and faster I/O ramping. * Managed Folders allows for managing access control grants at the folder/prefix level. * Rapid Storage brings stateful handle-based filesystem protocols to object storage, sub-millisecond latencies for random reads and writes, and low-latency appends. * Cloud Storage FUSE implements the fuse filesystem interface, allowing file-oriented clients to view Cloud Storage buckets as a filesystem * A gRPC endpoint for Cloud Storage enables new features, including binary transfers, http/2 bidirectional streaming, and dynamic routing. We will present the technical details of each of these features and how they work together to bring file-like performance and semantics to Cloud Storage.
CDMI 3.0 is the third major revision of the Cloud Data Management Interface, which provides a standard for discovery and declarative data management of any URI-accessible data resource, such as LUNs, files, objects, tables, streams, and graphs. Version 3 of the standard reorganizes the specification around data resource protocol "exports", data resource declarative "metadata", and adds new support for "rels", which describe graph relationships between data resources. This reorganization highlights common use cases for using CDMI to discover and manage multi-protocol access to stored data. This presentation will provide an overview of the changes and additions to CDMI 3.0, and will cover how CDMI is used to solve the following common data management challenges: 1. As a data access client, how can I discover which protocols can be used to access a data resource? 2. As a data management client, how can I control which protocols can be used to access a data resource? 3. As a data management client, how can I specify desired storage management behaviours for a data resource? 4. As a data management client, how can I efficiently transfer and package data resources for portable transfer between systems? The presentation will conclude with a discussion of the CDMI 3.0 release timeline and a call for participation for interested attendees to get involved in the standardization process.