SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA
Amazon AWS S3 storage is widely deployed to store everything from customer data, server logs, software repositories and so on. Poorly secured S3 buckets have resulted in many publicized data breaches. The cloud service provider's shared responsibility model places responsibility on customers for protecting the confidentiality, availability and integrity of their data. Thales Cipher Trust Encryption Cloud Object Storage for S3 secures S3 objects by enabling advanced encryption along with dual end point access controls. Access controls are enforced both at the client host running the AWS S3 application and at the AWS S3 server end. The encryption offered by CTE COS for S3 is independent of AWS's S3 server side encryption. See Figure 1 Encryption and access controls are completely transparent to applications while AWS S3 administrative procedures remain unchanged after software agent deployment. Continuously enforced encryption policies protect against unauthorized access even in the case of AWS misconfigurations. Data access to 'protected' S3 buckets is tracked through detailed audit logs. CTE's granular, least-privileged user access policies protects sensitive data in S3 buckets from external attacks and misuse by other privileged users. CTE security administrators can frame client host policies to allow or deny actions involving ACLs like reading, writing, enumerating and deleting S3 buckets or even individual objects in a S3 bucket. In addition, client policies can also specify permissible users and applications capable of accessing protected AWS S3 buckets. AWS S3 server side access controls can also be simultaneously and transparently enabled with custom AWS IAM policies and roles. S3 bucket data accesses are only allowed from hosts configured with Ciphertrust Transparent Encryption. Cloud access controls and its management can therefore be offloaded to client hosts with additional control points for permitting specific local identities and applications. CTE COS S3 dual end point access controls and encryption therefore prevent S3 data breaches against unauthorized accesses even in the midst of misconfigured buckets and rogue insider threats. CTE COS S3 is FIPS 140-2 Level certified and is a part of the Ciphertrust Data Security platform.
As cloud adoption accelerates, organizations are increasingly managing data estates that span petabytes and exabytes. At this scale, traditional tools fall short. Modern data management in the cloud must go beyond just storage and embrace granular visibility, governance, and optimization. This session explores how cloud platforms are evolving to meet these demands with scalable, intelligent solutions. We’ll discuss how modern architectures support massive-scale data environments while keeping performance, cost, and compliance in balance. A key innovation is advanced telemetry—offering deep insights into data estates through rich analytics and visualizations. Users can now explore detailed breakdowns of storage usage by region, and service and track trends over time. Enhanced data discovery features also allow organizations to identify unused, orphaned, or sensitive data and address them appropriately.
In 2018, the developers of RGW — the open source S3 and Swift object storage endpoint in Ceph—began work on “zipper,” a new, pluggable storage provider API, inspired in part by the loadable backend storage drivers of NFS-Ganesha, which several of those developers also maintain. Zipper drivers can present entirely new storage backends and mappings, or customize existing data paths and core server workflow through stackable filter drivers implementing a subset of the same API. In addition to Ceph cluster support, Zipper drivers exist to support two kinds of filesystem export (posixdriver and s3gw [https://s3gw.tech]), a SQL data model (dbstore), as well as prototype backends for DAOS (https://daos.io/) and former Seagate MoTR storage clusters. In-progress applications of Zipper filters include D4N, a distributed, materialized S3 object cache (https://www.ugurkaynar.com/publications/kariz-fast21.pdf).
In 2019, RGW added initial support for dynamic customization of S3 requests and other workflows through Lua scripting. Both Zipper and Lua efforts have expanded in scope over time, and will soon yield the first in-tree, non-Ceph RADOS storage backend for Ceph Object Storage (RGW Standalone). The Ceph object development team wishes to grow awareness of Ceph RGW as an extensible, open-source object storage server platform with strong S3 fidelity and a vibrant developer community — and invite new participants to join our ranks!
The Cloud Storage Acceleration Layer (CSAL), an open-source host-based Flash Translation Layer (FTL) within the Storage Performance Development Kit (SPDK), has redefined cloud storage by transforming random write workloads into sequential patterns, optimizing performance and endurance for high-density NAND SSDs. This proposal introduces an enhanced CSAL framework integrating core scaling with RAID5F—a novel RAID implementation that eliminates the read-modify-write overhead and write hole problem inherent in traditional RAID5. By leveraging multi-core architectures to dynamically distribute CSAL’s write-shaping and data placement tasks, our approach achieves near-linear scalability across CPU cores while maintaining low-latency I/O. RAID5F, built on full-stripe writes and optimized parity computation, ensures data integrity without compromising performance, even under mixed workloads. This new configuration, paired with high-capacity QLC drives, exceeds performance, blends cost-efficiency and provides cutting-edge capabilities. Preliminary simulations and results demonstrate up to 2x throughput gains and a 30% reduction in write amplification compared to baseline CSAL deployments. This work positions CSAL with RAID5F as a cornerstone for next-generation cloud storage, delivering unmatched efficiency and reliability for hyperscale environments.
Data enhances foundational LLMs (e.g. GPT-4, Mistral Large and Llama 2) for context-aware outputs. In this session, we'll cover using unstructured, multi-modal data (e.g. PDFs, images or videos) in retrieval augmented generation (RAG) systems and learn about how cloud object storage can be an ideal file system for LLM-based applications that transform and use of domain-specific data, store user context and much more.
Cloud object storage systems have been built to satisfy simple storage workloads where traditional POSIX semantics are sacrificed for simplicity and scalability. With AI and analytics workloads migrating towards hyperscale cloud computing, object storage users are increasingly requesting file-oriented access to their data. In this presentation, we will discuss how Google Cloud Storage has approached bridging the gap between object-oriented and file-oriented storage to provide the optimal foundation for a scalable and performant data lakehouse which has been validated for both GPUs and TPUs at >60,000 accelerator scale, within both the Pytorch and JAX frameworks. Google Cloud Storage has recently launched several features to bring file-oriented features to object storage at cloud scale:
* Hierarchical Namespace for directory-style bucket structure and access patterns, including atomic rename of folders and faster I/O ramping. * Managed Folders allows for managing access control grants at the folder/prefix level.
* Rapid Storage brings stateful handle-based filesystem protocols to object storage, sub-millisecond latencies for random reads and writes, and low-latency appends.
* Cloud Storage FUSE implements the fuse filesystem interface, allowing file-oriented clients to view Cloud Storage buckets as a filesystem
* A gRPC endpoint for Cloud Storage enables new features, including binary transfers, http/2 bidirectional streaming, and dynamic routing. We will present the technical details of each of these features and how they work together to bring file-like performance and semantics to Cloud Storage.
CDMI 3.0 is the third major revision of the Cloud Data Management Interface, which provides a standard for discovery and declarative data management of any URI-accessible data resource, such as LUNs, files, objects, tables, streams, and graphs. Version 3 of the standard reorganizes the specification around data resource protocol "exports", data resource declarative "metadata", and adds new support for "rels", which describe graph relationships between data resources. This reorganization highlights common use cases for using CDMI to discover and manage multi-protocol access to stored data.
This presentation will provide an overview of the changes and additions to CDMI 3.0, and will cover how CDMI is used to solve the following common data management challenges: 1. As a data access client, how can I discover which protocols can be used to access a data resource? 2. As a data management client, how can I control which protocols can be used to access a data resource? 3. As a data management client, how can I specify desired storage management behaviours for a data resource? 4. As a data management client, how can I efficiently transfer and package data resources for portable transfer between systems? The presentation will conclude with a discussion of the CDMI 3.0 release timeline and a call for participation for interested attendees to get involved in the standardization process.