SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA

SNIA Developer

Attend

Why attend

Pricing/register

Hotel and venue

Agenda

Conference schedule

Full Conference Agenda

Speakers

Special Events/Plugfests

Call for presentations

Sponsorship Opportunities

Plugfests/Special Events

SMB3 Plugfest

Cloud Object Storage Plugfest

SNIA Swordfish Plugfest

Present a Birds of a Feather Session

Compacting Smaller Objects in Cloud for Higher Yield

Abstract

In file systems, large sequential writes are more beneficial than small random writes, and hence many storage systems implement a log structured file system. In the same way, the cloud favors large objects more than small objects. Cloud providers place throttling limits on PUTs and GETs, and so it takes significantly longer time to upload a bunch of small objects than a large object of the aggregate size. Moreover, there are per-PUT calls associated with uploading smaller objects. In Netflix, a lot of media assets and their relevant metadata is generated and pushed to cloud. Most of these files are between 10s of bytes to 10s of kilobytes and are saved as small objects on Cloud. In this talk, we would like to propose a strategy to compact these small objects into larger blobs before uploading them to Cloud. We will discuss the policies to select relevant smaller objects, and how to manage the indexing of these objects within the blob. We will also discuss how different cloud storage operations such as reads and deletes would be implemented for such objects. Finally, we would showcase the potential impact of such a strategy on Netflix assets in terms of cost and performance.

Learning Objectives

Learn about per request throttle & costs in cloud storage
Leveraging machine learning techniques to select relevant objects for compaction
Cloud operations such as reads, writes, deletes, updates on compacted objects
Impact of the proposal on Netflix assets

Tejas Chopra

Senior Software Engineer

Netflix, Inc.

Download PDF

Rate this Session

Description

Related Sessions

Cloud Storage

Enabling Standard Block Protocols on a Distributed Cloud-Native Platform - Azure Storage

Emily Kilian

Principal Software Engineer,

Microsoft

Yuemin Lu

Principal Product Manager/Architect

Microsoft

Cloud Storage

Modern Data Management at Exabyte Scale — With Visibility, Efficiency, and Control

As cloud adoption accelerates, organizations are increasingly managing data estates that span petabytes and exabytes. At this scale, traditional tools fall short. Modern data management in the cloud must go beyond just storage and embrace granular visibility, governance, and optimization. This session explores how cloud platforms are evolving to meet these demands with scalable, intelligent solutions. We’ll discuss how modern architectures support massive-scale data environments while keeping performance, cost, and compliance in balance. A key innovation is advanced telemetry—offering deep insights into data estates through rich analytics and visualizations. Users can now explore detailed breakdowns of storage usage by region, and service and track trends over time. Enhanced data discovery features also allow organizations to identify unused, orphaned, or sensitive data and address them appropriately.

Akshay Agrawal

Principal Product Manager

Cloud Storage

CSAL with Core Scaling for RAID5F: Revolutionizing Cloud Storage Performance and Reliability

The Cloud Storage Acceleration Layer (CSAL), an open-source host-based Flash Translation Layer (FTL) within the Storage Performance Development Kit (SPDK), has redefined cloud storage by transforming random write workloads into sequential patterns, optimizing performance and endurance for high-density NAND SSDs. This proposal introduces an enhanced CSAL framework integrating core scaling with RAID5F—a novel RAID implementation that eliminates the read-modify-write overhead and write hole problem inherent in traditional RAID5. By leveraging multi-core architectures to dynamically distribute CSAL’s write-shaping and data placement tasks, our approach achieves near-linear scalability across CPU cores while maintaining low-latency I/O. RAID5F, built on full-stripe writes and optimized parity computation, ensures data integrity without compromising performance, even under mixed workloads. This new configuration, paired with high-capacity QLC drives, exceeds performance, blends cost-efficiency and provides cutting-edge capabilities. Preliminary simulations and results demonstrate up to 2x throughput gains and a 30% reduction in write amplification compared to baseline CSAL deployments. This work positions CSAL with RAID5F as a cornerstone for next-generation cloud storage, delivering unmatched efficiency and reliability for hyperscale environments.

Mariusz Barczak

Principal Engineer - Storage Software Architect

Solidigm

Cloud Storage

Cloud Storage Considerations for Retrieval Augmented Generation (RAG) in AI Applications

Data enhances foundational LLMs (e.g. GPT-4, Mistral Large and Llama 2) for context-aware outputs. In this session, we'll cover using unstructured, multi-modal data (e.g. PDFs, images or videos) in retrieval augmented generation (RAG) systems and learn about how cloud object storage can be an ideal file system for LLM-based applications that transform and use of domain-specific data, store user context and much more.

Scott Hoag

Principal Product Manager

Microsoft

Cloud Storage

Blending Objects and Files in Google Cloud Storage

Cloud object storage systems have been built to satisfy simple storage workloads where traditional POSIX semantics are sacrificed for simplicity and scalability. With AI and analytics workloads migrating towards hyperscale cloud computing, object storage users are increasingly requesting file-oriented access to their data. In this presentation, we will discuss how Google Cloud Storage has approached bridging the gap between object-oriented and file-oriented storage to provide the optimal foundation for a scalable and performant data lakehouse which has been validated for both GPUs and TPUs at >60,000 accelerator scale, within both the Pytorch and JAX frameworks. Google Cloud Storage has recently launched several features to bring file-oriented features to object storage at cloud scale:

* Hierarchical Namespace for directory-style bucket structure and access patterns, including atomic rename of folders and faster I/O ramping. * Managed Folders allows for managing access control grants at the folder/prefix level.

* Rapid Storage brings stateful handle-based filesystem protocols to object storage, sub-millisecond latencies for random reads and writes, and low-latency appends.

* Cloud Storage FUSE implements the fuse filesystem interface, allowing file-oriented clients to view Cloud Storage buckets as a filesystem

* A gRPC endpoint for Cloud Storage enables new features, including binary transfers, http/2 bidirectional streaming, and dynamic routing. We will present the technical details of each of these features and how they work together to bring file-like performance and semantics to Cloud Storage.

Jeff Terrace

Senior Staff Software Engineer,

Google

Hsiu-Fan Wang

Senior Staff Software Engineer

Google

Cloud Storage

CDMI 3.0: Standardized Management of any URI-accessible Resource

CDMI 3.0 is the third major revision of the Cloud Data Management Interface, which provides a standard for discovery and declarative data management of any URI-accessible data resource, such as LUNs, files, objects, tables, streams, and graphs. Version 3 of the standard reorganizes the specification around data resource protocol "exports", data resource declarative "metadata", and adds new support for "rels", which describe graph relationships between data resources. This reorganization highlights common use cases for using CDMI to discover and manage multi-protocol access to stored data.

This presentation will provide an overview of the changes and additions to CDMI 3.0, and will cover how CDMI is used to solve the following common data management challenges: 1. As a data access client, how can I discover which protocols can be used to access a data resource? 2. As a data management client, how can I control which protocols can be used to access a data resource? 3. As a data management client, how can I specify desired storage management behaviours for a data resource? 4. As a data management client, how can I efficiently transfer and package data resources for portable transfer between systems? The presentation will conclude with a discussion of the CDMI 3.0 release timeline and a call for participation for interested attendees to get involved in the standardization process.

David Slik

Scientist,

Huawei Technologies Canada Co., Ltd.

SNIA Developer Conference September 15-17, 2025

SDC 2025 is brought to you by SNIA. SNIA is an industry association committed to its mission of worldwide leadership developing and promoting architectures, standards, education and vendor-neutral collaboration.