SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA
The Grand Unified File Index (GUFI) is a toolset built upon established technologies that helps manage massive amounts of filesystem metadata, which is becoming more and more important as the amount of data needing to be managed grows. GUFI provides powerful query capabilities via SQL that far surpass those provided by standard POSIX tools such as find, ls, and stat. GUFI provides results much faster than POSIX tools by virtue of running in parallel rather than being single threaded. GUFI allows for both filesystem administrators and users to query the same index without loss of security.
All of this capability is available to all filesystems that are accessible via the POSIX filesystem interface (or can be mapped to it), removing the requirement of users needing to be experts with filesystem specific tools. In addition to general improvements to the codebase, recent developments have introduced a series of features that, when combined, makes GUFI a swiss army knife for data (in addition to metadata) processing while maintaining need-to-know: exposing GUFI trees as single tables via the SQLite virtual table mechanism allowing for SQLAlchemy to access GUFI trees, the ability to run arbitrary commands to return results as both individual values as well as whole tables, and the addition of SQLite-native vector embedding generation and tables allowing for nearest neighbor searches of user data. These additions have turned GUFI from a command line only tool to a full stack tool with powerful features also usable from fully featured front end GUIs. LA-UR-25-24971
Learn about the general improvements to the GUFI toolset that have been made in the past few years. Learn about GUFI's new data processing capabilities, including the ability to perform AI operations. Understand that GUFI can be made very user-friendly despite how many features have been added.
Rubrik is a cybersecurity company protecting mission critical data for thousands of customers across the globe including banks, hospitals, and government agencies. SDFS is the filesystem that powers the data path and makes this possible. In this talk, we will discuss challenges in building a masterless distributed filesystem with support for data resilience, strong data integrity, and high performance which can run across a wide spectrum of hardware configurations including cloud platforms. We will discuss the high level architecture of our FUSE based filesystem, how we leverage erasure coding for maintaining data resilience and checksum schemes for maintaining strong data integrity with high performance. We will also cover the challenges in continuously monitoring and maintaining the health of the filesystem in terms of data resilience, data integrity and load balance. Further we will go over how we expand and shrink resources online from the filesystem. We will also discuss the need and challenge of providing priority natively in our filesystem to support a variety of workloads and background operations with varying SLA requirements. Finally, we will also touch on the benefits and challenges of supporting encryption, compression, and de-duplication natively in the filesystem.
GoogleFS introduced the architectural separation of metadata and data, but its reliance on a single active master imposed fundamental limitations on scalability, redundancy, and availability. This talk presents a modern metadata architecture, exemplified by SaunaFS, that eliminates the single-leader model by distributing metadata across multiple concurrent, multi-threaded servers. Metadata is stored in a sharded, ACID-compliant transactional database (e.g., FoundationDB), enabling horizontal scalability, fault tolerance through redundant metadata replicas, reduced memory footprint, and consistent performance under load. The result is a distributed file system architecture capable of exabyte-scale operation in a single namespace while preserving POSIX semantics and supporting workloads with billions of small files.
The performance of network file protocols is a critical factor in the efficiency of the AI and Machine Learning pipeline. This presentation provides a detailed comparative analysis of the two leading protocols, Server Message Block (SMB) and Network File System (NFS), specifically for demanding AI workloads. We evaluate the advanced capabilities of both protocols, comparing SMB3 with SMB Direct and Multichannel against NFS with RDMA and multistream TCP configurations. The industry-standard MLPerf Storage benchmark is used to simulate realistic AI data access patterns, providing a robust foundation for our comparison. The core of this research focuses on quantifying the performance differences and identifying the operational and configuration overhead associated with each technology.
The Samba file server is evolving beyond traditional TCP-based transport. This talk introduces the latest advancements in Samba's networking stack, including full support for SMB over QUIC, offering secure, firewall-friendly file sharing using modern internet protocols. We’ll also explore the ongoing development of SMB over SMB-Direct (RDMA), aimed at delivering low-latency, high-throughput file access for data center and high-performance environments. Join us for a deep dive into these transport innovations, their architecture, current status, and what's next for Samba’s high-performance networking roadmap.
Samba is evolving to meet the demands of modern enterprise IT. The latest advancements bring critical SMB3 capabilities that boost scalability, reliability, and cloud readiness. With features like SMB over QUIC, Transparent Failover, and SMB3 Directory Leases now arriving, Samba is positioning itself as a robust solution for secure, high-performance file services across data centers and hybrid cloud environments. Learn how these enhancements can future-proof your infrastructure - without vendor lock-in.