| SNIA | Experts on Data

SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA

Abstract

The Grand Unified File Index (GUFI) is a toolset built upon established technologies that helps manage massive amounts of filesystem metadata, which is becoming more and more important as the amount of data needing to be managed grows. GUFI provides powerful query capabilities via SQL that far surpass those provided by standard POSIX tools such as find, ls, and stat. GUFI provides results much faster than POSIX tools by virtue of running in parallel rather than being single threaded. GUFI allows for both filesystem administrators and users to query the same index without loss of security.

All of this capability is available to all filesystems that are accessible via the POSIX filesystem interface (or can be mapped to it), removing the requirement of users needing to be experts with filesystem specific tools. In addition to general improvements to the codebase, recent developments have introduced a series of features that, when combined, makes GUFI a swiss army knife for data (in addition to metadata) processing while maintaining need-to-know: exposing GUFI trees as single tables via the SQLite virtual table mechanism allowing for SQLAlchemy to access GUFI trees, the ability to run arbitrary commands to return results as both individual values as well as whole tables, and the addition of SQLite-native vector embedding generation and tables allowing for nearest neighbor searches of user data. These additions have turned GUFI from a command line only tool to a full stack tool with powerful features also usable from fully featured front end GUIs. LA-UR-25-24971

Learn about the general improvements to the GUFI toolset that have been made in the past few years. Learn about GUFI's new data processing capabilities, including the ability to perform AI operations. Understand that GUFI can be made very user-friendly despite how many features have been added.

Jason Lee

Scientist

Los Alamos National Laboratory

Recent Updates to the Grand Unified File Index Toolset

Abstract

Learning Objectives

Recent Updates to the Grand Unified File Index Toolset

Abstract

Learning Objectives

Highly Scalable, Masterless, Distributed Filesystem at Rubrik

Scalable Metadata in Distributed File Systems: Revisiting the GoogleFS Design for Exabyte-Scale Namespaces

Choosing Your AI Storage Protocol: A Deep Dive into SMB and NFS Performance, Tuning, and Overhead

New Transports in Samba: QUIC and SMB-Direct Support

Samba 2025: Enterprise-Ready, Cloud-Optimized

Session details to be announced