The Grand Unified File Index (GUFI) is a toolset built upon established technologies that helps manage massive amounts of filesystem metadata, which is becoming more and more important as the amount of data needing to be managed grows. GUFI provides powerful query capabilities via SQL that far surpass those provided by standard POSIX tools such as find, ls, and stat. GUFI provides results much faster than POSIX tools by virtue of running in parallel rather than being single threaded. GUFI allows for both filesystem administrators and users to query the same index without loss of security. All of this capability is available to all filesystems that are accessible via the POSIX filesystem interface (or can be mapped to it), removing the requirement of users needing to be experts with filesystem specific tools.
In addition to general improvements to the codebase, recent developments have introduced a series of features that, when combined, makes GUFI a swiss army knife for data (in addition to metadata) processing while maintaining need-to-know: exposing GUFI trees as single tables via the SQLite virtual table mechanism allowing for SQLAlchemy to access GUFI trees, the ability to run arbitrary commands to return results as both individual values as well as whole tables, and the addition of SQLite-native vector embedding generation and tables allowing for nearest neighbor searches of user data. These additions have turned GUFI from a command line only tool to a full stack tool with powerful features also usable from fully featured front end GUIs.
LA-UR-25-24971