Abstract
This talk describes the design, implementation, and evaluation of the metadata index search component of a highly scalable, cost-efficient file system. Our system maintains traditional file system interfaces (e.g., POSIX) because they are used by many enterprise and consumer applications. However, storing hundreds of millions or billions of files in such a file system makes it difficult for a user to keep track of files and their status. Hierarchical naming is helpful up to a point, but does not solve the whole problem of managing files, which can easily be "lost.” Therefore, in such large file systems, a search facility is required. Searching for a file by a combination of file name and metadata makes it easier to find files. A POSIX file system already stores metadata such as file owner, group, creation date, change date, and size. Here, we focus on facilities that we have built to maintain file metadata indices and to service file meta-data search queries. Our metadata search subsystem uses open-source components, an OS-level file system notification system and an index partitioning and distribution mechanism that allows for fast searches over billions of files. In a typical installation, typical queries, including those that touch all file indices, respond within reasonable delays. We also discuss future work.
Learning Objectives
Metadata Index Search for a Scalable POSIX File System
Use of Open Source Software in a Prototype System
Overview of a Scalable Distributed POSIX File System