MarFS: Near-POSIX Access to Object-Storage

Library Content Type:
Publish Date: 
Monday, September 19, 2016
Event Name: 
Focus Areas:

Many computing sites need long-term retention of mostly-cold data, often referred to as “data lakes”. The main function of this storage-tier is capacity, but non-trivial bandwidth/access requirements may also exist. For many years, tape was the most economical solution. However, data sets have grown larger more quickly than tape bandwidth has improved, such that disk is now becoming more economically feasible for this storage-tier. MarFS is a Near-POSIX File System, storing metadata across multiple POSIX file systems for scalable parallel access, while storing data in industry-standard erasure-protected cloud-style object stores for space-efficient reliability and massive parallelism. Our presentation will include: Cost-modeling of disk versus tape for campaign storage, Challenges of presenting object-storage through POSIX file semantics, Scalable Parallel metadata operations and bandwidth, Scaling metadata structures to handle trillions of files with billions of files/directory, Alternative technologies for data/metadata, and Structure of the MarFS solution.