Abstract
Over the past decade, distributed file systems based on a scale-out architecture that enables managing massive amounts of storage space (petabytes) have become commonplace. In this talk, I will first provide an overview of OSS systems (such as HDFS and KFS) in this space. I will then describe how these systems have evolved to take advantage of increasing network bandwidth in data center settings to improve application performance as well as storage efficiency. I will talk about these aspects by highlighting two novel features, multi-writer atomic append and (time-permitting) distributed erasure coding. These capabilities have been implemented in KFS and deployed in production settings to run analytic workloads.