Snapshotting Scale-out Storage Pitfalls and Solutions | SNIA

Abstract

The most difficult to support is the requirement of consistency that often implies not only storage system’s own internal consistency (which is mandatory) but the application level consistency as well. Distributed clocks are never perfectly synchronized: temporal inversions are inevitable. While most I/O sequences are order indifferent, we cannot allow inconsistent snapshots that reference user data contradicting the user’s perception of ordering. Further, photo snapshots do not require the subjects to freeze: distributed snapshot operation must not require cluster-wide freezes. It must execute concurrently with updates and eventually (but not immediately) result in a persistent snapshot accessible for reading and cloning. This presentation will survey distributed snapshotting, explain and illustrate ACID properties of the operation and their concrete interpretations and implementations. We will describe MapReduce algorithms to snapshot a versioned eventually consistent object cluster. Lastly the benefits of atomic cluster-wide snapshots for distributed storage clusters will be reviewed. True cluster-wide snapshots enable capabilities and storage services that had been feared lost when storage systems scaled-out beyond transactional consistency of medium-size clusters. Learning Objectives Distributed copy-on-write snapshots - a lost art? Usage of client-defined timestamps to support causal consistency MapReduce programming model vs. cluster-wide snapshotting – a perfect fit