Fast Resilvering in High Capacity all Flash Systems

Library Content Type:
Publish Date: 
Thursday, September 26, 2019
Event Name: 
Event Track:
Focus Areas:

The trend of increasing all flash (NVMe/SAS SSD) drive capacities will most likely continue in coming years. The higher capacities present the challenge in resilvering the data when the devices fail. The bigger the capacity, the bigger is the blast radius and the larger time to rebuild it. However, the SSDs have much higher read/write bandwidth and it is increasing with the capacities.
A resilvering design is presented here for parity RAIDs (single, double or triple) or erasure code group which builds on the strengths of these drives to accomplish the resilvering in acceptable time. The scheme utilizes both the read and write bandwidth of all the drives in the RAID group to achieve the shorter time. The deep integration of RAID layer with the filesystem helps in limiting the resilvering to in-use data only. The design is geared to generate sequential read and write patterns. Additionally, the fine-grained, filesystem driven control helps in minimizing the impact on the client I/O latency when the resilvering is happening.

Watch video: