Abstract
Consumer-grade flash-based SSDs deliver great performance compared to disks, but their well-known vagaries make designing a cost-effective all-flash production-grade array that scales to petabytes a daunting challenge. To overcome this, the Purity architecture developed by Pure Storage implements global inline deduplication and achieves consistently low latency in highly-available all-flash arrays that are in production today. This discussion will cover the techniques that make it possible to perform the hundreds of thousands of metadata insertions and updates per second required to achieve this reliability. The architecture utilizes an insert-only database to enable parallelism both within and across controllers, and a variety of tactical mechanisms that maintain metadata at a size consistent with constantly changing application demands. The discussion will further explain how Pure Storage's data structures and algorithms maintain performance, data integrity and availability in the presence of broad classes of multiple failures of varying severities.
Learning Objectives
Learn the global data structures that deliver logical integrity without sacrificing recovery performance
How to use multi-core processors effectively to enhance the performance of SSD-based arrays
How to scale a flash-based array to the petabyte range without sacrificing performance or data integrity