Deep Compression at Inline Speed for All-Flash Array

Author(s)/Presenter(s):
Library Content Type:
Publish Date: 
Wednesday, September 23, 2020
Event Name: 
Event Track:
Abstract: 

The rapid improvement of overall $/Gbyte has driven the high performance All-Flash Array to be increasingly adopted in both enterprises and cloud datacenters. Besides the raw NAND density scaling with continued semiconductor process improvement, data reduction techniques have and will play a crucial role in further reducing the overall effective cost of All-Flash Array. One of the key data reduction techniques is compression. Compression can be performed both inline and offline. In fact, the best All-Flash Arrays often do both: fast inline compression at a lower compression ratio, and slower, opportunistic offline deep compression at significantly higher compression ratio. However, with the rapid growth of both capacity and sustained throughput due to the consolidation of workloads on a shared All-Flash Array platform, a growing percentage of the data never gets the opportunity for deep compression. There is a deceptively simple solution: Inline Deep Compression with the additional benefits of reduced flash wear and networking load. The challenge, however, is the prohibitive amount of CPU cycles required. Deep compression often requires 10x or more CPU cycles than typical fast inline compression. Even worse, the challenge will continue to grow: CPU performance scaling has slowed down significantly (breakdown of Dennard scaling), but the performance of All-Flash Array has been growing at a far greater pace. In this talk, I will explain how we can meet this challenge with a domain-specific hardware design. The hardware platform is a FPGA-based PCIe card that is programmable. It can sustain 5+Gbyte/s of deep compression throughput with low latency for even small data block sizes (TByte/s BW and <0ns of latency) and the almost unlimited parallelism available on a modern mid-range FPGA device. The hardware compression algorithm is trained with a vast amount of data available to our systems. Our benchmarks show it can match or outperform some of the best software compressors available in the market without taxing the CPU.

Learning Objectives

Hardware Architecture for Inline Deep Compression,Design of Hardware Deep Compression Engine,Inline and offline compression of All-Flash Array