Abstract
Finger printing algorithm is the foundation of Data deduplication, it is the prominent hotspot on CPU utilization due to its heavy computation. In this talk, we analyze the nature of the data deduplication feature in ZFS file system, present methods to increase data deduplication efficiency with a proper method using multiple-buffer hashing optimization technology. The multiple-buffer hashing has the usage limitations in data deduplication applications, like memory bandwidth for multiple-core, and lower performance for light workload. We design a multiple-buffer hashing framework for ZFS data deduplication to resolve these limitations. As result, ZFS improves 2.5x data deduplication throughput with this framework. This framework for multiple-buffer hashing is general and convenient to benefit other data deduplication applications.
Learning Objectives
Analyzing the nature of the data deduplication feature in ZFS file system.
Understanding how multiple-buffer hashing works.
Analyzing the usage limitations of multiple-buffer hashing for data deduplication applications.
Designing a general framework of multiple-buffer hash for data deduplication applications.
Demonstrating multiple-buffer hashing framework for ZFS data deduplication solution and performance.