Submitted by Anonymous (not verified) on

Reducing the amount of data is a huge advantage of saving a total cost of ownership for a distributed storage system. To do this, a deduplication method which removes redundant data is being used as one of the promising solutions to save storage capacity. However, in practice, traditional deduplication methods designed for a local storage system is not suitable for a distributed storage system due to several challenging issues. First, I/O overhead due to additional data and metadata processing can have a huge impact on performance, and the deduplication ratio is not high enough due to data distributed across multiple nodes. Second, it is not easy to design efficient metadata management for deduplicated data along with legacy metadata management due to scale-out characteristics. To address these challenges, in this talk, we propose a global deduplication method with a multi-tiered storage design and self-contained metadata structure. A tiering with deduplication-aware replacement policy can improve a deduplication efficiency by filtering out more important chunks which have high deduplication ratio. In addition, by adopting a self-contained metadata structure, it can also provide compatibility with existing storage features, recovery and snapshot. As a result, our proposed tiering-based global deduplication can reduce I/O traffic and save storage cost for a distributed storage system.

Bonus Content
Off
Presentation Type
Presentation
Learning Objectives
  • Deduplication
  • Tiering
  • Distributed Storage System
Display Order
78
Start Date/Time
YouTube Video ID
sbP8SeTEA3U
Zoom Meeting Completed
Off
Main Speaker / Moderator
Speakers
Webform Submission ID
25