Abstract
Deduplication can be accomplished in different ways in a file system. This tutorial will focus on block-level deduplication. While conceptually simple, an implementation can be quite complex as it must address multiple issues: scalability - when the lookup table no longer fits in memory. performance - impact of table lookups and writes dependent on reads. space accounting - space now be shared between files and file systems. administrative model - keeping model simple. We will talk about these issues in detail. This tutorial will also cover expanding the notion of deduplication beyond the storage device to include in-memory and over-the-wire deduplication.
Learning Objectives
• Handling scalability in deduplication.
• Understanding performance issues associated with deduplication.
• Dealing with space accounting with deduplication.