Abstract
Data in memory could be in a modified state than its on-disk copy. Also, unlike the on-disk copy, the in-memory data might not be checksummed, replicated or backed-up, every time it is modified. So the data must be checksummed before mirroring to avoid network corruptions. But checksumming the data in the application has other overheads: It must handle networking functionalities like retransmission, congestion, etc. Secondly, if it delays the validation of mirrored data, it might be difficult to recover the correct state of the system.
Mirrored-data integrity as transport protocol functionality leads to modular design and better performance. We propose a novel approach that utilizes TCP with MD5 signatures to handle the network integrity overhead. Thus, the application can focus on its primary task. We discuss the evaluation and use-case of this approach (NVM mirroring in Data Domain HA) to prove its advantages over conventional approach of checksumming in the application.
Learning Objectives
Designing efficient data-mirroring in backup and recovery systems, where reliability is prime
Linux kernel TCP know-how for using it with MD5 option
Analysis of conventional approach vs. the TCP MD5
Use-case: TCP MD5 option for NVM mirroring in Data Domain HA