Drive Health Monitor (DHM) for Drives On-Prem (or core data center) and Cloud

Author(s)/Presenter(s):
Library Content Type:
Publish Date: 
Wednesday, September 29, 2021
Event Name: 
Event Track:
Abstract: 

This paper looks back at the analysis that has been done for the drive wear-out issue on for the different E-series array systems running at different customers’ sites and uses that data to give more specific guidance on thresholds for a preemptive drive removal. Motivation of DHM: Customers with old ventage or Refurbished drive replacement may experience a data loss event and continues to see a high drive failure rate >5% AFR (Annual Failure Rate). Storage vendors expect to see high fallout rates across the hard drive population as they age. The high utilization and the age of the drives are likely to continue and possibly increase the drive failure rate.

Failures are as such:

  • Outages: Loss of access
  • Performance impact
  • Possible data loss
  • Show user how DHM deploy a proactive (manual + automatic) drive removal process to predict bad drive early and reduce unexpected failures
  • Show user how advanced drive monitoring algorithm based on drive error metrics and scoring (offline monitoring) works
  • Show the user what are and why DHM has three prong approach to analyze the drive health static (Drive Statistics, Errors observed by controller, and SPFA thresholds )
  • Describe in details the design of DHM via a block diagram and how the thresholds are determined for DHM to predict bad drive early and reduce unexpected failures
  • Graphically shares plots and scoring tables of historical data for different drives monitored by DHM and share what was the expected life time for the Recommended Drive Removal

Watch video: