Abstract
The enterprise applications use huge amount of data which demands large scale distributed storage subsystems deployments in the data centers. The storage virtualization functionalities bring in more complexity in storage management. The latest storage systems support performance data collection with high frequencies such as seconds and minutes which enables to gather more data for performance analysis. The challenge with the storage subsystems is finding the performance bottlenecks and identifying the root cause and resolving them with less turnaround. The performance bottlenecks include wide variety of issues such as inaccessible disks, I/O errors, port masking, volume errors, network congestion. The Machine Learning models (Multivariate Regression, Time series analysis & VAR models) are helping proactively finding the performance anomalies/bottlenecks and recovering from them intelligently. The performance metrics those are used in building the Machine Learning models are I/O Rate R/W (Read & Write), Data Rate R/W, Response time R/W, Cache hit R/W, Average data block size R/W, porta data rate R/W, port-local node queue time, port protocol errors, port congestion index etc. The storage system is updated with the corrective resolution or send an alert message to the storage administrator based on the performance bottleneck detection. We will share the experiences of using Machine Learning models for performance anomaly detection.