Abstract
In modern data center with thousands of servers, thousands of switches and storage devices, and millions of cables, failures could arise anywhere in compute, network or storage layer. The infrastructures provides multiple sources of huge volumes of data - time series data of events, alarms, statistics, IPC, system-wide data structures, traces and logs. Interestingly, data is gathered in different formats and at different rates by different subsystems. In this heterogeneous data representation, the ability to blend and ingest the data to discover hidden correlations and patterns is important. Robust data architecture and machine learning techniques are required to predict impending functional or performance issues and to propose desired actions that can mitigate an unwanted situation before it happens. This presentation will outline the challenges and address machine learning based solutions to understand long term trends, provide assessment of risk of failures, and suggest appropriate actions.
Learning Objectives
Challenges of data center environment
Analytics
Machine Learning