Platform Performance Analysis for I/O-intensive Applications

Library Content Type:
Publish Date: 
Wednesday, September 23, 2020
Event Name: 
Event Track:

High performance storage applications running on Intel® Xeon® processors actively utilize I/O capabilities and I/O accelerating features of platform by interfacing with NVMe devices. Such I/O-intensive applications may suffer from performance issues, which in a big picture can be categorized into three domains: I/O device bound – performance is limited by device capabilities core bound – performance is limited by algorithmic or microarchitectural code issues uncore bound – performance is limited by non-optimal interactions between devices and CPU. This talk focuses on the latter case. In Intel architectures the term “core” covers execution units and private caches, and all the rest of the processor is referred as “uncore”, which includes on-die interconnect, shared cache, cross-socket links, integrated memory and I/O controllers, etc. Activities happening on IO path in uncore cannot be monitored with traditional core-centric analyses, but there are pitfalls that require uncore-centric view. Intel servers provide such view by incorporating thousands of uncore performance monitoring events that can be collected in performance monitoring units (PMUs) associated with uncore IP blocks. However, using raw counters for performance analysis requires deep knowledge of hardware and appears incredibly challenging. In this talk we will discuss platform-level activities induced by I/O traffic on Intel® Xeon® Scalable processors and summarize practices for best performance of storage applications. We will overview telemetry points staying on the IO traffic path and eventually present developing uncore-specific performance analysis methodology, that reveals platform-level inefficiencies, including poor utilization of Intel® Data Direct I/O Technology (Intel® DDIO).

Learning Objectives

Uncore-centric performance analysis methodology for I/O-intensive applications running on Intel server architectures,HW operations induced by PCIe traffic and HW-level observability for them,Practices to gain best IO performance on Intel server architectures