Beyond GPUs: NVMe SSD Power and Voltage Telemetry for AI Datacenters | SNIA

Abstract

AI workloads are driving unprecedented power density in modern data centers, with some racks now exceeding 100 kW and power margins tightening at every level of the stack. While GPUs dominate most power discussions, NVMe SSDs are a meaningful — and often under-instrumented — contributor to rack-level energy consumption and a valuable indicator of power-delivery integrity. Historically, SSD power behavior has been inferred from static estimates rather than directly observed under real workloads, limiting developers' ability to plan, validate, optimize, and debug power behavior at scale.

Future SSDs address this visibility gap with standardized power monitoring telemetry, enabling per-drive power measurement including one-second average power, cumulative energy consumption, host-configurable power thresholds, and persistent on-drive power logs. A companion capability adds rail-level voltage monitoring telemetry, including threshold-triggered under- and over-voltage events and persistent voltage history — exposing power-delivery and platform-level integrity issues that aggregate power measurements alone cannot reveal.

This session presents developer-focused case studies using power monitoring telemetry and voltage monitoring telemetry to analyze SSD behavior under AI workloads. A power telemetry example compares measured SSD power against traditional estimates, highlighting significant discrepancies and their implications for power planning. A voltage telemetry scenario demonstrates how per-rail voltage monitoring can pinpoint power anomalies — such as rail droops and transient events — that are difficult to diagnose from system-level measurements alone. Attendees will leave with practical techniques for using standardized SSD telemetry not just to observe power behavior, but to actively optimize it — leveraging configurable thresholds and real-time measurements as a closed-loop feedback mechanism for power management in high-density AI deployments.

Jacob Schmier

Senior Firmware Engineer