Cutting-edge innovations in solid-state drive (SSD) power efficiency, and liquid cooling are designed to mitigate AI workload bottlenecks and reduce Total Cost of Ownership (TCO) in data centers. In the Unlocking Sustainable Data Centers: Optimizing SSD Power Efficiency and Liquid Cooling for AI Workloads webinar,  experts from the SNIA SSD Special Interest Group discussed how and why power efficiency matters, NVMe® power management and power states, and power scheduling and optimization examples. Below is a recap of the audience Q&A, with insights into how to move toward lower TCO and greener data centers without sacrificing IOPS. 

Q: What kind of switching times between power states are being considered?

A: That is a tricky question because it is dependent on the specific power state and what needs to be done on the SSD to get it in and out of that power state. If we're talking about idle states, that's typically where hardware components are going to sleep, and that takes time to bring up all the components. So that's going to fall more into the millisecond range, while other power states can be within the microsecond range.

Q: What type of usage is assumed for TP 4199 in system rack data at the data center level?

A: In our webinar, we discuss a few usage models, and there are actually a lot of ways to think about how we could use power measurements.  If you worked in validation or you're familiar with SSD power, you can basically reverse engineer what's happening. You can take that information and use it to get bug info on the fly or optimize host software. Even better, you can compare SSD vendors and ask, “Vendor X reports this, Vendor Y reports that, why is there a discrepancy?” The idea is to create a competitive environment to challenge vendors to improve power efficiency.  This also enables progress towards validating sustainability goals at the rack level.  Compute customers are really stepping it up with their sustainability initiatives, and in order to meet those goals, they need visibility into each individual hardware component.

Q: You spoke about how NVMe® Technical Proposal (TP) 4199 unlocks precise, standardized, scalable power telemetry.  Where can we get more information on the TP and other NVMe information you were discussing?

A: The NVM Express Consortium will post all ratified material and NVMe specifications on their website at nvmexpress.org/compliance.  You can join the NVMe Technical Work Group. There is also a deeper dive on TP4199 in this video[1].  And a 2025 Open Compute Project technical talk also has more details. 

Q: The information presented today is focused around data centers but AI is moving to the edge.  Is there a place that people can go for more information on the same type of content but for the edge?

A: The folks we’re talking to on the edge are really interested in immersion which is interesting because we mentioned for the hyperscalers this density per rack is all focused on liquid. The immersion information actually turns out to be really interesting on the edge where you know there could be heat reuse, etc.  I think edge used to mean smaller populations with one or two servers, but now I think people are seeing edge more for on premises. Regional data centers or telco data centers with a few megawatts might still be an edge. It's certainly very interesting as far as the types of power but for the SSD I think obviously that liquid cooling example is an extreme example where people aren't building 600 kilowatt racks on the edge, but rather they’re building 20 or 30 kilowatt racks. That’s probably the major difference now where we see edge is still more the enterprise/traditional data center where you have extreme power limitations on the rack where actually some of these NGV power states actually become even more critical.

Q: What are your thoughts on balancing serviceability with cooling performance for direct to chip? For example, using thermal interface material (TIM)?

A: I think the interesting thing in that solution we discussed for the E1.S  liquid cooled cold plate solution is actually that they designed a way to make the E1.S hot pluggable with the cold plates so the SSD is still serviceable in the liquid cooling system which is pretty unique. We always want SSDs to be serviceable - to be able to replace any failed drive. We always thought there'd be no serviceability in immersion, but it turns out you can just pull the drive out of the tank, let the fluid drain, and then swap. This is in production - pretty surprising - but liquid cooling and this higher power density makes everything a million times more complicated. It is a lot harder than air-cooled stuff.

Thanks to our moderator, Cameron Brett, SSD SIG Co-Chair from KIOXIA, and our presenters, Jonmichael Hands, SSD SIG Co-Chair from Solidigm, and Nicole Ross from Samsung. SNIA has an extensive Educational Library of SSD materials – type SSD in the search bar for webinars, conference presentations, and white papers. The SNIA SSD Special Interest Group has been focused on Total Cost of Ownership (TCO) for SSDs, with a TCO Model of Storage White Paper and a TCO Calculator.  Stay tuned for a new TCO of Computational Storage coming out soon.    And if you have questions on the webinar, or any of SNIA’s work, send an email to askcms@snia.org  Thanks!


 

[1] TP4199 Details presented at 2025 OCP Storage Tech Talks by Dan Hubbard (Micron) at the 3 hr 17 minute mark: https://www.youtube.com/watch?v=ppPGAngXX7c