Sorry, you need to enable JavaScript to visit this website.

SNIA Developer Conference September 15-17, 2025 | Santa Clara, CA

Assessing AI Storage Communication Performance At Scale

Abstract

How do we assess the performance of AI network and storage infrastructure that is critical to the successful deployment of today's complex AI training and inferencing engines? And is it possible to do this without needing to provision racks of expensive GPU Capex?  This presentation discusses methodologies and considerations in performing such assessments.   We look at different topologies, host and network side considerations and metrics. The performance aspects of NICs/SmartNICs, storage offload processing, switches and interconnects are examined. Benchmarking of AI collective communications with RoCE transport are considered along with the overall impact on training convergence time and network utilization.  The operational aspect of commercial networks includes proxies, encapsulations, connection scale and encryption. We discuss their impact on AI training and inferencing.

Learning Objectives

Performance consideration in AI fabric and storage deployment Optimization metrics for AI training and inference Role of various AI infrastructure elements that impact performance