Sorry, you need to enable JavaScript to visit this website.

AI Server Clusters – Networking and Storage

Grand Mesa F

Wed Apr 29 | 9:50am

Abstract

AI Server Clusters use multiple networks/fabrics (e.g., Scale-Up, Scale-Out, Front End/Access) to support the range of communication requirements exhibited by AI workloads. This session provides an overview of the networks and fabrics used in AI Server Clusters along with a survey of the low-level networking protocols/technology involved (e.g., Spectrum-X, Ultra Ethernet, NVLink, UALink, ESUN, and SUE-T).  These topics are followed by discussion of storage networking for AI Clusters, including applicability of specific storage networking protocols to AI Server Cluster networks/fabrics and the role of RDMA functionality. 

Download PDF