Fabric Attached Memory – Hardware and Software Architecture
HPC architectures increasingly handle workloads where the working data set cannot be easily partitioned or is too large to fit into node local memory. We have defined a system architecture and a software stack to enable large data sets to be held in fabric-attached memory (FAM) that is accessible to all compute nodes across a Slingshot-connected HPC cluster, thus providing a new approach to handling large data sets.