Modeling & Simulation
Simulating CXL.mem for Fun and Profit
CXL.mem enables hosts to expand their memories beyond individual servers and access memory regions using load and store instructions. In addition, CXL.mem enables memory sharing among its endpoints. Realizing memory sharing requires extending the coherency management protocol beyond individual hosts. Hosts and devices need to track the state of each memory region using individual finite state machines. This enables devices to modify the state of memory at specific hosts when needed, which is referred to as back-invalidation snooping.
Our goal is to democratize the exploration of large-scale CXL.mem deployments using simulations. In this talk, we describe the design and implementation of our packet-level CXL.mem simulator consisting of hosts, CXL switches, and CXL endpoints. We implemented our simulator on top of SimPy, which is a discrete-event simulation framework based on Python. Users can use the simulator APIs to (i) construct CXL topologies by connecting hosts with their CXL endpoints, switches, and devices, (ii) define control and data flow of applications, and (iii) submit applications to specific hosts for execution. The simulator can also be utilized to explore new CXL switch architectures.
Next, we show how our simulator helps explore a variety of use cases. First, we characterize the overheads of CXL coherency to realize memory sharing. These overheads include access, serialization, snooping, and eviction delays. Second, we discuss how our simulator helps analyze the performance of existing task schedulers and memory allocators. In addition, we describe how the simulator enables designing a new CXL-aware task scheduler based on our simulation-driven insights.