The latest versions of the CXL standard allow system architects to consider future systems using secure disaggregated and distributed shared memory. With the immediate concerns of increasing DRAM costs, system designers are preparing to deploy CXL devices that reuse already-purchased DRAM DIMMs in servers supporting newer versions of DDR, expanding system memory capacity without incurring additional DRAM costs. This presentation explores another means to reduce the cost of memory using CXL: a CXL Type 3 device variant, sometimes referred to as CXL-SSD or CMM-H, which combines DRAM and NAND storage yet offers a host byte-addressable access to tens of TBs of memory.
For a CXL hybrid media device with DRAM and SSD, the CXL device designer needs to maintain an internal balance between the CXL device’s PCIe interface, internal DDR bandwidth, and SSD page size. The CXL device uses the DRAM to cache SSD pages and mitigate SSD latency for host CPU CXL.mem requests, but the CXL device still maintains the contract between the host physical address (HPA) and device physical address (DPA) regardless of where the SSD page resides. CXL devices with hybrid memory require a host CPU and its operating system to configure host physical address spaces an order of magnitude larger than current systems support, and designers need to find new means to reduce the metadata needed to support such large physical address spaces.
Numerous workloads benefit from lower-overhead access to byte-addressable memory over IO, but some memory latency-sensitive applications require techniques to reduce access latency to data stored on a CXL device. Some CXL host CPUs provide hardware support to mitigate memory latency to CXL. For a CXL DRAM-SSD hybrid media device, an application could provide information to the CXL device—for example, receiving notifications to pre-stage data in the CXL device’s DRAM cache before the application accesses the DPAs. An application that provides pre-staging hints or access pattern hints to a CXL hybrid media device allows the device to use its DRAM for caching SSD pages more efficiently.
Though CXL hybrid memory devices may have microsecond latencies upon missing in the DRAM cache, in the future, with the rise of systems using scale-up Ethernet, higher latency memory access may not be unique to CXL hybrid devices or CXL memory pools. Past and current academic research has evaluated different use cases and solutions for host CPUs to tolerate longer-latency memory accesses.
This presentation uses design choices and workload characterization data gathered from XCENA’s CXL 3.x device, MX1—the first ASIC implementation of a CXL 3.x hybrid media CXL device—to illustrate key points.