

## Accelerate Everything.

Successfully Deploying Persistent Memory and Acceleration via Compute Express Link!

Stephen Bates, Chief Technology Officer, Flash Memory Summit 2019







## Beyond NVDIMM: Future Interfaces for Persistent Memory

Stephen Bates, Microsemi





## Persistent Memory (PM)





Low Latency



**Memory Semantics** 



**Storage Features** 

© 2017 SNIA Persistent Memory Summit. All Rights Reserved.







© 2017 SNIA Persistent Memory Summit. All Rights Reserved.





## Coming Soon to a Cinema Near You!



GEN Z A New Fabric

> featuring Optional coherency NVMe support Scale

Coming in 2020

### CCIX

The ARMpire Strikes Back

featuring Off the CPU bus Accelerator support Cache coherency Scale?

Coming Soon??

## OpenCAPI

The Return of the Big Blue

featuring Off the CPU bus Accelerator support Cache coherency

Now Showing in Select Cinemas

### **BUT WHAT CAN I SEE TODAY???**

© 2018 SNIA Persistent Memory Summit. All Rights Reserved.





## Coming Soon to a Cinema Near You!



GEN Z A New Fabric

> featuring Optional coherency NVMe support Scale

Coming in 2020

CCIX

The ARMpire Strikes Back

featuring Off the CPU bus Accelerator support Cache coherency Scale?

Coming Soon??

## OpenCAPI

The Return of the Big Blue

featuring Off the CPU bus Accelerator support Cache coherency

Now Showing in Select Cinemas



### **BUT WHAT CAN I SEE TODAY???**

© 2018 SNIA Persistent Memory Summit. All Rights Reserved.

Compute Express Link





7





# (intel) orm AMD2 Last Week!

- **EIDETICOM**
- All the CPU vendors I care about are now CXL members.
- Same cannot be said for OpenCAPI, CCIX or Genz
- Remember, coherent buses MUST come directly out of the CPU!



## CXL Protocols

# The CXL transaction layer is comprised of 3 dynamically multiplexed sub-protocols on a single link:

- · CXL.io Discovery, configuration, register access, interrupts, etc.
- CXL.cache Device access to processor memory
- CXL.memory Processor access to device attached memory

### CXL - Dynamically Multiplexed IO, Cache and Memory







Let's break that down. Three protocols on one physical layer:

- **CXL.io**: This is PCIe Gen 5.0. All PCIe services will just work!
  - DMA
  - Interrupts (MSI/MSIX)
  - SR-IOV, ACS, ATS etc. for virtualization
  - NVM Express!!!??? We will come back to this
- **CXL.mem**: This is the protocol by which the host CPU accesses (persistent) memory on the CXL device.
- **CXL.cache**: The is the protocol by which the CXL device accesses host memory (useful for accelerators, not covered here today).







11



### **Memory Buffers**

Usages:

- Memory BW expansion
- Memory capacity expansion
- Storage Class Memory
- Protocols:
- · CXL.io
- · CXL.mem



Let's break that down. Consider the right-most model:

- Essentially a NVDIMM but no longer constrained by the physical and electrical requirements of DDR and DIMMs.
- Since the form-factors are PCIe we have more options around the shape, power and heat of these solutions.
- The CXL.io allows for discovery, configuration and management (we can write a PCIe driver for these devices).
- We can put a DMA engine on the Memory Buffer and program that via PCIe to do data movement for us.
- No longer consuming DIMM slots or channels. Save all that capacity and bandwidth for standard DRAM.





### DDR NVDIMM vs CXL NVDIMM

| Attribute         | DDR               | CXL                             | Comment                                                                                                                       |
|-------------------|-------------------|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
| Form-factor       | DIMM              | Many                            | CXL has many form-factor options                                                                                              |
| DMA               | No                | Yes                             | CXL allows placement on DMA engine on device.<br>Can be programmed via PCIe driver.                                           |
| HW Virtualization | No                | SR-IOV                          | NVDIMM can be virtualized via software which impacts performance.                                                             |
| Management        | SMBus and<br>MMIO | SMBus and<br>MMIO and<br>CXL.io | If we adopt NVMe for CXL devices we can use NVMe Management Interface (NVMe-MI).                                              |
| Latency           | Very Low          | Low                             | Until we get hardware it is hard to get comparative<br>numbers for NVDIMM vs CXL.mem to the same<br>memory types (e.g. 3DXP). |
| Throughput        | 19GB/s            | 64GB/s                          | NVDIMM is 64 bits @ 2400MT/s/channel. CXL is (upto) 16 lanes of PCIe Gen 5 in each direction.                                 |





Linux Support for CXL

- (Persistent) Memory discovery will be done via ACPI. This can include Heterogenous Memory Attribute Tables (HMAT) to describe properties of the memory.
- The discovered memory will be added to the physical memory pool.
- We can control how and who this memory is used by to some extent by the numactl framework.
- \*If\* the CXL device has a DMA engine and accelerator(s) these can be programmed via a PCIe driver (perhaps NVM Express).









#### CXL-based NVDIMM:

- Use all the DIMM slots for DRAM, not NVDIMM.
- NVM can be managed by controller chip if needed.
- A lot more flexibility on form-factor, power etc than DDR based NVDIMM.
- A DMA engine on CXL device could assist with data movement.
- Can also be used just to expand volatile memory capacity.







Best-In-Class Storage and Analytic Acceleration delivered via an NVMe-based Computational Storage Processor.

CONFIDENTIAL – EIDETICOM COPYRIGHT 2019





## CXL features for 2.0:

- Improved throughput and latency (PCIe Gen6).
- Switching via enhanced PCIe switches
- Memory pooling (allowing multiple hosts to connect to a pool of (persistent) memory.



### **Conclusions**

- CXL may finally be bringing some clarity to the "Star Wars" of open, coherent buses.
- Minimal software changes needed to deploy (persistent) memory on CXL.
- Adding acceleration and remote PM both possible.
- We all get a pony!



Eideticom HQ 3553 31<sup>st</sup> NW, Calgary, AB, Canada T2L 2K7

Eideticom (Bay Area) 168 South Park, San Francisco, CA 94107 USA

www.eideticom.com

Contact: sales@eideticom.com