Abstract
In most architectures, data is copied from far memory or storage to a local memory for processing. However, when the size of data is in order of hundreds of gigabytes, it is efficient to move static compute functions near data, instead of copying the bulky data near compute. This is the concept of moving the compute near the memory or building a compute engine inside the memory device is called “Processing in Memory” or “Computational Memory”. This presentation dives deeper in this concept using an example of Kestral, an Intel OptaneTM PMem based accelerator, which attaches up to two terabytes of memory to a single PCIe Gen 4.0 interface.