

# Planning for the Next Decade of NVM Programming

Andy Rudoff Principal Engineer Intel Corporation



# Planning for the Next Decade of NVM Programming

Andy Rudoff Principal Engineer Intel Corporation Member: SNIA NVM Programming TWG

The Story So Far:

#### In the beginning the Universe was created.

# This has made a lot of people very angry and been widely regarded as a bad move.



The Story So Far:

#### In the beginning the Universe was created.

# This has made a lot of people very angry and been widely regarded as a bad move.

- Douglas Adams, The Restaurant at the End of the Universe

**SD**<sup>©</sup>

2015 Storage Developer Conference. © Intel Corporation. All Rights Reserved.

| Table 1. Comparison | of data storage | technologies. (Di | ata drawn from publi | ic sources and HP internal | research). |
|---------------------|-----------------|-------------------|----------------------|----------------------------|------------|
|---------------------|-----------------|-------------------|----------------------|----------------------------|------------|

|                                  | Memristor | PCM                              | STT-<br>RAM      | DRAM                                                          | Flash                            | HD                               |
|----------------------------------|-----------|----------------------------------|------------------|---------------------------------------------------------------|----------------------------------|----------------------------------|
| Chip area per bit<br>(F²)        | 4         | 8–16                             | 14-64            | 6–8                                                           | 4–8                              | n/a                              |
| Energy per bit (pJ) <sup>2</sup> | 0.1–3     | 2–100                            | 0.1–1            | 2–4                                                           | 10 <sup>1</sup> -10 <sup>4</sup> | 10 <sup>6</sup> —10 <sup>7</sup> |
| Read time (ns)                   | <10       | 20–70                            | 10-30            | 10–50                                                         | 25,000                           | 5-8x10 <sup>6</sup>              |
| Write time (ns)                  | 20–30     | 50–500                           | 13–95            | 10–50                                                         | 200,000                          | 5-8x10 <sup>6</sup>              |
| Retention                        | >10 years | <10 years                        | Weeks            | <second< td=""><td>~10 years</td><td>~10 years</td></second<> | ~10 years                        | ~10 years                        |
| Endurance (cycles)               | ~1012     | 10 <sup>7</sup> -10 <sup>8</sup> | 10 <sup>15</sup> | >1017                                                         | 10 <sup>3</sup> 10 <sup>6</sup>  | 10 <sup>15</sup> ?               |
| 3D capability                    | Yes       | No                               | No               | No                                                            | Yes                              | n/a                              |

Source: <u>http://www8.hp.com/hpnext/posts/beyond-dram-and-flash-part-2-new-memory-technology-data-deluge</u>



# **Moving the Focus to SW Latency**



App to SSD IO Read Latency (QD=1, 4KB)

SD @

#### **Memory or Storage?**



# SD (E

# **Persistence:**

#### Storage

- Block I/O only
- Sync or AsyncDMA master
- High Capacity?Drive-serviceability?
- NAND, NVMe, PCIe

15

#### pmem

- Byte addressable
- Sync (probably)
  - DMA slave

Growing Capacity?NVDIMM

Not NAND, NVMePCle?



# pmem: The New Tier

- **Byte-addressable memory, but persistent**
- Must be reasonable to stall a CPU waiting for a load to finish
  - So, not NAND NVM based
- Can do small I/O
  - DIMMs are 64B cache line accessible
- Can DMA to it
  - Receive data from network directly to persistence!





2015 Storage Developer Conference. © Intel Corporation. All Rights Reserved.

# **Defining the NVM Programming Model**





SD 📧

#### **Recent Announcements**

#### □ Intel® 3D XPoint<sup>™</sup> Technology

#### **The Intel DIMM**



12









2015 Storage Developer Conference. © Intel Corporation. All Rights Reserved.

## **The Memory Timeline**

SD (E



2015 Storage Developer Conference. © Intel Corporation. All Rights Reserved.





**SD**<sup>©</sup>

# (Announced) State of SW Ecosystem

- Detecting pmem
  - BIOS creates ACPI-style information for OS
  - Defined in ACPI 6.0
- Linux support upstream
  - Exposing pmem as block storage
    - □ Generic NVDIMM driver for Linux released
  - Exposing pmem for direct access
    - Linux DAX upstream
  - Naming pmem areas
    - Linux ext4+DAX support upstream
  - KVM Changes upstream
- Support in other operating systems emerging
  - Neal's talk Yesterday
    - □ Storage Class Memory Support in the Windows Operating System
  - Heavy OSV Involvement in TWG

#### The Next Decade...





2015 Storage Developer Conference. © Intel Corporation. All Rights Reserved.

### **Transparency Levels**



### **Transparency Levels**



# **One Transparent Example:** *pmem Paging*



| NVDIMM |
|--------|
|        |



# Paging from the OS Page Cache



# **Attributes of Paging**

(and why everyone avoids it)

- Major page faults
  - Block I/O (page I/O) on demand
  - Context switch there and back again
  - Latency of block stack
- Available memory looks much larger
  - But penalty of fault is significant
- Page in must pick a victim
  - Based on simplistic R/M metric
  - Can surprise an application
- Many enterprise apps opt-out
  - Managing page cache themselves
  - Using intimate date knowledge for paging decisions
- Interesting example: Java GC

# Paging to pmem



# When Will pmem Paging be Cost Effective?

When pmem costs less than (or close to) DRAM
When pmem performance approaches DRAM
When pmem capacity becomes significant

When pmem is as reliable as memory

Probably needs to exceed memory reliability due to the fact it is persistent



# Not just for pmem...

- High-bandwidth memory
- NUMA localities

# Different NVM technologies



# **Extending into User Space**





#### NVM Library: pmem.io 64-bit Linux Alpha Release

SD (15



# **Replication Challenge of pmem**



# **RDMA to pmem**





# **Non-Transparent pmem Use Cases**

- Volatile caching
  - Due to capacity, relative simplicity
- In-memory database
- Storage appliance write cache
  - Also for large structures like dedup tables
  - Leverage RDMA capability
- □ Large, byte-addressable data structures
  - Example: HBASE hash table
- HPC
  - Example: checkpoint
  - Example: distributed versioned object store

# **Prediction: The Sweet Spots**



# **Prediction: The Big Challenge**

SD (E



## Summary...

SD 🕑



2015 Storage Developer Conference. © Intel Corporation. All Rights Reserved.

# **Summary**

- Building a SW ecosystem for pmem
  - Won't overcome Enterprise time-to-adoption, but...
  - Linux support upstream
  - Other operating systems progressing
- Cost versus Benefit Challenge
  - Cost of Emerging NVM
  - Cost of application complexity
  - Fall back to transparency at various levels
- What you can do to prepare
  - Learn NVM programming model
  - Map use cases to pmem
  - Contribute to libraries, SW ecosystem



# **Links to More Information**

- **SNIA NVM Programming Model** 
  - http://www.snia.org/forums/sssi/nvmp
- Intel® Architecture Instruction Set Extensions Programming Reference
  - <u>https://software.intel.com/en-us/intel-isa-extensions</u>
- Open Source NVM Library work
  - http://pmem.io
- Linux kernel support & instructions
  - https://github.com/01org/prd
- □ ACPI 6.0 NFIT definition (used by BIOS to expose NVDIMMs to OS)
  - http://www.uefi.org/sites/default/files/resources/ACPI\_6.0.pdf
- Open specs providing NVDIMM implementation examples, layout, BIOS calls:
  - http://pmem.io/documents/
- □ Google group for pmem programming discussion:
  - http://groups.google.com/group/pmem