

# **NAND Controller Reliability Challenges**

Hanan Weingarten

February 27, 2018

2018 Toshiba Memory America, Inc.

#### Agenda

- Introduction to NAND and 3D technology
- Reliability challenges
- Summary





### Introduction to NAND and 3D technology (1)

- NAND devices store information as charge in a transistor cell
- Transistor cells are chained together in a bit-line
  - Cell sensing:
    - Apply a threshold voltage  $V_{\mathsf{T}}$  to a transistor cell and check for conductance
    - Other cells on bit line are set to pass state
- Block:
  - Many bit-lines are packed together to form a block
- Word-line / Page:
  - Corresponding transistors on block bit-lines form a word-line / page





## Introduction to NAND and 3D technology (2)

- More than a single bit may be stored in a cell
  - One of several charge levels is programmed into the cell
    - E.g., TLC (3 bits per cell) utilizes 8 charge levels
    - Programming is an analog process
      - Accuracy is a function of
        - Technology
        - Programing time
        - ...

Leading Innovation >>>

- Distribution of cell  $V_T$  s across a page
  - Read errors occur when lobe intersect
  - $V_T$  threshold positioning is critical for reliability
  - Number of  $V_T$  thresholds determines read speed
- QLC (4 bits per cell) emerging:
  - Very dense application & low cost
  - Mostly read application
  - Higher probability of read errors





### Introduction to NAND and 3D technology (3)

- Previous technology: NAND strings on planar (2D) silicon structures
- New technology: NAND strings on (3D) silicon structures
  Why 3D BiCS FLASH vs 2D NAND Floating Gate technology?
- Higher Capacity
  - Vertically stacked cell structure enables higher capacity in the same footprint
- Higher Endurance
  - Charge trap cell & memory hole structure increase endurance
- Higher Performance
  - Faster programming speed with 1-shot program called "Full Sequence"
  - Triple pages can be programed simultaneously with fewer steps
- Higher Power Efficiency
  - Triple pages can be programed with almost the same power consumption of a single page program





### Introduction to NAND and 3D technology (4)

#### NAND organization:

- Block:
  - Made of many bit-lines
- Word-line:
  - A set of all cells corresponding to one row across block bit-lines
- Page:

Leading Innovation >>>

- The set of bits corresponding to same bit level in word-line
- Atomic programming operation: page / Word-line programming
  - In 3D-TLC, atomic word-line programming (1-shot) enable faster programming
- Atomic read operation: Page read
- Atomic erase operation: Full block erase



Bit-line 2

Word-line

Word-line 2

Bit-line

### Introduction to NAND and 3D technology (5)

#### • Memory management:

- Atomic erase of full block requires:
  - Flash Translation Layer (FTL)
  - Garbage collection
  - Over provisioning (OP)
  - Result:
    - Write amplification (WA)
      - One host page write => several page writes on the NAND
      - Function of OP





### Introduction to NAND and 3D technology (6)

#### • NAND controller roles depending on application:

- Block / Segment access application:
  - Flash Translation Layer (FTL):
    - Translate host logical address to physical address on NAND
    - Memory Management / Garbage collection
    - Wear leveling

- ...

- Other applications (e.g. Open Channel)
  - No FTL
  - Optional: Bad block management

• ...

- All applications:
  - Error free NAND
    - All reliability issues handled by controller



### Introduction to NAND and 3D technology (7)

- NAND controller requirements & Challenges:
  - High throughputs
  - Low latency and high IOPs
  - Low power (embedded applications)
  - Low gate-count (embedded applications)
  - NAND controller must be adapted for all types of stresses to meet above requirements

#### • NAND controller tools:

- Powerful and unique ECC with hard / soft decoding, low gate count / power, RAID capabilities
- Unique DSP, utilizing machine learning to optimize NAND trim parameters with minimal overheads
- Specialized management and FTL to enable performance and reliability





# **Reliability Challenges (1)**

- NAND reliability deteriorates under different types of stresses
- Program Disturb (PD) stress:
  - NAND reliability deteriorates with Program Erase (P/E) cycles
    - Growing damage to NAND channel
    - Damage is significantly more noticeable when additional stress types are applied
    - Reliability is reflected in read Bit Error Rates (BER) as function of P/E cycles





# **Reliability Challenges (1)**

#### Retention (1):

- A major source for reliability deterioration
- Much more significant following P/E cycles
- Distribution of cell  $V_T$  s across a page exhibit significant changes:
  - Increased charge loss
    - Due to channel damage during P/E cycles —
  - Lobes widen:
    - Increase overlap error probability
    - Significant shift in  $V_T$  read thresholds





# **Reliability Challenges (2)**

#### Retention (2)

- Retention is accelerated through oven bake
- Significant effect to  $V_T$  threshold adaptation
- NAND Controller must adapt  $V_T$  s:
  - Automatically
  - With marginal overheads
- NAND Controller must have improved ECC
  - Support higher BER following retention
  - Support "soft"-decoding







# **Reliability Challenges (3)**

#### Read Disturb (RD) stress:

- Each read behaves as a weak programming operation
- Following RD
  - Significant V<sub>T</sub> shift (opposite direction to retention)
  - Increased BER
- Similar NAND controller requirements for a different regime



## **Reliability Challenges (4)**

#### Cross temperature:

- Temperature changes between programming and reading can affect NAND  $V_T$  distribution
- Some applications may have very large temperature differences
  - E.g. Vehicle industry: -40°C to 105°C
- NAND Controller must adapt to new distributions automatically
  - Reading with default  $V_T$  threshold will lead to read errors





#### Other failure modes:

- Sudden die failures
- Sudden block failure
  - Block failures detected late during read operations
    - Not "bad blocks" detected during programming / erase
- Special failures may require incorporating RAID like solutions at the NAND die / block levels.
  - Allow data recovery
  - Last line of defense
    - Slow recovery method





# Summary (1)

#### NAND controllers do more than just FTL

- Enabling reliability is one of the main tasks
- Similar and dissimilar to communication systems due to special requirements
  - E.g. NAND read penalty

#### • NAND controllers use many methods to improve reliability:

- NAND trim parameter optimization
  - E.g.  $V_T$  threshold optimization
- Unique ECC to support
  - Hard / soft Decoding
  - Low gate-count / power
  - Low latency / high throughputs / IOPs
  - RAID capabilities

Leading Innovation >>>

- Machine learning approach
  - Lower overheads, improve accuracy and performance



# Summary (2)

#### Next Generation Challenges

- QLC enabled by BiCS 3D
  - Significantly tighter distributions
  - Small VT estimation errors will lead to large errors
  - New tradeoffs between reliability and performance
    - Program speeds Vs reliability
    - Read time / gate count / memory Vs reliability





# **TOSHIBA** Leading Innovation >>>