Tuning DDR4 for Power and Performance

Mike Micheletti
Product Manager
Teledyne LeCroy
Agenda

- Introduction – DDR4 Technology
- Expanded role of MRS
- Power Features Examined
- Reliability Features Examined
- Performance Features Examined
DDR4 Goals & Motivations

- Spec development started in 2005; Official JEDEC release Aug 2012
- **2x Bandwidth**
  - Up to 3.2 Gbps (per pin)
- **Evolutionary Path**
  - Single Ended Signaling
  - Same clocking
- **Lower Cost**
  - 8 Bit prefetch, same core frequency
- **Power Savings**
  - 30-40% power saving (vs DDR3L),
    - tCAL, LP-ASR, etc..
- **Improved Reliability**
  - C/A parity, CRC, MPR readout, etc…

Analysts: 50% market penetration by 2015/2016
New DDR4 Features Categorized

- Gear Down Mode
- Internal Vref DQ
- DQ Training with MPR
- Per DRAM Addressability
- 2133 to 3200 MT/s signaling
- Bank Groups
- Fine Granularity Refresh
- Self Refresh Abort
- Reliability (RAS)
- Performance
- Signalling
- Test
- Power
- TCSR
- TCAR
- CS to CMD Latency (TCAL)
- VDDQ Term
- Max Power Saving Mode
- 0.5KB Page size
- DBI
- 3DS
- Write CRC
- CA Parity
- Multipurpose Register (MPR) Readout
## DDR4 Compared to DDR3

<table>
<thead>
<tr>
<th>Spec Items</th>
<th>DDR3</th>
<th>DDR4</th>
</tr>
</thead>
<tbody>
<tr>
<td>Density / Speed</td>
<td>512Mbp~8Gb</td>
<td>2Gb~16Gb</td>
</tr>
<tr>
<td></td>
<td>1.6~2.1Gbps</td>
<td>1.6~3.2Gbps</td>
</tr>
<tr>
<td>Voltage (VDD/VDDQ/VPP)</td>
<td>1.5V/1.5V/NA</td>
<td>1.2V/1.2V/2.5V</td>
</tr>
<tr>
<td></td>
<td>(1.35V/1.35V/NA)</td>
<td></td>
</tr>
<tr>
<td>Vref</td>
<td>External Vref (VDD/2)</td>
<td>Internal Vref (Req. training)</td>
</tr>
<tr>
<td>Data IO</td>
<td>CTT (34 ohm)</td>
<td>POD (34 ohm)</td>
</tr>
<tr>
<td>CMD/ADDR IO</td>
<td>CTT</td>
<td>CTT</td>
</tr>
<tr>
<td>Strobe</td>
<td>Bi-dir / differential</td>
<td>Bi-dir / differential</td>
</tr>
<tr>
<td># of banks</td>
<td>8 banks</td>
<td>16 banks (4 BG)</td>
</tr>
<tr>
<td>Core architecture</td>
<td>1KB / 1KB / 2KB</td>
<td>512B / 1KB / 2KB</td>
</tr>
<tr>
<td></td>
<td>8 bits</td>
<td>8 bits</td>
</tr>
<tr>
<td>Added functions</td>
<td>RESET/ZQ/Dynamic ODT</td>
<td>+ CRC/DBI/Multi preamble</td>
</tr>
<tr>
<td>Physical</td>
<td>78 / 96 BGA</td>
<td>78 / 96 BGA</td>
</tr>
</tbody>
</table>
## DDR4: Command Encoding

<table>
<thead>
<tr>
<th>/CS</th>
<th>BGn, BAn</th>
<th>/ACT</th>
<th>A17</th>
<th>A16 /RAS</th>
<th>A15 /CAS</th>
<th>A14 /WE</th>
<th>A13</th>
<th>A12</th>
<th>A11</th>
<th>A10</th>
<th>A9–0</th>
<th>Command</th>
</tr>
</thead>
<tbody>
<tr>
<td>H</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>Deselect (No operation)</td>
</tr>
<tr>
<td>L</td>
<td>bank</td>
<td>L</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>Active (activate): open a row</td>
</tr>
<tr>
<td>L</td>
<td>x</td>
<td>H</td>
<td>x</td>
<td>H</td>
<td>H</td>
<td>H</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>No operation</td>
</tr>
<tr>
<td>L</td>
<td>x</td>
<td>H</td>
<td>x</td>
<td>H</td>
<td>H</td>
<td>L</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>ZQ Calibration</td>
</tr>
<tr>
<td>L</td>
<td>bank</td>
<td>H</td>
<td>x</td>
<td>H</td>
<td>L</td>
<td>H</td>
<td>x</td>
<td>BC</td>
<td>x</td>
<td>AP</td>
<td></td>
<td>Read (BC=burst chop)</td>
</tr>
<tr>
<td>L</td>
<td>bank</td>
<td>H</td>
<td>x</td>
<td>H</td>
<td>L</td>
<td>L</td>
<td>x</td>
<td>BC</td>
<td>x</td>
<td>AP</td>
<td></td>
<td>Write (AP=auto-precharge)</td>
</tr>
<tr>
<td>L</td>
<td>x</td>
<td>H</td>
<td>x</td>
<td>L</td>
<td>H</td>
<td>H</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>(Unassigned, reserved)</td>
</tr>
<tr>
<td>L</td>
<td>x</td>
<td>H</td>
<td>x</td>
<td>L</td>
<td>H</td>
<td>L</td>
<td>x</td>
<td>H</td>
<td>x</td>
<td></td>
<td></td>
<td>Precharge all banks</td>
</tr>
<tr>
<td>L</td>
<td>bank</td>
<td>H</td>
<td>x</td>
<td>L</td>
<td>H</td>
<td>L</td>
<td>x</td>
<td>L</td>
<td>x</td>
<td></td>
<td></td>
<td>Precharge one bank</td>
</tr>
<tr>
<td>L</td>
<td>x</td>
<td>H</td>
<td>x</td>
<td>L</td>
<td>L</td>
<td>H</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>Refresh</td>
</tr>
<tr>
<td>L</td>
<td>register</td>
<td>H</td>
<td>0</td>
<td>L</td>
<td>L</td>
<td>L</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td>data</td>
<td>Mode register set (MR0–MR6)</td>
</tr>
</tbody>
</table>

8/16/2013
Testing DDR4 Protocol

- Fast, Easy Connection & Setup
  - No Calibration needed
- Comprehensive Bus Analyzer for DDR3 & DDR4
  - Traditional Waveform & State Listings
- Real-Time JEDEC Error Triggering
  - Detects over 65 JEDEC bus & timing violations
New MRS Commands  (MR4 – MR6)

- New Features enabled with MRS:
  - Auto-Self Refresh / Low Power Auto Self Refresh
  - CRC and C/A Parity Error Check
  - Host Tx / Rx Training Pattern
  - Per DRAM addressability (PDA)
  - Internal DQ Vref per DRAM
  - Gear-down mode (for C/C/A)
  - Dynamic ODT
  - CAL mode
## DDR4 Mode Register Set (MRS) Overview

<table>
<thead>
<tr>
<th>Register</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>MR0</td>
<td>Write Mode Register 0</td>
</tr>
<tr>
<td>MR1</td>
<td>Write Mode Register 1</td>
</tr>
<tr>
<td>MR2</td>
<td>Write Mode Register 2</td>
</tr>
<tr>
<td>MR3</td>
<td>Write Mode Register 3</td>
</tr>
<tr>
<td>MR4</td>
<td>Write Mode Register 4</td>
</tr>
<tr>
<td>MR5</td>
<td>Write Mode Register 5</td>
</tr>
<tr>
<td>MR6</td>
<td>Write Mode Register 6</td>
</tr>
<tr>
<td>MR7</td>
<td>Write Mode Register 7</td>
</tr>
</tbody>
</table>

**MPR Read Format**

- **DLL always Enabled**
- **MPR Read Format**
- **CRC Clear & Parity Error Status**
Key Design Challenge: DQ Training with MPR

- DDR4 allows custom patterns for DQ training
  - Host uses MR3 \([A2=1]\) command to initiate DQ Training
  - READ BA[1:0] defines the MPR Location (pattern)
Performance Features: DQ Training Sequence

[Diagram showing clock, timing violation, DDR command, command value, CKE, chip select, address, BG, BA, reset, alert, parity, and a section labeled [MRS] with options:

- MR Select: MR3
- MPR page Selection: Page 0
- MPR operation: Dataflow from/to MPR
- Geardown Mode: 1/2 Rate
- Per DRAM Addressability: Disable
- Temperature sensor readout: Disable
- Fine Granularity Refresh Mode: Normal (Fixed 1x)
- Write CMD Latency when CRC and DM are enabled: 4nCK
- MPR Read Format: Serial]
READ MPR0 (default pattern) Location 0

- Back-to-Back Read from MPR is allowed with tCCD=4 nCK for seamless operation
DDR4: Power Features

- Reduced Voltages (1.2V)
- VDDQ Termination (POD)
- External Vpp
- Dynamic Bus Inversion (DBI)
- 0.5KB Page size
- Temperature controlled Refresh (SR)
- Low Power Auto-self Refresh (LP ASR)
- CS to CMD Latency (tCAL)
- Max Power Saving Mode (MPSM)
Power Features Examined

- Reduced Vdd (Voltage)
  - DDR4 Standard (1.2V)
  - DDR4L (1.05V >?)

![Graph showing power features comparison between different DDR types](Image)
Power Features Examined

- VDDQ Termination
  - DDR3 utilizes center tap termination
  - DDR4 utilizes VDDQ termination
    - “Pseudo open drain” signaling
    - Reduces IO current draw
  - DBI: minimize number of zeroes
    - Increase % of bits stored as “1”
    - Improves Performance & Signal integrity
      - Lower “Synchronously switching output” noise
Power Features Examined

• External Vpp for Word-line Voltage
  - DDR3 utilizes on-die voltage pump to generate higher word line voltage
  - DDR4 utilizes Separate Vpp voltage rail
    - Externally supplied Vpp @ 2.5V enables more energy efficient memory system
    - Reduces voltage draw & die space
Command Address Latency (CAL)

- Command and Address receivers disabled (MR4)
- CS# used to wakeup the receivers
- CMD and ADDR sent after a delay of tCAL (latency 3 clocks at 2.1GT/s)

Power savings:
- 23% for $I_{dd}$
- 10% for $I_{dd0}$
- 13% TDP (dual rank DIMM's)
Command Address Latency (CAL)

- Switching Ranks adds CAL Latency
- CAL mode introduces more latency in multi-threaded IO
Command Address Latency (CAL)

- CAL mode is better for sequential IO operations
  - Only impacts DRAM when exiting from IDLE
Power Savings: Server DDR4 vs. DDR3 (Heavy Utilization)

DDR4 results based on Intel projected values for IDD.
DDR3x results based on supplier provided Idd values.
DDR4: Features

- Reliability
  - CRC on Writes
  - MPR Error Log
  - Command / Address Parity check
CMD / ADDR Parity Checking

- When enabled – SDRAM verifies parity before executing the command
  - Command and the address lines only
- Additional delay (parity latency) for tMRD & tMOD (4 to 6 CLKs)
  - PL ranges from 4nCK to 6nCK, depending on clock rate

![Diagram showing command and address timing with parity checking](image-url)
CMD / ADDR Parity Error Detection

Controller sees ALERT = “LOW” for >“48” nCK

4 CLKs
DDR4: Features

- **Performance**
  - Signaling 1066MHz to 1.6GHz (2133 to 3200 MTs)
  - Training - Preamble training; Internal DQ Vref
  - Gear down mode - For speeds above 2666 MT/s
    - CMD/CTR/ADDR sent at 2t Timing

- **Bank Groups**
Bank Groups:

- DDR4 Similar latency….but higher data rates
  - So more requests must be kept in-flight to realize higher bandwidth
- DDR4 supports 16 banks divided into 4 bank groups
  - 4 Bank Groups at x4 & x8
  - 2 Bank Groups at x16

Separate IO gating structures allow faster Write-to-Read turnaround between BG
Bank Group RRD_L, CCD_L, WTR_L Violations

Bank Groups require higher latency between ACTIVATE to same BG

<table>
<thead>
<tr>
<th></th>
<th>1600</th>
<th>1866</th>
<th>2133</th>
<th>2400</th>
</tr>
</thead>
<tbody>
<tr>
<td>nRRDS</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>x4</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>x8</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>x16</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td>7</td>
</tr>
<tr>
<td>nRRDL</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>x4</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td>6</td>
</tr>
<tr>
<td>x8</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td>6</td>
</tr>
<tr>
<td>x16</td>
<td>6</td>
<td>6</td>
<td>7</td>
<td>8</td>
</tr>
<tr>
<td>tCCD_S</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>4</td>
</tr>
<tr>
<td>tCCD_L</td>
<td>5</td>
<td>5</td>
<td>6</td>
<td>6</td>
</tr>
<tr>
<td>tWTR_S</td>
<td>2</td>
<td>3</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>tWTR_L</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
</tr>
</tbody>
</table>
tRRD-L Violation Check
Row Hammer

Aggressive row activations can corrupt adjacent rows

- A bank of memory is loaded with valid data (green bits)
- If one row is repeatedly activated without a regular refresh, the crosstalk with the rows directly above and below deteriorates the data in the neighboring rows. These rows are called “victim rows”.
- Once the rows are sufficiently deteriorated, errors appear. Additional activation of neighboring rows increase the number of errors.
- When data has been compromised, even a refresh cannot recover the data. The information is lost permanently.
### Row Usage Report

![Cycle Report](image)

<table>
<thead>
<tr>
<th>Row Address</th>
<th>Min Per Cycle</th>
<th>Max Per Cycle</th>
<th>Avg Per Cycle</th>
<th>Exceed Count</th>
</tr>
</thead>
<tbody>
<tr>
<td>R0 BG2 BA2 0x6B13</td>
<td>241</td>
<td>433</td>
<td>337</td>
<td>0</td>
</tr>
<tr>
<td>R1 BG2 BA0 0x6B13</td>
<td>254</td>
<td>312</td>
<td>283</td>
<td>0</td>
</tr>
<tr>
<td>R1 BG1 BA1 0x6B3A</td>
<td>654</td>
<td>736</td>
<td>695</td>
<td>0</td>
</tr>
<tr>
<td>R1 BG0 BA3 0x6B13</td>
<td>44</td>
<td>88</td>
<td>66</td>
<td>0</td>
</tr>
<tr>
<td>R1 BG0 BA2 0x6B13</td>
<td>45</td>
<td>95</td>
<td>70</td>
<td>0</td>
</tr>
<tr>
<td>R0 BG3 BA3 0x6B3A</td>
<td>84</td>
<td>88</td>
<td>86</td>
<td>0</td>
</tr>
<tr>
<td>R0 BG3 BA2 0x6B...</td>
<td>122</td>
<td>448</td>
<td>285</td>
<td>0</td>
</tr>
<tr>
<td>R0 BG3 BA1 0x6B...</td>
<td>696</td>
<td>744</td>
<td>720</td>
<td>0</td>
</tr>
<tr>
<td>R0 BG3 BA0 0x6B13</td>
<td>1238</td>
<td>4818</td>
<td>3028</td>
<td>2</td>
</tr>
<tr>
<td>R0 BG2 BA1 0x6B...</td>
<td>86</td>
<td>128</td>
<td>107</td>
<td>0</td>
</tr>
<tr>
<td>R0 BG2 BA0 0x6B...</td>
<td>45</td>
<td>128</td>
<td>86.5</td>
<td>0</td>
</tr>
<tr>
<td>R0 BG1 BA3 0x6B18</td>
<td>86</td>
<td>128</td>
<td>107</td>
<td>0</td>
</tr>
<tr>
<td>R1 BG3 BA3 0x6B13</td>
<td>24</td>
<td>64</td>
<td>44</td>
<td>0</td>
</tr>
</tbody>
</table>
## DDR4 Features: Payback & Pitfalls

<table>
<thead>
<tr>
<th>Feature</th>
<th>Server</th>
<th>Workstation</th>
<th>Mobile</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.5KB Page size</td>
<td>✔️</td>
<td>☐</td>
<td>☐</td>
</tr>
<tr>
<td>Temperature controlled Refresh (SR)</td>
<td>✔️</td>
<td>✔️</td>
<td>✔️</td>
</tr>
<tr>
<td>Low Power Auto-self Refresh (LP ASR)</td>
<td>✔️</td>
<td>✔️</td>
<td>✔️</td>
</tr>
<tr>
<td>CS to CMD Latency (tCAL)</td>
<td>☐</td>
<td>☐</td>
<td>✔️</td>
</tr>
<tr>
<td>Data Bus Inversion (DBI)</td>
<td>✔️</td>
<td>✔️</td>
<td>✔️</td>
</tr>
<tr>
<td>Training</td>
<td>✔️</td>
<td>✔️</td>
<td>✔️</td>
</tr>
<tr>
<td>Bank Groups</td>
<td>✔️</td>
<td>✔️</td>
<td>✔️</td>
</tr>
<tr>
<td>Gear down mode</td>
<td>✔️</td>
<td>✔️</td>
<td>✔️</td>
</tr>
<tr>
<td>CRC on Writes</td>
<td>✔️</td>
<td>☐</td>
<td>☐</td>
</tr>
<tr>
<td>Command / Address Parity check</td>
<td>☐</td>
<td>☐</td>
<td>☐</td>
</tr>
</tbody>
</table>
Questions >?
Thank You...!

Email: Mike.Micheletti@teledynelecroy.com
Web Site: http://www.TeledyneLecroy.com/
Bank Group Analysis

Bank State View extrapolates READ / WRITE density by Bank Group