Netronome Memory Ecosystem
Network Traffic Growing Exponentially
And is Composed of Rich Multi-Media Traffic

<table>
<thead>
<tr>
<th>Traffic Type</th>
<th>Percentage</th>
</tr>
</thead>
<tbody>
<tr>
<td>9.7% – Outside of Top Five</td>
<td></td>
</tr>
<tr>
<td>3.0% – Communications</td>
<td></td>
</tr>
<tr>
<td>Email, chat, voice and video communication services</td>
<td></td>
</tr>
<tr>
<td>Skype, WhatsApp, iMessage, FaceTime, MSN Messenger</td>
<td></td>
</tr>
<tr>
<td>4.0% – Marketplaces</td>
<td></td>
</tr>
<tr>
<td>Marketplaces where customers make purchases and downloads</td>
<td></td>
</tr>
<tr>
<td>Android Marketplace, iTunes, Windows Update</td>
<td></td>
</tr>
<tr>
<td>12.0% – File Sharing</td>
<td></td>
</tr>
<tr>
<td>File distribution by peer-to-peer or newsgroup methods</td>
<td></td>
</tr>
<tr>
<td>BitTorrent, eDonkey, Gnutella, Newsgroups</td>
<td></td>
</tr>
<tr>
<td>12.7% – Web Browsing</td>
<td></td>
</tr>
<tr>
<td>Web browsing protocols, specific websites</td>
<td></td>
</tr>
<tr>
<td>HTTP, WAP browsing, SaaS Business Apps</td>
<td></td>
</tr>
<tr>
<td>58.6% – Real-Time Entertainment</td>
<td></td>
</tr>
<tr>
<td>Live streaming or buffered audio and video distribution</td>
<td></td>
</tr>
<tr>
<td>Netflix, Hulu, YouTube, Pandora, Grooveshark, Last.fm</td>
<td></td>
</tr>
</tbody>
</table>

Source: Sandvine Global Internet Phenomena Report - 2H 2012
Network Flow Processing
A Memory Intensive and Latency Sensitive Endeavor

• A “flow” is a uni-directional sequence of packets all sharing a set of common packet headers.

• Stateful flow processing requires significant memory capacity and bandwidth to, for example, maintain state information across all packets in a flow. This is in addition to the standard memory needs of conventional state-less processing.

• At 400 Gbps, there are 600 million packets/second assuming a minimum Ethernet frame size and overhead. This implies a processing budget of 1.68 nano-seconds per packet. Hence the latency sensitive nature of network operations and NFP processing.

• See, for example,:  
  • “The Strange World of Networking Memory,” David Chapman, GSI Technology, MEPTEC Roadmaps for Multi Die Integration, November 14, 2012;  
  • “Bandwidth Engine® 2 – Macro MSR820 Supports up to 400 GE,” Michael J. Miller, MoSys, Linley Tech Data Center Conference, February 6, 2013; and  
NFP-62XX Architecture & A Use Case

- **ARM11 256K L2 CACHE**
- **72x10GE**
- **18x40GE**
- **7x100GE**
- **7x100G ILKN**
- **4x8 PCI-GEN3**

- **BULK CRYPTO**
- **CAM ATOMIC QUEUE HASH**
- **TRAFFIC MANAGER**

- **ADAPTIVE MEMORY CONTROLLER**
  (DDR3-2133)
  ACCELERATORS

- **>10 Tbps INTERNAL BANDWIDTH**

- **24MB PROXIMITY MEMORY**

- **8MB PROXIMITY MEMORY**

- **96 NPP**
  (PACKET PROCESSOR Cores)

- **120 FPC**
  (FLOW PROCESSING COREs)

- **ACCELERATORS**

- **>10 Tbps INTERNAL BANDWIDTH**

- **CPU Virtualization**

- **PCle I/O Virtualization**

- **Multicore x86 Flow Processor**

- **BULK CRYPTO**
- **CAM ATOMIC QUEUE HASH**
- **TRAFFIC MANAGER**

- **ARM11 256K L2 CACHE**
- **72x10GE**
- **18x40GE**
- **7x100GE**
- **7x100G ILKN ILKN-LA**
- **4x8 PCI-GEN3**

- **ACCELERATORS**

- **>10 Tbps INTERNAL BANDWIDTH**

- **CPU Virtualization**

- **PCle I/O Virtualization**

- **Multicore x86 Flow Processor**
## External Memory-Based SKUs in the NFP-32XX and IXP Based Acceleration Card Product Line

<table>
<thead>
<tr>
<th>Integrated Circuit</th>
<th>NFE-3240-20F-CB-10</th>
<th>NFE-3240-20F-CA-10</th>
<th>NFE-3240-6C-CA-10</th>
<th>NFE-3240-1N-CA-10</th>
<th>NFE-3240-1N-CC-10</th>
<th>NFE-3240-20F-DC-00</th>
<th>NFE-3240-6C-DC-00</th>
<th>NFE-i8000 (EOL)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Clock</td>
<td>1.2Ghz</td>
<td>1.1Ghz</td>
<td>1.1Ghz</td>
<td>1.2Ghz</td>
<td>1.2Ghz</td>
<td>1.0Ghz</td>
<td>1.0Ghz</td>
<td>1.4Ghz</td>
</tr>
<tr>
<td>TCAM</td>
<td>36Mb</td>
<td>36Mb</td>
<td>36Mb</td>
<td>36Mb</td>
<td>Algorithmic (SW)</td>
<td>Algorithmic (SW)</td>
<td>Algorithmic (SW)</td>
<td>9 Mb</td>
</tr>
<tr>
<td>QDR2 SRAM</td>
<td>32MB</td>
<td>32MB</td>
<td>32MB</td>
<td>32MB</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>40 MB</td>
</tr>
<tr>
<td>DDR3 DRAM</td>
<td>8GB</td>
<td>4GB</td>
<td>4GB</td>
<td>4GB</td>
<td>4GB</td>
<td>4GB</td>
<td>4GB</td>
<td>768 MB (RDRAM)</td>
</tr>
<tr>
<td>Crypto and PKI</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>No</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr>
<td>Acceleration</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>PCI-e Form Factor</td>
<td>Full length, full profile</td>
<td>Full length, full profile</td>
<td>Full length, full profile</td>
<td>Full length, full profile</td>
<td>Full length, full profile</td>
<td>Half length, full profile</td>
<td>Half length, full profile</td>
<td>Full length, full profile</td>
</tr>
</tbody>
</table>
NFP-Based PCIe Acceleration Cards
## NFP Memory Allocation Trend and 2.5D Opportunity

<table>
<thead>
<tr>
<th></th>
<th>NFP-3200</th>
<th>NFP-6200</th>
<th>Hypothetical NexGen NFP</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Year of Introduction</strong></td>
<td>2010</td>
<td>2014</td>
<td>2016-2017</td>
</tr>
<tr>
<td><strong>Ethernet Standards</strong></td>
<td>1 and 10 Gbps</td>
<td>1, 10, 40 and 100 Gbps</td>
<td>400 Gbps</td>
</tr>
<tr>
<td><strong>Flow Processing Line Rate</strong></td>
<td>20 Gbps</td>
<td>200 Gbps</td>
<td>800 Gbps – 1Tbps</td>
</tr>
<tr>
<td><strong>Process Node &amp; Frequency</strong></td>
<td>65nm</td>
<td>22nm</td>
<td>14nm</td>
</tr>
<tr>
<td><strong>Frequency</strong></td>
<td>1.2 GHz – 1.4 GHz</td>
<td>1.2 GHz</td>
<td>1.2 GHz – 2.4 GHz</td>
</tr>
<tr>
<td><strong>Package</strong></td>
<td>40mm by 40mm BGA</td>
<td>45mm by 45mm BGA</td>
<td>2.5D BGA</td>
</tr>
<tr>
<td><strong>On-Die Memory</strong></td>
<td>2 MByte</td>
<td>30 MByte</td>
<td>&gt; 60 MByte Hierarchical Memory</td>
</tr>
<tr>
<td><strong>On 2.5D Interposer Memory</strong></td>
<td>N/A</td>
<td>N/A</td>
<td>800+ Gbps 2-4 GByte Low Latency</td>
</tr>
<tr>
<td><strong>External Memory</strong></td>
<td>&lt;= 8 GByte – DDR3</td>
<td>&lt;= 24 GByte - DDR3</td>
<td>&lt;= 48 GByte - DDR4</td>
</tr>
</tbody>
</table>
NFP-6200 Hierarchical Processing Memories

Proximity memories
- Process data locally
- Deliver data to Cores without blocking

Memory hierarchy
- Core Register Files
- Core Local Memory
- Cluster Local Scratch
- Cluster Target Memory
- Internal MU
- DDR3-backed External MU

Internal Memory Unit
(150-250 cycles, 4MB SRAM, HLR tables, locks, statistics, packet bodies, visible to the system)

External Memory Unit
(150-750 cycles, 2MB cache, 1MB SRAM, 4GB DDR3, LLR tables, packet bodies, queues, rings, visible to the system)

Cluster Target Memory (110 cycles, 256kB, tables, locks, rings, 256B packet, packet metadata, visible to the system)

Local Memory (4kB, visible to FPC)

Push Transfer RF
Flow Processing Core

Cluster Local Scratch (50 cycles, 64kB, tables, locks, visible to cluster FPCs)

High-Performance, Latency-Tolerant Architecture, with Large On-Die Memory
Current Generation NFP Memory Hierarchy

- **Local Memory** – 0 cycle latency
- **Cluster Local Scratch** - 20-50 cycle latency; 2048 Gbps
- **Cluster Target Memory** – 50-100 cycle latency; 4096 Gbps
- **Internal Memory Unit** – 150-250 cycle latency; 1280 Gbps
- **External Memory Unit** – 150-500 cycle latency; 400 Gbps

**On-Die SRAM**
- 30 MByte

**External DDR3 Based Memory**; 400 Gbps

**On-PCB DRAM**
- 24 GByte
"A 22nm High Performance Flow Processor for 200Gb/s Software-Defined Networking."

Gavin Stark, Netronome CTO  
Sakir Sezer, Netronome Fellow  

Hot Chips  
August 25-27, 2013  
Stanford Memorial Auditorium  
Palo Alto, CA
2.5D Integration

- 1 CMOS Buffer Die
- RAM Die or Die Stack
- Processor Die
- Si Substrate
- Package Substrate
- PC Board
NexGen NFP Memory Hierarchy and Allocation

- **Next Generation Hierarchical**
  - On-Die SRAM (and/or embedded DRAM)

- **2.5D On-Interposer Memory**
  - High Bandwidth (wide I/O or high speed serial)
  - Relatively Low Latency

- **External DDR4**
  - On-PCB DRAM
  - High Capacity (e.g., HMC) Standard DRAM latency
NFP and 2.5D Package Technology

• 2.5D NFP Objectives

  • Reduce on-die memory area by off-loading memory into a 2.5D package;
  • Optimize package size by minimizing memory related ball count;
  • Provide bandwidth and latency that complements on-die and on-PCB memory;
  • Minimize power by employing low power direct (bump-to-bump) memory interfaces;
  • Leverage industry standard technologies and components;
  • Leverage ability to mix process node die (e.g., 14nm NFP and higher node memory die);
  • Meet networking market reliability and cost requirements; and
  • Enable a range of NFP product SKUs of varying memory capacity.
Conservative 2.5D Interconnect Estimates

• Constraints:
  • Maximum package size of 50mm by 50mm
  • Silicon interposer area limited by reticle area (approximately 26mm by 32mm)

• Laminate interposer supports 5,000+ interconnects

• Silicon interposer supports 90,000+ interconnects
NexGen NFP Reliability Issues

• Thermal Coupling
  • NFP – Tjmax of 125°C; Tj continuous of 110°C
  • DRAM – Tjmax of 110°C; Tj continuous of 100°C
  • NFP power of 50W+ can create a thermal coupling issue

• Soft Error Rate (SER)
  • Typical networking SER target of \(\leq 1000\) FITs
  • All on-die NFP memories employ ECC
  • Therefore, mandatory for 2.5D on-interposer memory to support ECC.
Case Study Impressions

• Hybrid Memory Cube
  • Serdes interface favorable for reducing package ball count;
  • Density favorable for reducing board area, although package size is 31mm by 31mm;
  • Max. aggregate bandwidth of 320 GB/s;
  • Latency an issue;
  • Is there an opportunity for a low latency controller and/or architecture?
MCM of HMC and NFP

- Laminate interposer
- Size of approximately 70mm by 90mm
- Potentially reduces board complexity
Case Study Impressions (continued)

- High Bandwidth Memory
  - JEDEC standard not yet released
  - Interception point and industry support?
  - DRAM-like latency?
Strawman 2.5D NFP Memory Opportunity

• High bandwidth I/O supporting direct (bump-to-bump) interconnect;
• High capacity and density – 2-4 Gbyte + NFP in a 50mm by 50mm maximum package;
• Low latency, multi-bank configuration;
• Bumped KGD or bumped KGD stack;
• Zero SER Impact; and
• Fabless supply chain compatible
Oceans of Packet Data

- 5B New Devices Per Year
- $32B Cloud Services
- $10B Private Internets
- $8.2B Security Equipment
- $14.8B Network Infrastructure

TURNED INTO

Inspect, Classify, Structure, Secure

- 50Gbps Security Processing
- 30MB Optimized On-Chip Memory
- 100 Dedicated Accelerators
- 96 Packet Processing Cores
- 120 Flow Processing Cores
- 720Gbps Processor I/O

Billions of Manageable Flows

- Cyber Security: NGFW, SSL Inspection, IPS
- SW-Defined Networking: OPENFLOW X/R, LOAD BALANCING, SERVICE INSERTION
- Server Virtualization

“Network trends favor Netronome.”
Bob Wheeler, The Linley Group

“The NFP-6xxx transcends network processing categories.”
Microprocessor Report

$2B Silicon TAM
Sources: Infonetics, The Linley Group, Netronome 2012
Conclusion and Call to Action

• The high bandwidth and low latency requirements of next generation 800+ Gbps Network Flow Processing has created a new opportunity for high performance memories.

• Such high performance memories should be suitable for incorporation into a 2.5D package, thereby enabling a new layer of memory hierarchy.

• Netronome welcomes a renewed focus, input and collaboration from the memory community to fulfill these requirements and enable the next generation of networking hardware.