

# QorlQ T4240 DPAA Deep Dive

AMF-NET-T1282

**Charlie Li** 



Presents, the President logs, Allahie, C.-S., Cold TST, Cold Marine, Celline, Celline, 1, CMAs, the Image Efficient Soldyanan Logs, Namels, modalists 1, 905, Power (LAPC, President Expert, (1940), Querkon, Safekrano, the Jafokranon from SafeCore, Sared may and fair Colline, and the Care of technique of President Expert, Celline, Safekrano, Park, S. H., S. To, Cff. Alboth, Deets, Expert, Care, Capricia, Mayer, Caprillo, Mod. Workpin in an Peckage, Care Dovergo, Califor, Circipine, Raudy Pley, Safekranos, Streen, Tarbellack, Verbrin and Normalian Care transportation of Free Solds Sensionadaria, Inc. All other product or sensite cares are the property of their seguetion owners. In 2013 Tarbellack, Proc.





# **Agenda**

- DPAA Overview
- BMan Enablement
- FMan Enablement
- QMan Enablement
  - QMan Building Blocks
  - QMan Functions
  - QMan Scheduling
  - Order Restoration, Order Preservation and Atomicity
  - Congestion Management and Avoidance
- Accelerator Overview
- Conclusion





### Three Generations of Acceleration

- Communication Processor Module (CPM)
  - Started with M683xx
  - PQI, PQII, and some PQIII
  - Ethernet, ATM and TDM
- Quicc Engine (QE)
  - Started with MPC8360
  - PQII Pro, some PQIII and some QorIQ
  - Ethernet, ATM and TDM
- Data Path Acceleration Architecture (DPAA)
  - Started with P4080 (DPAA 1.0); T4240 (DPAA 1.x)
  - Ethernet



# Enhancing Core Performance with Data Path Acceleration Architecture



| Hardware Accelerators       |                                                     |  |  |  |  |
|-----------------------------|-----------------------------------------------------|--|--|--|--|
| FMAN<br>Frame<br>Manager    | 50 Gbps aggregate Parse,<br>Classify, Distribute    |  |  |  |  |
| BMAN<br>Buffer<br>Manager   | 64 buffer pools                                     |  |  |  |  |
| QMAN<br>Queue<br>Manager    | Up to 2 <sup>24</sup> queues                        |  |  |  |  |
| RMAN<br>Rapid IO<br>Manager | Seamless mapping sRIO to DPAA                       |  |  |  |  |
| SEC<br>Security             | 40Gbps: IPSec, SSL<br>Public Key 25K/s 1024b<br>RSA |  |  |  |  |
| PME<br>Pattern<br>Matching  | 10Gbps aggregate                                    |  |  |  |  |
| DCE<br>Data<br>Compression  | 20Gbps aggregate                                    |  |  |  |  |

**New** Enhanced

Saving CPU Cycles for higher value work





# Data Path Acceleration Architecture Philosophy

- DPAA is design to balance the performance of multiple CPUs and Accelerators with seamless Integrations
  - ANY packet to ANY core to ANY accelerator or network interface **efficiently** WITHOUT locks or semaphores.
- · "Infrastructure" components
  - Queue Manager (QMan)
  - Buffer Manager (BMan)
- "Accelerator" Components
  - Cores
  - Frame Manager (FMan)
  - RapidIO Message Manager (RMan)
  - Cryptographic accelerator (SEC)
  - Pattern matching engine (PME)
  - Decompression/Compression Engine (DCE)
  - DCB (Data Center Bridging)
- CoreNet
  - Provides the interconnect between the cores and the DPAA infrastructure as well as access to memory.







# **DPAA Terminology**

- Buffer: Unit of contiguous memory, allocated by software
- Frame: Buffer(s) that hold a data element (generally a packet)
  - Frames can be single buffers or multiple buffers (scatter/gather lists)
    - A "simple frame" has one delimited data element
    - A "multi buffer frame" has two or more data elements
- Frame Descriptor (FD): Proxy structure used to represent frames
- Frame Queue
  - FIFO of related Frames Descriptor.(e.g. TCP session)
  - The basic queuing structure supported by QMan
- Frame Queue Descriptor (FQD): Structure used to manage Frame Queues







# **DPAA** Building Block: Frame Descriptor (FD)







# **DPAA Interaction: Compound Frame**

 Compound frames allows related data to be passed in a single unit to the DPAA Accelerators.







# **DPAA** Building Block: Frame Queue Descriptor

### **FQD Selected Field Description:**

- FQD\_LINK: Link to the next FQD in a queue of FQDs, used for Work Queues
- ORPRWS: ORP Restoration Window Size
- OA: ORP Auto Advance NESN Window Size
- ODP\_SEQ: ODP Sequence Number
- ORP\_NESN: ORP Next Expected Sequence Number.
- ORP\_EA\_HPTR, ORP\_EA\_TPTR: ORP Early Arrival Head and Tail Pointer
- PFDR\_HPTR, PFDR\_TPTR: PFDR Head and Tail Pointer
- CONTEXT\_A, CONTEXT\_B: Frame Queue Context A and B
- STATE: FQ State
- DEST WQ: Destination Work Queue
- ICS\_SURP: Intra-Class Scheduling Surplus or Deficit.
- IS: Intra-Class Scheduling Surplus or Deficit identifier
- ICS\_CRED: Intra-Class Scheduling Credit
- CONG\_ID: Congestion Group ID

freescale™

- RA[1-2]\_SFDR\_PTR: SFDR Pointer for Recently Arrived frame # 1 and 2
- TD\_MANT, TD\_EXP : Tail Drop threshold Exponent and Mantissa
- C: FQD in external memory or in cache (Qman 1.1)
- X: XON or XOFF for flow control command (Qman1.1)







# Software Portal FQD Context\_A Usage

- AE: Frame Annotation Stash Exclusive.
- DE: Frame Data Stash Exclusive
- CE: FQ Context Stash Exclusive
  - 0: Stash transaction issued as DIRECTO. PAMU translate this to LDEC
  - 1: Stash transaction issued as DIRECT1. PAMU translate this to LDECPE/LDECFE.
- AS: Frame Annotation Stashing Size
- DS: Frame Data Stashing Size
- CS: FQ Context Stashing Size
  - Number of 64 byte coherency granules (0, 1, 2, or 3) of Frame Annotation to be stashed.
- ADDR: FQ Context Address
  - the first 64 byte coherency granule containing the FQ context information to be stashed.







# Life of an Ingress Packet

- FMan receives packets
  - allocates internal buffers
  - retrieves data from MAC
- BMI
  - acquires a buffer from BMan
  - uses DMA to store data in it
- Parse+classify+keygen select a queue and policer profile
- Policer "colors" and optionally discards frame
- QMan applies active queue management and enqueues frame
- Frame is enqueued to one of a pool of cores
- Available core dequeue FD for processing







# **Agenda**

- DPAA Overview
- BMan Enablement
- FMan Enablement
- QMan Enablement
  - QMan Building Blocks
  - QMan Functions
  - QMan Scheduling
  - Order Restoration, Order Preservation and Atomicity
  - Congestion Management and Avoidance
- Accelerator Overview
- Conclusion





# **Buffer Manager Functional Blocks**

- Standardized command interface to SW and HW
  - Up to 66 Software portals for software: resolves any Multi Core race scenario
  - Up to 6 HW portal per HW block: simplified command for HW Accelerators
  - Up to 64 separate pools of free buffers
- BMan keeps a small per-pool stockpile of buffer pointers in internal memory
  - stockpile of 64 buffer pointers per pool,
     Maximum 2G buffer pointers
  - Absorbs bursts of acquire/release commands without external memory access
  - minimized access to memory for buffer pool management.
- Pools (buffer pointers) overflow into DRAM
- LIFO buffer allocation policy
  - A released buffer is immediately used for receiving new data, using cache lines previously allocated







# **BMan SW Portal Components**

### Core



- Software portals have 2 components
  - Management commands:
    - Command Registers (BCSPi\_CR): acquire 1-8 buffers OR query availability
    - Response Registers (BCSPi\_RR0 / RR1): buffer address OR Buffer Pool Availability and Depletion state
  - Buffer Release:
    - Release Command Ring (RCR) Entry (BCSPi\_RCRj): Circular FIFO
- Interrupts can be used to signal availability of space (in RCR) and that pools are depleted and require replenishment (RCR Interrupt Threshold Register)





### **BMan Command and Response**

- BMan Command Type
  - BMan command registers (BCSPi\_CR, RR and RCR) are 64B long.
  - Command Verb (1B) + Buffer Pool ID (1B)
    - Bit 0: Valid bit
    - Bit 1-3: Response Type. Valid encodings are:
      - 001 = Acquire buffers (Acquire)
      - 010= Release buffers to the pool identified in byte field 1 (Release)
      - 011= Release each buffer to the pool identified in byte field immediately preceding its buffer field (Release)
      - 100=Query buffer pool state, depletion and availability.
      - 110= Invalid command (Response)
      - 111= Stockpile ECC Error (Response)
    - Bit 4-7: Number of buffers associated with command type, maximum 8
      - 0h = Zero buffers. 1h = One buffer .... 8h = Eight buffers
    - Returns up to eight 48bit buffer addresses

| 0           | 1         | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 5 - |
|-------------|-----------|---|---|---|---|---|---|---|---|----|----|----|----|----|----|-----|
| Verb<br>x18 | BPID<br>1 | - | - | - | - | - | - | - | - | -  | -  | -  | -  | -  | -  | 5   |
| -           | -         | - | - | - | - | - | - | - | - | -  | -  | -  | -  | -  | -  |     |
| -           | -         | - | - | - | - | - | - | - | - | -  | -  | -  | -  | -  | -  |     |
| -           | -         | - | - | - | - | - | - | - | - | -  | -  | -  | -  | -  | -  |     |
|             |           |   |   |   |   |   |   |   |   |    |    |    |    |    |    |     |





# **Agenda**

- DPAA Overview
- BMan Enablement
- FMan Enablement
- QMan Enablement
  - QMan Building Blocks
  - QMan Functions
  - QMan Scheduling
  - Order Restoration, Order Preservation and Atomicity
  - Congestion Management and Avoidance
- Accelerator Overview
- Conclusion





# **New Frame Manager (FMan) Features**

- FMan combines the Ethernet network interfaces with packet distribution logic to provide intelligent distribution and queuing decisions for incoming traffic at line rate.
- FMan key new features for QorlQ T4 processors.
  - Six 1G/2.5G multirate Ethernet MACs (mEMACs) per Frame Manager
  - Two 10G multirate Ethernet MACs (mEMACs) per Frame Manager
  - QMan interface: Supports priority based flow control message pass from Ethernet MAC to Qman
  - Comply with IEEE 803.3az (Energy efficient ethernet) and IEEE 802.1QBbb, in addition of IEEE Std 802.3®, IEEE 802.3u, IEEE 802.3x, IEEE 802.3z, IEEE 802.3ac, IEEE 802.3ab, and IEEE-1588 v2 (clock synchronization over Ethernet)
  - Port Virtualization: Virtual Storage profile (SPID) selection after classification or distribution function evaluation.
  - Rx port multicast support.
  - Egress Shaping.
  - Offline port: able to copy the frame into new buffers and enqueue back to the QMan.







# In Modular Architecture Processing Pipeline



and Etritoic are trademarks of Prescues Serriconductor. Inc. All other product or solvino regrets are the property of their respective servers. © 2013 Prescues Serriconductor Inc.



# **FMAN Ports Types**

- Ethernet receive (Rx) and transmitter (Tx)
  - 1 Gbps/2.5Gbps/10Gbps
  - FMan\_v3 some ports can be configured as HiGig.
  - Jumbo frames of up to 9.6 KB (add uboot bootargs "fsl\_fm\_max\_frm=9600")
- Offline (O/H)
  - FMan\_v3: 3.75Mpps (vs 1.5Mpps from the P series)
  - Able to dequeue and enqueue from/to a QMan queue. The FMan applies a
    Parse Classify Distribute (PCD) flow and (if configured to do so) enqueues the
    frame it back in a Qman queue. In FMan\_v3 the FMan is able to copy the frame
    into new buffers and enqueue back to the QMan.
  - Use case: IP fragmentation and reassembly
- Host command
  - Able to dequeue host commands from a QMan queue. The FMan executes the host command (such as a table update) and enqueues a response to the QMan. The Host commands, require a dedicated PortID (one of the O/H ports).
  - The registers for Offline and Host commands are named O/H port registers.





# 1) Parser

- Performs parsing of common L2/L3/L4 headers, including tunneled protocols
- Can be augmented by the user to parse other standard protocols
- Can also parse proprietary, userdefined headers at any layer:
  - Self-describing, using standard fields such as proprietary Ethertype, Protocol ID, Next Header, etc.
  - Non-Self-Describing through configuration.
- Parse results, including proprietary fields, can be used by the classifier, and/or software.
- Soft parse can modify any field in parse results







freescale™

# Flexible Parsing of User Defined Fields (UDF)

The parser stores all parsing results in the Parse Array located inside the parser.

It is accessible through the Rx External Buffer Margins Register (FMBM\_REBM(BSM)) Incoming frame





# 2) KeyGen – Key Generator

- 256 Classification Plans
  - Indicates which fields of a parsed packet are of interest for key generation
- 32 Key Generation schemes
  - Direct Method
    - Used for port based or post coarse classification
  - Indirect Method
    - Based on the parsed protocol stack of the frame and the source port
    - The presence (or absence) of valid headers can direct the scheme used







# 3) Classifier

- Up to three level tree search at line rate
  - Up to 256 bytes per tree level
  - Up to 512 bytes of total key data for the three level
- Each table is an ordered list of entries, returning the first match
  - Keys for the initial table are generated from parse results/KeyGen
  - Keys for subsequent tables are generated from parse and previous lookup results
  - Keys can be generated from any fields in the frame, including proprietary UDF
  - Each entry is individually maskable.
- Each table entry has Action description
  - Queue ID, next table, hash and distribute, or Drop.
- Output of classifier
  - Single queue
  - Set of queues and a KeyGen scheme if distributing
  - Policing Profile
- IPv4 Header Manipulation Use case







# Case Study: Spread Control and User Plane Traffic







# Case Study: High Level Mapping to DPAA







# Case Study: Configure FMan with FMC Tool

rootfs/etc/fmc/config/8c-128fq-p.xml

```
<distribution name="ipv4eth0">
    <queue count="128" base="0x3800"/>
    <key>
           <fieldref name="ipv4.src"/>
            <fieldref name="ipv4.dst"/>
            <fieldref name="ipv4.tos"/>
    </key>
    </distribution>
    <distribution name="ipv4eth1">
    <queue count="128" base="0x3880"/>
    <key>
            <fieldref name="ipv4.src"/>
            <fieldref name="ipv4.dst"/>
            <fieldref name="ipv4.tos"/>
    </key>
    </distribution>
```

|                     |          | 1 1 1<br>6 7 8 | 1 2 2 2 2 2 2 2 2 2 2 3 3<br>9 0 1 2 3 4 5 6 7 8 9 0 1 |  |  |  |  |  |
|---------------------|----------|----------------|--------------------------------------------------------|--|--|--|--|--|
| Version HdrLen      | TOS      | Length         |                                                        |  |  |  |  |  |
| Identif             | ication  | Flag<br>s      |                                                        |  |  |  |  |  |
| TTL                 | Protocol | Checksum       |                                                        |  |  |  |  |  |
| Source Address      |          |                |                                                        |  |  |  |  |  |
| Destination Address |          |                |                                                        |  |  |  |  |  |
| Options + Padding   |          |                |                                                        |  |  |  |  |  |
| Data                |          |                |                                                        |  |  |  |  |  |
|                     |          |                |                                                        |  |  |  |  |  |





# **Virtual Storage Profiles**

- Virtual storage profiles are not supported in P1023, P4080, P3041, P5020, P5040 and P2041.
  - In these devices the storage profile is configurable only on a perport basis (named 'hardware port storage profile').
- The virtual storage profile mechanism allows the virtualization of the buffer pool selection for frame storage from the physical hardware ports.
  - The virtual storage profile ID (SPID) is selected as a result of the classification on the frame headers.
  - The same Storage Profile ID (SPID) values from the classification on different physical ports, may yield to different storage profile selection.





# **Agenda**

- DPAA Overview
- BMan Enablement
- FMan Enablement
- QMan Enablement
  - QMan Building Blocks
  - QMan Functions
  - QMan Scheduling
  - Order Restoration, Order Preservation and Atomicity
  - Congestion Management and Avoidance
- Accelerator Overview
- Conclusion





# **New Features for T4240 Queue Manager (QMan)**

- T4xxx has a total of 50 Software Portals (SP), increase from 10 SP found in the P class processors.
- Supports Customer Edge Egress Traffic Management (CEETM) that provides hierarchical class based scheduling and traffic shaping:
  - Available as an alternate to FQ/WQ scheduling mode on the egress side of specific direct connect portals
  - Enhanced class based scheduling supporting 16 class queues per channel
  - Token bucket based dual rate shaping representing Committed Rate (CR) and Excess Rate (ER)
  - Congestion avoidance mechanism equivalent to that provided by FQ congestion groups
- A total of 48 algorithmic sequencers are provided, allowing multiple enqueue/dequeue operations to execute simultaneously.
- Support up to 295M enqueue/dequeue operations per second.





# **Queuing Structure**

### Frame Descriptor (FD)

- The basic queue Element that describe a frame
- Usually a single IO packet will use a single frame
- Other scenarios: commands with no buffer

### Frame Queue Descriptor (FQD)

- A linked list of FD's
- Usually a frame queue is associated with a flow or interface
- Enqueue operation must include the target FQ as a parameter
- Dequeue operation may use FQ as a parameter for operation
- Head of frame queue can be associated to ODP

#### Work Queue Structure

- Linked list of FQD
- Hold flows of the same priority and designation
- Dequeue operation may use WQ as a parameter for operation

#### Channel

- Set of eight WQ channel served by a single type of entity
- Dequeue from channel can be configured to be:
  - strict priority
  - round robin (Simple, Weighted or Deficit)
- Dequeue may use channel as a parameter for operation







### **QMan Communicates with Portals**

- Portals are the interface between QMan and the accelerator which use them
  - Direct Connect Portals has direct connect signals to Dedicated Channel
    - Dedicated Channels are always serviced by a single entity, e.g. FMan, etc.
  - Software Portals use CoreNet as the physical interconnect to the processor core.
    - Each Software Portal serves a Dedicated Channel, and optionally services Pool Channels.
    - Software and QMan interact by "reading" and "writing" data across CoreNet
- Each channel consists of 8 WQs, and thus there are 8 possible priorities.
- QMan contains a total of 110 channels for T4240
  - 16 Dedicated Channels (Sub Portals) per Frame Manager's Direct Connect Portal
  - 1 for SEC (8 SPs), 1 for PME, 1 for DCE, 1 for RMan
  - Up to 50 CoreNet dedicated channels for software portals
  - 15 for CoreNet pool channels which are shared by all software portals







# **Agenda**

- DPAA Overview
- BMan Enablement
- FMan Enablement
- QMan Enablement
  - QMan Building Blocks
  - QMan Functions
  - QMan Scheduling
  - Order Restoration, Order Preservation and Atomicity
  - Congestion Management and Avoidance
- Accelerator Overview
- Conclusion





# **Queue Management**

- QMan provides a way to inter-connect DPAA components
  - Cores (including IPC)
  - Hardware offload accelerators
  - Network interfaces Frame Manager
- Queue management
  - High performance interfaces ("portals") for enqueue/dequeue
  - Internal buffering of queue/frame data to enhance performance
- Congestion avoidance and management
  - RED/WRED
  - Tail drop for single queues and aggregates of queues
  - Congestion notification for "loss-less" flow control
- Load spreading across processing engines (cores, HW accelerators)
  - Order restoration
  - Order preservation/atomicity
- Delivery to cache/HW accelerators of per queue context information with the data (Frames)
  - This is an important offload for software using hardware accelerators







# **QMan Software Portal Components**



- Enqueue: EQCR
- Dequeue: Command registers + DQRR
- Messages: MR (e.g. enqueue rejections)
- Management commands: command/response registers
- Interrupts can be used to signal availability of data or space (in EQCR)
- Rings provide finite size FIFOs
  - Up to 16 entries for DQRR, 8 entries for EQCR and MR
- Portal components are implemented inside QMan to reduce access latency
  - Unlike traditional BD rings which are in "memory" and "registers"
- QMan can "push" (stash) DQRR entries across CoreNet into the appropriate core's cache
- PI and CI are the basic mechanisms used with rings but other forms of notification of data availability and data consumption are supported
- When these other mechanisms are used QMan maintains PI/CI





# **QMan Cache Warming**

- In addition to stashing DQRR entries into cache, QMan's software portals can also "warm" a core's (L1 or L2) cache with frame and queue related data
  - Actual frame data for single buffer frames
  - Scatter gather list for multi-buffer frames
  - Frame "annotations"
    - Data between Address and Offset at start of frame
    - Used to pass additional information about the frame which is not "frame data" e.g. FM parse results
  - Data referenced by FQ Context
- Stashing options can be configured on a per FQ basis
- Cache warming are performed at the time that the frame is dequeued (i.e., DQRR entry is created)





# **Agenda**

- DPAA Overview
- BMan Enablement
- FMan Enablement
- QMan Enablement
  - QMan Building Blocks
  - QMan Functions
  - QMan Scheduling
  - Order Restoration, Order Preservation and Atomicity
  - Congestion Management and Avoidance
- Accelerator Overview
- Conclusion





#### **Core Load Distribution**

- Static flow-based distribution (Dedicated Channel)
  - Set of WQs with different FQs directed statically to different cores
  - Distribution of frames (selection of FQ) is based on hash keys, ensuring that packets from the same traffic flow will always go to the same cores
  - Static not dynamic, doesn't react to core load, assign work to the cores in a static or fixed manner
- Adaptive load balancing (Pool Channel)
  - Load spread the packets (or the Frame Queue) to the cores based on actual core availability/readiness
  - QMan provides two mechanisms to deal with out of order packets:
    - Order preservation: ensure that related packets are processed in order (and typically one at a time); can also provide "atomicity" – atomic access to data
    - Order restoration: allows frames to be processed out of order and then restores their order later on before they are transmitted







#### **Dequeue Modes**

- QMan supports 2 modes on software portals:
  - Push Mode:
    - Qman continues to push entries into DQRR in attempt to keep it "full"
    - QMan provides 2 command registers
      - One register is "static" and QMan repeatedly executes this command
      - One register is "volatile" and QMan executes that command a limited number of times
    - Push mode is "just like" a BD ring
  - Pull Mode:
    - QMan provides a single command register
    - Software must issue a new command for each dequeue operation
- Push mode is the most common mode
- Pull mode offers more control to applications





#### **Dequeue Scheduling**

- Class Scheduler schedules WQ
  - A Class scheduler per channel
  - Two levels of scheduling:
    - Use Weighted Interleaved Round Robin to schedule within the medium priority gueues (2 to 4) and low priority gueues (5 to 7)
    - Strict priority of all 8 WQs with programmable elevation (CS\_ELEV) of the low priority tier over the medium priority tier
  - Maintain active FQ states transition
    - Keeps track of the last RR winner, selection counters, elevation counter
- Intra-Class scheduling schedules FQ
  - Schedule a frame queue within a work queue
  - Use Modified Deficit Round Robin with ICS CRED + ICS SURP 15-bit Credit 116-bit Surplus
  - First Dequeue surplus = surplus + credit
  - Dequeue 1-3 frames,
  - subtract frame length(s) bytes from surplus
  - If surplus > 0, dequeue 1-3 frames more
  - If surplus <=0, reschedule Frame Queue



SW Portal Dequeue Dispatcher

Active FC

**Priority** 

Compare

High 0 High 1

Elevated Low

Medium

Low

**SDQCR** 

RR

RR

RR





#### Static Dequeue: Who Is on First?















#### Volatile Dequeue: Who Is on First?



Push Mode

| VDQCR               |   |                  |  |  |  |
|---------------------|---|------------------|--|--|--|
| Ρ                   | ш | Number of Frames |  |  |  |
| FQID <b>≔FQ6002</b> |   |                  |  |  |  |
|                     |   |                  |  |  |  |
|                     |   |                  |  |  |  |







#### **Customer Edge Egress Traffic Management (CEETM)**

- QMan 1.2 (i.e. QorlQ T42xx) supports egress traffic management by provides hierarchical class based scheduling and traffic shaping.
- On a specific QMan Direct Connect Portals (DCPs), each sub-portal that supports CEETM can be configured to use either the <u>regular FQ/WQ scheduling mode</u> OR <u>CEETM scheduling mode</u>.
  - A given sub-portal can switch between FQ/WQ and CEETM scheduling mode.
  - CEETM supports up to 8 logical network interfaces (LNI) that can each be mapped to a DCP sub-portal, whereas a DCP can support up to 16 sub-portals.
  - CEETM is supported on only a subset of the DCP portals, not on all DCP portals.
  - A single instance of the CEETM is associated with a single DCP and therefore a single egress I/O module.
- CEETM maintains the following functionality equivalent of regular FQ/WQ scheduling mode:
  - Congestion management capabilities including WRED
  - Dequeued frame context (Context\_A and Context\_B)
  - Priority or traffic class flow control
- FQID xF00000 xFFFFFF are reserved for CEETM as Logical FQIDs (LFQID).
  - -xF00000 xF00FFF (4k) for DCP portal 0 (i.e. FMAN0)
  - xF10000 xF10FFF (4k) for DCP portal 1 (i.e. FMAN1)





### **CEETM Scheduling Hierarchy (QMAN 1.2)**

#### Logics

- Green denotes logic units and signal paths that relate to the request and fulfillment of Committed Rate (CR) packet transmission opportunities.
- Yellow denotes the same for Excess Rate (ER).
- Black denotes logic units and signal paths that are used for unshaped opportunities or that operate consistently whether used for CR or ER opportunities.

#### Scheduler

- Channel Scheduler: channels are selected to send frame from Class Queues.
- Class scheduler: frames are selected from Class Queues. Class 0 has highest priority.

#### Algorithm

- Strict Priority (SP)
- Weighted Scheduling
- Shaped Aware Fair Scheduling (SAFS)
- Weighted Bandwidth Fair Scheduling (WBFS)







### Weighted Bandwidth Fair Scheduling (WBFS)

- Weighted Bandwidth Fair Scheduling (WBFS) is used to schedule packets from queues within a priority group (A or B group) such that each gets a "fair" amount of bandwidth made available to that priority group.
- The premises for fairness for algorithm is:
  - available bandwidth is divided and offered equally to all classes.
  - offered bandwidth in excess of a class's demand is to be re-offered equally to classes with unmet demand.

|                                     | Initial Distribution |                    | First<br>ReDistribution |                    | Second<br>Redistribution |                    | Total BW<br>Attained |
|-------------------------------------|----------------------|--------------------|-------------------------|--------------------|--------------------------|--------------------|----------------------|
| BW available                        | 10G                  |                    | 1.5G —                  |                    | 2G                       |                    | <b>0</b> G           |
| Number of classes with unmet demand | 5                    |                    | 3                       |                    | 2                        |                    |                      |
| Bandwidth to be offer to each class | 2G                   |                    | .5G                     |                    | .1G                      |                    |                      |
|                                     | Demand               | Offered & Retained | Unmet<br>Demand         | Offered & Retained | Unmet<br>Demand          | Offered & Retained |                      |
| Class 0                             | .5G                  | .5G                | 0                       |                    |                          |                    | .5G                  |
| Class 1                             | 2G                   | 2G                 | 0                       |                    |                          |                    | 2G                   |
| Class 2                             | 2.3G                 | 2G                 | .3G                     | .3G                | 0                        |                    | 2.3G                 |
| Class 3                             | 3G                   | 2G                 | 1G                      | .5G                | .5G                      | .1G                | 2.6G                 |
| Class 4                             | 4G                   | 2G                 | 2G                      | .5G                | 1.5G                     | .1G                | 2.6G                 |
| <b>Total Consumption</b>            | 11.8G                | 8.5G               |                         | 1.3G —             |                          | .2G —              | 10G                  |





#### **Agenda**

- DPAA Overview
- BMan Enablement
- FMan Enablement
- QMan Enablement
  - QMan Building Blocks
  - QMan Functions
  - QMan Scheduling
  - Order Restoration, Order Preservation and Atomicity
  - Congestion Management and Avoidance
- Accelerator Overview
- Conclusion





#### **Addressing Ordering Requirements**

- There are two basic approaches to addressing this requirement:
  - Order restoration
    - Take note of the correct order (or sequence) of packets before processing starts and restore the packets to that order before they are transmitted
  - Order preservation
    - Ensure that related packets are processed in order (and typically one at a time)
    - Order preservation can also provide "atomicity" atomic access to data used in processing the frame
- QMan requires that related frames (which must be transmitted in order) be placed on the same frame queue for both of these approaches
  - This does not mean that only related frames are placed on a given FQ
  - Many sets of related frames can be placed on an FQ
  - Frame Manager is responsible for achieving this







#### **QMan Order Restoration**

- QMan's order restoration support has two components:
- Order Definition Point (ODP)
  - A point defined relative sequence to each Frames' pass
  - ODP id is associated to FQ-ID
  - assigning a monotonically increasing 14 bits sequence number to a series of frames
  - QM supports single ODP on head of queue
  - ODP can be made anywhere in the system i.e. SW can be an ODP
- Order Restoration Point (ORP)
  - A point relative sequence associated to single ODP is restored
  - Allows frames insertion into the flow (single sequence number with more/last indication)
- Behavior highlights
  - Configurable number of "in-flight" packets per ORP
  - resection of ORP is part of enqueue command but Queue tail is not associated to ORP i.e., enqueue to single destination queue can respect many ORP's
  - Note: Frames are not "marked"

ODP-A







#### **Order Restoration Configuration**

- Treatment of a FD is determined by:
  - ORP Restoration Window Size (ORPRWS)
  - ORP Auto Advance NESN Window Toggle (OA)
  - ODP Sequence Number (SEQ)
  - ORP Next Expected Sequence Number (NESN)
  - ORP Acceptable Late Arrival Window Size (OLWS)
  - ORP Early Arrival Head Sequence Number (EA\_HSEQ)
  - ORP Early Arrival Tail Sequence Number (EA\_TSEQ)
  - ORP Early Arrival Head Pointer (EA\_HPTR)
  - ORP Early Arrival Tail Pointer (EA\_HPTR)





**ORP-A** 

FD A7

FD A8

FD A9

**FQD** 

**ORPRWS** 

OA

ODP SEQ

ORP\_NESN

**OLWS** 

ORP\_EA\_HSEQ

ORP\_EA\_TSEQ

ORP\_EA\_HPTR

ORP EA TPTR

FD A1

FD A2

FD A3

Expecting

FD A4

FD A7

FD A9

Sequence Number

-8191 to 8191

NESN (e.g. #4)



#### **Agenda**

- DPAA Overview
- BMan Enablement
- FMan Enablement
- QMan Enablement
  - QMan Building Blocks
  - QMan Functions
  - QMan Scheduling
  - Order Restoration, Order Preservation and Atomicity
  - Congestion Management and Avoidance
- Accelerator Overview
- Conclusion





#### **Congestion Management and Avoidance**

Both FM and QMan are involved in supporting congestion management and avoidance

- QMan provides the following support
  - Congestion management (loss-less flow control, threshold/tail drop)
  - Congestion avoidance (RED/WRED)
- Congestion Groups (CG) define granularity
  - Every frame queue has a tail drop threshold and configured to a congestion group
  - Congestion calculations are done on groups of queues
  - Congestion avoidance/management calculations configured in
- **Congestion Group Details** 
  - 256 Congestion Groups
  - Time aware weighted average queue depth of all queues in the
  - 3 color configurable WRED curves
  - Enqueued packet may be rejected (discarded) due to WRED policy
  - Instantaneous CG depth +/- hysteresis value can initiate congestion state messages to enqueue sources
    - Lossless flow control on interfaces supporting PAUSE semantics





Probability

Aggregate

Discard



### **Setup Congestion Group Record**

- On enqueue:
  - The color for the frame being enqueued is used to select a probability curve
  - The frame may be selected for random discard



|   | Mgmt Cmd          |   |
|---|-------------------|---|
| V | Cmd Verb          |   |
| / | WE_MASK           | / |
|   | WR_PARM_G         | / |
|   | WR_PARM_Y         |   |
|   | WR_PARM_R         |   |
|   | WR_EN_G           |   |
|   | WR_EN_Y           |   |
|   | WR_EN_R           |   |
|   |                   |   |
|   |                   | - |
|   | ENQ and DEQ       | L |
|   | upd avg queue len |   |
|   |                   | l |

WRED Green Enable
WRED Green Parameters
WRED Yellow Enable
WRED Yellow Parameters
WRED Red Enable
WRED RED Parameters
CSCN Enable
CSCN Target
Congestion State Tail Drop Enable
Congestion State Threshold
Congestion State
Group Instantaneous Byte Count
TimeStamp

Bits 0-7: MA Bits 8-12: Mn Bits 13-19: SA Bits 20-25: Sn Bits 26-31: Pn





#### **Agenda**

- DPAA Overview
- BMan Enablement
- FMan Enablement
- QMan Enablement
  - QMan Building Blocks
  - QMan Functions
  - QMan Scheduling
  - Order Restoration, Order Preservation and Atomicity
  - Congestion Management and Avoidance
- Accelerator Overview
- Conclusion





### Security (SEC) 5.0 Overview



Supports protocol processing for the following:

- **IPSec**
- 802.1ae (MACSEC) SSL/TLS/DTLS
- 3GPP RLC
- LTE PDCP
- SRTP
- 802.11i (Wi-Fi)
- 802.16e (WiMax)

Public Key Hardware Accelerators (PKHA) ~25K RSA Ops/sec (1024b)

RSA and Diffie-Hellman (to 4096b)

Elliptic curve cryptography (1023b)

Data Encryption Standard Accelerators (DESA) ~15Gbps

DES, 3DES (2K, 3K) ECB, CBC, OFB modes

Advanced Encryption Standard Accelerators (AESA)

~40Gbps

Key lengths of 128-, 192-, and 256-bit ECB, CBC, CTR, CCM, GCM, CMAC,

OFB, CFB, and XTS

ARC Four Hardware Accelerators (AFHA) ~7.5Gbps

Compatible with RC4 algorithm

Message Digest Hardware Accelerators (MDHA) ~40Gbps

SHA-1, SHA-2 256,384,512-bit digests

MD5 128-bit digest

HMAC with all algorithms

Kasumi/F8 Hardware Accelerators (KFHA) ~9Gbps

F8, F9 as required for 3GPP A5/3 for GSM and EDGE

**GEA-3 for GPRS** 

Snow 3G Hardware Accelerators (STHA) ~12Gbps Implements Snow 3.0

ZUC Hardware Accelerators (ZHA) ~14Gbps

Implements 128-EEA3 & 128-EIA3 CRC Unit~40Gbps

Standard and user defined polynomials

Random Number Generator, random IV generation





## Pattern Matching Engine (PME) 2.X Overview

- Regex support plus significant extensions:
  - Patterns can be split into 256 sets each of which can contain 16 subsets
  - 32K patterns of up to 128B length
  - 9.6 Gbps raw performance
- Combined hash/NFA technology
  - No "explosion" in number of patterns due to wildcards
  - Low system memory utilization
  - Fast pattern database compiles and incremental updates
- Matching across "work units"
  - Finds patterns in streamed data
- Pipeline of processing
  - PME offers pipeline of filtering, matching, and behavior base engine for complete pattern matching solution







### RapidIO Message Manager (RMan) Overview

- Many queues allow multiple inbound/outbound queues per core
  - Hardware queue management via QorIQ Data Path Architecture (DPAA)
- Supports all messaging-style transaction types
  - Type 11 Messaging
  - Type 10 Doorbells
  - Type 9 Data Streaming
- Enables low overhead direct core-to-core communication

Device-to-Device Transport





# ecompression/Compression Engine (DCE)

#### Overview

- Deflate
  - As specified as in RFC1951
- GZIP
  - As specified in RFC1952
- Zlib
  - As specified in RFC1950
  - Interoperable with the zlib 1.2.5 compression library
- Encoding
  - supports Base 64 encoding and decoding (RFC4648).
- Operate up to 600Mhz
  - 10Gbps Compress
  - 10Gbps Decompress
  - 20Gbps Aggregate







#### Data Center Bridging (DCB) Overview

- QMan 1.2 (e.g. QorlQ T42xx) supports Data Center Bridging (DCB).
- DCB refers to a series of inter-related IEEE specifications collectively designed to enhance Ethernet LAN traffic prioritization and congestion management.
- DCB can be used in:
  - Between data center network nodes:
  - LAN/network traffic
  - Storage Area Network (SAN) (e.g. Fiber Channel (loss sensitive)) and,
  - IPC traffic (e.g. Infiniband (low latency))
- The DPAA is compliant with the following DCB specifications (traffic management related):
  - IEEE Std. 802.1Qbb: Priority-based flow control (PFC)
    - To avoid frame loss, a PFC Pause frames can be sent autonomously by HW.
  - IEEE Std. 802.1Qaz: Enhanced transmission selection (ETS)
    - Support Weighted bandwidth fairness.
  - IEEE 802.1Qau: Quantized Congestion Notification (QCN)
    - end-to end congestion control mechanism.





#### **Agenda**

- DPAA Overview
- BMan Enablement
- FMan Enablement
- QMan Enablement
  - QMan Building Blocks
  - QMan Functions
  - QMan Scheduling
  - Order Restoration, Order Preservation and Atomicity
  - Congestion Management and Avoidance
- Accelerator Overview
- Conclusion





#### Conclusion

- DPAA is about seamless Integration of multiple CPUs and Accelerators for high speed Interfaces.
- The Data Path Acceleration Architecture Accelerators included:
  - Queue Manager
  - Buffer Manager
  - Frame Manager
  - Hardware accelerators such as SEC, PME, DCE, and RMan
  - Power Architecture Cores
- Seamless Integration of these components address multicore requirements:
  - Load spreading
  - Packet ordering
  - Device virtualization
  - Inter-core communication
  - HW buffer management





