#### COBHAM

The most important thing we build is trust



#### GR740 Quad-Processor LEON4FT System-on-Chip Overview

Cobham Gaisler Date: 2018-September-27

Presenter:





- Highlights of LEON4FT vs LEON3FT
- Development board
- Features summary and architecture overview
- How to use GR740 / New features
- Key Performances
- GR740 vs. UT699/UT699E/UT700, GR740 vs. GR712RC
- Conclusion
- Please see <a href="https://www.gaisler.com/GR740">https://www.gaisler.com/GR740</a> for additional documentation, including results of benchmarking and validation.





Is there a later version?

# Please see <u>https://www.gaisler.com/GR740</u> for the latest version of this overview presentation



# Highlights of LEON3FT vs LEON4FT

High Level Architectural Improvements

#### **GR712RC LEON3FT**

- Dual core
- Shared AMBA AHB (32 bit) for CPUs, Memory, Debug, etc.

#### **GR740 LEON 4FT**

- Quad Core
- Wider (128 bit) CPU/Memory bus
- Dedicated Debug bus
- Addition of L2-Cache
- Hardware Memory Scrubber
- Improved partitioning
- Integrated SpW router (8 port)
- Performance counters
- Supports AMP & SMP
- Improved Support for boot over PCI/RMAP



#### **Development Board and Development SW**

- Development board with GR740 device.
  - 6U CPCI format (Double Eurocard).
  - Boxed version for bench top development.
  - http://www.gaisler.com/gr-cpci-gr740
  - 256 MiB SDRAM, 8 MiB NOT Flash
  - 2x Ethernet, 8x SpaceWire, PCI, UART, CAN, 1553, PROM/IO, GPIO and debug interfaces available
- Operating systems
  - RTEMS
  - WindRiver VxWorks
  - Linux 3.10+
- Other OSs/environments already ported to LEON include:
  - Bare C, ThreadX, PikeOS, XtratuM, ..
- Supported by GRMON2 debug monitor
- Board user manual available at: <u>http://gaisler.com/gr-cpci-gr740</u>







Core components

- System-on-chip
  - -4 x LEON4 fault tolerant CPU:s with L1 cache, MMU and FPU
  - -2 MiB Level-2 cache
  - -96/48-bit SDRAM controller with EDAC and scrubber
  - -8/16-bit PROM/IO controller with EDAC
  - 5 x Timer, 5 x IRQ controller
  - On-chip AHB bus infrastructure
  - IOMMU for peripheral DMA
  - PLLs for clock generation
  - Communication interfaces





Block diagram

#### • Architecture block diagram (simplified)





Interfaces

- Interfaces
  - -8-port Spacewire router with on-chip LVDS
  - -2 x 1Gbit/100Mbit Ethernet MAC (MII/GMII)
  - PCI master/target with DMA, 33 MHz
  - Dual-redundant CAN
  - -MIL-STD-1553B interface (bus A/B)
  - $-2 \times UART$
  - 16 x GPIO
- Debug interfaces (for GRMON connection)
  - Ethernet EDCL (using either of the two MACs above)
  - JTAG
  - Spacewire RMAP (using separate GRSPW2 for debug only)



Interface restrictions

- Some functions have been multiplexed onto the same pins to fit into package pin count
- Either PCI or second Ethernet (not both) can be enabled only when SDRAM is in 48-bit mode.
  - Configured "hard" via bootstrap signals.
  - Selection of
    - (1) 96-bit SDRAM + 1xETH
    - (2) 48-bit SDRAM + 2xETH
    - (3) 48-bit SDRAM + 1xETH + PCI
- CAN,1553,UART,SpwDebug are shared with PROM top address bits and part of 16-bit PROM data bus unused in 8-bit mode.
  - Configurable pin-by-pin between PROM or peripheral function.
  - Pins that are not used for either function can be used as additional GPIO.



Block Diagram

- Bus topology of five AMBA AHB buses: Processor, Memory, Master IO, Slave IO and Debug. Low-speed peripherals via APB
- Debug AHB bus and corresponding core are gated-off in flight.





LEON4FT and GRFPU

#### • LEON4FT – IEEE1754 SPARC V8 compliant 32-bit processor

- 7-stage pipeline, multiprocessor support
- 128-bit AHB bus interface
- Compare-and-swap (CASA) instruction support (from SPARCv9)

GRFPU

- High-performance FPU integrated into LEON4 pipeline
- Hardware DIV and SQRT
- Floating-point controller (FPC) decouples FP operations from pipeline, allowing CPU and FPU to work in parallel





Caches

- Level-1 cache
  - Separate L1 integrated into each LEON4 core
  - Multi-set with configurable LRU/LRR/RND policy
  - Write-through operation
  - Bus snooping with physical tags to maintain coherency
- Level-2 cache
  - Designed as a bridge in the bus topology
  - Highly configurable in caching behaviour
  - Supports copy-back operation
  - Locked ways, allowing part or whole to be used as on-chip RAM
  - Can be partitioned based on bus master indexes





Memory Subsystem

- Memory controller
  - PC100 SDRAM with 64/32 data bits and 32/16 check bits
  - Full width or half-word selected via bootstrap signals
  - Powerful interleaved 16/32+8bit ECC giving 32 or 16 checkbits

Scrubber

- Fast initialization of memory and checkbits
- Background scrubbing
- Error reporting to CPU and statistics collection
- Memory error handling (memory controller, scrubber, CPU)
  - Rapid regeneration of contents after SEFI
  - Graceful degradation of failed byte lane
  - Example code available for RTEMS
- Boot memory provided via PROM/IO interface (same controller as UT699, GR712RC)





I/O Interfaces

- Large number of I/O interfaces
  - SpaceWire router
  - PCI master/target with DMA
  - Gbit Ethernet
  - -MIL-STD-1553B
  - -CAN 2.0B
  - -UART, SPI, GPIO
- Debug interfaces
  - Ethernet
  - SpaceWire (RMAP)
  - JTAG





I/O interfaces – connected through IOMMU

#### • IOMMU

- Connects all DMA capable I/O masters through one interface to the Processor bus
- Performs pre-fetching and read/write combining
- Provides address translation and access restriction
- -Uses separate page tables from processor
- Masters can be placed in groups where each group has its own set of page tables
- Master traffic can also be routed directly to Memory bus, bypassing Level-2 cache.





SpaceWire Rouuter

- SpaceWire router
  - Four internal AMBA ports, compatible with GRSPW2
  - Eight external SpaceWire ports
  - Same IP core as used in GR718 SpaceWire router ASIC device
  - SpaceWire link speed: 300 Mbit/s





PCI master(initiator)/target with DMA

- Provides PCI master/target interface
  - Provided by GRPCI2 core (vs. GRPCI for UT699/UT699E/UT700)
  - 32-bit interface supporting 33 MHz operation
  - Not fully compliant to PCI 2.3 due to lack of suitable pads and pin multiplexing.
  - Target has three configurable PCI BARs. BAR0 and BAR1 default to prefetchable 128 MiB BARs and BAR2 defaults to a nonprefetchable 8 MiB BAR.
- Pins shared with SDRAM interface: If PCI is enabled then the data width of the SDRAM interface is reduced to 32-bits. Pins are also shared with second Gigabit Ethernet interface.





Gigabit Ethernet

- Gigabit Ethernet interfaces
  - 2x Ethernet interfaces
  - Supports 10/100/1000 Mbit in both full- and half-duplex
  - DMA engine for both receiver and transmitter
  - Internal buffer allows core to buffer complete packet
  - Supports MII and GMII interface to external transceiver
  - Supports scatter/gather IO and IPv4 checksum offloading
  - Provides Ethernet Debug communication link
  - EDCL can also be connected to Debug bus
- Pins of second Ethernet interface are shared with SDRAM interface: If second Ethernet interface is to be used then the data width of the SDRAM interface is reduced to 32-bits. Pins are also shared with PCI interface (second Ethernet interface only).



MIL-STD-1553B, CAN 2.0B, UART, SPI, GPIO

- MIL-STD-1553B controller provided BM/BC/RT functionality with dual redundant buses.
  - Has internal DMA engine.
- CAN 2.0B controller with internal DMA engine
- Two 8-bit UARTs with 16 byte FIFOs
- SPI master/slave controller
  - Configurable word length (3-32 words)
- Two general purpose I/O ports





Debug interfaces and Debug bus

- Debug bus
  - Debug support unit
  - PCI trace buffer
  - -AHB trace buffer, monitoring Master IO bus
  - APB bridge allows direct access to performance counters
- Debug links
  - JTAG Debug Communication Link
    - Bandwidth: 500 kb/s
  - RMAP target
    - Bandwidth 20 Mb/s
  - Ethernet Debug links
    - Bandwidth: >100 Mb/s
    - Can optionally be connected to Master IO bus





Improved Debug Support

- Debug support improved compared to earlier LEON devices
  - High-speed debug interfaces
  - Non-intrusive debugging through dedicated Debug bus
  - AHB trace buffer with filtering
  - Instruction trace buffer with filtering can be read during execution
  - Hardware data watchpoints, Data area monitoring
- Improved profiling support with support for filtering
  - I/D cache/TLB miss/hold
  - Data write buffer hold, Branch prediction miss
  - Total/Integer/FP instruction count
  - Total execution count
  - -L2 accesses, misses
  - AHB bus statistics
  - Interrupt time stamping





Resource partitioning

- Resource partitioning allows running separated software instances
  - The architecture has been designed to support both SMP, AMP and mixtures (example: 3 CPU:s running Linux or VxWorks SMP and one running RTEMS)
  - The L2 cache can be set to 1 way/CPU mode. Cache has fence registers that can be used to protect software.
  - IRQs can be masked/routed separately to each CPU
  - The I/O peripherals' register interface are located at separate 4k pages to allow (via MMU) restricting user-level software from accessing the "wrong" peripheral
  - IOMMU allows placing DMA peripherals into groups and offers modes with protection and address translation





#### How to use GR740

Taking advantage of the four LEON4FT

- Advantage: More processing power, more functions on one chip
- Design goal of maximum average performance has a cost in jitter/predictability
- Linux/VxWorks/eCos has SMP support.
  - Developers hesitant to trust SMP kernel
  - RTEMS SMP development ongoing
- UP instances of RTEMS/VxWorks/eCos/Bare-C/Other can be used by linking images to separate memory areas
  - Booting multiple images is supported by MKPROM2
  - May need static MMU tables to enforce (space) separation
  - Developer needs to assign HW resources
  - Apart from added set up work, no news
    - More functions on one chip
    - Cost is added jitter



# How to use GR740



PROM-less / SpW applications

- PROM-less booting possible via SpaceWire
  - Connect via RMAP
  - Configure main memory controller
  - Use HW memory scrubber to initialize memory
  - Enable L2 cache
  - Upload software
  - Assign processor start address(es)
  - Start processor(s)
- SpaceWire router, with eight external ports, is fully functional without processor intervention.
- Device can also act as a software/processor-free bridge between SpaceWire and PCI/SPI/1553 etc.
  - IOMMU can be used to restrict RMAP access.





# How to use GR740

Clock gating

- Clock gating is controlled via clock gating unit
  - Automatic clock gating of processor cores that are in idle mode
  - Separate gating of floating-point units. FPU is gated-off when it is disabled.
  - Clock gating unit also controls clock and reset for the following peripherals:
    - Ethernet controllers
    - SpaceWire router
    - PCI target/initiator with DMA unit
    - MIL-STD-1553B controller
    - CAN 2.0B controller
    - UARTs
    - SPI controller
    - PROM/IO memory controller
- Debug bus is gated-off when DSU is disabled.



#### GR740 New Features



Summary of (some of the) new features

- Features in GR740 not found in most present day LEON/LEON-MP architectures:
  - Quad-core LEON4FT
  - L2 cache with locking
  - Wide AMBA buses
  - Improved support for partitioning
    - IOMMU
    - Per-processor timers and interrupt controllers
  - Improved debug support (#links, filters, performance counters)
  - Improved support for AMP (address mapping, number of cores)
  - Boot options (PROM, RMAP, PCI)
  - Interrupt time stamping
  - Hardware memory scrubber





## Key Performances

Clock frequencies

- System clock (CPU:s, L2Cache, on-chip buses)
  - Nominal frequency is 250 MHz, generated by PLL from external 50 MHz clock (STA and prod. test)
  - Full temp range (-40 to +125 Tj) with margins for aging and clock jitter
  - -4 CPUs x 250 MHz x 1.7 DMIPS/MHz = 1700 DMIPS
- Memory clock
  - 100 MHz supported internally and achieved on evaluation board (using commercial SDRAMs and external clock buffer).
  - Achievable clock frequency on space-grade board will depend on I/O timing and clocking scheme.
  - Some mitigation techniques have been implemented to support high-load scenarios (2T command signalling, duplicated CS# lines)

#### GR740 vs. Existing Cobham Processors



|                 | Aeroflex Colorado Springs/Gaisler   |                           |               |               |                                              |
|-----------------|-------------------------------------|---------------------------|---------------|---------------|----------------------------------------------|
| Processor       | DUAL LEON3FT                        | LEON3FT                   | LEON3FT       | LEON3FT       | QUAD LEON4FT                                 |
| Identifier      | GR712RC                             | UT699                     | UT699E        | UT700         | GR740                                        |
| Foundry         | Tower                               | TSMC                      | TSMC          | TSMC          | ST                                           |
| Clock Frequency | 100                                 | 66                        | 100           | 166           | 250                                          |
| DMIPS/<br>Core  | 140                                 | 92                        | 140           | 233           | 425                                          |
| Cache I/D       | 16/16                               | 8/8                       | 16/16         | 16/16         | 16/16                                        |
| MMU             | Yes                                 | Yes                       | Yes           | Yes           | Yes                                          |
| SpaceWire       | Up to 6 x 200 Mb/<br>s DMA/<br>RMAP | 2 x DMA, 2 x DMA/<br>RMAP | 4 x DMA/ RMAP | 4 x DMA/ RMAP | 4x DMA / RMAP<br>Router with 8x SpW<br>ports |
| CAN             | 2                                   | 2                         | 2             | 2             | 2                                            |
| PCI             | No                                  | 1                         | 1             | 1             | 1                                            |
| 1553            | 1                                   | No                        | No            | 1             | 1                                            |
| Eval board      | Available                           | Available                 | LEAP          | LEAP          | Available                                    |



- LEON4 in GR740 improves performance (1.7 DMIPS/MHz vs. 1.4 DMIPS/MHz).
- Maximum frequency increase: > 250 MHz for GR740
- Quad-processor system provides additional performance improvement. Up to a speed-up of four but in reality lower due to shared bus and SW synchronization requirements.
- UT\* has 10/100 Mbit Ethernet. GR740 has 10/100/1000 Mbit.
- UT699/UT699E lacks MIL-STD-1553B. Present in GR740 and UT700.
- GR740 provides four AMBA ports and eight SpaceWire ports with a router. UT\* has four SpaceWire interfaces.







- LEON4 performance improvement over LEON3FT
- 250 MHz GR740 vs 100 MHz GR712RC
- Quad-core system with Level-2 cache vs. dual-core system with shared memory controller.
- Level-2 cache reduces impact of shared memory.
- GR712RC has shared resources for memory controller, timer unit. GR740 improves HW support for partitioning by mapping addresses on 4k boundaries and including additional HW units.
- Timing / interference analysis possible for dual-core GR712RC system as demonstrated by CNES. Shared L2 cache more difficult to analyze but this is mitigated by inclusion of performance counters to count accesses to shared resources and L2 partitioning.





- GR740 is immediately supported by Cobham Gaisler software packages and development tools.
- Latest news: <u>http://www.gaisler.com/GR740</u>

