RADEON Instinct MI8 Data Sheet

User Manual: RADEON-Instinct-MI8 DataSheet

Open the PDF directly: View PDF PDF.
Page Count: 3

DownloadRADEON-Instinct-MI8 Data Sheet
Open PDF In BrowserView PDF
Open Ecosystem For Machine Intelligence

DEEP LEARNING INFERENCE ACCELERATOR WITH 8.2 TFLOPS SINGLE
PRECISION COMPUTE PERFORMANCE IN EFFICIENT SCALE-OUT DESIGN1
Datacenter designers deploying Machine Learning and AI
systems today require systems capable of running more
complex workloads that are massively parallel in nature, while
continuing to improve system efficiencies. Improvements in
the capabilities of accelerators over the last decade, along with
the advancement in software, are providing designers with
the option to build more efficient heterogeneous computing
systems to help them meet today’s challenges.
The Radeon Instinct™ MI8 server accelerator is a highly
efficient, cost-effective inference and HPC solution delivering
8.2 TFLOPS of peak performance with 4GB of ultrafast HBM1
memory, making it the perfect solution for running deep learning
inference applications, where lots of new smaller data set inputs
are being run at half or single precision against trained neural
networks to discover new knowledge.1 The MI8 accelerator
is also the perfect open solution for general purpose HPC
systems deployed in Financial, Energy, Life Science, Automotive,
Academic (Research & Teaching), Government Labs and other

AMD’s Radeon Instinct™ MI8 accelerator
delivers superior half precision performance
with 4GB of high-bandwidth HBM1 memory
in an efficient, passively cooled, scale out
accelerator form factor.1

Deep Learning Inference
P4
P40

0.09

0.19

Up to 90x

Up to 42x

Highlights
•
•
•
•
•
•
•
•

8.2 TFLOPS FP16 or FP32 Performance1
Up To 47 GFLOPS Per Watt FP16 or FP32 Performance2
4GB HBM1 on 512-bit Memory Interface
Passively Cooled Server Accelerator
Large BAR Support for Multi GPU Peer to Peer
ROCm Open Platform for HPC-Class Rack Scale
Optimized MIOpen Libraries for Deep Learning
MxGPU SR-IOV Hardware Virtualization

Key Features
GPU Architecture:

AMD “Fiji”

Stream Processors:

4,096

Performance:
Half-Precision (FP16)
Single-Precision (FP32)
Double-Precision (FP64)

8.2 TFLOPS
8.2 TFLOPS
512 GFLOPS

GPU Memory:

4GB HBM1

Memory Bandwidth:

Up to 512 GB/s

Bus Interface:

PCIe® Gen 3 x16

MxGPU Capability:

Yes

Board Form Factor:

Full-Height, Dual-Slot

Length:

6”

Thermal Solution:

Passively Cooled

Standard Max Power:
Warranty:

175W TDP
Three Year Limited3

OS Support:

Linux® 64-bit

ROCm Software Platform:

Yes

Programing Environment:
MI8

8.2

ISO C++, OpenCL™, CUDA (via AMD’s HIP conversion tool)
and Python4 (via Anaconda’s NUMBA)

Peak FP16 (TFLOPS)

For more information, visit: Radeon.com/Instinct

OUTSTANDING SINGLE AND
HALF PRECISION PERFORMANCE
The Radeon Instinct MI8 accelerator based on AMD’s 3rd generation “Fiji” architecture with
improved data-parallel processing and ultra-fast HBM1 memory delivers 8.2 TFLOPS of peak
performance with up to 512 GB/s of memory bandwidth in a single, passively cooled GPU card.1
The MI8 accelerator, combined with AMD’s ROCm open software platform, is the perfect solution
for cost sensitive system deployments for Machine Intelligence, Deep learning and HPC workloads,
where performance and efficiency are key system requirements.
3RD GENERATION “FIJI”
ARCHITECTURE

MxGPU SR-IOV HARDWARE
VIRTUALIZATION

PASSIVELY
COOLED

The Radeon Instinct™ MI8 server accelerator
is based on the “Fiji” architecture which is
built with AMD’s 3rd Generation Graphics
Core Next (GCN) packing 64 compute
units (CU) with 64 stream processor per
CU delivering 8.2 TFLOPS FP16 or FP32
compute performance in a single GPU card.1

Design with support of AMD’s MxGPU SRIOV hardware virtualization technology for
optimized datacenter cycle utilization, the
Radeon Instinct MI8 provides a virtualization
solution with dedicated user GPU resources,
data security and version control, a cost
effective licensing model with no additional
hardware licensing fees, and a simplified
native driver model ensuring operating
system and application compatibility.

The Radeon Instinct MI8 design is a
passively-cooled accelerator design
for large-scale server deployments.

HBM1: ULTRAFAST
MEMORY BANDWIDTH
4GB of ultrafast HBM1 GPU memory
delivering up to 512 GB/s of memory
bandwidth. HBM1 is a modern type of
memory design with low power consumption
and ultra-wide communication lanes.

ROCm OPEN SOFTWARE
PLATFORM
AMD’s ROCm platform provides a scalable,
fully open source software platform
optimized for large-scale heterogeneous
system deployments with an open source
headless Linux driver, HCC compiler, rich
runtime based on HSA, tools and libraries.

For more information, visit:
Radeon.com/Instinct
ROCm.github.io

SUPERIOR FP16 PERFORMANCE FOR INFERENCE2
Deep Learning Inference
Radeon Instinct MI8
Nvidia Tesla P40
Nvidia Tesla P4

47

0.76

Up to 60x

1.2

Up to 38x

Peak FP16 GFLOPS/Watt

©2017 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD arrow logo, Radeon, and combinations thereof, are trademarks of Advanced Micro Devices, Inc.
All other product names are for reference only and may be trademarks of their respective owners. “Fiji” is an internal architecture code name and not a product name.

FOOTNOTES
1. Measurements conducted by AMD Performance Labs as of June 2, 2017 on the Radeon Instinct™ MI8 “Fiji” architecture based accelerator. Results
are estimates only and may vary. Performance may vary based on use of latest drivers. PC/system manufacturers may vary configurations yielding
different results. The results calculated for MI8 resulted in 8.2 TFLOPS peak half precision (FP16) performance and 8.2 TFLOPS peak single precision
(FP32) floating-point performance.
AMD TFLOPS calculations conducted with the following equation: FLOPS calculations are performed by taking the engine clock from the highest
DPM state and multiplying it by xx CUs per GPU. Then, multiplying that number by xx stream processors, which exist in each CU. Then, that number is
multiplied by 2 FLOPS per clock for FP32. To calculate TFLOPS for FP16, 4 FLOPS per clock were used.
Measurements on the Nvidia Tesla P40 resulted in 0.19 TFLOPS peak half precision (FP16) peak floating-point performance with 250w TDP GPU
card from external source.
Sources:
https://devblogs.nvidia.com/parallelforall/mixed-precision-programming-cuda-8/
http://images.nvidia.com/content/pdf/tesla/184427-Tesla-P40-Datasheet-NV-Final-Letter-Web.pdf
Measurements on the Nvidia Tesla P4 resulted in 0.09 TFLOPS peak half precision (FP16) floating-point performance with 75w TDP GPU card from
external source.
Sources:
https://devblogs.nvidia.com/parallelforall/mixed-precision-programming-cuda-8/
http://images.nvidia.com/content/pdf/tesla/184457-Tesla-P4-Datasheet-NV-Final-Letter-Web.pdf
AMD has not independently tested or verified external and/or third party results/data and bears no responsibility for any errors or omissions therein.
RIF-1
2. Measurements conducted by AMD Performance Labs as of June 2, 2017 on the Radeon Instinct™ MI8 “Fiji” architecture based accelerator. Results
are estimates only and may vary. Performance may vary based on use of latest drivers. PC/system manufacturers may vary configurations yielding
different results. The results calculated for Radeon Instinct MI8 resulted in 47 GFLOPS/watt peak half precision (FP16) performance and 47 GFLOPS/
watt peak single precision (FP32) floating-point performance.
AMD GFLOPS per watt calculations conducted with the following equation: FLOPS calculations are performed by taking the engine clock from the
highest DPM state and multiplying it by xx CUs per GPU. Then, multiplying that number by xx stream processors, which exist in each CU. Then, that
number is multiplied by 2 FLOPS per clock for FP32. To calculate TFLOPS for FP16, 4 FLOPS per clock were used.
Once the TFLOPs are calculated, the number is divided by the 175w TDP power and multiplied by 1,000.
Measurements on the Nvidia Tesla P40 based on 0.19 TFLOPS peak FP16 with 250w TDP GPU card result in 0.76 GFLOPS/watt peak half precision
(FP16) performance.
Sources for Nvidia Tesla P40 FP16 TFLOPs number:
https://devblogs.nvidia.com/parallelforall/mixed-precision-programming-cuda-8/
http://images.nvidia.com/content/pdf/tesla/184427-Tesla-P40-Datasheet-NV-Final-Letter-Web.pdf
Measurements on the Nvidia Tesla P4 based on 0.09 TFLOPS peak FP16 with 75w TDP GPU card result in 1.2 GFLOPS/watt peak half precision
(FP16) performance.
Sources for Nvidia Tesla P40 FP16 TFLOPs number:
https://devblogs.nvidia.com/parallelforall/mixed-precision-programming-cuda-8/
http://images.nvidia.com/content/pdf/tesla/184457-Tesla-P4-Datasheet-NV-Final-Letter-Web.pdf
AMD has not independently tested or verified external and/or third party results/data and bears no responsibility for any errors or omissions therein.
RIF-2
3. The Radeon Instinct GPU accelerator products come with a three year limited warranty. Please visit www.AMD.com/warranty for details on the
specific graphics products purchased. Toll-free phone service available in the U.S. and Canada only, email access is global.
4. Support for Python is planned, but still under development.



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : Yes
Language                        : en-GB
Tagged PDF                      : Yes
XMP Toolkit                     : Adobe XMP Core 5.6-c137 79.159768, 2016/08/11-13:24:42
Create Date                     : 2017:06:20 12:12:10+01:00
Metadata Date                   : 2017:06:20 12:12:11+01:00
Modify Date                     : 2017:06:20 12:12:11+01:00
Creator Tool                    : Adobe InDesign CC 2017 (Macintosh)
Instance ID                     : uuid:9afe36e6-5dcf-7546-90fc-61453a351d07
Original Document ID            : xmp.did:7f3322eb-6965-4d92-9475-0b9f16020ad1
Document ID                     : xmp.id:2e681d27-db39-4a2d-a35c-7e3eebf2e3aa
Rendition Class                 : proof:pdf
History Action                  : converted
History Parameters              : from application/x-indesign to application/pdf
History Software Agent          : Adobe InDesign CC 2017 (Macintosh)
History Changed                 : /
History When                    : 2017:06:20 12:12:10+01:00
Derived From Instance ID        : xmp.iid:22fa54e4-336c-4049-934c-93f38ed197cd
Derived From Document ID        : xmp.did:7f3322eb-6965-4d92-9475-0b9f16020ad1
Derived From Original Document ID: xmp.did:7f3322eb-6965-4d92-9475-0b9f16020ad1
Derived From Rendition Class    : default
Format                          : application/pdf
Producer                        : Adobe PDF Library 15.0
Trapped                         : False
Page Count                      : 3
Creator                         : Adobe InDesign CC 2017 (Macintosh)
EXIF Metadata provided by EXIF.tools

Navigation menu