RADEON Instinct MI25 Data Sheet
User Manual: RADEON-Instinct-MI25 DataSheet
Open the PDF directly: View PDF .
Page Count: 3
Download | |
Open PDF In Browser | View PDF |
The Next Era of Compute and Machine Intelligence THE WORLD’S FASTEST TRAINING ACCELERATOR FOR MACHINE INTELLIGENCE AND DEEP LEARNING1 The Radeon Instinct™ MI25 accelerator, designed with the most advanced Next-Gen “Vega” GPU architecture, is the ultimate training accelerator for large scale machine intelligence and deep learning, along with being an optimized open compute workhorse for single-precision HPC-class system workloads. The MI25 delivers leadership in FP16 and FP32 performance in a passively-cooled single GPU server card with 24.6 TFLOPS of FP16 and 12.3 TFLOPS of FP32 peak performance through its 64 compute units with 4,096 stream processors.1 The Radeon Instinct MI25’s powerful compute engine and advanced memory architecture, combined with AMD’s ROCm open software platform and ecosystem, provides a powerful, flexible heterogeneous compute solution that allows datacenter designers to meet the challenges of a new era of compute and Machine Intelligence. AMD’s Radeon Instinct™ MI25, combined with the ROCm open software platform and MIOpen libraries, delivers superior performance per watt for deep learning training deployments in the datacenter.2 Superior Performance Per Watt Nvidia Tesla P100-SMX2 Nvidia Tesla P100-16 Radeon Instinct MI25 71 75 82 Highlights • • • • • • • Industry Leading Performance for Deep Learning1 Next-Gen “Vega” Architecture Advanced Memory Engine Large BAR Support for Multi-GPU Peer to Peer ROCm Open Software Platform for Rack Scale Optimized MIOpen Libraries for Deep Learning MxGPU Hardware Virtualization Key Features GPU Architecture: AMD “Vega10” Stream Processors: 4,096 Performance: Half-Precision (FP16) Single-Precision (FP32) Double-Precision (FP64) 24.6 TFLOPS 12.3 TFLOPS 768 GFLOPS GPU Memory: 16GB HBM2 Memory Bandwidth: Up to 484 GB/s ECC: Yes3 Bus Interface: PCIe® Gen 3 x16 MxGPU Capability: Yes Board Form Factor: Full-Height, Duel-Slot Length: 10.5” Thermal Solution: Passively Cooled Standard Max Power: 300W TDP Warranty: Three Year Limited4 OS Support: Linux® 64-bit ROCm Software Platform: Yes Programing Environment: Peak FP16 GFLOPS/Watt ISO C++, OpenCL™, CUDA (via AMD’s HIP conversion tool) and Python5 (via Anaconda’s NUMBA) Note: “Vega 10” is an internal codename only For more information, visit: Radeon.com/Instinct NEXT-GEN “VEGA” ARCHITECTURE WITH THE WORLD’S MOST ADVANCED MEMORY ARCHITECTURE The Radeon Instinct MI25 accelerator brings in a new era of compute for the datacenter with its Next-Gen “Vega” architecture delivering superior compute performance via its powerful parallel compute engine and Next-Gen programmable geometry pipeline improving processing efficiencies, while delivering 2x peak throughput-per-clock over previous Radeon architectures.6 The Radeon Instinct MI25 provides increased performance density, while decreasing energy consumption per operation making it the perfect solution for today’s demanding workloads in the datacenter. NEXT-GEN “VEGA” ARCHITECTURE PASSIVELY COOLED REMOTE MANAGEABILITY CAPABILITIES World’s most advanced GPU compute engine and memory architecture built with cuttingedge 14nm FinFET process, purpose-built to handle big data sets and a diverse range of computational workloads. The Radeon Instinct MI25 design is a passivelycooled accelerator design for large-scale server deployments. The Radeon Instinct MI25 accelerator has advanced out-of-band manageability circuitry for simplified GPU monitoring in large scale systems. The MI25’s manageability capabilities provide accessibility via I2C, regardless of what state the GPU is in, providing advanced monitoring of a range of static and dynamic GPU information using PMCI compliant data structures including board part detail, serial numbers, GPU temperature, power and other information. HBM2: ULTRA-HIGH MEMORY BANDWIDTH Combined with AMD’s state of the art Infinity Memory Engine with a newly designed High Bandwidth Cache (HBC) and controller, the MI25 GPU has 16GB of latest HBM2 ECC3 GPU memory with 484 GB/s of memory bandwidth. MxGPU SR-IOV HARDWARE VIRTUALIZATION Design support for AMD’s MxGPU SR-IOV hardware virtualization technology, the Radeon Instinct MI25 provides a VDI solution with dedicated user GPU resources, data security and version control. Plus, a cost effective licensing model with no additional hardware licensing fees, and a simplified native driver model ensuring operating system and application compatibility. For more information, visit: Radeon.com/Instinct ROCm.github.io ROCm OPEN SOFTWARE PLATFORM AMD’s ROCm platform provides a scalable, fully open source software platform optimized for large-scale heterogeneous system deployments with an open source headless Linux driver, HCC compiler, rich runtime based on HSA, tools and libraries. LEADERSHIP IN FP16 & FP32 PERFORMANCE1 Up to 1.3x More FP16 TFLOPS Nvidia Tesla P100-16 Radeon Instinct MI25 Up to 1.3x More FP32 TFLOPS Nvidia Tesla P100-16 18.7 24.5 Peak FP16 TFLOPS ©2017 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD arrow logo, Radeon, and combinations thereof, are trademarks of Advanced Micro Devices, Inc. All other product names are for reference only and may be trademarks of their respective owners. “Vega” and “Vega 10” are internal architecture code names and not products. Radeon Instinct MI25 Peak FP32 TFLOPS 9.3 12.3 FOOTNOTES 1. Measurements conducted by AMD Performance Labs as of June 2, 2017 on the Radeon Instinct™ MI25 “Vega” architecture based accelerator. Results are estimates only and may vary. Performance may vary based on use of latest drivers. PC/system manufacturers may vary configurations yielding different results. The results calculated for Radeon Instinct MI25 resulted in 24.6 TFLOPS peak half precision (FP16) and 12.3 TFLOPS peak single precision (FP32) floating-point performance. AMD TFLOPS calculations conducted with the following equation: FLOPS calculations are performed by taking the engine clock from the highest DPM state and multiplying it by xx CUs per GPU. Then, multiplying that number by xx stream processors, which exist in each CU. Then, that number is multiplied by 2 FLOPS per clock for FP32. To calculate TFLOPS for FP16, 4 FLOPS per clock were used. The FP64 TFLOPS rate is calculated using 1/16th rate. External results on the NVidia Tesla P100-16 (16GB card) GPU Accelerator resulted in 18.7 TFLOPS peak half precision (FP16) and 9.3 TFLOPS peak single precision (FP32) floating-point performance. Results found at: https://images.nvidia.com/content/tesla/pdf/nvidia-tesla-p100-PCIe-datasheet.pdf. External results on the NVidia Tesla P100-SXM2 GPU Accelerator resulted in 21.2 TFLOPS peak half precision (FP16) and 10.6 TFLOPS peak single precision (FP32) floating-point performance. Results found at: http://www.nvidia.com/object/tesla-p100.html AMD has not independently tested or verified external/third party results/data and bears no responsibility for any errors or omissions therein. RIV-1 2. Measurements conducted by AMD Performance Labs as of June 2, 2017 on the Radeon Instinct™ MI25 “Vega” architecture based accelerator. Results are estimates only and may vary. Performance may vary based on use of latest drivers. PC/system manufacturers may vary configurations yielding different results. The results calculated for Radeon Instinct MI25 resulted in 82 GFLOPS/watt peak half precision (FP16) or 41 GFLOPS/watt peak single precision (FP32) floating-point performance. AMD GFLOPS per watt calculations conducted with the following equation: FLOPS calculations are performed by taking the engine clock from the highest DPM state and multiplying it by xx CUs per GPU. Then, multiplying that number by xx stream processors, which exist in each CU. Then, that number is multiplied by 2 FLOPS per clock for FP32. To calculate TFLOPS for FP16, 4 FLOPS per clock were used. The FP64 TFLOPS rate is calculated using 1/16th rate. Once the TFLOPs are calculated, the number is divided by the xxx watts TDP power and multiplied by 1,000 to determine the GFLOPS per watt. Calculations conducted by AMD Performance Labs as of June 2, 2017 on the NVidia Tesla P100-16 (16GB card) GPU Accelerator to determine GFLOPS/watt by dividing TFLOPS results by 250 watts TDP resulted in 75 GFLOPS per watt peak half precision (FP16) and 37 GFLOPS per watt peak single precision (FP32) floating-point performance. Sources: https://images.nvidia.com/content/tesla/pdf/nvidia-tesla-p100-PCIe-datasheet.pdf Calculations conducted by AMD Performance Labs as of June 2, 2017 on the NVidia Tesla P100-SXM2 GPU Accelerator to determine GFLOPS/ watt by dividing TFLOPS results by 300 watts TDP resulted in 71 GFLOPS per watt peak half precision (FP16) and 35 GFLOPS per watt peak single precision (FP32) floating-point performance. Sources: http://www.nvidia.com/object/tesla-p100.html AMD has not independently tested or verified external/third party results/data and bears no responsibility for any errors or omissions therein. RIV-4 3. ECC support is limited to the HBM2 memory and ECC protection is not provided for internal GPU structures. 4. The Radeon Instinct GPU accelerator products come with a three year limited warranty. Please visit www.AMD.com/warranty for details on the specific graphics products purchased. Toll-free phone service available in the U.S. and Canada only, email access is global. 5. Support for Python is planned, but still under development. 6. Data based on AMD Engineering design of Vega. Radeon R9 Fury X has 4 geometry engines and a peak of 4 polygons per clock. Vega is designed to handle up to 11 polygons per clock with 4 geometry engines. This represents an increase of 2.6x. VG-3
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.4 Linearized : Yes Language : en-GB Tagged PDF : Yes XMP Toolkit : Adobe XMP Core 5.6-c137 79.159768, 2016/08/11-13:24:42 Create Date : 2017:06:20 11:44:35+01:00 Metadata Date : 2017:06:20 11:44:36+01:00 Modify Date : 2017:06:20 11:44:36+01:00 Creator Tool : Adobe InDesign CC 2017 (Macintosh) Instance ID : uuid:2b55048c-b0a1-f94c-bc26-d6afbbcf5dba Original Document ID : xmp.did:7f3322eb-6965-4d92-9475-0b9f16020ad1 Document ID : xmp.id:970d4a84-6ceb-4f7c-b9f2-2d8f2ee7526b Rendition Class : proof:pdf Derived From Instance ID : xmp.iid:a1ee1998-4823-4adf-8bb8-eca8fd4e1e6b Derived From Document ID : xmp.did:8115b289-2e30-4a73-81d5-5ef60162b534 Derived From Original Document ID: xmp.did:7f3322eb-6965-4d92-9475-0b9f16020ad1 Derived From Rendition Class : default History Action : converted History Parameters : from application/x-indesign to application/pdf History Software Agent : Adobe InDesign CC 2017 (Macintosh) History Changed : / History When : 2017:06:20 11:44:35+01:00 Format : application/pdf Producer : Adobe PDF Library 15.0 Trapped : False Page Count : 3 Creator : Adobe InDesign CC 2017 (Macintosh)EXIF Metadata provided by EXIF.tools