OpenCL_Programming_Guide AMD Accelerated Parallel Processing Open CL Programming Guide
User Manual:
Open the PDF directly: View PDF
Page Count: 286 [warning: Documents this large are best viewed by clicking the View PDF Link!]
- AMD Accelerated Parallel Processing OpenCL‰
- Preface
- Contents
- Chapter 1 OpenCL Architecture and AMD Accelerated Parallel Processing
- 1.1 Software Overview
- 1.2 Hardware Overview for Southern Islands Devices
- 1.3 Hardware Overview for Evergreen and Northern Islands Devices
- 1.4 The AMD Accelerated Parallel Processing Implementation of OpenCL
- Figure 1.5 AMD Accelerated Parallel Processing Software Ecosystem
- Figure 1.6 Simplified Mapping of OpenCL onto AMD Accelerated Parallel Processing for Evergreen and Northern Island Devices
- Figure 1.7 Work-Item Grouping Into Work-Groups and Wavefronts
- 1.4.1 Work-Item Processing
- 1.4.2 Flow Control
- 1.4.3 Work-Item Creation
- 1.5 Memory Architecture and Access
- 1.6 Communication Between Host and GPU in a Compute Device
- 1.7 GPU Compute Device Scheduling
- 1.8 Terminology
- 1.9 Programming Model
- 1.10 Example Programs
- Chapter 2 Building and Running OpenCL Programs
- Chapter 3 Debugging OpenCL
- Chapter 4 OpenCL Performance and Optimization
- 4.1 AMD APP Profiler
- 4.2 AMD APP KernelAnalyzer
- 4.3 Analyzing Processor Kernels
- 4.4 Estimating Performance
- 4.5 OpenCL Memory Objects
- 4.6 OpenCL Data Transfer Optimization
- 4.7 Using Multiple OpenCL Devices
- Chapter 5 OpenCL Performance and Optimization for Southern Islands Devices
- 5.1 Global Memory Optimization
- 5.2 Local Memory (LDS) Optimization
- 5.3 Constant Memory Optimization
- 5.4 OpenCL Memory Resources: Capacity and Performance
- 5.5 Using LDS or L1 Cache
- 5.6 NDRange and Execution Range Optimization
- 5.7 Instruction Selection Optimizations
- 5.8 Additional Performance Guidance
- 5.9 Specific Guidelines for Southern Islands GPUs
- Chapter 6 OpenCL Performance and Optimization for Evergreen and Northern Islands Devices
- 6.1 Global Memory Optimization
- 6.2 Local Memory (LDS) Optimization
- 6.3 Constant Memory Optimization
- 6.4 OpenCL Memory Resources: Capacity and Performance
- 6.5 Using LDS or L1 Cache
- 6.6 NDRange and Execution Range Optimization
- 6.7 Using Multiple OpenCL Devices
- 6.8 Instruction Selection Optimizations
- 6.9 Clause Boundaries
- 6.10 Additional Performance Guidance
- Chapter 7 OpenCL Static C++ Programming Language
- Appendix A OpenCL Optional Extensions
- A.1 Extension Name Convention
- A.2 Querying Extensions for a Platform
- A.3 Querying Extensions for a Device
- A.4 Using Extensions in Kernel Programs
- A.5 Getting Extension Function Pointers
- A.6 List of Supported Extensions that are Khronos-Approved
- A.7 cl_ext Extensions
- A.8 AMD Vendor-Specific Extensions
- A.8.1 cl_amd_fp64
- A.8.2 cl_amd_vec3
- A.8.3 cl_amd_device_persistent_memory
- A.8.4 cl_amd_device_attribute_query
- A.8.5 cl_amd_device_profiling_timer_offset
- A.8.6 cl_amd_device_topology
- A.8.7 cl_amd_device_board_name
- A.8.8 cl_amd_compile_options
- A.8.9 cl_amd_offline_devices
- A.8.10 cl_amd_event_callback
- A.8.11 cl_amd_popcnt
- A.8.12 cl_amd_media_ops
- A.8.13 cl_amd_media_ops2
- A.8.14 cl_amd_printf
- A.9 cl_amd_predefined_macros
- A.10 Supported Functions for cl_amd_fp64 / cl_khr_fp64
- A.11 Extension Support by Device
- Appendix B The OpenCL Installable Client Driver (ICD)
- Appendix C Compute Kernel
- Appendix D Device Parameters
- Table D.1 Parameters for 7xxx Devices
- Table D.2 Parameters for 68xx and 69xx Devices
- Table D.3 Parameters for 65xx, 66xx, and 67xx Devices
- Table D.4 Parameters for 64xx Devices
- Table D.5 Parameters for Zacate and Ontario Devices
- Table D.6 Parameters for 56xx, 57xx, 58xx, Eyfinity6, and 59xx Devices
- Table D.7 Parameters for Exxx, Cxx, 54xx, and 55xx Devices
- Appendix E OpenCL Binary Image Format (BIF) v2.0
- Appendix F Open Decode API Tutorial
- Appendix G OpenCL-OpenGL Interoperability
- Index