Opencl22 Reference Guide
User Manual: Pdf
Open the PDF directly: View PDF
.
Page Count: 24
| Download | |
| Open PDF In Browser | View PDF |
OpenCL 2.2 Reference Guide
Page 1
OpenCL API Reference
The OpenCL Platform Layer
The OpenCL platform layer implements platform-specific
features that allow applications to query OpenCL devices,
device configuration information, and to create OpenCL
contexts using one or more devices. Items in blue apply only
when the appropriate extension is enabled (see Extensions on
page 21 of this reference guide).
Querying platform info & devices [4.1-2] [9.16.9]
cl_int clGetPlatformIDs (cl_uint num_entries,
cl_platform_id *platforms, cl_uint *num_platforms)
cl_int clIcdGetPlatformIDsKHR (cl_uint num_entries,
cl_platform_id * platfoms, cl_uint *num_platforms)
cl_int clGetPlatformInfo (cl_platform_id platform,
cl_platform_info param_name,
size_t param_value_size, void *param_value,
size_t *param_value_size_ret)
param_name: CL_PLATFORM_{PROFILE, VERSION},
CL_PLATFORM_{NAME, VENDOR, EXTENSIONS},
CL_PLATFORM_HOST_TIMER_RESOLUTION,
CL_PLATFORM_ICD_SUFFIX_KHR [Table 4.1]
cl_int clGetDeviceIDs (cl_platform_id platform,
cl_device_type device_type, cl_uint num_entries,
cl_device_id *devices, cl_uint *num_devices)
device_type: [Table 4.2]
CL_DEVICE_TYPE_{ACCELERATOR, ALL, CPU},
CL_DEVICE_TYPE_{CUSTOM, DEFAULT, GPU}
cl_int clGetDeviceInfo (cl_device_id device,
cl_device_info param_name,
size_t param_value_size, void *param_value,
size_t *param_value_size_ret)
param_name: [Table 4.3]
CL_DEVICE_ADDRESS_BITS, CL_DEVICE_AVAILABLE,
CL_DEVICE_BUILT_IN_KERNELS,
CL_DEVICE_COMPILER_AVAILABLE,
CL_DEVICE_{DOUBLE, HALF, SINGLE}_FP_CONFIG,
CL_DEVICE_ENDIAN_LITTLE,
CL_DEVICE_EXTENSIONS,
CL_DEVICE_ERROR_CORRECTION_SUPPORT,
CL_DEVICE_EXECUTION_CAPABILITIES,
CL_DEVICE_GLOBAL_MEM_CACHE_{SIZE, TYPE},
CL_DEVICE_GLOBAL_MEM_{CACHELINE_SIZE, SIZE},
CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE,
CL_DEVICE_IL_VERSION,
CL_DEVICE_IMAGE_MAX_{ARRAY, BUFFER}_SIZE,
CL_DEVICE_IMAGE_SUPPORT,
CL_DEVICE_IMAGE2D_MAX_{WIDTH, HEIGHT},
CL_DEVICE_IMAGE3D_MAX_{WIDTH, HEIGHT,
DEPTH},
CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT,
CL_DEVICE_IMAGE_PITCH_ALIGNMENT,
CL_DEVICE_LINKER_AVAILABLE,
CL_DEVICE_LOCAL_MEM_{TYPE, SIZE},
CL_DEVICE_MAX_{CLOCK_FREQUENCY, PIPE_ARGS},
CL_DEVICE_MAX_{COMPUTE_UNITS, SAMPLERS},
CL_DEVICE_MAX_CONSTANT_{ARGS, BUFFER_SIZE},
CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE,
CL_DEVICE_MAX_{MEM_ALLOC, PARAMETER}_SIZE,
CL_DEVICE_MAX_NUM_SUB_GROUPS,
CL_DEVICE_MAX_ON_DEVICE_{QUEUES, EVENTS},
CL_DEVICE_MAX_{READ, WRITE}_IMAGE_ARGS,
CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS,
CL_DEVICE_MAX_SUB_GROUPS,
CL_DEVICE_MAX_WORK_GROUP_SIZE,
CL_DEVICE_MAX_WORK_ITEM_{DIMENSIONS, SIZES},
CL_DEVICE_MEM_BASE_ADDR_ALIGN,
CL_DEVICE_NAME,
CL_DEVICE_NATIVE_VECTOR_WIDTH_{CHAR, INT, DOUBLE, HALF, LONG, SHORT, FLOAT),
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT,
©2017 Khronos Group - Rev. 0817
[n.n.n] and purple text: sections and text in the OpenCL API 2.2 Spec.
[n.n.n] and green text: sections and text in the OpenCL C++ 2.2 Spec.
[n.n.n] and brown text: sections and text in the OpenCL C 2.0 Spec.
[n.n.n] and blue text: sections and text in the OpenCL Extension 2.2 Spec.
Section and table references are to the OpenCL API 2.2 specification.
CL_DEVICE_{OPENCL_C_VERSION, PARENT_DEVICE},
CL_DEVICE_PARTITION_AFFINITY_DOMAIN,
CL_DEVICE_PARTITION_MAX_SUB_DEVICES,
CL_DEVICE_PARTITION_{PROPERTIES, TYPE},
CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS,
CL_DEVICE_PIPE_MAX_PACKET_SIZE,
CL_DEVICE_{PLATFORM, PRINTF_BUFFER_SIZE},
CL_DEVICE_PREFERRED_Y_ATOMIC_ALIGNMENT
(where Y may be LOCAL, GLOBAL, PLATFORM),
CL_DEVICE_PREFERRED_VECTOR_WIDTH_Z
(where Z may be CHAR, INT, DOUBLE, HALF, LONG,
SHORT, FLOAT),
CL_DEVICE_PREFERRED_INTEROP_USER_SYNC,
CL_DEVICE_PROFILE,
CL_DEVICE_PROFILING_TIMER_RESOLUTION,
CL_DEVICE_SPIR_VERSIONS,
CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS
CL_DEVICE_QUEUE_ON_{DEVICE, HOST}_PROPERTIES,
CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE,
CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE,
CL_DEVICE_{REFERENCE_COUNT, VENDOR_ID},
CL_DEVICE_SVM_CAPABILITIES,
CL_DEVICE_TERMINATE_CAPABILITY_KHR,
CL_DEVICE_{TYPE, VENDOR},
CL_DEVICE_VENDOR_ID,
CL_{DEVICE, DRIVER}_VERSION,
CL_DEVICE_MAX_NAMED_BARRIER_COUNT_KHR
cl_int clGetDeviceAndHostTimer (cl_device_id device,
cl_ulong *device_timestamp,
cl_ulong *host_timestamp)
cl_int clGetHostTimer (cl_device_id device,
cl_ulong *host_timestamp)
Partitioning a device [4.3]
cl_int clCreateSubDevices (cl_device_id in_device,
const cl_device_partition_property *properties,
cl_uint num_devices, cl_device_id *out_devices,
cl_uint *num_devices_ret)
properties: [Table 4.4] CL_DEVICE_PARTITION_EQUALLY,
CL_DEVICE_PARTITION_BY_COUNTS,
CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN
The OpenCL Runtime
API calls that manage OpenCL objects such as commandqueues, memory objects, program objects, kernel objects
for __kernel functions in a program and calls that allow you to
enqueue commands to a command-queue such as executing a
kernel, reading, or writing a memory object.
Command queues [5.1]
cl_command_queue
clCreateCommandQueueWithProperties (
cl_context context, cl_device_id device,
const cl_command_queue_properties *properties,
cl_int *errcode_ret)
*properties: Points to a zero-terminated list of properties
and their values: [Table 5.1] CL_QUEUE_SIZE,
CL_QUEUE_PROPERTIES (bitfield which may be
set to an OR of CL_QUEUE_* where * may
be: OUT_OF_ORDER_EXEC_MODE_ENABLE,
PROFILING_ENABLE, ON_DEVICE[_DEFAULT]),
CL_QUEUE_THROTTLE_{HIGH, MED, LOW}_KHR
(requires the cl_khr_throttle_hint extension),
CL_QUEUE_PRIORITY_KHR (bitfield which may be
one of CL_QUEUE_PRIORITY_HIGH_KHR,
CL_QUEUE_PRIORITY_MED_KHR,
CL_QUEUE_PRIORITY_LOW_KHR
(requires the cl_khr_priority_hints extension))
cl_int clRetainDevice (cl_device_id device)
cl_int clReleaseDevice (cl_device_id device)
Contexts [4.4]
cl_context clCreateContext (
const cl_context_properties *properties,
cl_uint num_devices, const cl_device_id *devices,
void (CL_CALLBACK*pfn_notify)
(const char *errinfo, const void *private_info,
size_t cb, void *user_data),
void *user_data, cl_int *errcode_ret)
properties: [Table 4.5]
NULL or CL_CONTEXT_PLATFORM,
CL_CONTEXT_INTEROP_USER_SYNC,
CL_CONTEXT_{D3D10, D3D11}_DEVICE_KHR,
CL_CONTEXT_ADAPTER_{D3D9, D3D9EX}_KHR,
CL_CONTEXT_ADAPTER_DXVA_KHR,
CL_CONTEXT_MEMORY_INITIALIZE_KHR,
CL_CONTEXT_TERMINATE_KHR,
CL_GL_CONTEXT_KHR, CL_CGL_SHAREGROUP_KHR,
CL_{EGL, GLX}_DISPLAY_KHR, CL_WGL_HDC_KHR
cl_context clCreateContextFromType (
const cl_context_properties *properties,
cl_device_type device_type,
void (CL_CALLBACK *pfn_notify)
(const char *errinfo, const void *private_info,
size_t cb, void *user_data),
void *user_data, cl_int *errcode_ret)
properties: See clCreateContext
device_type: See clGetDeviceIDs
cl_int clRetainContext (cl_context context)
cl_int clReleaseContext (cl_context context)
cl_int clGetContextInfo (cl_context context,
cl_context_info param_name,
size_t param_value_size, void *param_value,
size_t *param_value_size_ret)
param_name:
CL_CONTEXT_X where X may be REFERENCE_COUNT,
DEVICES, NUM_DEVICES, PROPERTIES,
D3D10_PREFER_SHARED_RESOURCES_KHR,
D3D11_PREFER_SHARED_RESOURCES_KHR [Table 4.6]
cl_int clTerminateContextKHR (cl_context context)
Get CL extension function pointers [9.2]
void* clGetExtensionFunctionAddressForPlatform (
cl_platform_id platform, const char *funcname)
cl_int clSetDefaultDeviceCommandQueue (
cl_context context, cl_device_id device,
cl_command_queue command_queue)
cl_int clRetainCommandQueue (
cl_command_queue command_queue)
cl_int clReleaseCommandQueue (
cl_command_queue command_queue)
cl_int clGetCommandQueueInfo (
cl_command_queue command_queue,
cl_command_queue_info param_name,
size_t param_value_size, void *param_value,
size_t *param_value_size_ret)
param_name: [Table 5.2]
CL_QUEUE_CONTEXT,
CL_QUEUE_DEVICE[_DEFAULT], CL_QUEUE_SIZE,
CL_QUEUE_REFERENCE_COUNT,
CL_QUEUE_PROPERTIES
www.khronos.org/opencl
OpenCL API
OpenCLTM (Open Computing Language) is a multi-vendor
open standard for general-purpose parallel programming of
heterogeneous systems that include CPUs, GPUs, and other
processors. OpenCL provides a uniform programming environment
for software developers to write efficient, portable code for highperformance compute servers, desktop computer systems, and
handheld devices.
Specification documents and online reference are available at
www.khronos.org/opencl.
Page 2
Buffer Objects
Elements of buffer objects are stored sequentially and accessed using a pointer by a kernel executing
on a device.
OpenCL API
Create buffer objects [5.2.1]
cl_mem clCreateBuffer (
cl_context context, cl_mem_flags flags, size_t size,
void *host_ptr, cl_int *errcode_ret)
flags: [Table 5.3] CL_MEM_READ_WRITE, CL_MEM_{WRITE, READ}_ONLY,
CL_MEM_HOST_NO_ACCESS, CL_MEM_HOST_{READ, WRITE}_ONLY,
CL_MEM_{USE, ALLOC, COPY}_HOST_PTR
cl_mem clCreateSubBuffer (
cl_mem buffer, cl_mem_flags flags, cl_buffer_create_type buffer_create_type,
const void *buffer_create_info, cl_int *errcode_ret)
flags: See clCreateBuffer
buffer_create_type: CL_BUFFER_CREATE_TYPE_REGION
Read, write, copy, & fill buffer objects [5.2.2-3]
cl_int clEnqueueReadBuffer (
cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read,
size_t offset, size_t size, void *ptr, cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueReadBufferRect (
cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_read,
const size_t *buffer_origin, const size_t *host_origin, const size_t *region,
size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch,
size_t host_slice_pitch, void *ptr, cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueWriteBuffer (
cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write,
size_t offset, size_t size, const void *ptr, cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueWriteBufferRect (
cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_write,
const size_t *buffer_origin, const size_t *host_origin, const size_t *region,
size_t buffer_row_pitch, size_t buffer_slice_pitch, size_t host_row_pitch,
size_t host_slice_pitch, const void *ptr, cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueFillBuffer (
cl_command_queue command_queue, cl_mem buffer, const void *pattern,
size_t pattern_size, size_t offset, size_t size, cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
Image Formats [5.3.1.1]
Supported combinations of image_channel_order and image_channel_data_type.
Built-in support [Table 5.8]
CL_R (read or write): CL_[HALF_]FLOAT, CL_UNORM_INT{8,16}, CL_SNORM_INT{8,16},
CL_SIGNED_INT{8,16,32}, CL_UNSIGNED_INT{8,16,32}
CL_DEPTH (read or write): CL_FLOAT, CL_UNORM_INT16
CL_DEPTH_STENCIL (read only): CL_FLOAT, CL_UNORM_INT24
(Requires the extension cl_khr_gl_depth_images)
CL_RG (read or write): CL_[HALF_]FLOAT, CL_UNORM_INT{8,16}, CL_SNORM_INT{8,16},
CL_SIGNED_INT{8,16,32}, CL_UNSIGNED_INT{8,16,32}
CL_RGBA (read or write): CL_[HALF_]FLOAT, CL_UNORM_INT{8,16}, CL_UNORM_INT_101010_2,
CL_SNORM_INT{8,16}, CL_SIGNED_INT{8,16,32}, CL_UNSIGNED_INT{8,16,32}
CL_BGRA (read or write): CL_UNORM_INT8
CL_sRGBA (read only): CL_UNORM_INT8
(Requires the extension cl_khr_srgb_image_writes)
Supported image channel order values [Table 5.6]
CL_R, CL_A (read and write): CL_[HALF_]FLOAT, CL_UNORM_INT{8,16},
CL_SIGNED_INT{8,16,32}, CL_UNSIGNED_INT{8,16,32}, CL_SNORM_INT{8,16}
CL_INTENSITY: CL_[HALF_]FLOAT, CL_UNORM_INT{8,16}, CL_SNORM_INT{8|16}
CL_DEPTH_STENCIL: Only used if extension cl_khr_gl_depth_images is enabled and
channel data type = CL_UNORM_INT24 or CL_FLOAT
CL_LUMINANCE: CL_UNORM_INT{8,16}, CL_[HALF_]FLOAT, CL_SNORM_INT{8,16}
CL_RG, CL_RA: CL_[HALF_]FLOAT, CL_UNORM_INT{8,16}, CL_SIGNED_INT{8,16, 32} ,
CL_UNSIGNED_INT{8,16,32}, CL_SNORM_INT{8,16}
CL_RGB: CL_UNORM_SHORT_{555,565} , CL_UNORM_INT_101010
CL_ARGB: CL_UNORM_INT8, CL_SIGNED_INT8,
CL_UNSIGNED_INT8, CL_SNORM_INT8
CL_BGRA: CL_{SIGNED, UNSIGNED}_INT8, CL_SNORM_INT8
©2017 Khronos Group - Rev. 0817
OpenCL 2.2 Reference Guide
cl_int clEnqueueCopyBuffer (
cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer,
size_t src_offset, size_t dst_offset, size_t size, cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueCopyBufferRect (
cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_buffer,
const size_t *src_origin, const size_t *dst_origin, const size_t *region,
size_t src_row_pitch, size_t src_slice_pitch, size_t dst_row_pitch,
size_t dst_slice_pitch, cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
Map buffer objects [5.2.4]
void * clEnqueueMapBuffer (
cl_command_queue command_queue, cl_mem buffer, cl_bool blocking_map,
cl_map_flags map_flags, size_t offset, size_t size,
cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event,
cl_int *errcode_ret)
map_flags: CL_MAP_{READ, WRITE}, CL_MAP_WRITE_INVALIDATE_REGION
Image Objects
Items in blue apply when the appropriate extension is enabled.
Create image objects [5.3.1]
cl_mem clCreateImage (
cl_context context, cl_mem_flags flags, const cl_image_format *image_format,
const cl_image_desc *image_desc, void *host_ptr, cl_int *errcode_ret)
flags: See clCreateBuffer
Query list of supported image formats [5.3.2]
cl_int clGetSupportedImageFormats (
cl_context context, cl_mem_flags flags, cl_mem_object_type image_type,
cl_uint num_entries, cl_image_format *image_formats,
cl_uint *num_image_formats)
flags: See clCreateBuffer
image_type: CL_MEM_OBJECT_IMAGE{1D, 2D, 3D},
CL_MEM_OBJECT_IMAGE1D_BUFFER, CL_MEM_OBJECT_IMAGE{1D, 2D}_ARRAY
Read, write, copy, & fill image objects [5.3.3-4]
cl_int clEnqueueReadImage (
cl_command_queue command_queue, cl_mem image, cl_bool blocking_read,
const size_t *origin, const size_t *region, size_t row_pitch, size_t slice_pitch,
void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list,
cl_event *event)
cl_int clEnqueueWriteImage (
cl_command_queue command_queue, cl_mem image, cl_bool blocking_write,
const size_t *origin, const size_t *region, size_t input_row_pitch,
size_t input_slice_pitch, const void *ptr, cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueFillImage (
cl_command_queue command_queue, cl_mem image, const void *fill_color,
const size_t *origin, const size_t *region,cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueCopyImage (
cl_command_queue command_queue, cl_mem src_image, cl_mem dst_image,
const size_t *src_origin, const size_t *dst_origin, const size_t *region,
cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)
Copy between image & buffer objects [5.3.5]
cl_int clEnqueueCopyImageToBuffer (
cl_command_queue command_queue, cl_mem src_image, cl_mem dst_buffer,
const size_t *src_origin, const size_t *region, size_t dst_offset,
cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueCopyBufferToImage (
cl_command_queue command_queue, cl_mem src_buffer, cl_mem dst_image,
size_t src_offset, const size_t *dst_origin, const size_t *region,
cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)
Map and unmap image objects [5.3.6]
void * clEnqueueMapImage (
cl_command_queue command_queue, cl_mem image, cl_bool blocking_map,
cl_map_flags map_flags, const size_t *origin, const size_t *region,
size_t *image_row_pitch, size_t *image_slice_pitch, cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event, cl_int *errcode_ret)
map_flags: CL_MAP_{READ, WRITE}, CL_MAP_WRITE_INVALIDATE_REGION
Query image objects [5.3.7]
cl_int clGetImageInfo (
cl_mem image, cl_image_info param_name, size_t param_value_size,
void *param_value, size_t *param_value_size_ret)
param_name: [Table 5.10] CL_IMAGE_FORMAT, CL_IMAGE_{ARRAY, ELEMENT}_SIZE,
CL_IMAGE_{ROW, SLICE}_PITCH, CL_IMAGE_{HEIGHT, WIDTH, DEPTH},
CL_IMAGE_NUM_{SAMPLES, MIP_LEVELS},
CL_IMAGE_DX9_MEDIA_PLANE_KHR,
CL_IMAGE_{D3D10, D3D11}_SUBRESOURCE_KHR
www.khronos.org/opencl
OpenCL 2.2 Reference Guide
Pipes
Page 3
Memory Objects
A memory object is a handle to a reference counted region
of global memory. Includes buffer objects, image objects,
and pipe objects. Items in blue apply when the appropriate
extension is enabled.
Create pipe objects [5.4.1]
Memory objects [5.5.1, 5.5.2]
cl_mem clCreatePipe (cl_context context,
cl_mem_flags flags, cl_uint pipe_packet_size,
cl_uint pipe_max_packets,
const cl_pipe_properties *properties,
cl_int *errcode_ret)
flags: 0 or CL_MEM_{READ, WRITE}_ONLY,
CL_MEM_{READ_WRITE, HOST_NO_ACCESS}
Pipe object queries [5.4.2]
cl_int clRetainMemObject (cl_mem memobj)
cl_int clReleaseMemObject (cl_mem memobj)
cl_int clSetMemObjectDestructorCallback (
cl_mem memobj, void (CL_CALLBACK *pfn_notify)
(cl_mem memobj, void *user_data),
void *user_data)
param_name:
cl_int clEnqueueUnmapMemObject (
cl_command_queue command_queue,
cl_mem memobj, void *mapped_ptr,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
Shared Virtual Memory [5.6]
Items in blue require the cl_khr_mipmap_image extension.
cl_int clGetPipeInfo (cl_mem pipe,
cl_pipe_info param_name, size_t param_value_size,
void *param_value, size_t *param_value_size_ret)
CL_PIPE_PACKET_SIZE, CL_PIPE_MAX_PACKETS
Shared Virtual Memory (SVM) allows the host and kernels
executing on devices to directly share complex, pointercontaining data structures such as trees and linked lists.
SVM sharing granularity
void* clSVMAlloc (
cl_context context, cl_svm_mem_flags flags,
size_t size, cl_uint alignment)
flags: [Table 5.14]
CL_MEM_READ_WRITE,
CL_MEM_{WRITE, READ}_ONLY,
CL_MEM_SVM_FINE_GRAIN_BUFFER,
CL_MEM_SVM_ATOMICS
void clSVMFree (cl_context context, void *svm_pointer)
Enqueuing SVM operations
cl_int clEnqueueSVMFree (
cl_command_queue command_queue,
cl_uint num_svm_pointers, void *sym_pointers[],
void (CL_CALLBACK*pfn_free_func)(
cl_command_queue command_queue,
cl_uint num_svm_pointers,
void *sym_pointers[], void *user_data),
void *user_data, cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueSVMMemcpy (
cl_command_queue command_queue,
cl_bool blocking_copy, void *dst_ptr,
const void *src_ptr, size_t size,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueSVMMemFill (
cl_command_queue command_queue,
void *svm_ptr, const void *pattern,
size_t pattern_size, size_t size,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueSVMMap (
cl_command_queue command_queue,
cl_bool blocking_map, cl_map_flags map_flags,
void *svm_ptr, size_t size,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueSVMUnmap (
cl_command_queue command_queue,
void *svm_ptr, cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueSVMMigrateMem (
cl_command_queue command_queue,
cl_uint num_svm_pointers, const void **svm_pointers,
const size_t *sizes, cl_mem_migration_flags flags,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
Sampler Objects [5.7]
cl_sampler
clCreateSamplerWithProperties (cl_context context,
const cl_sampler_properties *sampler_properties,
cl_int *errcode_ret)
sampler_properties: [Table 5.15]
CL_SAMPLER_NORMALIZED_COORDS,
CL_SAMPLER_{ADDRESSING, FILTER}_MODE,
CL_SAMPLER_MIP_FILTER_MODE,
CL_SAMPLER_LOD_{MIN, MAX}
cl_int clRetainSampler (cl_sampler sampler)
cl_int clReleaseSampler (cl_sampler sampler)
cl_int clGetSamplerInfo (cl_sampler sampler,
cl_sampler_info param_name,
size_t param_value_size, void *param_value,
size_t *param_value_size_ret)
param_name: CL_SAMPLER_REFERENCE_COUNT,
CL_SAMPLER_{CONTEXT, FILTER_MODE},
CL_SAMPLER_ADDRESSING_MODE,
CL_SAMPLER_NORMALIZED_COORDS [Table 5.16]
Program Objects
An OpenCL program consists of a set of kernels that are
identified as functions declared with the __kernel qualifier in
the program source.
Create program objects [5.8.1]
cl_program clCreateProgramWithSource (
cl_context context, cl_uint count, const char **strings,
const size_t *lengths, cl_int *errcode_ret)
cl_program clCreateProgramWithIL (cl_context context,
const void *il, size_t length, cl_int *errcode_ret)
cl_program clCreateProgramWithBinary (
cl_context context, cl_uint num_devices,
const cl_device_id *device_list, const size_t *lengths,
const unsigned char **binaries,
cl_int *binary_status, cl_int *errcode_ret)
cl_program clCreateProgramWithBuiltInKernels (
cl_context context, cl_uint num_devices,
const cl_device_id *device_list,
const char *kernel_names, cl_int *errcode_ret)
Retain and release program objects [5.8.2]
cl_int clRetainProgram (cl_program program)
cl_int clReleaseProgram (cl_program program)
cl_int clSetProgramReleaseCallback
(cl_program program, void (CL_CALLBACK*pfn_notify)
(cl_program prog, void *user_data),
void *user_data)
Set SPIR-V specialization constants [5.8.3]
cl_int clSetProgramSpecializationConstant (
cl_program program, cl_uint spec_id, size_t spec_size,
const void *spec_value)
Building program executables [5.8.4]
Flush and Finish [5.15]
cl_int clFlush (cl_command_queue command_queue)
cl_int clFinish (cl_command_queue command_queue)
©2017 Khronos Group - Rev. 0817
cl_int clBuildProgram (cl_program program,
cl_uint num_devices, const cl_device_id *device_list,
const char *options, void (CL_CALLBACK*pfn_notify)
(cl_program program, void *user_data),
void *user_data)
Migrate memory objects [5.5.4]
cl_int clEnqueueMigrateMemObjects (
cl_command_queue command_queue,
cl_uint num_mem_objects,
const cl_mem *mem_objects,
cl_mem_migration_flags flags,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
flags: CL_MIGRATE_MEM_OBJECT_HOST,
CL_MIGRATE_MEM_OBJECT_CONTENT_UNDEFINED
Query memory object [5.5.5]
cl_int clGetMemObjectInfo (cl_mem memobj,
cl_mem_info param_name, size_t param_value_size,
void *param_value, size_t *param_value_size_ret)
param_name: [Table 5.13]
CL_MEM_{TYPE, FLAGS, SIZE, HOST_PTR},
CL_MEM_CONTEXT, CL_MEM_OFFSET,
CL_MEM_{MAP, REFERENCE}_COUNT,
CL_MEM_ASSOCIATED_MEMOBJECT,
CL_MEM_USES_SVM_ POINTER,
CL_MEM_{D3D10, D3D11}_RESOURCE_KHR,
CL_MEM_DX9_MEDIA_ADAPTER_TYPE_KHR,
CL_MEM_DX9_MEDIA_SURFACE_INFO_KHR
Sampler declaration fields [6.13.14.1]
The sampler can be passed as an argument to the kernel using
clSetKernelArg, or declared in the outermost scope of kernel
functions, or it can be a constant variable of type sampler_t
declared in the program source.
const sampler_t =
| |
normalized-mode:
CLK_NORMALIZED_COORDS_{TRUE, FALSE}
address-mode:
CLK_ADDRESS_X, where X may be NONE, REPEAT,
CLAMP, CLAMP_TO_EDGE, MIRRORED_REPEAT
filter-mode: CLK_FILTER_NEAREST, CLK_FILTER_LINEAR
Separate compilation and linking [5.8.5]
cl_int clCompileProgram (cl_program program,
cl_uint num_devices, const cl_device_id *device_list,
const char *options, cl_uint num_input_headers,
const cl_program *input_headers,
const char **header_include_names,
void (CL_CALLBACK*pfn_notify)
(cl_program program, void *user_data),
void *user_data)
cl_program clLinkProgram (cl_context context,
cl_uint num_devices, const cl_device_id *device_list,
const char *options, cl_uint num_input_programs,
const cl_program *input_programs,
void (CL_CALLBACK*pfn_notify)
(cl_program program, void *user_data),
void *user_data, cl_int *errcode_ret)
Unload the OpenCL compiler [5.8.8]
cl_int clUnloadPlatformCompiler (
cl_platform_id platform)
Query program objects [5.8.9]
cl_int clGetProgramInfo (cl_program program,
cl_program_info param_name,
size_t param_value_size, void *param_value,
size_t *param_value_size_ret)
param_name: [Table 5.17]
CL_PROGRAM_{IL, REFERENCE_COUNT},
CL_PROGRAM_{CONTEXT, NUM_DEVICES, DEVICES},
CL_PROGRAM_{SOURCE, BINARY_SIZES, BINARIES},
CL_PROGRAM_{NUM_KERNELS, KERNEL_NAMES},
CL_PROGRAM_SCOPE_GLOBAL_{C,D}TORS_PRESENT
cl_int clGetProgramBuildInfo (
cl_program program, cl_device_id device,
cl_program_build_info param_name,
size_t param_value_size, void *param_value,
size_t *param_value_size_ret)
param_name: [Table 5.18]
CL_PROGRAM_BINARY_TYPE,
CL_PROGRAM_BUILD_{STATUS, OPTIONS, LOG},
CL_PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE
(Continued on next page >)
www.khronos.org/opencl
OpenCL API
A pipe is a memory object that stores data organized as a FIFO.
Pipe objects can only be accessed using built-in functions that
read from and write to a pipe. Pipe objects are not accessible
from the host.
Page 4
OpenCL 2.2 Reference Guide
Program Objects (continued)
Compiler options [5.8.6]
Preprocessor:
OpenCL API
(-D processed in order for clBuildProgram or
clCompileProgram)
-D name
-D name=definition
Math intrinsics:
-cl-single-precision-constant
-cl-denorms-are-zero
-cl-fp32-correctly-rounded-divide-sqrt
-I dir
Kernel Objects
A kernel object encapsulates the specific __kernel function and
the argument values to be used when executing it.
Create kernel objects [5.9.1]
cl_kernel clCreateKernel (cl_program program,
const char *kernel_name, cl_int *errcode_ret)
cl_int clCreateKernelsInProgram (cl_program program,
cl_uint num_kernels, cl_kernel *kernels,
cl_uint *num_kernels_ret)
Optimization options:
-cl-opt-disable
-cl-mad-enable
-cl-no-signed-zeros
-cl-finite-math-only
-cl-unsafe-math-optimizations -cl-fast-relaxed-math
-cl-uniform-work-group-size
Warning request/suppress:
-w
-Werror
Control OpenCL C and C++ language version:
-cl-std=CL1.1 OpenCL 1.1 specification
-cl-std=CL1.2 OpenCL 1.2 specification
-cl-std=CL2.0 OpenCL 2.0 specification
-cl-std=C++
OpenCL C++ specification
Query kernel argument information:
-cl-kernel-arg-info
cl_kernel clCloneKernel (cl_kernel source_kernel,
cl_int *errcode_ret)
cl_int clGetKernelInfo (cl_kernel kernel,
cl_kernel_info param_name,
size_t param_value_size, void *param_value,
size_t *param_value_size_ret)
param_name: [Table 5.20]
CL_KERNEL_{FUNCTION_NAME, NUM_ARGS},
CL_KERNEL_REFERENCE_COUNT,
CL_KERNEL_{ATTRIBUTES, CONTEXT, PROGRAM}
cl_int clSetKernelExecInfo (cl_kernel kernel,
cl_kernel_exec_info param_name,
size_t param_value_size, const void *param_value)
param_name: CL_KERNEL_EXEC_INFO_SVM_PTRS,
CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM
cl_int clGetKernelWorkGroupInfo (cl_kernel kernel,
cl_device_id device,
cl_kernel_work_group_info param_name,
size_t param_value_size, void *param_value,
size_t *param_value_size_ret)
param_name: CL_KERNEL_GLOBAL_WORK_SIZE,
CL_KERNEL_[COMPILE_]WORK_GROUP_SIZE,
CL_KERNEL_{COMPILE, MAX}_NUM_SUB_GROUPS,
CL_KERNEL_{LOCAL, PRIVATE}_MEM_SIZE,
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE
cl_int clGetKernelArgInfo (cl_kernel kernel,
cl_uint arg_indx, cl_kernel_arg_info param_name,
size_t param_value_size, void *param_value,
size_t *param_value_size_ret)
param_name: [Table 5.23] CL_KERNEL_ARG_NAME,
CL_KERNEL_ARG_{ACCESS, ADDRESS}_QUALIFIER,
CL_KERNEL_ARG_TYPE_{NAME, QUALIFIER}
Event Objects
cl_int clRetainEvent (cl_event event)
cl_int clRetainKernel (cl_kernel kernel)
cl_int clReleaseKernel (cl_kernel kernel)
Kernel arguments and queries [5.9.2-4]
cl_int clSetKernelArg (cl_kernel kernel,
cl_uint arg_index, size_t arg_size,
const void *arg_value)
cl_int clSetKernelArgSVMPointer (cl_kernel kernel,
cl_uint arg_index, const void *arg_value)
Event objects can be used to refer to a kernel execution
command, and read, write, map, and copy commands on
memory objects or user events.
Event objects [5.11]
cl_event clCreateUserEvent (
cl_context context, cl_int *errcode_ret)
cl_int clSetUserEventStatus (
cl_event event, cl_int execution_status)
cl_int clWaitForEvents (cl_uint num_events,
const cl_event *event_list)
cl_int clGetEventInfo (cl_event event,
cl_event_info param_name, size_t param_value_size,
void *param_value, size_t *param_value_size_ret)
param_name: [Table 5.24]
CL_EVENT_COMMAND_{QUEUE, TYPE},
CL_EVENT_{CONTEXT, REFERENCE_COUNT},
CL_EVENT_COMMAND_EXECUTION_STATUS
cl_int clReleaseEvent (cl_event event)
cl_int clSetEventCallback (cl_event event,
cl_int command_exec_callback_type,
void (CL_CALLBACK *pfn_event_notify)
(cl_event event, cl_int event_command_exec_status,
void *user_data), void *user_data)
Markers, barriers, & waiting for events [5.12]
cl_int clEnqueueMarkerWithWaitList (
cl_command_queue command_queue,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueBarrierWithWaitList (
cl_command_queue command_queue,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
Memory Model: SVM [3.3.3]
OpenCL extends the global memory region into host memory through a
shared virtual memory (SVM) mechanism. Three types of SVM in OpenCL:
Debugging options:
-g Generate additional errors for built-in functions
that allow you to enqueue commands on a device
SPIR binary options:
Requires the cl_khr_spir extension.
-x spir
-spir-std=x
Indicate that binary is in SPIR format
x is SPIR spec version, e.g.: 1.2
Double and half-precision floating-point in C++:
-cl-fp16-enable Enable full half data type support
cl_khr_fp16 macro.
-cl-fp64-enable Enable full half data type support
cl_khr_fp64 macro.
Linker options [5.8.7]
Library linking options:
-create-library
-enable-link-options
Program linking options:
-cl-denorms-are-zero
-cl-no-signed-zeroes
-cl-finite-math-only
-cl-fast-relaxed-math
-cl-unsafe-math-optimizations
cl_int clGetKernelSubGroupInfo (
cl_kernel kernel, cl_device_id device,
cl_kernel_sub_group_info param_name,
size_t input_value_size, const void *input_value,
size_t param_value_size, void *param_value,
size_t *param_value_size_ret)
param_name: [Table 5.22]
CL_KERNEL_LOCAL_SIZE_FOR_SUB_GROUP_COUNT,
CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE,
CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE
Execute kernels [5.10]
cl_int clEnqueueNDRangeKernel (
cl_command_queue command_queue,
cl_kernel kernel, cl_uint work_dim,
const size_t *global_work_offset,
const size_t *global_work_size,
const size_t *local_work_size,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueueNativeKernel (
cl_command_queue command_queue,
void (CL_CALLBACK *user_func)(void *), void *args,
size_t cb_args, cl_uint num_mem_objects,
const cl_mem *mem_list, const void **args_mem_loc,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
Profiling operations [5.14]
cl_int clGetEventProfilingInfo (cl_event event,
cl_profiling_info param_name,
size_t param_value_size, void *param_value,
size_t *param_value_size_ret)
param_name: [Table 5.25]
CL_PROFILING_COMMAND_{COMPLETE, QUEUED},
CL_PROFILING_COMMAND_{SUBMIT, START, END}
•
Coarse-Grained buffer SVM: (Required) Sharing at the granularity of regions of OpenCL buffer memory objects.
•
Fine-Grained buffer SVM: (Optional) Sharing occurs at the granularity of individual loads/stores into bytes within
OpenCL buffer memory objects.
•
Fine-Grained system SVM: Sharing occurs at the granularity of individual loads/stores into bytes occurring
anywhere within the host memory.
Summary of SVM options in OpenCL [3.3.3, Table 3-2]
SVM
Granularity of sharing
Memory allocation
Mechanisms to enforce consistency
Explicit updates between host and device?
Host synchronization points on the same
or between devices.
Host synchronization points between
devices
Synchronization points plus atomics (if
supported)
Synchronization points plus atomics (if
supported)
Yes, through Map and Unmap commands.
Non-SVM buffers
OpenCL Memory objects (buffer)
clCreateBuffer
Coarse-Grained buffer SVM
OpenCL Memory objects (buffer)
clSVMAlloc
Fine Grained buffer SVM
Bytes within OpenCL Memory objects (buffer)
clSVMAlloc
Fine-Grained system SVM
Bytes within Host memory (system)
Host memory allocation
mechanisms (e.g. malloc)
©2017 Khronos Group - Rev. 0817
Yes, through Map and Unmap commands.
No
No
www.khronos.org/opencl
OpenCL 2.2 Reference Guide
OpenCL C++ Language
OpenCL C++ Language Reference
Supported Data Types [3.1]
Header
cl_* types have exactly the same size as their host counterparts
defined in file. Half types require cl_khr_fp16.
Double types require that cl_khr_fp64 be enabled and that
CL_DEVICE_DOUBLE_FP_CONFIG is not zero.
Built-in scalar data types
API Type
-cl_char
cl_uchar
cl_short
cl_ushort
cl_int
cl_uint
cl_long
cl_ulong
cl_float
cl_double
cl_half
void
Description
true (1) or false (0)
8-bit signed
8-bit unsigned
16-bit signed
16-bit unsigned
32-bit signed
32-bit unsigned
64-bit signed
64-bit unsigned
32-bit float
64-bit IEEE 754
16-bit float (storage only)
empty set of values
Built-in vector data types
n is 2, 3, 4, 8, or 16. The halfn vector data type is required to be
supported as a data storage format.
OpenCL Type
booln
[u]charn
[u]shortn
[u]intn
[u]longn
floatn
doublen
halfn
API Type
Description
cl_[u]charn
cl_ [u]shortn
cl_ [u]intn
cl_ [u]longn
cl_floatn
cl_doublen
cl_ halfn
8-bit [un]signed
16-bit [un]signed
32-bit [un]signed
64-bit [un]signed
32-bit float
64-bit float
16-bit float
Other types [3.7.1, 3.8.1]
Section and table references are to the OpenCL 2.2 C++ Language specification.
half wrapper [3.6.1]
Header OpenCL C++ implements a wrapper
class for the built-in half data type. The class methods perform
implicit vload_half and vstore_half operations from Vector Data
Load and Store Functions section.
Constructs an object with a
fp16(const half &r) noexcept;
half built-in type.
Constructs an object with a
fp16(const float &r) noexcept;
float built-in type.
fp16(const double &r) noexcept; Constructs an object with a
double built-in type.
ndrange [3.13.6]
Header The ndrange type is used to
represent the size of the enqueued workload with a dimension
from 1 to 3.
struct ndrange {
explicit ndrange(size_t global_work_size) noexcept;
ndrange(size_t global_work_size, size_t local_work_size
noexcept;
ndrange(size_t global_work_offset, size_t global_work_size,
size_t local_work_size) noexcept;
template
ndrange(const size_t (&global_work_size)[N]) noexcept;
template
ndrange(const size_t (&global_work_size)[N],
const size_t (&global_work_size)[N]) noexcept;
template
ndrange(const size_t (&global_work_offset)[N],
const size_t (&global_work_size)[N],
const size_t (&global_work_size)[N]) noexcept;
};
Example
#include
#include
using namespace cl;
kernel void foo(device_queue q) {
q.enqueue_kernel(cl::enqueue_policy::no_wait, cl::ndrange( 1 ),
[](){ uint tid = get_global_id(0); } );
}
Header
Image and sampler types require CL_DEVICE_IMAGE_SUPPORT
is CL_TRUE. See header for pipe type. See
header for device_queue type.
Preprocessor Directives & Macros [2.7]
Type in OpenCL C++
cl::sampler
cl::image[1d, 2d, 3d]
cl::image1d_[buffer, array]
cl::image2d_ms
cl::image2d_array[_ms]
cl::image2d_depth[_ms]
cl::image2d_array_depth[_ms]
cl::pipe
cl::device_queue
__FILE__
Current source file
__LINE__
Integer line number
__OPENCL_CPP_VERSION__
Integer version number, e.g: 100
__func__
Current function name
API type for application
cl_sampler
cl_image
cl_pipe
cl_queue
Qualifiers and Optional Attributes
Function Qualifier [2.6.1]
__kernel, kernel
Type and Variable Attributes [2.8]
#pragma OPENCL FP_CONTRACT on-off-switch
on-off-switch: ON, OFF, or DEFAULT
#pragma OPENCL EXTENSION extensionname : behavior
#pragma OPENCL EXTENSION all : behavior
Conversions and Reinterpretation
[[cl::required_num_sub_groups(X)]]
The number of sub-groups that must be generated by a kernel
launch.
[[cl::vec_type_hint()]]
A hint to the compiler as a representation of the computational
width of the kernel.
[[cl::aligned(X)]]
[[cl::aligned]]
Specifies a minimum alignment (in bytes) for variables of the
Kernel Parameter Attribute [2.8.4]
specified type.
[[cl::max_size(n)]]
[[cl::packed]]
Specifies that each member of the structure or union is placed to The value of the attribute specifies the maximum size in bytes of
the corresponding memory object.
minimize the memory required.
Kernel Function Attributes [2.8.3]
Loop Attributes [2.8.5]
[[cl::required_work_group_size(X, Y, Z)]]
The work-group size that must be used as the local_work_size
argument to clEnqueueNDRangeKernel.
[[cl::ivdep(len)]]
[[cl::ivdep]]
A hint to indicate that the compiler may assume there are
no memory dependencies across loop iterations in order to
autovectorize consecutive iterations of loop.
[[cl::work_group_size_hint(X, Y, Z)]]
A hint to the compiler to specify the value most likely
to be specified by the local_work_size argument to
clEnqueueNDRangeKernel.
©2017 Khronos Group - Rev. 0817
OpenCL C++ and C++ 14
The OpenCL C++ programming language is based on the
ISO/IEC JTC1 SC22 WG21 N3690 language (a.k.a. C++14)
specification with specific restrictions and exceptions.
Section numbers denoted here with § refer to the C++ 14
specification.
• Implicit conversions for pointer types follow the rules
described in the C++ 14 specification.
• Conversions between integer types follow the conversion
rules specified in the C++14 specification except for
specific out-of-range behavior and saturated conversions.
• The preprocessing directives defined by the C++14
specification are supported.
• Macro names defined by the C++14 specification but not
currently supported by OpenCL are reserved for future
use.
• OpenCL C++ standard library implements modified version
of the C++ 14 numeric limits library.
• OpenCL C++ implements the following parts of the C++ 14
iterator library: Primitives, iterator operations, predefined
iterators, and range access.
• The OpenCL C++ kernel language doesn’t support variadic
functions and variable length arrays.
• OpenCL C++ library implements most of the C++14 tuples
except for allocator related traits (§ 20.4.2.8).
• OpenCL C++ supports type traits defined in the C++ 14
specification with additions and changes to the following:
°° UnaryTypeTraits (§ 3.15.1)
°° BinaryTypeTraits (§ 3.15.2)
°° TransformationTraits (§ 3.15.3)
• OpenCL C++ standard library implements most C++ 14
tuples excluding allocator related traits.
• C++14 features not supported by OpenCL C++:
°° the dynamic_cast operator (§ 5.2.7)
°° type identification (§ 5.2.8)
°° recursive function calls (§ 5.2.2, item 9) unless they are a
compile-time constant expression
°° non-placement new and delete operators (§ 5.3.4, 5.3.5)
°° goto statement (§ 6.6)
°° register and thread_local storage qualifiers (§ 7.1.1)
°° virtual function qualifier (§ 7.1.2)
°° function pointers (§ 8.3.5, 8.5.3) unless they are a
compile-time constant expression
°° virtual functions and abstract classes (§ 10.3, 10.4)
°° exception handling (§ 15)
°° the C++ standard library (§ 17 … 30)
°° asm declaration (§ 7.4)
°° no implicit lambda to function pointer conversion (§ 5.1.2)
[[cl::unroll_hint(n)]]
[[cl::unroll_hint]]
Used to specify that a loop (for, while, and do loops) can be
unrolled.
Header
Conversion types [3.2]
Conversions are available for the scalar types bool, char,
uchar, short, ushort, int, uint, long, ulong, half (if cl_khr_fp16
extension is enabled), float, double (if cl_khr_fp64 is enabled),
and derived vector types.
template
T convert_cast(U const& arg);
template
T convert_cast(T const& arg);
// and more...
Rounding modes [3.2.3]
::rte
::rtp
to nearest even
toward + infinity
::rtz toward zero
::rtn toward - infinity
Reinterpreting types [3.3]
Header
Supported data types except bool and void may be
reinterpreted as another data type of the same size using the
as_type function for scalar and vector data types.
template
T as_type(U const& arg);
www.khronos.org/opencl
OpenCL C++ Language
OpenCL Type
bool
char
unsigned char, uchar
short
unsigned short, ushort
int
unsigned int, uint
long
unsigned long, ulong
float
double
half
void
Page 5
Page 6
OpenCL C++ Language
OpenCL 2.2 Reference Guide
Vector Component Addressing [2.1.2.3]
Vector Addressing Equivalences
Vector Components
float4 v;
0
v.x, v.r,
v.s0
v.x, v.r,
v.s0
v.x, v.r,
v.s0
1
v.y, v.g,
v.s1
v.y, v.g,
v.s1
v.y, v.g,
v.s1
float8 v;
v.s0
v.s1
v.s2
float16 v;
v.s0
v.s1
v.s2
float2 v;
float3 v;
2
3
4
5
7
v.s3
v.s4
v.s5
v.s6 v.s7
v.s3
v.s4
v.s5
v.s6 v.s7
8
9
10
11
12
13
14
15
v.z, v.b,
v.s2
v.z, v.b, v.w, v.a,
v.s2
v.s3
Address Spaces Library
Header
Explicit address space storage classes [3.4.2]
global class
Can only be used to declare variables at program scope, with
static specifier, extern specifier, or passed as a kernel argument.
OpenCL C++ Language
6
local class
Can only be used to declare variables at kernel function scope,
program scope, with static keyword, extern specifier, or passed
as a kernel argument.
priv class
Cannot be used to declare variables in the program scope, with
static specifier, or extern specifier.
constant class
Can only be used to declare variables at program scope, with
static specifier, extern specifier, or passed as a kernel argument.
Explicit address space pointer classes [3.4.3]
v.sb, v.sc, v.sd, v.se, v.sf,
v.s8 v.s9 v.sa,
v.sA v.sB v.sC v.sD v.sE v.sF
bool operator!=(nullptr_t, Q_ptr global_ptr & x) noexcept;
bool operator<(const Q_ptr &x, nullptr_t) noexcept;
bool operator<(nullptr_t, const Q_ptr & x) noexcept;
bool operator>(const Q_ptr &x, nullptr_t) noexcept;
bool operator>(nullptr_t, const Q_ptr & x) noexcept;
bool operator<=(const Q_ptr &x, nullptr_t) noexcept;
bool operator<=(nullptr_t, const Q_ptr & x) noexcept;
bool operator>=(const Q_ptr &x, nullptr_t) noexcept;
bool operator>=(nullptr_t, const Q_ptr & x) noexcept;
void swap(Q_ptr& a, Q_ptr& b) noexcept;
Pointer class constructors [3.4.3.5]
Q may be global, local, private, or constant.
Construct an object which
points to nothing
Construct an object which
points to p
constexpr Q_ptr() noexcept;
explicit Q_ptr(pointer p) noexcept;
Q_ptr(const Q_ptr &) noexcept;
Copy constructor
The explicit address space pointer classes can be converted to
and from pointers with compatible address spaces, qualifiers,
Move constructor
Q_ptr(Q_ptr &&r) noexcept;
and types. Local, global, and private pointers can be converted to
an object
standard C++ pointers.
constexpr Q_ptr(nullptr_t) noexcept; Construct
initialized with nullptr
typedef T element_type;
typedef ptrdiff_t difference_type;
Pointer class assignment operators [3.4.3.6]
typedef add_global_t& reference;
Q may be global, local, private, or constant.
typedef const add_global_t& const_reference;
Q_ptr &operator=(const Q_ptr &r) Copy assignment operator
typedef add_global_t* pointer;
noexcept;
typedef const add_global_t* const_pointer;
Q_ptr &operator=(Q_ptr &&r)
Move assignment operator
The following pointer classes are defined in the header file
noexcept;
:
Q_ptr &operator=(pointer r)
Assign r pointer to the
noexcept;
stored pointer
template class global_ptr
Q_ptr &operator=(nullptr_t)
Assign nullptr to the stored
template class local_ptr
noexcept;
pointer
template class private_ptr
template class constant_ptr
Non-member functions [3.4.3.9]
In each of the partial declarations below, the placeholder Q may
be replaced with global, local, private, or constant. The omitted
initial part of each declaration is:
template
bool operator==(const Q_ptr &a, const Q_ptr &b)
noexcept;
bool operator!=(const OP_ptr &a, const Q_ptr &b)
noexcept;
bool operator<(const Q_ptr &a, const Q_ptr &b)
noexcept;
bool operator>(const Q_ptr &a, const Q_ptr &b)
noexcept;
bool operator<=(const Q_ptr &a, const Q_ptr &b)
noexcept;
bool operator>=(const Q_ptr &a, const Q_ptr &b)
noexcept;
In each of the partial declarations below, the omitted initial part
of the declaration is:
template
bool operator==(const Q_ptr &x, nullptr_t) noexcept;
bool operator==(nullptr_t, const Q_ptr &x) noexcept;
bool operator!=(const Q_ptr &x, nullptr_t) noexcept;
Pointer class observers [3.4.3.7]
Q may be global, local, private, or constant.
add_lvalue_reference_t> Return *get()
operator*() const noexcept;
pointer operator->() const noexcept; Return get()
reference operator[](size_t pos)
const noexcept;
Return get()[pos]
pointer get() const noexcept;
Return the stored pointer
explicit operator bool()
const noexcept;
Return get()!=nullptr
Pointer class modifiers [3.4.3.8]
Q may be global, local, private, or constant.
pointer release() noexcept;
Assign nullptr to the stored
pointer, returns the value
that get() had at start
void reset(pointer p = pointer())
noexcept;
void reset(pointer p) noexcept;
Assign p to the stored pointer
void reset(nullptr_t p = nullptr)
noexcept;
Equivalent to reset(pointer())
void swap(Q_ptr& r) noexcept;
Invokes swap on the stored
pointers.
Numeric indices are preceded by the letter s. Swizzling,
duplication, and nesting are allowed, e.g.: v.yx, v.xx, v.lo.x
v.lo
v.hi
v.odd
v.even
v.x, v.s0
v.y, v.s1
v.y, v.s1
v.x, v.s0
float2
v.s01, v.xy v.s23, v.zw v.s13, v.yw v.s02, v.xz
float3 *
v.s01, v.xy v.s23, v.zw v.s13, v.yw v.s02, v.xz
float4
v.s0123
v.s4567
v.s1357
v.s0246
float8
float16 v.s01234567 v.s89abcdef v.s13579bdf v.s02468ace
*When using .lo or .hi with a 3-component vector,
the .w component is undefined.
Q_ptr &operator++() noexcept;
Q_ptr &operator--() noexcept;
Prefix [in/de]crement stored
pointer by one.
Q_ptr operator++(int) noexcept;
Q_ptr operator--(int) noexcept;
Postfix [in/de]crement stored
pointer by one.
Q_ptr &operator+=(
difference_type r) noexcept;
Q_ptr &operator-=(
difference_type r) noexcept;
Adds r to the stored pointer
and returns *this.
Subtracts r to the stored
pointer and returns *this.
Q_ptr operator+(
difference_type r) noexcept;
[Adds/subtracts] r to the
stored pointer and returns
the value *this has at the
start of the operation.
Q_ptr operator-(
difference_type r) noexcept;
Other address space functions [3.4.4]
template
mem_fence get_mem_fence (
T *ptr);
Return the mem_fence value
for ptr.
template
T dynamic_as_cast(U *ptr);
Returns a pointer to a region
in the address space pointer
class specified in T
Atomic Operations Library [3.24]
Header
template struct atomic;
template<> struct atomic;
template struct atomic;
enum memory_order
memory_order_x where x may be relaxed, acquire,
acq_rel, seq_cst, release
enum memory_scope
memory_scope_x where x may be work_item,
work_group, sub_group, all_svm_devices, device
Atomic types [3.24.4]
Combined members from struct atomic, including
specializations for integers (atomic) and pointers
(atomic). For struct atomic, replace T with
integral. For struct atomic, replace T with T*.
The pointer specialization is available if
__INTPTR_WIDTH__== 32, or both extensions
cl_khr_int64_[base, extended]_atomics are enabled and
__INTPTR_WIDTH__== 64.
bool is_lock_free() const [volatile] noexcept;
void store(T, memory_order = memory_order_seq_cst,
memory_scope = memory_scope_device) [volatile] noexcept;
T load(memory_order = memory_order_seq_cst,
memory_scope = memory_scope_device)
const [volatile] noexcept;
operator T() const [volatile] noexcept;
T exchange(T, memory_order = memory_order_seq_cst,
memory_scope = memory_scope_device) [volatile] noexcept;
bool compare_exchange_[weak, strong](T&, T, memory_order,
memory_order, memory_scope) [volatile] noexcept;
bool compare_exchange_[weak, strong](T&, T, memory_order
= memory_order_seq_cst,
memory_scope = memory_scope_device) [volatile] noexcept;
atomic() noexcept = default;
constexpr atomic(T) noexcept;
T operator=(T) [volatile] noexcept;
(Continued on next page >)
©2017 Khronos Group - Rev. 0817
www.khronos.org/opencl
OpenCL 2.2 Reference Guide
Atomic Operations Library (continued)
Members available in specializations atomic and
atomic. For struct atomic, replace T with integral,
and for struct atomic, replace T with T*. op may be one of
add, sub, and, or, xor, min, or max.
T fetch_op(T, memory_order = memory_order_seq_cst,
memory_scope = memory_scope_device) [volatile] noexcept;
Ti operator[++, --]([int]) [volatile] noexcept;
Ti operator[+, -, &, |, ^]=(Ti) [volatile] noexcept;
Atomic types
Pointer specializations indicated with a dot are available when
these extensions are enabled:
cl_khr_fp64
both cl_khr_int64_[base, extended]_atomics
••
using atomic_[u]int = atomic<[u]int>;
using atomic_float = atomic;
using atomic_[u]long = atomic<[u]long>;
&& using atomic_double = atomic;
OpenCL C++ Language
Available if __INTPTR_WIDTH__== 32, or both extensions
cl_khr_int64_[base, extended]_atomics are enabled and
__INTPTR_WIDTH__== 64.
using atomic_intptr_t = atomic;
using atomic_uintptr_t = atomic;
Available if __SIZE_WIDTH__== 32, or both extensions
cl_khr_int64_[base, extended]_atomics are enabled and
__SIZE_WIDTH__== 64:
using atomic_size_t = atomic;
Available if __PTRDIFF_WIDTH__ == 32, or both extensions
cl_khr_int64_[base, extended]_atomics are enabled and
__PTRDIFF_WIDTH__ == 64:
using atomic_ptrdiff_t = atomic;
Members of struct atomic_flag:
atomic_flag() noexcept = default;
bool test_and_set(memory_order = memory_order_seq_cst,
memory_scope = memory_scope_device) [volatile] noexcept;
void clear(memory_order = memory_order_seq_cst,
memory_scope = memory_scope_device) [volatile] noexcept;
Images and Samplers Library [3.11]
Image dimension [3.11.5]
Header
struct sampler;
template constexpr sampler make_sampler();
template struct image;
Image types [3.11.2]
T is the type of value returned when reading or sampling from
given image, or the type of color used to write to image.
using image1d = image;
using image1d_buffer = image;
using image1d_array = image;
using image2d = image;
using image2d_depth = image;
using image2d_array = image;
using image2d_array_depth = image;
using image3d = image;
The extensions cl_khr_gl_msaa_sharing and
cl_khr_gl_depth_images add the following functions.
using image2d_ms = image;
using image2d_array_ms = image;
using image2d_depth_ms = image;
using image2d_array_depth_ms = image;
Image element types [3.11.4]
In OpenCL terminology, images are classified as depth images,
which have the Depth template parameter set to true, or
normal images, which have the Depth template parameter set
to false. Half types are only available if cl_khr_fp16 extension
is enabled.
Non-multisample depth image types: float, half
depth images For multi-sample 2D and multi-sample 2D
array images, only valid type: float
Valid types: float4, int4, uint4, and half4
normal images For multi-sample 2D and multi-sample 2D
array images, only valid types: float4, int4
and uint4
template
struct image_dim_num;
enum image_dim
image_1d, image_2d, image_3d, image_buffer
Members of class image
Members indicated with a dot are available when these
extensions are enabled:
• cl_khr_mipmap_image[_writes]
• cl_khr_gl_msaa_sharing and cl_khr_gl_depth_images
For images specified with image_dim::image1d and
image_dim::buffer
int width() const noexcept;
int width(float lod) const noexcept;
•
For images specified with image_dim::image2d
int [width, height]() const noexcept;
int [width, height](float lod) const noexcept;
int num_samples() const noexcept;
••
For images specified with image_dim::image3d
int [width, height, depth]() const noexcept;
int [width, height, depth](float lod) const noexcept;
•
For arrayed images
int array_size() const noexcept;
int array_size(int lod) const noexcept;
•
Image access [3.11.6]
enum image_access
sample, read, write, read_write
Members of class image
The non-multisample image template class specializations
present different sets of methods based on their access
parameter. Members indicated with a dot are available when
these extensions are enabled:
• cl_khr_mipmap_image
• cl_khr_mipmap_image_writes
• cl_ khr_gl_msaa_sharing and cl_khr_gl_depth_images
For images specified with image_access::read
element_type image::read(integer_coord coord) const noexcept;
pixel image::operator[](integer_coord coord) const noexcept;
element_type image::pixel::operator element_type()
const noexcept;
element_type image::read(integer_coord coord, int sample)
noexcept;
•
Non-member functions:
bool atomic_flag_test_and_set([volatile]atomic_flag*) noexcept;
bool atomic_flag_test_and_set_explicit([volatile]atomic_flag*,
memory_order, memory_scope) noexcept;
void atomic_flag_clear([volatile]atomic_flag*) noexcept;
void atomic_flag_clear_explicit([volatile]atomic_flag*,
memory_order, memory_scope) noexcept;
Fences [3.24.6]
void atomic_fence(mem_fence flags,
memory_order order, memory_scope scope) noexcept;
flags: mem_fence::global, mem_fence::local,
mem_fence::image or a combination of these values
ORed together
scope: memory_scope_x where x may be all_svm_devices,
device, work_group, sub_group, work_item
For images specified with image_access::write
void image::write(integer_coord coord, element_type color)
noexcept;
image::pixel image::operator[](integer_coord coord) noexcept;
image::pixel & image::pixel::operator=(element_type color)
noexcept;
void image::write(integer_coord coord, element_type color,
int lod) noexcept;
•
For images specified with image_access::read_write
element_type image::read(integer_coord coord) const noexcept;
void image::write(integer_coord coord, element_type color)
noexcept;
image::pixel image::operator[](integer_coord coord) noexcept;
element_type image::pixel::operator element_type(
) const noexcept;
image::pixel & image::pixel::operator=(element_type color)
noexcept;
For images specified with image_access::sample
element_type image::read(integer_coord coord) const noexcept;
element_type image::sample(const sampler &s,
integer_coord coord) const noexcept;
element_type image::sample(const sampler &s,
float_coord coord) const noexcept;
image::pixel image::operator[](integer_coord coord)
const noexcept;
element_type image::pixel::operator element_type(
) const noexcept;
element_type image::sample(const sampler &s,
float_coord coord, float lod) const noexcept;
element_type image::sample(const sampler &s,
integer_coord coord, gradient_coord gradient_x,
gradient_coord gradient_y) const noexcept;
•
•
Common image methods [3.11.7]
Each image type implements this set of common members.
Member indicated with a dot is available when these extensions
are enabled:
• cl_khr_mipmap_image[_writes]
image_channel_type image::data_type() const noexcept;
image_channel_order image::order() const noexcept;
int image::miplevels() const noexcept;
•
enum image_channel_type
snorm_int8, snorm_int16, unorm_int8,
unorm_int16, unorm_int24,
unorm_short_565, unorm_short_555,
unorm_short_101010, unorm_short_101010_2, sint8,
sint16, sint32, uint8, uint16, uint32,
float16, float32
enum image_channel_order
a, r, rx, rg, rgx, ra, rgb, rgbx, rgba, argb, bgra, intensity,
luminance, abgr, srgb, srgbx, srgba, sbgra,
depth, depth_stencil
(Continued on next page >)
©2017 Khronos Group - Rev. 0817
www.khronos.org/opencl
OpenCL C++ Language
•
• •
Page 7
Page 8
OpenCL C++ Language
Images and Samplers Library (continued)
Other image methods [3.11.8]
•
Members indicated with a dot are available when these
extensions are enabled:
cl_khr_mipmap_image
•cl_khr_mipmap_image
cl_khr_mipmap_image_writes
•cl_ khr_gl_msaa_sharingor and
cl_khr_gl_depth_images
•
element_type image::sample(const sampler &s,
float_coord coord) const noexcept;
element_type image::sample(const sampler &s,
integer_coord coord) const noexcept;
element_type image::sample(const sampler &s,
float_coord coord, float lod) const noexcept;
element_type image::sample(const sampler &s,
float_coord coord, gradient_coord gradient_x,
gradient_coord gradient_y) const noexcept;
element_type image::read(integer_coord coord) const noexcept;
void image::read(integer_coord coord, int sample) noexcept;
OpenCL C++ Language
pixel operator[](integer_coord coord) noexcept;
pixel operator[](integer_coord coord) const noexcept;
element_type pixel::operator element_type() const noexcept;
pixel & pixel::operator=(element_type color) noexcept;
int width() const noexcept;
int width(int lod) const noexcept;
•
•
•
void image::write(integer_coord coord, element_type color)
noexcept;
void image::write(integer_coord coord, element_type color,
int lod) noexcept;
•
int height() const noexcept;
• int height(int lod) const noexcept;
int depth() const noexcept;
• int depth(int lod) const noexcept;
enum class pipe_access { read, write };
template struct pipe;
template pipe
class pipe methods [3.8.4]
read
write
bool write(const T& ref) noexcept;
read
reservation
reserve(uint num_packets) const noexcept;
reservation
reserve(uint num_packets) noexcept;
reservation
work_group_reserve(uint num_packets)
const noexcept;
reservation
work_group_reserve(uint num_packets)
noexcept;
reservation
sub_group_reserve(uint num_packets)
const noexcept;
reservation
sub_group_reserve(uint num_packets)
noexcept;
write
read
write
read
write
read, write uint num_packets() const noexcept;
read, write uint max_packets() const noexcept;
integer_coord size() const noexcept;
int miplevels() const noexcept;
int num_samples() const noexcept;
•
•
Sampler [3.11.9]
Acquire a sampler inside of a kernel by passing it as a kernel
parameter from host using clSetKernelArg, or creating it using
the make_sampler function in the kernel code.
mirrored_repeat, repeat, clamp_to_edge, clamp, none
enum normalized_coordinates
normalized, unnormalized
enum normalized_coordinates
nearest, linear
•
Pipes Library
Member function
bool read(T& ref) const noexcept;
image_channel_type image::data_type() const noexcept;
image_channel_order image::order() const noexcept;
enum addressing_mode
int array_size() const noexcept;
int array_size(int lod) const noexcept;
Header
Use pipe and pipe_storage template classes as a communication channel between kernels.
When
pipe_access is:
OpenCL 2.2 Reference Guide
Description
Read packet from pipe into ref.
Write packet specified by ref to
pipe.
Reserve num_packets entries for
reading/writing from/to pipe.
When
pipe_access is:
read
write
Member function
Description
packet from the reserved area
bool pipe::reservation::read(uint index, T& ref) Read
of the pipe referred to by index
const noexcept;
into ref.
Write packet specified by ref to the
bool pipe::reservation::write(uint index,
reserved area of the pipe referred
const T& ref) noexcept;
to by index.
void pipe::reservation::commit()
const noexcept;
write
bool pipe::reservation::commit() noexcept;
read
bool pipe::reservation::is_valid();
bool pipe::reservation::is_valid()
write
const noexcept;
explicit
pipe::reservation::operator bool()
read, write
const noexcept;
read
Non-member functions
template
pipe make_pipe(const pipe_storage
& ps);
Returns current number of packets
that have been written to but not
yet been read from the pipe.
Returns max. number of packets
specified when pipe was created.
Device Enqueue Library [3.13]
Constructs a read only or write only pipe
from pipe_storage object.
N in the following declaration specifies the maximum number of packets which can be held by an
object.
template struct pipe_storage;
Members of struct pipe_storage:
pipe_storage();
pipe_storage(const pipe_storage&) = default;
Constructs a read only or write only pipe
pipe_storage(pipe_storage&&) = default;
from pipe_storage object.
template
pipe get() noexcept;
enum event_status
template
enqueue_status enqueue_kernel(enqueue_policy policy,
uint num_events_in_wait_list, const event *event_wait_list,
event *event_ret, const ndrange &ndrange, Fun fun,
Args... args) noexcept;
device_queue(const device_queue&) = default;
device_queue(device_queue&&) = default;
enum enqueue_status
Members of struct event [3.13.4]
enum event_profiling_info
bool is_valid() const noexcept;
explicit operator bool() const noexcept;
void retain() noexcept;
void release() noexcept;
void set_status(event_status status) noexcept;
void profiling_info(event_profiling_info name,
global_ptr value) noexcept;
enum enqueue_policy
no_wait, wait_kernel, wait_work_group
submitted, complete, error
success, failure, invalid_queue, invalid_ndrange, invalid_event_wait_list, queue_full,
invalid_arg_size, event_allocation_failure, out_of_resources
exec_time
Members of struct device_queue [3.13.3]
struct device_queue: marker_type;
©2017 Khronos Group - Rev. 0817
Return true if reservation is a valid
reservation ID.
pipe_storage class [3.8.5]
Header
Allows a kernel to independently enqueue the same device, without host interaction.
enqueue_status
enqueue_marker(uint num_events_in_wait_list,
const event *event_wait_list, event *event_ret) noexcept;
template
enqueue_status enqueue_kernel(enqueue_policy policy,
const ndrange &ndrange, Fun fun, Args... args) noexcept;
Indicates that all reads/writes
to num_packets associated with
reservation are completed.
Enqueues a marker to device queue
after a list of events specified by
event_wait_list completes.
Enqueue functor or lambda fun on
the device with specified policy over
the specified ndrange.
Enqueues functor or lambda fun in
the same way as the overload above
with the exception for the passed
event list.
Constructors
struct event;
Returns true if event object is a valid event.
Returns true if event object is a valid event.
Increments the event reference count.
Decrements the event reference count.
Sets the execution status of a user event.
Captures the profiling information for
functions that are enqueued as commands.
(Continued on next page >)
www.khronos.org/opencl
OpenCL 2.2 Reference Guide
OpenCL C++ Language
Device Enqueue Library (continued)
Work-Item Functions [3.14]
Non-member functions [3.13.5]
device_queue get_default_device_queue();
event make_user_event();
template uint
get_kernel_work_group_size(
Fun fun, Args... args);
template uint
get_kernel_preferred_work_group_size_multiple(Fun fun, Args... args);
template uint
get_kernel_sub_group_count_for_ndrange(
const ndrange & ndrange, Fun fun, Args... args);
template uint
get_kernel_max_sub_group_size_for_ndrange(
const ndrange & ndrange, Fun fun, Args... args);
Returns the default device queue.
Creates, returns, and sets the execution status
of the user event to event_status::submitted.
Provides a mechanism to query the maximum
work-group size that can be used to execute
a functor
Number of dimensions in use
size_t get_global_size(uint dimindx);
Number of global work-items
size_t get_global_id(uint dimindx);
Global work-item ID value
size_t get_local_size(uint dimindx);
Number of local work-items if kernel executed with
uniform work-group size
Returns the number of sub-groups in each
work-group of the dispatch
size_t get_enqueued_local_size(uint dimindx);
Number of local work-items
size_t get_local_id(uint dimindx);
Local work-item ID
Returns the maximum sub-group size for a
functor.
size_t get_num_groups(uint dimindx);
Number of work-groups
size_t get_group_id(uint dimindx);
Work-group ID
size_t get_global_offset(uint dimindx);
Global offset
size_t get_global_linear_id();
Work-items 1-dimensional global ID
size_t get_local_linear_id();
Work-items 1-dimensional local ID
size_t get_sub_group_size();
Number of work-items in the subgroup
size_t get_max_sub_group_size();
Maximum size of a subgroup
size_t get_num_sub_groups();
Number of subgroups
size_t get_enqueued_num_sub_groups();
If kernel executed with a uniform work-group size,
results are same as for get_num_sub_groups.
size_t get_sub_group_id();
Sub-group ID
size_t get_sub_group_local_id();
Unique work-item ID
Returns the preferred multiple of work-group
size for launch.
Enqueue enums [3.13.7-11]
enum enqueue_policy
no_wait, wait_kernel, wait_work_group
enum enqueue_status
success, failure, invalid_queue, invalid_ndrange, invalid_event_wait_list, queue_full,
invalid_arg_size, event_allocation_failure, out_of_resources
enum event_status
submitted, complete, error
enum event_profiling_info
exec_time
Workgroup Functions [3.15]
Header
Logical operations [3.15.2]
Synchronization Functions [3.16]
bool work_group_all (bool predicate)
bool work_group_any (bool predicate)
Header
struct work_group_named_barrier: marker_type;
Barriers [3.16.2]
Work-items in a work-group must
execute this before any can continue
void sub_group_barrier(mem_fence flags, memory_scope Work-items in a sub-group must
scope = memory_scope_work_group);
execute this before any can continue
flags: mem_fence::global, mem_fence::local, mem_fence::image or a combination of
these values ORed together
scope: memory_scope_x where x may be all_svm_devices, device, work_group,
sub_group, work_item
void work_group_barrier(mem_fence flags,
memory_scope scope = memory_scope_work_group);
Named barriers [3.16.3]
Members from struct work_group_named_barrier. work_group_named_barrier requires
the cl_khr_sub_group_named_barrier extension be enabled.
bool sub_group_all (bool predicate)
bool sub_group_any (bool predicate)
Broadcast functions [3.15.3]
T is type int, uint, long, ulong, or float, double (if cl_khr_fp64 is enabled) or half (if cl_khr_fp16 is
enabled).
T work_group_broadcast(T a, size_t local_id);
T work_group_broadcast(T a, size_t local_id_x, size_t local_id_y);
T work_group_broadcast(T a, size_t local_id_x, size_t local_id_y,
size_t local_id_z);
Broadcast the value of a for
work-item identified by local_id
to all work-items in the workgroup.
T sub_group_broadcast(T x, size_t sub_group_local_id);
Broadcast the value of x
for work-item returned by
get_sub_group_local_id to all
work-items in the sub-group.
work_group_named_barrier(uint sub_group_count);
work_group_named_barrier(
const work_group_named_barrier&) = default;
work_group_named_barrier(
work_group_named_barrier&&) = default;
wait(mem_fence flags,
memory_scope scope = memory_scope_work_group)
const noexcept;
©2017 Khronos Group - Rev. 0817
Initialize a new named barrier object
to synchronize sub_group_count subgroups in the current work-group.
All work-items in a sub-group
executing the kernel on a processor
must execute this method before any
are allowed to continue.
Evaluates predicate for all work-items in the workgroup and returns true if predicate evaluates to
true for all/any work-items in the work-group.
Evaluates predicate for all work-items in the subgroup and returns a non-zero value if predicate
evaluates to non-zero for all/any work-items in the
sub-group.
Numeric operations [3.15.4]
enum work_group_op
add, min, max
T is type int, uint, long, ulong, or float, double (if cl_khr_fp64 is enabled) or half (if cl_khr_fp16 is
enabled).
T work_group_reduce(T x);
Return result of reduction operation for all
values of x specified by work-items in a work-group.
template
T work_group_scan_[ex, in]clusive(T x);
Perform an exclusive/inclusive scan operation
of all values specified by work-items in the
work-group.
template
T sub_group_reduce(T x);
Return result of reduction operation for all
values of x specified by work-items in a sub-group.
template
T sub_group_scan_[ex, in]clusive(T x);
Perform an exclusive/inclusive scan operation
of all values specified by work-items in a sub-group.
The scan results are returned for each work-item.
template
www.khronos.org/opencl
OpenCL C++ Language
template
uint get_kernel_max_num_sub_groups(
Fun fun, Args... args);
Header
Query the number of dimensions, global, and local work size specified to
clEnqueueNDRangeKernel, and global and local identifier of each work-item when this kernel is
executed on a device.
uint get_work_dim();
Returns a valid local size that would produce
the requested number of sub-groups such that
each sub-group is complete with no partial
sub-groups
Provides a mechanism to query the maximum
number of sub-groups that can be used to
execute the passed functor on the current
device.
template uint
get_kernel_local_size_for_sub_group_count(
uint num_sub_groups, Fun fun, Args... args);
Page 9
Page 10
OpenCL C++ Language
OpenCL C++ Language
Math Functions [3.19]
Header
Vector versions of the math functions operate component-wise.
The description is per-component. T is halfn (if cl_khr_fp16 is
enabled), floatn, or doublen (if cl_khr_fp64 is enabled), where n
is 2, 3, 4, 8, or 16. Tf may only be floatn. All angles are in radians.
T rsqrt (T)
Tf native_math::rsqrt(Tf x);
Tf half_math::rsqrt(Tf x);
T sqrt (T)
Tf native_math::sqrt(Tf x);
Tf half_math::sqrt(Tf x);
Trigonometric functions [3.19.2]
Logarithmic functions [3.19.4]
T acos (T)
Arc cosine
intn ilogb (T x)
T acosh (T)
T acospi (T x)
T asin (T)
T asinh (T)
T asinpi (T x)
T atan (T y_over_x)
T atan2 (T y, T x)
T atanh (T)
T atanpi (T x)
T atan2pi (T x, T y)
T cos (T x)
Tf native_math::cos(Tf x);
Tf half_math::cos(Tf x);
T cosh (T x)
T cospi (T x)
T sin (T x)
Tf native_math::sin(Tf x);
Tf half_math::sin(Tf x);
Inverse hyperbolic cosine
Compute acos (x) / π
Arc sine
Inverse hyperbolic sine
Compute asin (x) / π
Arc tangent
Arc tangent of y / x
Hyperbolic arc tangent
Compute atan (x) / π
Compute atan2 (y, x) / π
Return exponent as an integer
value
T lgamma (T x)
T lgamma_r (T x, intn *signp)
T log (T)
Tf native_math::log( Tf x);
Tf half_math::log( Tf x);
T log2 (T)
Tf native_math::log2(Tf x);
Tf half_math::log2(Tf x);
T log10 (T)
Tf native_math::log10(Tf x);
Tf half_math::log10(Tf x);
T log1p (T x)
Log gamma function
T logb (T x)
Exponent of x
T sincos (T x, T *cosval)
Sine and cosine of x
T sinh (T x)
T sinpi (T x)
T tan (T x)
Tf native_math::tan(Tf x);
Tf half_math::tan(Tf x);
T tanh (T x)
T tanpi (T x)
Hyperbolic sine
sin (π x)
Cosine, x is an angle
Hyperbolic cosine
Compute cos (π x)
Sine, x is an angle
Tangent
Hyperbolic tangent
tan (π x)
Power functions [3.19.3]
T cbrt (T)
T pow (T x, T y)
Cube root
Compute x to the power of y
Compute x y, where y is an
integer
floatn pown (T x, intn y)
T powr (T x, T y)
Tf native_math::powr(Tf x,
Compute x y, where x is >= 0
Tf y);
Tf half_math::powr(Tf x, Tf y);
Tf rootn (T x, intn y)
Compute x to the power of 1/y
Integer Built-in Functions [3.20]
Header
T is type char, charn, uchar, ucharn, short, shortn, ushort,
ushortn, int, intn, uint, uintn, long, longn, ulong, or ulongn,
where n is 2, 3, 4, 8, or 16. Tu is the unsigned version of T. Tsc is
the scalar version of T.
bitwise functions [3.20.2]
Inverse square root
Square root
T ldexp (T x, intn k)
floatn nan (uintn nancode)
doublen nan
(ulongn nancode)
Quiet NaN
halfn nan
(ushortn nancode)
T remainder (T x, T y)
Floating point remainder
T remquo (T x, T y,
intn *quo)
Remainder and quotient
T rint (T x)
Round to nearest even integer
Base 2 logarithm
T round (T x)
Integral value nearest to x
rounding
Base 10 logarithm
T trunc (T x)
Return integral value nearest to
x rounding halfway cases away
from zero.
Compute loge (1.0 + x)
Comparison functions [3.19.7]
Natural logarithm
Exponential base-e exp. of x
Exponential base 2
Exponential base 10
Compute ex -1.0
x * 2k
Floating point functions [3.19.6]
T ceil (T)
T copysign (T x, T y)
Round to integer toward + infinity
x with sign changed to sign of y
T floor (T)
Round to integer toward infinity
T fma (T a, T b, T c)
Multiply and add, then round
T fmod (T x, T y)
Modulus. Returns x – y * trunc
(x/y)
T fdim (T x, T y)
Positive difference between x and y
T fmax (T x, T y)
Return y if x < y, else returns x
T fmin (T x, T y)
Return y if y < x, else returns x
T fmod (T x, T y)
Modulus. Returns x – y * trunc
(x/y)
T maxmag (T x, T y)
Maximum magnitude of x and y
T minmag (T x, T y)
Minimum magnitude of x and y
Other functions [3.19.8]
Tf native_math::divide(
Tf x, Tf y);
Tf half_math::divide(
Tf x, Tf y);
Compute x / y
T erfc (T)
Complementary error function.
T erf (T x)
Calculates error function of T
T fabs (T x)
Absolute value
T hypot (T x, T y)
Square root of x2 + y2
T mad (T a, T b, T c)
Approximates a * b + c
T fract (T x, T *iptr)
Fractional value in x
Tf native_math::recip(Tf x); Reciprocal
Tf half_math::recip(Tf x);
T frexp (T x, intn *exp)
Extract mantissa and exponent
T tgamma (T x)
numeric functions [3.20.3]
Tu abs (T x)
|x|
Tu abs_diff (T x, T y)
T add_sat (T x, T y)
T hadd (T x, T y)
T rhadd (T x, T y)
| x – y | without modulo overflow
x + y and saturates the result
(x + y) >> 1 without mod. overflow
(x + y + 1) >> 1
Number of leading 0-bits in x
T clamp (T x, T min, T max)
T clamp (T x, Tsc min, Tsc max) min(max(x, minval), maxval)
T ctz (T x)
Number of trailing 0-bits in x
T mad_hi (T a, T b, T c)
mul_hi(a, b) + c
T mad_sat (T a, T b, T c)
T max (T x, T y)
T max (T x, Tsc y)
T min (T x, T y)
T min (T x, Tsc y)
T mul_hi (T x, T y)
T sub_sat (T x, T y)
a * b + c and saturates the result
©2017 Khronos Group - Rev. 0817
Decompose floating-point number
Next representable floating-point
value after x in the direction of y
T clz (T x)
T popcount (T x)
Number of non-zero bits in x
T rotate (T v, T i)
result[indx] = v[indx] << i[indx]
For upsample, return type is scalar when the parameters are scalar.
short[n] upsample (
result[i]= (([u]short)hi[i]<< 8)|lo[i]
char[n] hi, uchar[n] lo)
ushort[n] upsample (
result[i]=((ushort)hi[i]<< 8)|lo[i]
uchar[n] hi, uchar[n] lo)
int[n] upsample (
result[i]=((int)hi[i]<< 16)|lo[i]
short[n] hi, ushort[n] lo)
uint[n] upsample (
result[i]=((uint)hi[i]<< 16)|lo[i]
ushort[n] hi, ushort[n] lo)
long[n] upsample (
result[i]=((long)hi[i]<< 32)|lo[i]
int[n] hi, uint[n] lo)
ulong[n] upsample (
result[i]=((ulong)hi[i]<< 32)|lo[i]
uint[n] hi, uint[n] lo)
T modf (T x, T *iptr)
T nextafter (T x, T y)
Exponential functions [3.19.5]
T exp (T x)
Tf native_math::exp(Tf x);
Tf half_math::exp(Tf x);
T exp2 (T)
Tf native_math::exp2(Tf x);
Tf half_math::exp2(Tf x);
T exp10 (T x)
Tf native_math::exp10(
Tf x);
Tf half_math::exp10(Tf x);
T expm1 (T x)
OpenCL 2.2 Reference Guide
y if x < y, otherwise it returns x
Gamma function
Common Functions [3.17]
Header
These functions are implemented using the round to nearest
even rounding mode. Vector versions operate component-wise.
Ts is type float, optionally double (if cl_khr_fp64 is enabled), or
half if cl_khr_fp16 is enabled. Tn is the vector form of Ts, where
n is 2, 3, 4, 8, or 16. T is Ts and Tn.
T clamp (T x, T min, T max)
Clamp x to range given by
min, max
T degrees (T radians)
radians to degrees
T max (T x, T y)
Max of x and y
T min (T x, T y)
Min of x and y
y if y < x, otherwise it returns x
T mix (T x, T y, T a)
Linear blend of x and y
High half of the product of x and y
x - y and saturates the result
T radians (T degrees)
degrees to radians
T step (T edge, T x)
0.0 if x < edge, else 1.0
24-bit operations [3.20.4]
The following fast integer functions optimize the performance
of kernels. In these functions, T is type int, uint, intn or uintn,
where n is 2, 3, 4, 8, or 16.
intn mad24 (T x, T y, T z)
Multiply 24-bit integer values x, y, add
32-bit int. result to 32-bit integer z
T mul24 (T x, T y)
Multiply 24-bit integer values x and y
T smoothstep (T edge0, T edge1, T x) Step and interpolate
T sign (T x)
Sign of x
www.khronos.org/opencl
OpenCL 2.2 Reference Guide
Geometric Functions [3.18]
OpenCL C++ Language
Page 11
Relational Built-in Functions [3.21]
Header
Header
These functions use the round to nearest even rounding mode.
Vector versions operate component-wise. Ts is scalar type float,
double if cl_khr_fp64 is enabled, or half if cl_khr_fp16 is enabled.
Tn is the vector form of Ts with 2, 3, or 4 components.
These functions can be used with built-in scalar or vector types
as arguments and return a scalar or vector integer result. T
is type float, floatn, char, charn, uchar, ucharn, short, shortn,
ushort, ushortn, int, intn, uint, uintn, long, longn, ulong, ulongn,
or optionally double or doublen (if cl_khr_fp64 is enabled) or
half or halfn (if cl_khr_fp16 is enabled). n is 2, 3, 4, 8, or 16.
booln isinf(floatn x, floatn y);
Test for + or – infinity
booln isnan(floatn x, floatn y);
Test for a NaN
booln isnormal(floatn x, floatn y);
Test for a normal value
Cross product
booln isequal(floatn x, floatn y);
booln isequal(halfn x, halfn y);
Compare of x == y
booln isequal(doublen x, doublen y);
booln signbit(floatn x, floatn y);
Test for sign bit
Ts distance (T p0, T p1)
Vector distance
booln isnotequal(floatn x, floatn y);
Compare of x != y
bool any(booln t);
1 if MSB in component of
x is set; else 0
Ts dot (T p0, T p1)
Dot product
booln isgreater(floatn x, floatn y);
Compare of x > y
Ts length (T p)
Vector length
booln isgreaterequal(floatn x,
floatn y);
Compare of x >= y
bool all(booln t);
1 if MSB in all components
of x are set; else 0
T normalize (T p)
Normal vector
length 1
booln isless(floatn x, floatn y);
Compare of x < y
T bitselect(T a, T b, T c);
booln islessequal(floatn x, floatn y);
Compare of x <= y
Each bit of result is
corresponding bit of a if
corresponding bit of c is 0
booln islessgreater(floatn x, floatn y);
Compare of
(x < y) || (x > y)
booln isordered(floatn x, floatn y);
Test if arguments are
ordered
T select(T a, T b, booln c);
booln isunordered(floatn x, floatn y);
Test if arguments are
unordered
For each component of a
vector type,
result[i] = if MSB of c[i] is
set ? b[i] : a[i] For scalar
type, result = c ? b : a
booln isfinite(floatn x, floatn y);
Test for finite value
Vector Data Load/Store [3.22]
Header
T is type char, uchar, short, ushort, int, uint, long, ulong, or
float, optionally double (if cl_khr_fp64 is enabled), or half (if
cl_khr_fp16 is enabled). Tn refers to the vector form of type T,
where n is 2, 3, 4, 8, or 16.
template
make_vector_t
vload(size_t offset, const T* p);
template
make_vector_t
vload(size_t offset,
const constant_ptr p);
Read vector data
from address
(p + (offset * n))
template
make_vector_t
vload_half(size_t offset, const half* p);
template
make_vector_t
vload_half(size_toffset,
const constant_ptr p);
Read a halfn from
address
(p + (offset * n))
template
make_vector
vloada_half(size_t offset, const half* p);
template
make_vector
vloada_half(size_t offset,
const constant_ptr p);
Read half vector
from (p + (offset *
n)). For half3, read
from (p + (offset
* 4)).
template
void vstore(T data, size_t offset,
vector_element_t* p);
Write vector data to
address (p + (offset
* n)
template
void vstore_half(T data, size_t offset,
half* p);
Write a half to
address
(p + offset)
template
void vstorea_half(T data, size_t offset,
half* p);
Write a half vector
to address (p +
(offset * n))
printf Function [3.23]
When the event associated with a particular kernel invocation
completes, the output of applicable printf calls is flushed to the
implementation-defined output stream.
printf format string
%[flags][width][.precision][vector][length] conversion
Floating point limits
enum float_round_style
[const_]iterator begin() [const] noexcept;
[const_]iterator end() [const] noexcept;
[const_]reverse_iterator rbegin() [const] noexcept;
[const_]reverse_iterator rend() [const] noexcept;
const_iterator cbegin() const noexcept;
const_iterator cend() const noexcept;
const_reverse_iterator crbegin() const noexcept;
const_reverse_iterator crend() const noexcept;
constexpr size_type size() const noexcept;
constexpr size_type max_size() const noexcept;
constexpr bool empty() const noexcept;
The above two printf calls print the following:
f4 = 1.00,2.00,3.00,4.00
uc = 0xfa,0xfb,0xfc,0xfd
Limits [3.26]
Iterators from struct array
Array Library [3.25]
float4 f = float4(1.0f, 2.0f, 3.0f, 4.0f);
uchar4 uc = uchar4(0xFA, 0xFB, 0xFC, 0xFD);
printf(“f4 = %2.2v4hlf\n”, f);
printf(“uc = %#v4hhx\n”, uc);
Header
Half is available if cl_khr_fp16 is enabled, and double is available if cl_khr_fp64 is enabled.
Header
template struct array;
©2017 Khronos Group - Rev. 0817
The following examples show the use of the vector specifier in
the printf format string.
printf output synchronization
OpenCL C++ Macros
(x is HALF, FLT, DBL)
x_DIG
x_MANT_DIG
x_MAX_10_EXP +4
x_MAX_EXP
x_MIN_10_EX
x_MIN_EXP
x_RADIX
x_MAX
x_MIN
x_EPSILON
Capacities from struct array
Examples:
Header
Writes output to an implementation-defined stream.
int printf (constant char * restrict format, …)
HALF
3
11
+4
+16
-4
-13
2
0x1.ffcp15h
0x1.0p-14h
0x1.0p-10h
FLT
6
24
+38
+128
-37
-125
2
0x1.fffffep127f
0x1.0p-126f
0x1.0p-23f
DBL
15
53
+308
+1024
-307
-1021
2
0x1.fffffffffffffp1023
0x1.0p-1022
0x1.0p-52
Application Macro
(x is HALF, FLT, DBL)
CL_x_DIG
CL_x_MANT_DIG
CL_x_MAX_10_EXP
CL_x_MAX_EXP
CL_x_MIN_10_EXP
CL_x_MIN_EXP
CL_x_RADIX
CL_x_MAX
CL_x_MIN
CL_x_EPSILON
round_indeterminate, round_toward_zero, round_to_nearest, round_toward_infinity, round_toward_neg_infinity
enum float_denorm_style
denorm_indeterminate, denorm_absent, denorm_present
Integer limits
#define CHAR_BIT 8
#define CHAR_MAX SCHAR_MAX
#define CHAR_MIN SCHAR_MIN
#define INT_MAX 2147483647
#define INT_MIN (-2147483647 – 1)
#define LONG_MAX 0x7fffffffffffffffL
#define LONG_MIN (-0x7fffffffffffffffL – 1)
#define SCHAR_MAX 127
#define SCHAR_MIN (-127 – 1)
#define SHRT_MAX 32767
#define SHRT_MIN (-32767 – 1)
#define UCHAR_MAX 255
#define USHRT_MAX 65535
#define UINT_MAX 0xffffffff
#define ULONG_MAX 0xffffffffffffffffUL
(Continued on next page >)
www.khronos.org/opencl
OpenCL C++ Language
float{3,4} cross (float{3,4} p0, float{3,4} p1)
double{3,4} cross (double{3,4} p0,
double{3,4} p1)
half{3,4} cross (half{3,4} p0, half{3,4} p1)
Page 12
OpenCL C++ Language
Limits (continued)
Math Constants [3.27]
Class numeric limits [3.26.2]
Header
The values of the following symbolic constants are singleprecision float.
bool is_specialized = false;
T min() noexcept { return T(); }
T max() noexcept { return T(); }
T lowest() noexcept { return T(); }
int digits = 0;
int digits10 = 0;
int max_digits10 = 0;
bool is_signed = false;
bool is_integer = false;
bool is_exact = false;
int radix = 0;
T epsilon() noexcept {return T()};
T round_error() noexcept {
return T(); }
int min_exponent = 0;
int min_exponent10 = 0;
int max_exponent = 0;
int max_exponent10 = 0;
bool has_infinity = false;
bool has_quiet_NaN = false;
MAXFLOAT
template class numeric_limits;
All the members below are declared as static constexpr.
OpenCL C++ Language
OpenCL 2.2 Reference Guide
bool has_signaling_NaN = false;
float_denorm_style has_denorm
= denorm_absent;
bool has_denorm_loss = false;
T infinity() noexcept {
return T(); }
T quiet_NaN() noexcept {
return T(); }
T signaling_NaN() noexcept {
return T(); }
T denorm_min() noexcept {
return T(); }
bool is_iec559 = false;
bool is_bounded = false;
bool is_modulo = false;
bool traps = false;
bool tinyness_before = false;
float_round_style round_style =
round_toward_zero;
bool is_scalar = false;
bool is_vector = false;
Non-members
template class numeric_limits;
template class numeric_limits;
template class numeric_limits<
const volatile T>;
Type Traits Library [3.29]
Header
template class tuple;
Primary type categories
is_void
is_null_pointer
is_integral
is_floating_point
is_array
is_pointer
is_enum
is_union
is_class
is_function
is_lvalue_reference
is_rvalue_reference
is_member_object_pointer
is_member_function_pointer
Composite type categories
is_reference
is_arithmetic
is_object
is_ fundamental
is_scalar
is_compound
is_member_pointer
Type property queries
alignment_of
rank
HUGE_VALF
HUGE_VAL
INFINITY
NAN
template class math_constants;
template<> class math_constants;
template<> class math_constants;
template<> class math_constants;
Tuple Library [3.28]
Header
template class tuple;
Tuple creation
functions
make_tuple()
tie()
forward_as_tuple()
tuple_cat()
template constexpr T Constant_v =
math_constants::FunctName;
Tuple helper
classes
class tuple_size
class tuple_element
template class math_constants;
static constexpr T FunctName noexcept { return T(); }
Element access get()
Relational
operators
operator==()
operator>()
Specialized
algorithms
swap()
Type relations
operator<()
operator<=()
operator!=()
operator>=()
Examples:
template constexpr T pi_v = math_constants::pi();
template class math_constants;
static constexpr T pi() noexcept { return T(); }
Sign modifications
is_same
is_base_of
is_convertible
make_signed
make_unsigned
Array modifications
Const-volatile modifications
remove_const
remove_volatile
remove_cv
add_const
add_volatile
add_cv
As modifications
remove_as
add_constant
add_local
add_global
add_private
add_generic
Replace the placeholders in the templates below with values
from the indicated column in the table above.
remove_attrs
remove_constant
remove_local
remove_global
remove_private
remove_generic
Reference modifications
remove_reference
add_lvalue_reference
add_rvalue_reference
remove_extent
remove_all_extents
Pointer modifications
add_pointer remove_pointer
Built-in vector queries
vector_size
is_vector_type
Built-in vector modifications
vector_element
make_vector
Other transformations
aligned_storage
decay
common_type
conditional
aligned_union
enable_if
underlying_type
result_of
extent
Iterator Library [3.30]
Header
template struct iterator;
Iterator operations
advance()
next()
Value of maximum non-infinite singleprecision floating-point number
Positive float expression, evaluates to +infinity
Positive double expression, evals. to +infinity
Constant float expression, positive or unsigned
infinity
Constant float expression, quiet NaN
Constants, functions, and macros
The preprocessor macros in the table below are shown for
double and are available if cl_khr_fp64 is enabled. Append _F
for float, or append _H for half if cl_khr_fp16 is enabled.
Name of constant
FunctName
Preprocessor macros
e
e()
M_E
log2e
log2e()
M_LOG2E
log10e
log10e()
M_LOG10E
ln2
ln2()
M_LN2
ln10
ln10()
M_LN10
pi
pi()
M_PI
pi_2
pi_2()
M_PI_2
pi_4
pi_4()
M_PI_4
one_pi
one_pi()
M_1_PI
two_pi
two_pi()
M_2_PI
two_sqrtpi
two_sqrtpi()
M_2_SQRTPI
sqrt2
sqrt2()
M_SQRT2
sqrt1_2
sqrt1_2()
M_SQRT1_2
distance()
prev()
Tags
input_iterator_tag
output_iterator_tag
forward_iterator_tag
bidirectional_iterator_tag
random_access_iterator_tag
Range access
begin()
end()
cbegin()
cend()
rbegin()
rend()
Predefined iterators
inserter()
front_inserter() back_inserter()
make_reverse_iterator()
make_move_iterator()
operatorOP() where OP may be ==, !=, <, > <=,
>=, +, -
Vector Wrapper Library [3.7]
Header
template struct vec;
struct vec members
vec( ) = default;
vec(const vec &) = default;
©2017 Khronos Group - Rev. 0817
crbegin()
crend()
vec(vec &&) = default;
vec(const vector_type &r) noexcept;
vec(vector_type &&r) noexcept;
template
vec(Params... params) noexcept;
operator vector_type() const noexcept;
operatorOP() where OP may be =, ++, --, +=,
-=, *=, /=, %=
swizzle()
Simple swizzles
If preprocessor macro SIMPLE_SWIZZLES is
defined, then:
auto func() noexcept; where func may
be x through zzzz
Type properties
is_const
is_volatile
is_private
is_local
is_global
is_constant
is_generic
is_vector
is_trivial
is_trivially_copyable
is_pod
is_literal_type
is_empty
is_polymorphic
is_abstract
is_final
is_signed
is_unsigned
is_standard_layout
is_[trivially_]constructible
is_[trivially_]default_constructible
is_[[trivially_]copy_]constructible
is_[[trivially_]move_]constructible
is_[trivially_]assignable
is_[[trivially_]copy_]assignable
is_[[trivially_]move_]assignable
is_[trivially_, nothrow_]destructible
is_nothrow_[default_]constructible
is_nothrow_[copy_, move_]constructible
is_nothrow_[copy_, move_]assignable
has_virtual_destructor
Vector Utilities [3.9]
Header
template
constexpr remove_attrs_t
>
get(Vec & vector) noexcept;
template
constexpr void set(Vec & vector,
remove_attrs_t>
value) noexcept;
struct channel_ref members
operatorOP() where OP may be =, ++, --, +=, -=,
*=, /=, %=
Non-member operators
operatorOP() where OP may be ==, !=, <, >
<=, >=, +, -, *, /
www.khronos.org/opencl
OpenCL 2.2 Reference Guide
OpenCL C Language
OpenCL C Language Reference
Page 13
Section and table references are to the OpenCL 2.0 C Language specification.
Supported Data Types
Built-in Vector Data Types [6.1.2]
n is 2, 3, 4, 8, or 16.
OpenCL Type
[u]charn
[u]shortn
[u]intn
[u]longn
floatn
doublen
Half vector and scalar types require cl_khr_fp16. Double types
require that CL_DEVICE_DOUBLE_FP_CONFIG is not zero.
Built-in Scalar Data Types [6.1.1]
OpenCL Type
API Type Description
bool
--
true (1) or false (0)
char
cl_char
8-bit signed
unsigned char, uchar
cl_uchar 8-bit unsigned
short
cl_short
API Type
cl_[u]charn
cl_[u]shortn
cl_[u]intn
cl_[u]longn
cl_floatn
cl_doublen
ndrange_t
clk_event_t
reserve_id_t
event_t
cl_mem_fence_flags
Description
8-bit [un]signed
16-bit [un]signed
32-bit [un]signed
64-bit [un]signed
32-bit float
64-bit float
Reserved Data Types [6.1.4]
Other Built-in Data Types [6.1.3]
16-bit signed
The OPTIONAL types shown below are only defined if
CL_DEVICE_IMAGE_SUPPORT is CL_TRUE. API type for application
shown in italics where applicable. Items in blue require the
cl_khr_gl_msaa_sharing extension.
unsigned short, ushort cl_ushort 16-bit unsigned
int
cl_int
32-bit signed
unsigned int, uint
cl_uint
32-bit unsigned
long
cl_long
64-bit signed
unsigned long, ulong
cl_ulong 64-bit unsigned
float
cl_float
double
cl_double 64-bit IEEE 754
half
cl_half
16-bit float (storage only)
size_t
--
32- or 64-bit unsigned integer
ptrdiff_t
--
32- or 64-bit signed integer
intptr_t
--
32- or 64-bit signed integer
uintptr_t
--
32- or 64-bit unsigned integer
void
void
void
OpenCL Type
image2d_[msaa_]t
OPTIONAL
image3d_t
OPTIONAL
image2d_array_ [msaa_]t
OPTIONAL
image1d_t
OPTIONAL
image1d_buffer_t
OPTIONAL
image1d_array_t
OPTIONAL
image2d_ [msaa_]depth_t
OPTIONAL
image2d_array_ [msaa_]depth_t OPTIONAL
sampler_t
OPTIONAL
queue_t
32-bit float
Description
2D image handle
3D image handle
2D image array
1D image handle
1D image buffer
1D image array
2D depth image
2D depth image array
sampler handle
Vector Component Addressing [6.1.7]
1
2
3
v.x, v.s0 v.y, v.s1
v.z, v.s2
float4 v;
v.x, v.s0 v.y, v.s1
v.z, v.s2 v.w, v.s3
5
6
7
8
float8 v;
v.s0
v.s1
v.s2
v.s3
v.s4
v.s5
v.s6 v.s7
float16 v;
v.s0
v.s1
v.s2
v.s3
v.s4
v.s5
v.s6 v.s7
9
10
11
12
13
14
15
v.s8 v.s9 v.sa, v.sb, v.sc, v.sd, v.se, v.sf,
v.sA v.sB v.sC v.sD v.sE v.sF
Numeric indices are preceded by the letter s or S, e.g.: s1. Swizzling, duplication, and nesting are allowed, e.g.: v.yx, v.xx, v.lo.x
v.lo
v.hi
v.odd
v.even
v.x, v.s0
v.y, v.s1
v.y, v.s1
v.x, v.s0
v.s01, v.xy
v.s23, v.zw
v.s13, v.yw
v.s02, v.xz
float8
float16
v.s01, v.xy
v.s23, v.zw
v.s13, v.yw
v.s02, v.xz
*When using .lo or .hi with a 3-component vector, the .w component is undefined.
Preprocessor Directives & Macros [6.10]
#pragma OPENCL FP_CONTRACT on-off-switch
on-off-switch: ON, OFF, DEFAULT
__FILE__
__func__
Current source file
Current function name
__LINE__
Integer line number
__OPENCL_VERSION__
CL_VERSION_1_0
CL_VERSION_1_1
CL_VERSION_1_2
CL_VERSION_2_0
__OPENCL_C_VERSION__
__ENDIAN_LITTLE__
__IMAGE_SUPPORT__
Integer version number, e.g: 200
Substitutes integer 100 for 1.0
Substitutes integer 110 for 1.1
Substitutes integer 120 for 1.2
Substitutes integer 200 for 2.0
Sub. integer for OpenCL C version
1 if device is little endian
1 if images are supported
__FAST_RELAXED_MATH__
1 if –cl-fast-relaxed-math
optimization option is specified
FP_FAST_FMA
Defined if double fma is fast
©2017 Khronos Group - Rev. 0817
16-bit, vector
quad, quadn
128-bit float, vector
complex half, complex halfn
imaginary half, imaginary halfn
16-bit complex, vector
complex float, complex floatn
imaginary float, imaginary floatn
32-bit complex, vector
complex double, complex doublen
imaginary double, imaginary doublen 64-bit complex, vector
complex quad, complex quadn
imaginary quad, imaginary quadn
128-bit complex, vector
floatnxm
n*m matrix of 32-bit floats
doublenxm
n*m matrix of 64-bit floats
These operators behave similarly as in C99 except operands
may include vector types when possible:
+
++
>
&&
==
<
||
*
!=
>=
?:
%
&
<=
>>
,
op=
sizeof
/
~
|
<<
-^
!
=
Address Space Qualifiers [6.5]
Vector Addressing Equivalences
float2
float3 *
float4
boolean vector
halfn
v.lo
v.hi
v.odd
v.even
v.s0123
v.s4567
v.s1357
v.s0246
v.s01234567 v.s89abcdef v.s13579bdf v.s02468ace
FP_FAST_FMAF
Defined if float fma is fast
FP_FAST_FMA_HALF
Defined if half fma is fast
__kernel_exec (X, typen)
Same as:
__kernel __attribute__((work_group_size_hint(X, 1, 1)))
__attribute__((vec_type_hint(typen)))
Conversions, Type Casting Examples [6.2]
T a = (T)b;
// Scalar to scalar, or scalar to vector
T a = convert_T(b);
T a = convert_T_R(b);
T a = as_T(b);
T a = convert_T_sat_R(b);
R: one of the rounding modes
_rte to nearest even
_rtz toward zero
_rtp toward + infinity
_rtn toward - infinity
__global, global
__constant, constant
__local, local
__private, private
Function Qualifiers [6.7]
__kernel, kernel
__attribute__((vec_type_hint(type)))
//type defaults to int
__attribute__((work_group_size_hint(X, Y, Z)))
__attribute__((reqd_work_group_size(X, Y, Z)))
Attribute Qualifiers [6.11]
Use to specify special attributes of enum, struct, and union
types.
__attribute__((aligned(n))) __attribute__((endian(host)))
__attribute__((aligned))
__attribute__((endian(device)))
__attribute__((packed))
__attribute__((endian))
Use to specify special attributes of variables or structure fields.
__attribute__((aligned(alignment)))
__attribute__((nosvm))
Use to specify basic blocks and control-flow-statements.
__attribute__((attr1)) {…}
Use to specify that a loop (for, while, and do loops) can be
unrolled. (Must appear immediately before the loop to be
affected.)
__attribute__((opencl_unroll_hint(n)))
__attribute__((opencl_unroll_hint))
www.khronos.org/opencl
OpenCL C Language
v.x, v.s0 v.y, v.s1
float3 v;
Description
booln
Operators [6.3]
4
float2 v;
OpenCL Type
Operators and Qualifiers
Vector Components
0
event handle
Page 14
OpenCL C Language
Access Qualifiers [6.6]
Apply to 2D and 3D image types to declare if the image memory
object is being read or written by a kernel.
__read_only, read_only
__write_only, write_only
__read_write, read_write
Blocks [6.12]
A result value type with a list of parameter types: for example:
1. The ^ declares variable “myBlock” is a Block.
2. The return type for the Block “myBlock”is int.
3. myBlock takes a single argument of type int.
4. The argument is named “num.”
5. Multiplier captured from block’s environment.
k
j
l
int (^myBlock)(int) =
^(int num) {return num * multiplier; };
m
n
Math Built-in Functions [6.13.2]
Ts is type float, optionally double (if cl_khr_fp64 is enabled), or
half (if cl_khr_fp16 is enabled). Tn is the vector form of Ts, where
n is 2, 3, 4, 8, or 16. T is Ts and Tn. All angles are in radians.
OpenCL C Language
HN indicates that half and native variants are available using only
the float or floatn types by prepending “half_” or “native_” to the
function name. Prototypes shown in brown text are available in
half_ and native_ forms only using the float or floatn types.
Work-Item Built-in Functions [6.13.1]
Query the number of dimensions, global, and local work size
specified to clEnqueueNDRangeKernel, and global and local
identifier of each work-item when this kernel is executed on a
device.
OpenCL 2.2 Reference Guide
size_t get_group_id (
uint dimindx)
Work-group ID
size_t get_global_offset (
uint dimindx)
Global offset
uint get_work_dim ()
Number of dimensions in use
size_t get_global_linear_id ()
Work-items 1-dimensional
global ID
size_t get_global_size (
uint dimindx)
Number of global work-items
size_t get_local_linear_id ()
Work-items 1-dimensional
local ID
size_t get_global_id (
uint dimindx)
Global work-item ID value
uint get_sub_group_size ()
size_t get_local_size (
uint dimindx)
Number of local work-items if kernel
executed with uniform work-group size
Number of work-items in
the subgroup
uint get_max_sub_group_size ()
Maximum size of a
subgroup
uint get_num_sub_groups ()
Number of subgroups
size_t get_enqueued_local_size (
uint dimindx)
Number of local workitems
size_t get_local_id (uint dimindx)
Local work-item ID
size_t get_num_groups (
uint dimindx)
Number of work-groups
uint get_enqueued_num_sub_groups ()
uint get_sub_group_id ()
Sub-group ID
uint get_sub_group_local_id ()
Unique work-item ID
T fmod (T x, T y)
Modulus. Returns x – y * trunc (x/y)
T sin (T)
T fract (T x, T *iptr)
Fractional value in x
T sincos (T x, T *cosval)
Sine and cosine of x
Ts frexp (T x, int *exp)
Tn frexp (T x, intn *exp)
Extract mantissa and exponent
T sinh (T)
Hyperbolic sine
T hypot (T x, T y)
Square root of x2 + y2
T sinpi (T x)
sin (π x)
int[n] ilogb (T x)
Return exponent as an integer value
Ts ldexp (T x, int n)
Tn ldexp (T x, intn n)
x * 2n
HN Sine
T sqrt (T)
HN Square root
T tan (T)
HN Tangent
T acos (T)
Arc cosine
T acosh (T)
Inverse hyperbolic cosine
T acospi (T x)
acos (x) / π
T asin (T)
Arc sine
T lgamma (T x)
Ts lgamma_r (Ts x, int *signp) Log gamma function
Tn lgamma_r (Tn x, intn *signp)
T asinh (T)
Inverse hyperbolic sine
T log (T)
HN Natural logarithm
T asinpi (T x)
asin (x) / π
T log2 (T)
HN Base 2 logarithm
T atan (T y_over_x)
Arc tangent
T log10 (T)
HN Base 10 logarithm
T atan2 (T y, T x)
Arc tangent of y / x
T log1p (T x)
ln (1.0 + x)
T atanh (T)
Hyperbolic arc tangent
T logb (T x)
Exponent of x
T atanpi (T x)
atan (x) / π
T mad (T a, T b, T c)
Approximates a * b + c
MAXFLOAT
Value of maximum non-infinite single-precision
floating-point number
T atan2pi (T x, T y)
atan2 (y, x) / π
T maxmag (T x, T y)
Maximum magnitude of x and y
HUGE_VALF
Positive float expression, evaluates to +infinity
T cbrt (T)
Cube root
T minmag (T x, T y)
Minimum magnitude of x and y
Round to integer toward + infinity
HUGE_VAL
Positive double expression, evals. to +infinity
T ceil (T)
T modf (T x, T *iptr)
Decompose floating-point number
T copysign (T x, T y)
x with sign changed to sign of y
Quiet NaN (Return is scalar when
nancode is scalar)
INFINITY
float[n] nan (uint[n] nancode)
Constant float expression, positive or unsigned
infinity
NAN
Constant float expression, quiet NaN
half[n] nan (ushort[n]
nancode)
double[n] nan (ulong[n]
nancode)
Quiet NaN
(Return is scalar when nancode is
scalar)
T nextafter (T x, T y)
Next representable floating-point
value after x in the direction of y
M_E_F
Value of e
T pow (T x, T y)
Compute x to the power of y
M_LOG2E_F
Value of log2e
Compute x y, where y is an integer
M_LOG10E_F
Value of log10e
M_LN2_F
Value of loge2
M_LN10_F
Value of loge10
M_PI_F
Value of π
M_PI_2_F
Value of π / 2
M_PI_4_F
Value of π / 4
M_1_PI_F
Value of 1 / π
M_2_PI_F
Value of 2 / π
M_2_SQRTPI_F
Value of 2 / √π
M_SQRT2_F
Value of √2
M_SQRT1_2_F
Value of 1 / √2
T cos (T)
HN Cosine
T cosh (T)
Hyperbolic cosine
T cospi (T x)
cos (π x)
T half_divide (T x, T y)
T native_divide (T x, T y)
x/y
(T may only be float or floatn)
T erfc (T)
Complementary error function
T erf (T)
Calculates error function of T
T exp (T x)
HN Exponential base e
Ts pown (T x, int y)
Tn pown (T x, intn y)
T exp2 (T)
HN Exponential base 2
T powr (T x, T y)
T exp10 (T)
HN Exponential base 10
T half_recip (T x)
T native_recip (T x)
1/x
(T may only be float or floatn)
T remainder (T x, T y)
Floating point remainder
T expm1 (T x)
e x -1.0
T fabs (T)
Absolute value
T fdim (T x, T y)
Positive difference between x and y
T floor (T)
Round to integer toward infinity
T fma (T a, T b, T c)
Multiply and add, then round
T fmax (T x, T y)
Tn fmax (Tn x, Ts y)
Return y if x < y,
otherwise it returns x
T fmin (T x, T y)
Tn fmin (Tn x, Ts y)
Return y if y < x,
otherwise it returns x
©2017 Khronos Group - Rev. 0817
HN Compute x y, where x is >= 0
Ts remquo (Ts x, Ts y, int *quo)
Remainder and quotient
Tn remquo (Tn x, Tn y, intn *quo)
T rint (T)
Round to nearest even integer
Ts rootn (T x, int y)
Tn rootn (T x, intn y)
Compute x to the power of 1/y
T round (T x)
Integral value nearest to x rounding
T rsqrt (T)
T tanh (T)
Hyperbolic tangent
T tanpi (T x)
tan (π x)
T tgamma (T)
Gamma function
T trunc (T)
Round to integer toward zero
Math Constants [6.13.2]
The values of the following symbolic constants are
single-precision float.
OPTIONAL
When double precision is supported (if cl_khr_fp64 is enabled)
macros ending in _F are available in type double by removing
_F from the macro name, and in type half (if cl_khr_fp16 is
enabled) by replacing _F with _H.
HN Inverse square root
www.khronos.org/opencl
OpenCL 2.2 Reference Guide
Image Read and Write Functions [6.13.14]
The built-in functions defined in this section can only be used
with image memory objects created with clCreateImage.
sampler specifies the addressing and filtering mode to use.
aQual refers to one of the access qualifiers. For samplerless
read functions this may be read_only or read_write.
• Writes to images with sRGB channel orders requires device
support of the cl_khr_srgb_image_writes extension.
• read_imageh and write_imageh require the
cl_khr_fp16 extension.
• MSAA images require the cl_khr_gl_msaa_sharing extension.
• Image 3D writes require the extension
cl_khr_3d_image_writes. [9.4.8]
Read and write functions for 2D images
OpenCL C Language
void write_imagef (aQual image2d_depth_t image,
int2 coord, float depth)
void write_imagef (aQual image2d_array_depth_t image,
int4 coord, float depth)
void write_imageh (aQual image2d_array_t image,
int4 coord, half4 color)
Read and write functions for 1D images
Read an element from a 1D image, or write a color value to a
location in a 1D image.
float4 read_imagef (read_only image1d_t image,
sampler_t sampler, {int, float} coord)
Page 15
Read and write functions for 3D images
Read an element from a 3D image, or write a color value to
a location in a 3D image. Writing to 3D images requires the
cl_khr_3d_image_writes extension [9.4.8].
float4 read_imagef (read_only image3d_t image,
sampler_t sampler, {int4, float4} coord)
int4 read_imagei (read_only image3d_t image,
sampler_t sampler, int4 coord)
int4 read_imagei (read_only image3d_t image,
sampler_t sampler, float4 coord)
uint4 read_imageui (read_only image3d_t image,
sampler_t sampler, {int4, float4} coord)
float4 read_imagef (aQual image3d_t image, int4 coord)
Read an element from a 2D image, or write a color value to a
location in a 2D image.
int4 read_imagei (read_only image1d_t image,
sampler_t sampler, {int, float} coord)
float4 read_imagef (read_only image2d_t image,
sampler_t sampler, {int2, float2} coord)
uint4 read_imageui (read_only image1d_t image,
sampler_t sampler, {int, float} coord)
int4 read_imagei (read_only image2d_t image,
sampler_t sampler, {int2, float2} coord)
float4 read_imagef (read_only image1d_array_t image,
sampler_t sampler, {int2, float4} coord)
uint4 read_imageui (read_only image2d_t image,
sampler_t sampler, {int2, float2} coord)
int4 read_imagei (read_only image1d_array_t image,
sampler_t sampler, {int2, float2} coord)
float4 read_imagef (read_only image2d_array_t image,
sampler_t sampler, {int4, float4} coord)
uint4 read_imageui (read_only image1d_array_t image,
sampler_t sampler, {int2, float2} coord)
int4 read_imagei (read_only image2d_array_t image,
sampler_t sampler, {int4, float4} coord)
float4 read_imagef (aQual image1d_t image, int coord)
void write_imageui (aQual image3d_t image,
int4 coord, uint4 color)
float4 read_imagef (aQual image1d_buffer_t image, int coord)
void write_imageh (aQual image3d_t image,
int4 coord, half4 color)
uint4 read_imageui (read_only image2d_array_t image,
sampler_t sampler, {int4, float4} coord)
int4 read_imagei (aQual image1d_t image, int coord)
float read_imagef (read_only image2d_depth_t image,
sampler_t sampler, {int2, float2} coord)
uint4 read_imageui (aQual image1d_t image, int coord)
float read_imagef (read_only image2d_array_depth_t image,
sampler_t sampler, {int4, float4} coord)
uint4 read_imageui (aQual image1d_buffer_t image, int coord)
float4 read_imagef (aQual image2d_t image, int2 coord)
int4 read_imagei (aQual image2d_t image, int2 coord)
float4 read_imagef (aQual image2d_array_t image, int4 coord)
int4 read_imagei (aQual image2d_array_t image, int4 coord)
float4 read_imagef (aQual image1d_array_t image, int2 coord)
int4 read_imagei (aQual image1d_array_t image, int2 coord)
uint4 read_imageui (aQual image1d_array_t image, int2 coord)
half4 read_imageh (read_only image1d_t image,
sampler_t sampler, {int, float} coord)
half4 read_imageh (aQual image1d_t image, int coord)
uint4 read_imageui (aQual image2d_array_t image, int4 coord)
half4 read_imageh (read_only image1d_array_t image,
sampler_t sampler, {int2, float4} coord)
float read_imagef (aQual image2d_depth_t image, int2 coord)
half4 read_imageh (aQual image1d_array_t image, int2 coord)
float read_imagef (aQual image2d_array_depth_t image,
int4 coord)
half4 read_imageh (read_only image2d_t image,
sampler_t sampler, {int2, float2} coord)
half4 read_imageh (aQual image2d_t image, int2 coord)
half4 read_imageh (read_only image2d_array_t image,
sampler_t sampler, {int4, float4} coord)
half4 read_imageh (aQual image2d_array_t image,
int4 coord)
void write_imagef (aQual image2d_t image,
int2 coord, float4 color)
void write_imagei (aQual image2d_t image,
int2 coord, int4 color)
void write_imageui (aQual image2d_t image,
int2 coord, uint4 color)
half4 read_imageh (aQual image1d_buffer_t image, int coord)
void write_imagef (aQual image1d_t image,
int coord, float4 color)
void write_imagei (aQual image1d_t image,
int coord, int4 color)
void write_imageui (aQual image1d_t image,
int coord, uint4 color)
void write_imageh (aQual image1d_t image,
int coord, half4 color)
void write_imagef (aQual image1d_buffer_t image,
int coord, float4 color)
void write_imagei (aQual image1d_buffer_t image,
int coord, int4 color)
void write_imageui (aQual image1d_buffer_t image,
int coord, uint4 color)
void write_imageh (aQual image1d_buffer_t image,
int coord, half4 color)
void write_imageh (aQual image2d_t image,
int2 coord, half4 color)
void write_imagef (aQual image1d_array_t image,
int2 coord, float4 color)
void write_imagef (aQual image2d_array_t image,
int4 coord, float4 color)
void write_imagei (aQual image1d_array_t image,
int2 coord, int4 color)
void write_imagei (aQual image2d_array_t image,
int4 coord, int4 color)
void write_imageui (aQual image1d_array_t image,
int2 coord, uint4 color)
void write_imageui (aQual image2d_array_t image,
int4 coord, uint4 color)
void write_imageh (aQual image1d_array_t image,
int2 coord, half4 color)
uint4 read_imageui (aQual image3d_t image, int4 coord)
half4 read_imageh (read_only image3d_t image,
sampler_t sampler, {int4, float4} coord)
half4 read_imageh (aQual image3d_t image, int4 coord)
void write_imagef (aQual image3d_t image,
int4 coord, float4 color)
void write_imagei (aQual image3d_t image,
int4 coord, int4 color)
Extended mipmap read and write functions
These functions require the cl_khr_mipmap_image and
cl_khr_mipmap_image_writes extensions.
float read_imagef (read_only image2d_[depth_]t image,
sampler_t sampler, float2 coord, float lod)
int4 read_imagei (read_only image2d_t image,
sampler_t sampler, float2 coord, float lod)
uint4 read_imageui (read_only image2d_t image,
sampler_t sampler, float2 coord, float lod)
float read_imagef (read_only image2d_ [depth_]t image,
sampler_t sampler, float2 coord, float2 gradient_x,
float2 gradient_y)
int4 read_imagei (read_only image2d_t image,
sampler_t sampler, float2 coord, float2 gradient_x,
float2 gradient_y)
uint4 read_imageui (read_only image2d_t image,
sampler_t sampler, float2 coord, float2 gradient_x,
float2 gradient_y)
float4 read_imagef (read_only image1d_t image,
sampler_t sampler, float coord, float lod)
int4 read_imagei (read_only image1d_t image,
sampler_t sampler, float coord, float lod)
uint4 read_imageui(read_only image1d_t image,
sampler_t sampler, float coord, float lod)
float4 read_imagef (read_only image1d_t image,
sampler_t sampler, float coord, float gradient_x,
float gradient_y)
int4 read_imagei (read_only image1d_t image,
sampler_t sampler, float coord, float gradient_x,
float gradient_y)
uint4 read_imageui(read_only image1d_t image,
sampler_t sampler, float coord, float gradient_x,
float gradient_y)
float4 read_imagef (read_only image3d_t image,
sampler_t sampler, float4 coord, float lod)
int4 read_imagei(read_only image3d_t image,
sampler_t sampler, float4 coord, float lod)
uint4 read_imageui(read_only image3d_t image,
sampler_t sampler, float4 coord, float lod)
float4 read_imagef (read_only image3d_t image,
sampler_t sampler, float4 coord, float4 gradient_x,
float4 gradient_y)
(Continued on next page >)
©2017 Khronos Group - Rev. 0817
www.khronos.org/opencl
OpenCL C Language
uint4 read_imageui (aQual image2d_t image, int2 coord)
int4 read_imagei (aQual image1d_buffer_t image, int coord)
int4 read_imagei (aQual image3d_t image, int4 coord)
Page 16
Image Read and Write (continued)
Extended mipmap read and write functions (cont’d)
int4 read_imagei(read_only image3d_t image,
sampler_t sampler, float4 coord, float4 gradient_x,
float4 gradient_y)
uint4 read_imageui(read_only image3d_t image,
sampler_t sampler, float4 coord, float4 gradient_x,
float4 gradient_y)
float4 read_imagef (read_only image1d_array_t image,
sampler_t sampler, float2 coord, float lod)
int4 read_imagei (read_only image1d_array_t image,
sampler_t sampler, float2 coord, float lod)
uint4 read_imageui(read_only image1d_array_t image,
sampler_t sampler, float2 coord, float lod)
float4 read_imagef (read_only image1d_array_t image,
sampler_t sampler, float2 coord, float gradient_x,
float gradient_y)
int4 read_imagei (read_only image1d_array_t image,
sampler_t sampler, float2 coord, float gradient_x,
float gradient_y)
uint4 read_imageui(read_only image1d_array_t image,
sampler_t sampler, float2 coord, float gradient_x,
float gradient_y)
float read_imagef (read_only image2d_array_ [depth_]t image,
sampler_t sampler, float4 coord, float lod)
int4 read_imagei (read_only image2d_array_t image,
sampler_t sampler, float4 coord, float lod)
Image Query Functions [6.13.14.5] [9.10.3]
OpenCL C Language
The MSAA forms require the extension cl_khr_gl_msaa_sharing.
Mipmap requires the extension cl_khr_mipmap_image.
Query image width, height, and depth in pixels
int get_image_width (aQual image{1,2,3}d_t image)
int get_image_width (aQual image1d_buffer_t image)
int get_image_width (aQual image{1,2}d_array_t image)
int get_image_width (
aQual image2d_[array_]depth_t image)
int get_image_width (aQual image2d_[array_]msaa_t image)
int get_image_width (
aQual image2d_ [array_]msaa_depth_t image)
int get_image_height (aQual image{2,3}d_t image)
int get_image_height (aQual image2d_array_t image)
int get_image_height (
aQual image2d_[array_]depth_t image)
int get_image_height (
aQual image2d_[array_]msaa_t image)
int get_image_height (
aQual image2d_[array_]msaa_depth_t image)
int get_image_depth (image3d_t image)
Query image array size
size_t get_image_array_size (aQual image1d_array_t image)
size_t get_image_array_size (aQual image2d_array_t image)
size_t get_image_array_size (
aQual image2d_array_depth_t image)
size_t get_image_array_size (
aQual image2d_array_msaa_depth_t image)
Query image dimensions
int2 get_image_dim (aQual image2d_t image)
int2 get_image_dim (aQual image2d_array_t image)
int4 get_image_dim (aQual image3d_t image)
int2 get_image_dim (aQual image2d_[array_]depth_t image)
int2 get_image_dim (aQual image2d_[array_]msaa_t image)
int2 get_image_dim (
aQual image2d_ [array_]msaa_depth_t image)
©2017 Khronos Group - Rev. 0817
OpenCL C Language
uint4 read_imageui (read_only image2d_array_t image,
sampler_t sampler, float4 coord, float lod)
float read_imagef (
read_only image2d_array_ [depth_]t image,
sampler_t sampler, float4 coord, float2 gradient_x,
float2 gradient_y)
int4 read_imagei (read_only image2d_array_t image,
sampler_t sampler, float4 coord, float2 gradient_x,
float2 gradient_y)
uint4 read_imageui (read_only image2d_array_t image,
sampler_t sampler, float4 coord, float2 gradient_x,
float2 gradient_y)
void write_imagef (aQual image2d_ [depth_]t image,
int2 coord, int lod, float4 color)
void write_imagei (aQual image2d_t image, int2 coord, int lod,
int4 color)
void write_imageui (aQual image2d_t image, int2 coord, int lod,
uint4 color)
void write_imagef (aQual image1d_t image, int coord, int lod,
float4 color)
void write_imagei (aQual image1d_t image, int coord, int lod,
int4 color)
void write_imageui (aQual image1d_t image, int coord, int lod,
uint4 color)
void write_imagef (aQual image1d_array_t image, int2 coord,
int lod, float4 color)
void write_imagei (aQual image1d_array_t image, int2 coord,
int lod, int4 color)
void write_imageui (aQual image1d_array_t image, int2 coord,
int lod, uint4 color)
Query image channel data type and order
int get_image_channel_data_type (
aQual image{1,2,3}d_t image)
OpenCL 2.2 Reference Guide
void write_imagef (aQual image2d_array_ [depth_]t image,
int4 coord, int lod, float4 color)
void write_imagei (aQual image2d_array_t image, int4 coord,
int lod, int4 color)
void write_imageui (aQual image2d_array_t image, int4 coord,
int lod, uint4 color)
void write_imagef (aQual image3d_t image, int4 coord, int lod,
float4 coord)
void write_imagei (aQual image3d_t image, int4 coord, int lod,
int4 color)
void write_imageui (aQual image3d_t image, int4 coord, int lod,
uint4 color)
Extended multi-sample image read functions [9.10.3]
The extension cl_khr_gl_msaa_sharing adds the following built-in
functions.
float read_imagef (aQual image2d_msaa_depth_t image,
int2 coord, int sample)
float read_imagef (aQual image2d_array_depth_msaa_t image,
int4 coord, int sample)
float4 read_image{f, i, ui} (image2d_msaa_t image,
int2 coord, int sample)
float4 read_image{f, i, ui} (image2d_array_msaa_t image,
int4 coord, int sample)
Notes
int get_image_channel_data_type (
aQual image1d_buffer_t image)
int get_image_channel_data_type (
aQual image{1,2}d_array_t image)
int get_image_channel_data_type (aQual
image2d_[array_]depth_t image)
int get_image_channel_data_type (
aQual image2d_[array_]msaa_t image)
int get_image_channel_data_type (
aQual image2d_[array_]msaa_depth_t image)
int get_image_channel_order (aQual image{1,2,3}d_t image)
int get_image_channel_order (
aQual image1d_buffer_t image)
int get_image_channel_order (
aQual image{1,2}d_array_t image)
int get_image_channel_order (
aQual image2d_[array_]depth_t image)
int get_image_channel_order (
aQual image2d_[array_]msaa_t image)
int get_image_channel_order(
aQual image2d_[array_]msaa_depth_t image)
Extended query functions [9.10.3]
These functions require the cl_khr_mipmap_image extension.
int get_image_num_mip_levels (aQual image1d_t image)
int get_image_num_mip_levels (
aQual image2d_ [depth_]t image)
int get_image_num_mip_levels (aQual image3d_t image)
int get_image_num_mip_levels (
aQual image1d_array_t image)
int get_image_num_mip_levels (
aQual image2d_array_[depth_]t image)
int get_image_num_samples (
aQual image2d_[array_]msaa_t image)
int get_image_num_samples (
aQual image2d_ [array_]msaa_depth_t image)
www.khronos.org/opencl
OpenCL 2.2 Reference Guide
Integer Built-in Functions [6.13.3]
T is type char, charn, uchar, ucharn, short, shortn, ushort,
ushortn, int, intn, uint, uintn, long, longn, ulong, or ulongn,
where n is 2, 3, 4, 8, or 16. Tu is the unsigned version of T. Tsc is
the scalar version of T.
Tu abs (T x)
|x|
Tu abs_diff (T x, T y)
| x – y | without modulo overflow
T add_sat (T x, T y)
x + y and saturates the result
T hadd (T x, T y)
(x + y) >> 1 without mod. overflow
T rhadd (T x, T y)
(x + y + 1) >> 1
T clamp (T x, T min, T max)
T clamp (T x, Tsc min, Tsc max) min(max(x, minval), maxval)
T clz (T x)
Number of leading 0-bits in x
T ctz (T x)
Number of trailing 0-bits in x
T mad_hi (T a, T b, T c)
mul_hi(a, b) + c
T mad_sat (T a, T b, T c)
a * b + c and saturates the result
T max (T x, T y)
T max (T x, Tsc y)
y if x < y, otherwise it returns x
T min (T x, T y)
T min (T x, Tsc y)
y if y < x, otherwise it returns x
T mul_hi (T x, T y)
High half of the product of x and y
T rotate (T v, T i)
result[indx] = v[indx] << i[indx]
Relational Built-in Functions [6.13.6]
These functions can be used with built-in scalar or vector types
as arguments and return a scalar or vector integer result. T
is type float, floatn, char, charn, uchar, ucharn, short, shortn,
ushort, ushortn, int, intn, uint, uintn, long, longn, ulong, ulongn,
or optionally double or doublen (if cl_khr_fp64 is enabled) or half
and halfn (if cl_khr_fp16 is enabled). Ti is type char, charn, short,
shortn, int, intn, long, or longn. Tu is type uchar, ucharn, ushort,
ushortn, uint, uintn, ulong, or ulongn. n is 2, 3, 4, 8, or 16.
Compare of x == y
int isnotequal (float x, float y)
intn isnotequal (floatn x, floatn y)
int isnotequal (double x, double y)
longn isnotequal (doublen x, doublen y)
int isnotequal (half x, half y)
shortn isnotequal (halfn x, halfn y)
Compare of x != y
int isgreater (float x, float y)
intn isgreater (floatn x, floatn y)
int isgreater (double x, double y)
longn isgreater (doublen x, doublen y)
int isgreater (half x, half y)
shortn isgreater (halfn x, halfn y)
Compare of x > y
int isgreaterequal (float x, float y)
intn isgreaterequal (floatn x, floatn y)
int isgreaterequal (double x, double y)
Compare of x >= y
T sub_sat (T x, T y)
x - y and saturates the result
T popcount (T x)
Number of non-zero bits in x
For upsample, return type is scalar when the parameters are scalar.
Common Built-in Functions [6.13.4]
These functions operate component-wise and use round to
nearest even rounding mode. Ts is type float, optionally double
(if cl_khr_fp64 is enabled) or half and halfn (if cl_khr_fp16 is
enabled). Tn is the vector form of Ts, where n is 2, 3, 4, 8, or 16.
T is Ts and Tn.
short[n] upsample (
char[n] hi, uchar[n] lo)
result[i]= ((short)hi[i]<< 8)|lo[i]
ushort[n] upsample (
uchar[n] hi, uchar[n] lo)
result[i]=((ushort)hi[i]<< 8)|lo[i]
T clamp (T x, T min, T max)
Tn clamp (Tn x, Ts min, Ts max)
Clamp x to range given by
min, max
int[n] upsample (
short[n] hi, ushort[n] lo)
result[i]=((int)hi[i]<< 16)|lo[i]
T degrees (T radians)
radians to degrees
uint[n] upsample (
ushort[n] hi, ushort[n] lo)
result[i]=((uint)hi[i]<< 16)|lo[i]
T max (T x, T y)
Tn max (Tn x, Ts y)
Max of x and y
long[n] upsample (
int[n] hi, uint[n] lo)
result[i]=((long)hi[i]<< 32)|lo[i]
T min (T x, T y)
Tn min (Tn x, Ts y)
Min of x and y
ulong[n] upsample (
uint[n] hi, uint[n] lo)
result[i]=((ulong)hi[i]<< 32)|lo[i]
T mix (T x, T y, T a)
Tn mix (Tn x, Tn y, Ts a)
Linear blend of x and y
T radians (T degrees)
degrees to radians
T step (T edge, T x)
Tn step (Ts edge, Tn x)
0.0 if x < edge, else 1.0
The following fast integer functions optimize the performance
of kernels. In these functions, T is type int, uint, intn, or uintn,
where n is 2, 3, 4, 8, or 16.
T mad24 (T x, T y, T z)
Multiply 24-bit integer values x, y, add
32-bit int. result to 32-bit integer z
T mul24 (T x, T y)
Multiply 24-bit integer values x and y
longn isless (doublen x, doublen y)
int isless (half x, half y)
shortn isless (halfn x, halfn y)
int islessequal (float x, float y)
intn islessequal (floatn x, floatn y)
int islessequal (double x, double y)
longn islessequal (doublen x, doublen y)
int islessequal (half x, half y)
shortn islessequal (halfn x, halfn y)
T smoothstep (T edge0, T edge1, T x)
Step and interpolate
T smoothstep (Ts edge0, Ts edge1, T x)
T sign (T x)
Sign of x
Compare of x < y
longn isnormal (doublen)
int isnormal (half)
shortn isnormal (halfn)
Test for a normal
value
Compare of x <= y
int isordered (float x, float y)
intn isordered (floatn x, floatn y)
int isordered (double x, double y)
longn isordered (doublen x, doublen y)
int isordered (half x, half y)
shortn isordered (halfn x, halfn y)
Test if arguments are
ordered
int islessgreater (float x, float y)
intn islessgreater (floatn x, floatn y)
int islessgreater (double x, double y)
Compare of
longn islessgreater (doublen x, doublen y) (x < y) || (x > y)
int islessgreater (half x, half y)
shortn islessgreater (halfn x, halfn y)
int isunordered (float x, float y)
intn isunordered (floatn x, floatn y)
int isunordered (double x, double y)
Test if arguments are
longn isunordered (doublen x, doublen y) unordered
int isunordered (half x, half y)
shortn isunordered (halfn x, halfn y)
int isfinite (float)
intn isfinite (floatn)
int isfinite (double)
longn isfinite (doublen)
int isfinite (half)
shortn isfinite (halfn)
int signbit (float)
intn signbit (floatn)
int signbit (double)
longn signbit (doublen)
int signbit (half)
shortn signbit (halfn)
Test for sign bit
int any (Ti x)
1 if MSB in component
of x is set; else 0
int all (Ti x)
1 if MSB in all
components of x are
set; else 0
T bitselect (T a, T b, T c)
half bitselect (half a, half b, half c)
halfn bitselect (halfn a, halfn b, halfn c)
Each bit of result is
corresponding bit of
a if corresponding bit
of c is 0
T select (T a, T b, Ti c)
T select (T a, T b, Tu c)
halfn select (halfn a, halfn b, shortn c)
half select (half a, half b, short c)
halfn select (halfn a, halfn b, ushortn c)
half select (half a, half b, ushort c)
For each component
of a vector type,
result[i] = if MSB of
c[i] is set ? b[i] : a[i]
For scalar type, result
=c?b:a
Test for finite value
int isinf (float)
intn isinf (floatn)
int isinf (double)
longn isinf (doublen)
int isinf (half)
shortn isinf (halfn)
Test for + or – infinity
int isnan (float)
intn isnan (floatn)
Test for a NaN
longn isgreaterequal (doublen x, doublen y)
int isgreaterequal (half x, half y)
Compare of x >= y
shortn isgreaterequal (halfn x, halfn y)
int isnan (double)
longn isnan (doublen)
int isnan (half)
shortn isnan (halfn)
Test for a NaN
int isless (float x, float y)
intn isless (floatn x, floatn y)
int isless (double x, double y)
int isnormal (float)
intn isnormal (floatn)
int isnormal (double)
Test for a normal
value
Ts is scalar type float, optionally double (if cl_khr_fp64 is enabled),
or half (if cl_khr_fp16 is enabled). T is Ts and the 2-, 3-, or
4-component vector forms of Ts.
Ts distance (T p0, T p1)
Vector distance
Ts dot (T p0, T p1)
Dot product
float{3,4} cross (float{3,4} p0, float{3,4} p1)
double{3,4} cross (double{3,4} p0, double{3,4} p1) Cross product
half{3,4} cross (half{3,4} p0, half{3,4} p1)
Ts length (T p)
T normalize (T p)
Compare of x < y
Page 17
Geometric Built-in Functions [6.13.5]
©2017 Khronos Group - Rev. 0817
float fast_distance (float p0, float p1)
float fast_distance (floatn p0, floatn p1)
Vector distance
Vector length
float fast_length (float p)
float fast_length (floatn p)
Vector length
Normal vector
length 1
float fast_normalize (float p)
floatn fast_normalize (floatn p)
Normal vector
length 1
www.khronos.org/opencl
OpenCL C Language
int isequal (float x, float y)
intn isequal (floatn x, floatn y)
int isequal (double x, double y)
longn isequal (doublen x, doublen y)
int isequal (half x, half y)
shortn isequal (halfn x, halfn y)
OpenCL C Language
Page 18
OpenCL C Language
Vector Data Load/Store [6.13.7]
T is type char, uchar, short, ushort, int, uint, long, ulong, or float,
optionally double (if cl_khr_fp64 is enabled), or half (if cl_khr_fp16
is enabled). Tn refers to the vector form of type T, where n is 2, 3,
4, 8, or 16. R defaults to current rounding mode, or is one of the
rounding modes listed in 6.2.3.2.
Tn vloadn (size_t offset,
const [constant] T *p)
Read vector data from
address (p + (offset * n))
void vstoren (Tn data,
size_t offset, T *p)
float vload_half (size_t offset,
const [constant] half *p)
floatn vload_halfn (size_t offset,
const [constant] half *p)
Write vector data to address
(p + (offset * n)
Read a half from address
(p + offset)
Read a halfn from address
(p + (offset * n))
void vstore_half (float data,
size_t offset, half *p)
void vstore_half_R (float data,
size_t offset, half *p)
void vstore_half (double data,
size_t offset, half *p)
Write a half to address
(p + offset)
void vstore_half_R (double data,
size_t offset, half *p)
Write a half to address
(p + offset)
void vstore_halfn (floatn data,
size_t offset, half *p)
void vstore_halfn_R (floatn data, Write a half vector to address
size_t offset, half *p)
(p + (offset * n))
void vstore_halfn (doublen data,
size_t offset, half *p)
Synchronization & Memory Fence Functions [6.13.8]
void vstore_halfn_R (doublen
data, size_t offset, half *p)
Write a half vector to address
(p + (offset * n))
floatn vloada_halfn (size_t offset, Read half vector data from
(p + (offset * n)). For half3,
const [constant] half *p)
read from (p + (offset * 4)).
void vstorea_halfn (floatn data,
size_t offset, half *p)
void vstorea_halfn_R (floatn data,
Write half vector data to (p +
size_t offset, half *p)
(offset * n)). For half3, write
void vstorea_halfn (doublen data, to (p + (offset * 4)).
size_t offset, half *p)
void vstorea_halfn_R (doublen
data, size_t offset, half *p)
Async Copies and Prefetch [6.13.10]
flags argument is the memory address space, set to a 0 or an OR’d combination of
CLK_X_MEM_FENCE where X may be LOCAL, GLOBAL, or IMAGE. Memory fence functions provide
ordering between memory operations of a work-item.
T is type char, charn, uchar, ucharn, short, shortn, ushort, ushortn, int, intn, uint, uintn, long,
longn, ulong, ulongn, float, floatn, optionally double or doublen (if cl_khr_fp64 is enabled), or
half or halfn (if cl_khr_fp16 is enabled).
void work_group_barrier (cl_mem_fence_flags flags
[, memory_scope scope])
event_t async_work_group_copy ( __local T *dst,
const __global T *src, size_t num_gentypes, event_t event)
event_t async_work_group_copy ( __global T *dst,
const __local T *src, size_t num_gentypes, event_t event)
Work-items in a work-group must
execute this before any can continue
void atomic_work_item_fence (cl_mem_fence_flags flags Orders loads and stores of a work[, memory_scope scope])
item executing a kernel
void sub_group_barrier (cl_mem_fence_flags flags
[, memory_scope scope])
Work-items in a sub-group must
execute this before any can continue
Atomic Functions [6.13.11]
OpenCL C Language
OpenCL 2.2 Reference Guide
OpenCL C implements a subset of the C11 atomics (see section 7.17 of the C11 specification) and
synchronization operations.
In the following tables, A refers to an atomic_* type (not including atomic_flag). C refers to
its corresponding non-atomic type. M refers to the type of the other argument for arithmetic
operations. For atomic integer types, M is C. For atomic pointer types, M is ptrdiff_t.
The type atomic_* is a 32-bit integer. atomic_long and atomic_ulong require extension
cl_khr_int64_base_atomics or cl_khr_int64_extended_atomics. The atomic_double type is available
if cl_khr_fp64 is enabled. The default scope is work_group for local atomics and all_svm_devices
for global atomics. The extensions cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics
implement atomic operations on 64-bit signed and unsigned integers to locations in __global and
__local memory.
See the table under Atomic Types and Enum Constants for information about parameter types
memory_order, memory_scope, and memory_flag.
void atomic_init(volatile A *obj, C value)
Initializes the atomic object pointed to by obj to
the value value.
void atomic_work_item_fence(
cl_mem_fence_flags flags, memory_order
order, memory_scope scope)
Effects based on value of order. flags must be
CLK_{GLOBAL, LOCAL, IMAGE}_MEM_FENCE or a
combination of these.
void atomic_store(volatile A *object, C desired)
void atomic_store_explicit(volatile A *object,
C desired, memory_order order
[ , memory_scope scope])
Atomically replace the value pointed to by object
with the value of desired. Memory is affected
according to the value of order.
event_t async_work_group_strided_copy ( __local T *dst, const __global T *src,
size_t num_gentypes, size_t src_stride, event_t event)
event_t async_work_group_strided_copy ( __global T *dst, const __local T *src,
size_t num_gentypes, size_t dst_stride, event_t event)
void wait_group_events (
int num_events, event_t *event_list)
Wait for completion of
async_work_group_copy
void prefetch (const __global T *p,
size_t num_gentypes)
Prefetch num_gentypes * sizeof(T) bytes
into global cache
bool atomic_flag_test_and_set(
volatile atomic_flag *object)
bool atomic_flag_test_and_set_explicit(
volatile atomic_flag *object,
memory_order order[ , memory_scope scope])
Atomically sets the value pointed to by object
to true. Memory is affected according to the
value of order. Returns atomically, the value of
the object immediately before the effects.
void atomic_flag_clear(volatile atomic_flag *object)
void atomic_flag_clear_explicit(
volatile atomic_flag *object,
memory_order order[ , memory_scope scope])
Atomically sets the value pointed to by object
to false. The order argument shall not be
memory_order_acquire nor
memory_order_acq_rel. Memory is affected
according to the value of order.
Values for key for atomic_fetch and modify functions
key
add
sub
or
xor
op
+
|
^
computation
addition
subtraction
bitwise inclusive or
bitwise exclusive or
key
and
min
max
C atomic_load(volatile A *object)
Atomically returns the value pointed to by
object. Memory is affected according to the
C atomic_load_explicit(volatile A *object,
memory_order order[ , memory_scope scope]) value of order.
Atomic Types and Enum Constants
C atomic_exchange(volatile A *object, C desired)
C atomic_exchange_explicit(volatile A *object,
C desired, memory_order order
[ , memory_scope scope])
memory_order
Atomically replace the value pointed to by object
with desired. Memory is affected according to
the value of order.
Atomically compares the value pointed to by
object for equality with that in expected, and
if true, replaces the value pointed to by object
with desired, and if false, updates the value
in expected with the value pointed to by object.
These operations are atomic read-modify-write
operations.
C atomic_fetch_(volatile A *object, M operand)
C atomic_fetch__explicit(volatile A *object,
M operand, memory_order order
[ , memory_scope scope])
Atomically replaces the value pointed to by
object with the result of the computation
applied to the value pointed to by object and
the given operand.
©2017 Khronos Group - Rev. 0817
op computation
&
bitwise and
min compute min
max compute max
Parameter Type Values
memory_order_relaxed
memory_order_ acq_rel
memory_scope_work_item
memory_order_acquire
memory_order_seq_cst
memory_order_release
memory_scope_work_group
memory_scope_sub_group memory_scope_all_svm_devices
memory_scope_device (default for functions that do not take a memory_scope
argument)
memory_scope
bool atomic_compare_exchange_strong(
volatile A *object, C *expected, C desired)
bool atomic_compare_exchange_strong_explicit(
volatile A *object, C *expected, C desired,
memory_order success,
memory_order failure[ , memory_scope scope])
bool atomic_compare_exchange_weak(
volatile A *object,
C *expected, C desired)
bool atomic_compare_exchange_weak_explicit(
volatile A *object, C *expected, C desired,
memory_order success,
memory_order failure[ , memory_scope scope])
Copies
num_gentypes
T elements from
src to dst
Atomic integer and floating-point types
† indicates types supported by a limited subset of atomic operations
‡ indicates size depends on whether implemented on 64-bit or 32-bit architecture.
§ indicates types supported only if extensions cl_khr_int64_base_atomics and
cl_khr_int64_extended_atomics are enabled.
atomic_int
atomic_uint
atomic_flag
atomic_long §
atomic_ulong §
atomic_float † atomic_intptr_t ‡§ atomic_size_t ‡§
atomic_double †§ atomic_uintptr_t ‡§ atomic_ptrdiff_t ‡§
Atomic macros
to a token sequence to initialize an atomic object of
#define ATOMIC_VAR_INIT(C value) Eaxpands
type that is initialization-compatible with value.
#define ATOMIC_FLAG_INIT
Initialize an atomic_flag to the clear state.
www.khronos.org/opencl
OpenCL 2.2 Reference Guide
Address Space Qualifier Functions [6.13.9]
T refers to any of the built-in data types supported by OpenCL C
or a user-defined type.
OpenCL C Language
Workgroup Functions [6.13.15]
T is type int, uint, long, ulong, or float, optionally double (if
cl_khr_fp64 is enabled) or half (if cl_khr_fp64 is enabled).
[const] global T * to_global (
[const] T *ptr)
global address space
[const] local T * to_local (
[const] T *ptr)
Returns a non-zero value if predicate evaluates to non-zero for
all or any workitems in the work-group.
local address space
[const] private T * to_private (
[const] T *ptr)
private address space
[const] cl_mem_fence_flags
get_fence( [const] T *ptr)
Memory fence value:
CLK_GLOBAL_MEM_FENCE,
CLK_LOCAL_MEM_FENCE
int work_group_all (int predicate)
int work_group_any (int predicate)
int sub_group_all (int predicate)
int sub_group_any (int predicate)
printf Function [6.13.13]
Return result of reduction operation specified by for all
values of x specified by workitems in work-group. may be
min, max, or add.
printf output synchronization
When the event associated with a particular kernel invocation
completes, the output of applicable printf calls is flushed to the
implementation-defined output stream.
printf format string
The format string follows C99 conventions and supports an
optional vector specifier:
%[flags][width][.precision][vector][length] conversion
Examples:
The following examples show the use of the vector specifier in
the printf format string.
The above two printf calls print the following:
f4 = 1.00,2.00,3.00,4.00
uc = 0xfa,0xfb,0xfc,0xfd
Tn shuffle (Tm x,
TUn mask)
Tn shuffle2 (Tm x, Tm y,
TUn mask)
Construct permutation of elements
from one or two input vectors, return
a vector with same element type as
input and length that is the same as
the shuffle mask.
Do an exclusive or inclusive scan operation specified by of
all values specified by work-items in the work-group. The scan
results are returned for each work-item. may be min, max,
or add.
int read_pipe (
__read_only pipe T p, T *ptr)
Read packet from p
into ptr.
int read_pipe (__read_only pipe T p,
reserve_id_t reserve_id,
uint index, T *ptr)
Read packet from
reserved area of the
pipe reserve_id and
index into ptr.
reserve_id_t reserve_read_pipe (
__read_only pipe T p,
uint num_packets)
reserve_id_t reserve_write_pipe (
__write_only pipe T p,
uint num_packets)
void commit_read_pipe (
__read_only pipe T p,
reserve_id_t reserve_id)
void commit_write_pipe (
__write_only pipe T p,
reserve_id_t reserve_id)
Reserve num_packets
entries for reading from
or writing to p.
Indicates that all reads
and writes to num_
packets associated with
reservation reserve_id
are completed.
int write_pipe (
Write packet specified
__write_only pipe T p, const T *ptr) by ptr to p.
uint get_pipe_max_packets (
pipe T p)
int write_pipe (
__write_only pipe T p,
reserve_id_t reserve_id,
uint index, const T *ptr)
Returns maximum
number of packets
specified when p was
created.
Write packet specified
by ptr to reserved area
reserve_id and index.
uint get_pipe_num_packets (
pipe T p)
Returns the number of
available entries in p.
bool is_valid_reserve_id (
reserve_id_t reserve_id)
Return true if reserve_id
is a valid reservation ID
and false otherwise.
void work_group_commit_read_pipe (pipe T p, reserve_id_t reserve_id)
void work_group_commit_write_pipe (pipe T p, reserve_id_t reserve_id)
void sub_group_commit_read_pipe (pipe T p, reserve_id_t reserve_id)
void sub_group_commit_write_pipe (pipe T p, reserve_id_t reserve_id)
Indicates that all reads and writes
to num_packets associated with
reservation reserve_id are completed.
reserve_id_t work_group_reserve_read_pipe (pipe T p, uint num_packets)
reserve_id_t work_group_reserve_write_pipe (pipe T p, uint num_packets)
reserve_id_t sub_group_reserve_read_pipe (pipe T p, uint num_packets)
reserve_id_t sub_group_reserve_write_pipe (pipe T p, uint num_packets)
Reserve num_packets entries for
reading from or writing to p. Returns a
valid reservation ID if the reservation
is successful.
Enqueuing and Kernel Query Built-in Functions [6.13.17]
A kernel may enqueue code represented by Block syntax, and control execution order with event
dependencies including user events and markers. There are several advantages to using the Block
syntax: it is more compact; it does not require a cl_kernel object; and enqueuing can be done as
a single semantic step. The macro CLK_NULL_EVENT refers to an invalid device event. The macro
CLK_NULL_QUEUE refers to an invalid device queue.
uint get_kernel_work_group_size (void (^block)(void))
int enqueue_kernel (queue_t queue, kernel_enqueue_flags_t flags,
const ndrange_t ndrange, void (^block)(void))
int enqueue_kernel (queue_t queue, kernel_enqueue_flags_t flags,
const ndrange_t ndrange, uint num_events_in_wait_list,
const clk_event_t *event_wait_list, clk_event_t *event_ret,
void (^block)(void))
int enqueue_kernel (queue_t queue, kernel_enqueue_flags_t flags,
const ndrange_t ndrange,
void (^block)(local void *, …), uint size0, …)
int enqueue_kernel (queue_t queue, kernel_ enqueue_flags_t flags,
const ndrange_t ndrange,
uint num_events_in_wait_list, const clk_event_t *event_wait_list,
clk_event_t *event_ret, void (^block)(local void *, …), uint size0, …)
uint get_kernel_preferred_work_group_size_multiple (
void (^block)(local void *, …))
int enqueue_marker (queue_t queue, uint num_events_in_wait_list,
const clk_event_t *event_wait_list, clk_event_t *event_ret)
©2017 Khronos Group - Rev. 0817
Allows a work-item to
enqueue a block for
execution to queue.
Work-items can enqueue
multiple blocks to a device
queue(s).
flags may be one of
CLK_ENQUEUE_FLAGS_
{NO_WAIT, WAIT_KERNEL,
WAIT_WORK_GROUP}
uint get_kernel_work_group_size (void (^block)(local void *, …))
uint get_kernel_preferred_work_group_size_multiple (
void (^block)(void))
Query the maximum workgroup size that can be
used to execute a block.
Returns the preferred
multiple of work-group
size for launch.
Enqueue a marker
command to queue.
uint get_kernel_sub_group_count_for_ndrange
(const ndrange_t ndrange, void (^block)(void))
uint get_kernel_sub_group_count_for_ndrange
(const ndrange_t ndrange, void (^block)(local void *, …))
Returns number of
subgroups in each
workgroup of the dispatch.
uint get_kernel_max_sub_group_size_for_ndrange
(const ndrange_t ndrange, void (^block)(void))
uint get_kernel_max_sub_group_size_for_ndrange
(const ndrange_t ndrange, void (^block)(local void *, …))
Returns the maximum
sub-group size for a block.
www.khronos.org/opencl
OpenCL C Language
Miscellaneous Vector Functions [6.13.12]
Tm and Tn are type charn, ucharn, shortn, ushortn, intn, uintn,
longn, ulongn, floatn, optionally doublen (if cl_khr_fp64 is
enabled) or halfn (if cl_khr_fp16 is enabled), where n is 2,4,8, or
16 except in vec_step it may also be 3. TUn is ucharn, ushortn,
uintn, or ulongn.
int vec_step (Tn a)
int vec_step (typename)
T work_group_broadcast (T a, size_t local_id)
T work_group_broadcast (T a, size_t local_id_x,
size_t local_id_y)
T work_group_broadcast (T a, size_t local_id_x,
size_t local_id_y, size_t local_id_z)
T sub_group_broadcast (T x, size_t local_id)
Pipe Built-in Functions [6.13.16.2-4]
T represents the built-in OpenCL C scalar or vector integer or
floating-point data types or any user defined type built from these
scalar and vector data types. Half scalar and vector types require
the cl_khr_fp16 extension. Double or vector double types require
the cl_khr_fp64 extension. The macro CLK_NULL_RESERVE_ID
refers to an invalid reservation ID.
float4 f = (float4)(1.0f, 2.0f, 3.0f, 4.0f);
uchar4 uc = (uchar4)(0xFA, 0xFB, 0xFC, 0xFD);
printf("f4 = %2.2v4hlf\n", f);
printf("uc = %#v4hhx\n", uc);
Takes built-in scalar or vector data type
argument. Returns 1 for scalar, 4 for
3-component vector, else number of
elements in the specified type.
Broadcast the value of a to all work-items in the work-group.
local_id must be the same value for all workitems in the workgroup.
T work_group_scan_exclusive_ (T x)
T work_group_scan_inclusive_ (T x)
T sub_group_scan_exclusive_ (T x)
T sub_group_scan_inclusive_ (T x)
T work_group_reduce_ (T x)
T sub_group_reduce_ (T x)
Writes output to an implementation-defined stream.
int printf (constant char * restrict format, …)
Page 19
Page 20
OpenCL C Language
Event Built-in Functions [6.13.17.8]
T is type int, uint, long, ulong, or float, optionally double (if cl_khr_fp64 is enabled), or half (if
cl_khr_fp16 is enabled).
void retain_event (clk_event_t event)
Increments event reference count.
void release_event (clk_event_t event)
Decrements event reference count.
clk_event_t create_user_event ()
Create a user event.
bool is_valid_event (clk_event_t event)
True for valid event.
void set_user_event_status (
clk_event_t event, int status)
Sets the execution status of a user event.
status: CL_COMPLETE or a negative error
value.
void capture_event_profiling_info (
Captures profiling information for command
clk_event_t event, clk_profiling_info name,
associated with event in value.
global void *value)
OpenCL 2.2 Reference Guide
Helper Built-in Functions [6.13.17.9]
queue_t get_default_queue (void)
Default queue or CLK_NULL_QUEUE
ndrange_t ndrange_1D (size_t global_work_size)
ndrange_t ndrange_1D (size_t global_work_size,
size_t local_work_size)
ndrange_t ndrange_1D (size_t global_work_offset,
size_t global_work_size, size_t local_work_size)
Builds a 1D ND-range
descriptor.
ndrange_t ndrange_nD (const size_t global_work_size[n])
ndrange_t ndrange_nD (size_t global_work_size,
const size_t local_work_size[n])
ndrange_t ndrange_nD (const size_t global_work_offset,
const size_t global_work_size,
const size_t local_work_size[n])
Builds a 2D or 3D ND-range descriptor.
n may be 2 or 3.
OpenCL C Language
Notes
©2017 Khronos Group - Rev. 0817
www.khronos.org/opencl
OpenCL 2.2 Reference Guide
Page 21
OpenCL Extensions Reference
Using OpenCL Extensions [9]
In this section, extensions shown in italics provide core features.
#pragma OPENCL EXTENSION extension_name : {enable |disable}
To test if an extension is supported, use
clGetPlatformInfo() or clGetDeviceInfo()
To get the address of the extension function:
clGetExtensionFunctionAddressForPlatform()
cl_apple_gl_sharing (see cl_khr_gl_sharing)
cl_khr_3d_image_writes
cl_khr_byte_addressable_store
cl_khr_context_abort
cl_khr_d3d10_sharing
cl_khr_d3d11_sharing
OpenGL, OpenGL ES Sharing [9.3 - 9.5]
These functions require the cl_khr_gl_sharing or
cl_apple_gl_sharing extension.
CL Context > GL Context, Sharegroup
cl_int clGetGLContextInfoKHR (
const cl_context_properties *properties,
cl_gl_context_info param_name,
size_t param_value_size, void *param_value,
size_t *param_value_size_ret)
param_name: CL_DEVICES_FOR_GL_CONTEXT_KHR,
CL_CURRENT_DEVICE_FOR_GL_CONTEXT_KHR
CL Buffer Objects > GL Buffer Objects
cl_mem clCreateFromGLBuffer (cl_context context,
cl_mem_flags flags, GLuint bufobj, cl_int *errcode_ret)
flags: CL_MEM_{READ_ONLY, WRITE_ONLY, READ_WRITE}
CL Image Objects > GL Textures
cl_mem clCreateFromGLTexture (cl_context context,
cl_mem_flags flags, GLenum texture_target,
GLint miplevel, GLuint texture, cl_int *errcode_ret)
flags: See clCreateFromGLBuffer
texture_target: GL_TEXTURE_{1D, 2D}[_ARRAY],
GL_TEXTURE_{3D, BUFFER, RECTANGLE},
GL_TEXTURE_CUBE_MAP_POSITIVE_{X, Y, Z},
GL_TEXTURE_CUBE_MAP_NEGATIVE_{X, Y, Z},
GL_TEXTURE_2D_MULTISAMPLE[_ARRAY] (Requires
extension cl_khr_gl_msaa_sharing)
DX9 Media Surface Sharing [9.7]
Header
Enable the extension cl_khr_dx9_media_sharing.
cl_int clGetDeviceIDsFromDX9MediaAdapterKHR (
cl_platform_id platform,
cl_uint num_media_adapters,
cl_dx9_media_adapter_type_khr*media_adapters_type,
void *media_adapters,
cl_dx9_media_adapter_set_khr media_adapter_set,
cl_uint num_entries, cl_device_id *devices,
cl_int *num_devices)
media_adapter_type:
CL_ADAPTER_{D3D9, D3D9EX, DXVA}_KHR
media_adapter_set: CL_{ALL, PREFERRED}_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR
cl_int
clEnqueue{Acquire, Release}DX9MediaSurfacesKHR(
cl_command_queue command_queue,
cl_uint num_objects, const cl_mem *mem_objects,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_khr_depth_images
cl_khr_device_enqueue_local_arg_types
cl_khr_dx9_media_sharing
cl_khr_egl_event
cl_khr_egl_image
cl_khr_fp16
cl_khr_fp64
cl_khr_gl_depth_images
cl_khr_gl_event
cl_khr_gl_msaa_sharing
cl_khr_gl_sharing
cl_khr_global_int32_base_atomics - atomic_*()
cl_khr_global_int32_extended_atomics - atomic_*()
cl_khr_icd
CL Image Objects > GL Renderbuffers
cl_mem clCreateFromGLRenderbuffer (
cl_context context, cl_mem_flags flags,
GLuint renderbuffer, cl_int *errcode_ret)
flags: See clCreateFromGLBuffer
Query Information
cl_int clGetGLObjectInfo (cl_mem memobj,
cl_gl_object_type *gl_object_type,
GLuint *gl_object_name)
*gl_object_type returns:
CL_GL_OBJECT_TEXTURE_BUFFER,
CL_GL_OBJECT_TEXTURE{1D, 2D, 3D},
CL_GL_OBJECT_TEXTURE{1D, 2D}_ARRAY,
CL_GL_OBJECT_{BUFFER, RENDERBUFFER}
cl_int clGetGLTextureInfo (cl_mem memobj,
cl_gl_texture_info param_name,
size_t param_value_size, void *param_value,
size_t *param_value_size_ret)
param_name: CL_GL_{TEXTURE_TARGET,
MIPMAP_LEVEL}, CL_GL_NUM_SAMPLES (Requires
extension cl_khr_gl_msaa_sharing)
Share Objects
cl_mem clCreateFromD3D10BufferKHR (
cl_context context, cl_mem_flags flags,
ID3D10Buffer *resource, cl_int *errcode_ret)
flags: See clCreateFromGLBuffer
cl_mem clCreateFromD3D10Texture2DKHR (
cl_context context, cl_mem_flags flags,
ID3D10Texture2D *resource, UINT subresource,
cl_int *errcode_ret)
flags: See clCreateFromD3D10BufferKHR
CL Event Objects > GL Sync Objects
cl_int clEnqueue{Acquire, Release}D3D10ObjectsKHR (
cl_ command_queue command_queue,
cl_uint num_objects, const cl_mem *mem_objects,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_event clCreateEventFromGLsyncKHR (
cl_context context, GLsync sync, cl_int *errcode_ret)
Requires the cl_khr_gl_event extension.
Direct3D 11 Sharing [9.8.7]
Header These functions require the
cl_khr_d3d11_sharing extension. For values of flags, see
clCreateFromGLBuffer.
cl_int clGetDeviceIDsFromD3D11KHR (
cl_platform_id platform,
cl_d3d11_device_source_khr d3d_device_source,
void *d3d_object,
cl_d3d11_device_set_khr d3d_device_set,
cl_uint num_entries, cl_device_id *devices,
cl_uint *num_devices)
d3d_device_source: CL_D3D11_DEVICE_KHR,
CL_D3D11_DXGI_ADAPTER_KHR
d3d_device_set: CL_ALL_DEVICES_FOR_D3D11_KHR,
CL_PREFERRED_DEVICES_FOR_D3D11_KHR
Create CL Event Objects from EGL
cl_mem clCreateFromEGLImageKHR (
cl_context context, CLeglDisplayKHR display,
CLeglImageKHR image, cl_mem_flags flags,
const cl_egl_image_properties_khr *properties,
cl_int *errcode_ret)
©2017 Khronos Group - Rev. 0817
cl_int clGetDeviceIDsFromD3D10KHR (
cl_platform_id platform,
cl_d3d10_device_source_khr d3d_device_source,
void *d3d_object,
cl_d3d10_device_set_khr d3d_device_set,
cl_uint num_entries, cl_device_id *devices,
cl_uint *num_devices)
d3d_device_source:
CL_D3D10_{DEVICE, DXGI_ADAPTER}_KHR
d3d_device_set:
CL_{ALL, PREFERRED}_DEVICES_FOR_D3D10_KHR
cl_mem clCreateFromD3D10Texture3DKHR (
cl_context context, cl_mem_flags flags,
ID3D10Texture3D *resource, UINT subresource,
cl_int *errcode_ret)
flags: See clCreateFromGLBuffer
Create CL Image Objects from EGL
cl_event clCreateEventFromEGLsyncKHR (
cl_context context, CLeglSyncKHR sync,
CLeglDisplayKHR display, cl_int *errcode_ret)
Direct3D 10 Sharing [9.6.7]
These functions require the cl_khr_d3d10_sharing extension.
The associated header file is .
cl_int clEnqueue{Acquire, Release}GLObjects (
cl_command_queue command_queue,
cl_uint num_objects, const cl_mem *mem_objects,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
EGL Interoperabililty [9.16, 9.17]
This function requires the extension cl_khr_egl_event.
cl_khr_image2d_from_buffer
cl_khr_initialize_memory
cl_khr_int64_base_atomics - atom_*()
cl_khr_int64_extended_atomics - atom_*()
cl_khr_local_int32_base_atomics - atomic_*()
cl_khr_local_int32_extended_atomics - atomic_*()
cl_khr_mipmap_image
cl_khr_mipmap_image_writes
cl_khr_priority_hints
cl_khr_srgb_image_writes
cl_khr_spir
cl_khr_subgroup_named_barrier
cl_khr_terminate_context
cl_khr_throttle_hints
These functions require the extension cl_khr_egl_image.
cl_mem clCreateFromD3D11BufferKHR (
cl_context context, cl_mem_flags flags,
ID3D11Buffer *resource, cl_int *errcode_ret)
cl_mem clCreateFromD3D11Texture3DKHR (
cl_context context, cl_mem_flags flags,
ID3D11Texture3D *resource, UINT subresource,
cl_int *errcode_ret)
cl_mem clCreateFromD3D11Texture2DKHR (
cl_context context, cl_mem_flags flags,
ID3D11Texture2D *resource,
UINT subresource, cl_int *errcode_ret)
cl_int clEnqueue{Acquire, Release}D3D11ObjectsKHR (
cl_command_queue command_queue,
cl_uint num_objects, const cl_mem *mem_objects,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list, cl_event *event)
cl_int clEnqueue{Acquire, Release}EGLObjectsKHR (
cl_command_queue command_queue,
cl_uint num_objects, const cl_mem *mem_objects,
cl_uint num_events_in_wait_list,
const cl_event *event_wait_list,
cl_event *event)
www.khronos.org/opencl
OpenCL Extensions
cl_mem clCreateFromDX9MediaSurfaceKHR (
cl_context context, cl_mem_flags flags,
cl_dx9_media_adapter_type_khr adapter_type,
void *surface_info, cl_uint plane, cl_int *errcode_ret)
flags: See clCreateFromGLBuffer
adapter_type: CL_ADAPTER_{D3D9, D3D9EX, DXVA}_KHR
Section and table references are to the OpenCL Extensions 2.1 specification.
Page 22
OpenCL 2.2 Reference Guide
Example of Enqueuing Kernels
Arguments that are a pointer type to local address space [6.13.17.2]
A block passed to enqueue_kernel can have arguments declared to be a pointer to local memory.
The enqueue_kernel built-in function variants allow blocks to be enqueued with a variable
number of arguments. Each argument must be declared to be a void pointer to local memory.
These enqueue_kernel built-in function variants also have a corresponding number of arguments
each of type uint that follow the block argument. These arguments specify the size of each local
memory pointer argument of the enqueued block.
kernel void
my_func_A_local_arg1(global int *a, local int *lptr, ...)
{
...
}
kernel void
my_func_A_local_arg2(global int *a,
local int *lptr1, local float4 *lptr2, ...)
{
...
}
kernel void
my_func_B(global int *a, ...)
{
...
ndrange_t ndrange = ndrange_1d(...);
uint local_mem_size = compute_local_mem_size();
enqueue_kernel(get_default_queue(),
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
ndrange,
^(local void *p){
my_func_A_local_arg1(a, (local int *)p, ...);},
local_mem_size);
}
kernel void
my_func_C(global int *a, ...)
{
...
ndrange_t ndrange = ndrange_1d(...);
void (^my_blk_A)(local void *, local void *) =
^(local void *lptr1, local void *lptr2){
my_func_A_local_arg2(
a,
(local int *)lptr1,
(local float4 *)lptr2, ...);};
// calculate local memory size for lptr
// argument in local address space for my_blk_A
uint local_mem_size = compute_local_mem_size();
enqueue_kernel(get_default_queue(),
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
ndrange,
my_blk_A,
local_mem_size, local_mem_size * 4);
}
©2017 Khronos Group - Rev. 0517
A Complete Example [6.13.17.3]
The example below shows how to implement an iterative algorithm where the host enqueues
the first instance of the nd-range kernel (dp_func_A). The kernel dp_func_A will launch a kernel
(evaluate_dp_work_A) that will determine if new nd-range work needs to be performed. If
new nd-range work does need to be performed, then evaluate_dp_work_A will enqueue a new
instance of dp_func_A . This process is repeated until all the work is completed.
kernel void
dp_func_A(queue_t q, ...)
{
...
// queue a single instance of evaluate_dp_work_A to
// device queue q. queued kernel begins execution after
// kernel dp_func_A finishes
if (get_global_id(0) == 0)
{
enqueue_kernel(q,
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
ndrange_1d(1),
^{evaluate_dp_work_A(q, ...);});
}
}
kernel void
evaluate_dp_work_A(queue_t q,...)
{
// check if more work needs to be performed
bool more_work = check_new_work(...);
if (more_work)
{
size_t global_work_size = compute_global_size(...);
void (^dp_func_A_blk)(void) =
^{dp_func_A(q, ...});
// get local WG-size for kernel dp_func_A
size_t local_work_size =
get_kernel_work_group_size(dp_func_A_blk);
// build nd-range descriptor
ndrange_t ndrange = ndrange_1D(global_work_size,
local_work_size);
// enqueue dp_func_A
enqueue_kernel(q,
CLK_ENQUEUE_FLAGS_WAIT_KERNEL,
ndrange,
dp_func_A_blk);
}
...
}
www.khronos.org/opencl
OpenCL 2.2 Reference Guide
Page 23
OpenCL Class Diagram
OpenCL Device Architecture Diagram
The figure below describes the OpenCL specification as a class diagram using the Unified Modeling
Language1 (UML) notation. The diagram shows both nodes and edges which are classes and their
relationships. As a simplification it shows only classes, and no attributes or operations.
The table below shows memory regions with allocation and memory access capabilities. R=Read,
W=Write
Annotations
Relationships
abstract classes
aggregations
{abstract}
inheritance
relationship
navigability
Cardinality
many
one and only one
optionally one
one or more
1
*
1
0..1
1..*
Global
Constant
Local
Private
Host
Dynamic allocation
R/W access
Dynamic allocation
R/W access
Dynamic allocation
No access
No allocation
No access
Kernel
No allocation
R/W access
Static allocation
R-only access
Static allocation
R/W access
Static allocation
R/W access
The conceptual OpenCL
device architecture diagram
shows processing elements
(PE), compute units (CU),
and devices. The host is not
shown.
Unified Modeling Language (http://www.uml.org/) is a trademark of Object Management Group (OMG).
Notes
©2017 Khronos Group - Rev. 0517
www.khronos.org/opencl
Page 24
OpenCL 2.2 Reference Guide
OpenCL Reference Card Index
The following index shows the page number for each item included in this guide. The color of the row in the table below is the color of the box to which you should refer.
A
Access Qualifiers
Address Space Qualifier functions
Address Space Qualifiers
Address Spaces
Array library
Async Copies
Atomics
Atomics library
Attribute Qualifiers
Attributes
C
C
C
C++
C++
C
C
C++
C
C++
14
19
13
6
11
18
18
6-7
13
5
B
Barrier functions
bitwise functions
Blocks
Broadcast functions
Buffer objects
C++
C++
C
C++
9
10
14
9
2
C
C Language Reference
C++ 14
C++ Language Reference
channel_ref
Class Diagram
cl[Release, Retain]CommandQueue ()
cl[Release, Retain]Context ()
cl[Release, Retain]Device ()
cl[Release, Retain]Event ()
cl[Release, Retain]Kernel ()
cl[Release, Retain]Memobject ()
cl[Release, Retain]Program ()
cl[Release, Retain]Sampler ()
clBuildProgram ()
clCloneKernel ()
clCompileProgram ()
clCreateBuffer ()
clCreateCommandQueue* ()
clCreateContext* ()
clCreateImage ()
clCreateKernel ()
clCreateKernelsInProgram ()
clCreatePipe ()
clCreateProgramWith* ()
clCreateSamplerWithProperties ()
clCreateSubBuffer ()
clCreateSubDevices ()
clCreateUserEvent ()
clEnqueueBarrierWithWaitList ()
clEnqueueCopyBuffer ()
clEnqueueCopyBufferRect ()
clEnqueueCopyBufferToImage ()
clEnqueueCopyImage ()
clEnqueueCopyImageToBuffer ()
clEnqueueFillBuffer ()
clEnqueueFillImage ()
clEnqueueMapBuffer ()
clEnqueueMapImage ()
clEnqueueMarkerWithWaitList ()
clEnqueueMigrateMemobjects ()
clEnqueueNativeKernel ()
clEnqueueNDRangeKernel ()
C
C++
C++
C++
13
5
5
12
23
1
1
1
4
4
3
3
3
3
4
3
2
1
1
2
4
4
3
3
3
2
1
4
4
2
2
2
2
2
2
2
2
2
4
3
4
4
clEnqueueReadBuffer ()
clEnqueueReadBufferRect ()
clEnqueueReadImage ()
clEnqueueSVM* ()
clEnqueueUnmapMemobject ()
clEnqueueWriteBuffer ()
clEnqueueWriteBufferRect ()
clEnqueueWriteImage ()
clFinish ()
clFlush ()
clGetCommandQueueInfo ()
clGetContextInfo ()
clGetDeviceAndHostTimer ()
clGetDeviceIDs ()
clGetDeviceInfo ()
clGetEventInfo ()
clGetEventProfilingInfo ()
clGetExtensionFunctionAddress* ()
clGetHostTimer ()
clGetImageInfo ()
clGetKernelArgInfo ()
clGetKernelInfo ()
clGetKernelSubGroupInfo ()
clGetKernelWorkGroupInfo ()
clGetMemobjectInfo ()
clGetPipeInfo ()
clGetPlatformIDs ()
clGetPlatformInfo ()
clGetProgramBuildInfo ()
clGetProgramInfo ()
clGetSamplerInfo ()
clGetSupportedImageFormats ()
clIcdGetPlatformIDsKHR ()
clLinkProgram ()
clSetDefaultDeviceCommandQueue ()
clSetEventCallback ()
clSetKernelArg ()
clSetKernelExecInfo ()
clSetMemobjectDestructorCallback ()
clSetProgramReleaseCallback ()
clSetProgramSpecializationConstant ()
clSetUserEventStatus ()
clSVMAlloc ()
clSVMFree ()
clTerminateContextKHR ()
clUnloadPlatformCompiler ()
clWaitForEvents ()
Command queues
Common functions
Common functions
Comparison functions
Compile
Compiler options
constant
Constants
Constants
Contexts
Conversions
Conversions and Type Casting
C++
C
C++
C++
C++
C
C++
C
2
2
2
3
3
2
2
2
3
3
1
1
1
1
1
4
4
1
1
2
4
4
4
4
3
3
1
1
3
3
3
2
1
3
1
4
4
4
3
3
3
4
3
3
1
3
4
1
10
17
10
3-4
4
6
12
14
1
5
13
D
Debugging options
depth images
Device Architecture Diagram
Device Enqueue
Direct3D 10 Sharing
Direct3D 11 Sharing
DX9 Media Surface Sharing
C++
C++
Ext
Ext
Ext
4
7
23
8-9
21
21
21
E
EGL Interoperabililty
Enqueue
Enqueue
Event functions
Event objects
Extensions
Ext
C++
C
C
Ext
21
8-9
19
20
4
21
F
fast
Fast 24-bit operations
Fences
Flush and Finish
Function qualifier
Function qualifier
C
C++
C++
C++
C
13
10
7
3
5
13
G
Geometric functions
Geometric functions
global
C++
C
C++
11
17
6
H-I
half
Image objects
Image query functions
Image Read and Write functions
Images
Images and Samplers library
Integer functions
Integer functions
Iterator library
C++
C
C
C++
C++
C
C++
C++
K-L
Kernel objects
Kernel query functions
Limits
Link
Linker
local
C
5
2
16
C++
11-12
C++
3-4
4
6
C
C++
C
C++
C
C++
C
Named barriers
Named Barriers for Subgroups
ndrange
ndrange
C++
Ext
C++
C
Pipe functions
Pipes
Pipes
Platform Layer
Pointer class
pragma
Prefetch
Preprocessor Directives & Macros
Preprocessor Directives & Macros
printf function
printf function
priv
Profiling operations
Program linking options
Program objects
Qualifiers
read_image*()
Reinterpreting types
Relational functions
Relational functions
Retain and release program objects
Rounding modes
Rounding modes
Runtime
21
13
C
19
3
8
1
6
13
18
5
13
11
19
6
4
4
3-4
13
C
15-16
C
C++
C++
C
C
C++
C
C++
C
C++
C++
C
C++
C++
C
5
17
11
3
5
13
1
S
C++
7-8
3
3
21
3
4
3
12
C
18
C++
Ext
C++
9
C++
12
12
5
13
T
Traits
Tuple library
Types
Types
V
15-16
W
9
21
5
20
C
R
14
4
12
14
10
14
3
7-8
N-O
Ext
P-Q
Sampler
Sampler objects
15-16 Shared Virtual Memory
7-8 SPIR 1.2 Binaries
7-8 SPIR-V specialization constants
17 SVM
10 SVM operations
12 swizzles
Synchronization & Memory Fence
functions
4
Synchronization functions
19
M
Macros
Markers, barriers, & waiting for events
Math constants
Math constants
Math functions
Math functions
Memory objects
mipmap
mipmap
OpenGL and OpenGL ES Sharing
Operators
C++
C++
C
Vector Component Addressing
Vector Component Addressing
Vector Data Load/Store
Vector Data Load/Store
Vector functions
Vector Utilities
Vector Wrapper library
Work-Item functions
Work-Item functions
Workgroup functions
Workgroup functions
write_image*()
C++
C
C
C++
C
C++
C++
6
13
18
11
19
12
12
C
14
9
9
19
C
15-16
C
C++
C++
The Khronos Group is an industry consortium creating open standards for the authoring and
acceleration of parallel computing, graphics and dynamic media on a wide variety of platforms and
devices. See www.khronos.org to learn more about the Khronos Group.
OpenCL is a trademark of Apple Inc. and is used under license by Khronos.
©2017 Khronos Group - Rev. 0517
Reference card production by Miller & Mattson www.millermattson.com
www.khronos.org/opencl
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf Linearized : No XMP Toolkit : XMP Core 5.4.0 Creator Tool : Adobe InDesign CC 2017 (Macintosh) Metadata Date : 2017:10:11 06:48-07:00 Create Date : 2017:10:11 06:47:49-07:00 Modify Date : 2017:10:11 06:48-07:00 Format : application/pdf Original Document ID : xmp.did:3195D91657CDDD11841EF19E5A285EBE History Software Agent : Adobe InDesign CC 2017 (Macintosh) History Parameters : from application/x-indesign to application/pdf History Changed : / History When : 2017:10:11 06:47:49-07:00 History Action : converted Instance ID : uuid:e4749a05-ca9b-f549-8b67-774c691e20ad Document ID : xmp.id:41716525-e2e4-4cc5-9acf-a1a6f477f7d1 Derived From Rendition Class : default Derived From Document ID : xmp.did:45cf9c02-fa63-43f1-aef1-37d1fc0455d8 Derived From Instance ID : xmp.iid:45cf9c02-fa63-43f1-aef1-37d1fc0455d8 Derived From Original Document ID: xmp.did:3195D91657CDDD11841EF19E5A285EBE Rendition Class : proof:pdf Trapped : False Producer : Adobe PDF Library 15.0 Page Count : 24 PDF Version : 1.4 Creator : Adobe InDesign CC 2017 (Macintosh)EXIF Metadata provided by EXIF.tools