Opencl22 Reference Guide

User Manual: Pdf

Open the PDF directly: View PDF .
Page Count: 24

OpenCL 2.2 Reference Guide Page 1

OpenCL API

The OpenCL Runme

API calls that manage OpenCL objects such as command-

queues, memory objects, program objects, kernel objects

for __kernel funcons in a program and calls that allow you to

enqueue commands to a command-queue such as execung a

kernel, reading, or wring a memory object.

Command queues [5.1]

cl_command_queue

clCreateCommandQueueWithProperes (

cl_context context, cl_device_id device,

const cl_command_queue_properes *properes,

cl_int *errcode_ret)

*properes: Points to a zero-terminated list of properes

and their values: [Table 5.1] CL_QUEUE_SIZE,

CL_QUEUE_PROPERTIES (biield which may be

set to an OR of CL_QUEUE_* where * may

be: OUT_OF_ORDER_EXEC_MODE_ENABLE,

PROFILING_ENABLE, ON_DEVICE[_DEFAULT]),

CL_QUEUE_THROTTLE_{HIGH, MED, LOW}_KHR

(requires the cl_khr_throle_hint extension),

CL_QUEUE_PRIORITY_KHR (biield which may be

one of CL_QUEUE_PRIORITY_HIGH_KHR,

CL_QUEUE_PRIORITY_MED_KHR,

CL_QUEUE_PRIORITY_LOW_KHR

(requires the cl_khr_priority_hints extension))

cl_int clSetDefaultDeviceCommandQueue (

cl_context context, cl_device_id device,

cl_command_queue command_queue)

cl_int clRetainCommandQueue (

cl_command_queue command_queue)

cl_int clReleaseCommandQueue (

cl_command_queue command_queue)

cl_int clGetCommandQueueInfo (

cl_command_queue command_queue,

cl_command_queue_info param_name,

size_t param_value_size, void *param_value,

size_t *param_value_size_ret)

param_name: [Table 5.2]

CL_QUEUE_CONTEXT,

CL_QUEUE_DEVICE[_DEFAULT], CL_QUEUE_SIZE,

CL_QUEUE_REFERENCE_COUNT,

CL_QUEUE_PROPERTIES

OpenCL API Reference

Secon and table references are to the OpenCL API 2.2 specicaon.

OpenCLTM (Open Compung Language) is a mul-vendor

open standard for general-purpose parallel programming of

heterogeneous systems that include CPUs, GPUs, and other

processors. OpenCL provides a uniform programming environment

for soware developers to write ecient, portable code for high-

performance compute servers, desktop computer systems, and

handheld devices.

Specicaon documents and online reference are available at

www.khronos.org/opencl.

[n.n.n] and purple text: secons and text in the OpenCL API 2.2 Spec.

[n.n.n] and green text: secons and text in the OpenCL C++ 2.2 Spec.

[n.n.n] and brown text: secons and text in the OpenCL C 2.0 Spec.

[n.n.n] and blue text: secons and text in the OpenCL Extension 2.2 Spec.

The OpenCL Plaorm Layer

The OpenCL plaorm layer implements plaorm-specic

features that allow applicaons to query OpenCL devices,

device conguraon informaon, and to create OpenCL

contexts using one or more devices. Items in blue apply only

when the appropriate extension is enabled (see Extensions on

page 21 of this reference guide).

Querying plaorm info & devices [4.1-2] [9.16.9]

cl_int clGetPlaormIDs (cl_uint num_entries,

cl_plaorm_id *plaorms, cl_uint *num_plaorms)

cl_int clIcdGetPlaormIDsKHR (cl_uint num_entries,

cl_plaorm_id * plaoms, cl_uint *num_plaorms)

cl_int clGetPlaormInfo (cl_plaorm_id plaorm,

cl_plaorm_info param_name,

size_t param_value_size, void *param_value,

size_t *param_value_size_ret)

param_name: CL_PLATFORM_{PROFILE, VERSION},

CL_PLATFORM_{NAME, VENDOR, EXTENSIONS},

CL_PLATFORM_HOST_TIMER_RESOLUTION,

CL_PLATFORM_ICD_SUFFIX_KHR [Table 4.1]

cl_int clGetDeviceIDs (cl_plaorm_id plaorm,

cl_device_type device_type, cl_uint num_entries,

cl_device_id *devices, cl_uint *num_devices)

device_type: [Table 4.2]

CL_DEVICE_TYPE_{ACCELERATOR, ALL, CPU},

CL_DEVICE_TYPE_{CUSTOM, DEFAULT, GPU}

cl_int clGetDeviceInfo (cl_device_id device,

cl_device_info param_name,

size_t param_value_size, void *param_value,

size_t *param_value_size_ret)

param_name: [Table 4.3]

CL_DEVICE_ADDRESS_BITS, CL_DEVICE_AVAILABLE,

CL_DEVICE_BUILT_IN_KERNELS,

CL_DEVICE_COMPILER_AVAILABLE,

CL_DEVICE_{DOUBLE, HALF, SINGLE}_FP_CONFIG,

CL_DEVICE_ENDIAN_LITTLE,

CL_DEVICE_EXTENSIONS,

CL_DEVICE_ERROR_CORRECTION_SUPPORT,

CL_DEVICE_EXECUTION_CAPABILITIES,

CL_DEVICE_GLOBAL_MEM_CACHE_{SIZE, TYPE},

CL_DEVICE_GLOBAL_MEM_{CACHELINE_SIZE, SIZE},

CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE,

CL_DEVICE_IL_VERSION,

CL_DEVICE_IMAGE_MAX_{ARRAY, BUFFER}_SIZE,

CL_DEVICE_IMAGE_SUPPORT,

CL_DEVICE_IMAGE2D_MAX_{WIDTH, HEIGHT},

CL_DEVICE_IMAGE3D_MAX_{WIDTH, HEIGHT,

DEPTH},

CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT,

CL_DEVICE_IMAGE_PITCH_ALIGNMENT,

CL_DEVICE_LINKER_AVAILABLE,

CL_DEVICE_LOCAL_MEM_{TYPE, SIZE},

CL_DEVICE_MAX_{CLOCK_FREQUENCY, PIPE_ARGS},

CL_DEVICE_MAX_{COMPUTE_UNITS, SAMPLERS},

CL_DEVICE_MAX_CONSTANT_{ARGS, BUFFER_SIZE},

CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE,

CL_DEVICE_MAX_{MEM_ALLOC, PARAMETER}_SIZE,

CL_DEVICE_MAX_NUM_SUB_GROUPS,

CL_DEVICE_MAX_ON_DEVICE_{QUEUES, EVENTS},

CL_DEVICE_MAX_{READ, WRITE}_IMAGE_ARGS,

CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS,

CL_DEVICE_MAX_SUB_GROUPS,

CL_DEVICE_MAX_WORK_GROUP_SIZE,

CL_DEVICE_MAX_WORK_ITEM_{DIMENSIONS, SIZES},

CL_DEVICE_MEM_BASE_ADDR_ALIGN,

CL_DEVICE_NAME,

CL_DEVICE_NATIVE_VECTOR_WIDTH_-

{CHAR, INT, DOUBLE, HALF, LONG, SHORT, FLOAT),

CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT,

CL_DEVICE_{OPENCL_C_VERSION, PARENT_DEVICE},

CL_DEVICE_PARTITION_AFFINITY_DOMAIN,

CL_DEVICE_PARTITION_MAX_SUB_DEVICES,

CL_DEVICE_PARTITION_{PROPERTIES, TYPE},

CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS,

CL_DEVICE_PIPE_MAX_PACKET_SIZE,

CL_DEVICE_{PLATFORM, PRINTF_BUFFER_SIZE},

CL_DEVICE_PREFERRED_Y_ATOMIC_ALIGNMENT

(where Y may be LOCAL, GLOBAL, PLATFORM),

CL_DEVICE_PREFERRED_VECTOR_WIDTH_Z

(where Z may be CHAR, INT, DOUBLE, HALF, LONG,

SHORT, FLOAT),

CL_DEVICE_PREFERRED_INTEROP_USER_SYNC,

CL_DEVICE_PROFILE,

CL_DEVICE_PROFILING_TIMER_RESOLUTION,

CL_DEVICE_SPIR_VERSIONS,

CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_-

PROGRESS

CL_DEVICE_QUEUE_ON_{DEVICE, HOST}_PROPERTIES,

CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE,

CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE,

CL_DEVICE_{REFERENCE_COUNT, VENDOR_ID},

CL_DEVICE_SVM_CAPABILITIES,

CL_DEVICE_TERMINATE_CAPABILITY_KHR,

CL_DEVICE_{TYPE, VENDOR},

CL_DEVICE_VENDOR_ID,

CL_{DEVICE, DRIVER}_VERSION,

CL_DEVICE_MAX_NAMED_BARRIER_COUNT_KHR

cl_int clGetDeviceAndHostTimer (cl_device_id device,

cl_ulong *device_mestamp,

cl_ulong *host_mestamp)

cl_int clGetHostTimer (cl_device_id device,

cl_ulong *host_mestamp)

Paroning a device [4.3]

cl_int clCreateSubDevices (cl_device_id in_device,

const cl_device_paron_property *properes,

cl_uint num_devices, cl_device_id *out_devices,

cl_uint *num_devices_ret)

properes: [Table 4.4] CL_DEVICE_PARTITION_EQUALLY,

CL_DEVICE_PARTITION_BY_COUNTS,

CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN

cl_int clRetainDevice (cl_device_id device)

cl_int clReleaseDevice (cl_device_id device)

Contexts [4.4]

cl_context clCreateContext (

const cl_context_properes *properes,

cl_uint num_devices, const cl_device_id *devices,

void (CL_CALLBACK*pfn_nofy)

(const char *errinfo, const void *private_info,

size_t cb, void *user_data),

void *user_data, cl_int *errcode_ret)

properes: [Table 4.5]

NULL or CL_CONTEXT_PLATFORM,

CL_CONTEXT_INTEROP_USER_SYNC,

CL_CONTEXT_{D3D10, D3D11}_DEVICE_KHR,

CL_CONTEXT_ADAPTER_{D3D9, D3D9EX}_KHR,

CL_CONTEXT_ADAPTER_DXVA_KHR,

CL_CONTEXT_MEMORY_INITIALIZE_KHR,

CL_CONTEXT_TERMINATE_KHR,

CL_GL_CONTEXT_KHR, CL_CGL_SHAREGROUP_KHR,

CL_{EGL, GLX}_DISPLAY_KHR, CL_WGL_HDC_KHR

cl_context clCreateContextFromType (

const cl_context_properes *properes,

cl_device_type device_type,

void (CL_CALLBACK *pfn_nofy)

(const char *errinfo, const void *private_info,

size_t cb, void *user_data),

void *user_data, cl_int *errcode_ret)

properes: See clCreateContext

device_type: See clGetDeviceIDs

cl_int clRetainContext (cl_context context)

cl_int clReleaseContext (cl_context context)

cl_int clGetContextInfo (cl_context context,

cl_context_info param_name,

size_t param_value_size, void *param_value,

size_t *param_value_size_ret)

param_name:

CL_CONTEXT_X where X may be REFERENCE_COUNT,

DEVICES, NUM_DEVICES, PROPERTIES,

D3D10_PREFER_SHARED_RESOURCES_KHR,

D3D11_PREFER_SHARED_RESOURCES_KHR [Table 4.6]

cl_int clTerminateContextKHR (cl_context context)

Get CL extension funcon pointers [9.2]

void* clGetExtensionFunconAddressForPlaorm (

cl_plaorm_id plaorm, const char *funcname)

OpenCL 2.2 Reference GuidePage 2

OpenCL API

Buer Objects

Elements of buer objects are stored sequenally and accessed using a pointer by a kernel execung

on a device.

Create buer objects [5.2.1]

cl_mem clCreateBuer (

cl_context context, cl_mem_ags ags, size_t size,

void *host_ptr, cl_int *errcode_ret)

ags: [Table 5.3] CL_MEM_READ_WRITE, CL_MEM_{WRITE, READ}_ONLY,

CL_MEM_HOST_NO_ACCESS, CL_MEM_HOST_{READ, WRITE}_ONLY,

CL_MEM_{USE, ALLOC, COPY}_HOST_PTR

cl_mem clCreateSubBuer (

cl_mem buer, cl_mem_ags ags, cl_buer_create_type buer_create_type,

const void *buer_create_info, cl_int *errcode_ret)

ags: See clCreateBuer

buer_create_type: CL_BUFFER_CREATE_TYPE_REGION

Read, write, copy, & ll buer objects [5.2.2-3]

cl_int clEnqueueReadBuer (

cl_command_queue command_queue, cl_mem buer, cl_bool blocking_read,

size_t oset, size_t size, void *ptr, cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueReadBuerRect (

cl_command_queue command_queue, cl_mem buer, cl_bool blocking_read,

const size_t *buer_origin, const size_t *host_origin, const size_t *region,

size_t buer_row_pitch, size_t buer_slice_pitch, size_t host_row_pitch,

size_t host_slice_pitch, void *ptr, cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueWriteBuer (

cl_command_queue command_queue, cl_mem buer, cl_bool blocking_write,

size_t oset, size_t size, const void *ptr, cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueWriteBuerRect (

cl_command_queue command_queue, cl_mem buer, cl_bool blocking_write,

const size_t *buer_origin, const size_t *host_origin, const size_t *region,

size_t buer_row_pitch, size_t buer_slice_pitch, size_t host_row_pitch,

size_t host_slice_pitch, const void *ptr, cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueFillBuer (

cl_command_queue command_queue, cl_mem buer, const void *paern,

size_t paern_size, size_t oset, size_t size, cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueCopyBuer (

cl_command_queue command_queue, cl_mem src_buer, cl_mem dst_buer,

size_t src_oset, size_t dst_oset, size_t size, cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueCopyBuerRect (

cl_command_queue command_queue, cl_mem src_buer, cl_mem dst_buer,

const size_t *src_origin, const size_t *dst_origin, const size_t *region,

size_t src_row_pitch, size_t src_slice_pitch, size_t dst_row_pitch,

size_t dst_slice_pitch, cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

Map buer objects [5.2.4]

void * clEnqueueMapBuer (

cl_command_queue command_queue, cl_mem buer, cl_bool blocking_map,

cl_map_ags map_ags, size_t oset, size_t size,

cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event,

cl_int *errcode_ret)

map_ags: CL_MAP_{READ, WRITE}, CL_MAP_WRITE_INVALIDATE_REGION

Image Objects

Items in blue apply when the appropriate extension is enabled.

Create image objects [5.3.1]

cl_mem clCreateImage (

cl_context context, cl_mem_ags ags, const cl_image_format *image_format,

const cl_image_desc *image_desc, void *host_ptr, cl_int *errcode_ret)

ags: See clCreateBuer

Query list of supported image formats [5.3.2]

cl_int clGetSupportedImageFormats (

cl_context context, cl_mem_ags ags, cl_mem_object_type image_type,

cl_uint num_entries, cl_image_format *image_formats,

cl_uint *num_image_formats)

ags: See clCreateBuer

image_type: CL_MEM_OBJECT_IMAGE{1D, 2D, 3D},

CL_MEM_OBJECT_IMAGE1D_BUFFER, CL_MEM_OBJECT_IMAGE{1D, 2D}_ARRAY

Read, write, copy, & ll image objects [5.3.3-4]

cl_int clEnqueueReadImage (

cl_command_queue command_queue, cl_mem image, cl_bool blocking_read,

const size_t *origin, const size_t *region, size_t row_pitch, size_t slice_pitch,

void *ptr, cl_uint num_events_in_wait_list, const cl_event *event_wait_list,

cl_event *event)

cl_int clEnqueueWriteImage (

cl_command_queue command_queue, cl_mem image, cl_bool blocking_write,

const size_t *origin, const size_t *region, size_t input_row_pitch,

size_t input_slice_pitch, const void *ptr, cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueFillImage (

cl_command_queue command_queue, cl_mem image, const void *ll_color,

const size_t *origin, const size_t *region,cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueCopyImage (

cl_command_queue command_queue, cl_mem src_image, cl_mem dst_image,

const size_t *src_origin, const size_t *dst_origin, const size_t *region,

cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)

Copy between image & buer objects [5.3.5]

cl_int clEnqueueCopyImageToBuer (

cl_command_queue command_queue, cl_mem src_image, cl_mem dst_buer,

const size_t *src_origin, const size_t *region, size_t dst_oset,

cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueCopyBuerToImage (

cl_command_queue command_queue, cl_mem src_buer, cl_mem dst_image,

size_t src_oset, const size_t *dst_origin, const size_t *region,

cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)

Map and unmap image objects [5.3.6]

void * clEnqueueMapImage (

cl_command_queue command_queue, cl_mem image, cl_bool blocking_map,

cl_map_ags map_ags, const size_t *origin, const size_t *region,

size_t *image_row_pitch, size_t *image_slice_pitch, cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event, cl_int *errcode_ret)

map_ags: CL_MAP_{READ, WRITE}, CL_MAP_WRITE_INVALIDATE_REGION

Query image objects [5.3.7]

cl_int clGetImageInfo (

cl_mem image,

cl_image_info param_name, size_t param_value_size,

void *param_value, size_t *param_value_size_ret)

param_name: [Table 5.10] CL_IMAGE_FORMAT, CL_IMAGE_{ARRAY, ELEMENT}_SIZE,

CL_IMAGE_{ROW, SLICE}_PITCH, CL_IMAGE_{HEIGHT, WIDTH, DEPTH},

CL_IMAGE_NUM_{SAMPLES, MIP_LEVELS},

CL_IMAGE_DX9_MEDIA_PLANE_KHR,

CL_IMAGE_{D3D10, D3D11}_SUBRESOURCE_KHR

Image Formats [5.3.1.1]

Supported combinaons of image_channel_order and image_channel_data_type.

Built-in support [Table 5.8]

CL_R (read or write): CL_[HALF_]FLOAT, CL_UNORM_INT{8,16}, CL_SNORM_INT{8,16},

CL_SIGNED_INT{8,16,32}, CL_UNSIGNED_INT{8,16,32}

CL_DEPTH (read or write): CL_FLOAT, CL_UNORM_INT16

CL_DEPTH_STENCIL (read only): CL_FLOAT, CL_UNORM_INT24

(Requires the extension cl_khr_gl_depth_images)

CL_RG (read or write): CL_[HALF_]FLOAT, CL_UNORM_INT{8,16}, CL_SNORM_INT{8,16},

CL_SIGNED_INT{8,16,32}, CL_UNSIGNED_INT{8,16,32}

CL_RGBA (read or write): CL_[HALF_]FLOAT, CL_UNORM_INT{8,16}, CL_UNORM_INT_101010_2,

CL_SNORM_INT{8,16}, CL_SIGNED_INT{8,16,32}, CL_UNSIGNED_INT{8,16,32}

CL_BGRA (read or write): CL_UNORM_INT8

CL_sRGBA (read only): CL_UNORM_INT8

(Requires the extension cl_khr_srgb_image_writes)

Supported image channel order values [Table 5.6]

CL_R, CL_A (read and write): CL_[HALF_]FLOAT, CL_UNORM_INT{8,16},

CL_SIGNED_INT{8,16,32}, CL_UNSIGNED_INT{8,16,32}, CL_SNORM_INT{8,16}

CL_INTENSITY: CL_[HALF_]FLOAT, CL_UNORM_INT{8,16}, CL_SNORM_INT{8|16}

CL_DEPTH_STENCIL: Only used if extension cl_khr_gl_depth_images is enabled and

channel data type = CL_UNORM_INT24 or CL_FLOAT

CL_LUMINANCE: CL_UNORM_INT{8,16}, CL_[HALF_]FLOAT, CL_SNORM_INT{8,16}

CL_RG, CL_RA: CL_[HALF_]FLOAT, CL_UNORM_INT{8,16}, CL_SIGNED_INT{8,16, 32} ,

CL_UNSIGNED_INT{8,16,32}, CL_SNORM_INT{8,16}

CL_RGB: CL_UNORM_SHORT_{555,565} , CL_UNORM_INT_101010

CL_ARGB: CL_UNORM_INT8, CL_SIGNED_INT8,

CL_UNSIGNED_INT8, CL_SNORM_INT8

CL_BGRA: CL_{SIGNED, UNSIGNED}_INT8, CL_SNORM_INT8

OpenCL 2.2 Reference Guide Page 3

OpenCL API

Shared Virtual Memory [5.6]

Shared Virtual Memory (SVM) allows the host and kernels

execung on devices to directly share complex, pointer-

containing data structures such as trees and linked lists.

SVM sharing granularity

void* clSVMAlloc (

cl_context context, cl_svm_mem_ags ags,

size_t size, cl_uint alignment)

ags: [Table 5.14]

CL_MEM_READ_WRITE,

CL_MEM_{WRITE, READ}_ONLY,

CL_MEM_SVM_FINE_GRAIN_BUFFER,

CL_MEM_SVM_ATOMICS

void clSVMFree (cl_context context, void *svm_pointer)

Enqueuing SVM operaons

cl_int clEnqueueSVMFree (

cl_command_queue command_queue,

cl_uint num_svm_pointers, void *sym_pointers[],

void (CL_CALLBACK*pfn_free_func)(

cl_command_queue command_queue,

cl_uint num_svm_pointers,

void *sym_pointers[], void *user_data),

void *user_data, cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueSVMMemcpy (

cl_command_queue command_queue,

cl_bool blocking_copy, void *dst_ptr,

const void *src_ptr, size_t size,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueSVMMemFill (

cl_command_queue command_queue,

void *svm_ptr, const void *paern,

size_t paern_size, size_t size,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueSVMMap (

cl_command_queue command_queue,

cl_bool blocking_map, cl_map_ags map_ags,

void *svm_ptr, size_t size,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueSVMUnmap (

cl_command_queue command_queue,

void *svm_ptr, cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueSVMMigrateMem (

cl_command_queue command_queue,

cl_uint num_svm_pointers, const void **svm_pointers,

const size_t *sizes, cl_mem_migraon_ags ags,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)



Program Objects

An OpenCL program consists of a set of kernels that are

idened as funcons declared with the __kernel qualier in

the program source.

Create program objects [5.8.1]

cl_program clCreateProgramWithSource (

cl_context context, cl_uint count, const char **strings,

const size_t *lengths, cl_int *errcode_ret)

cl_program clCreateProgramWithIL (cl_context context,

const void *il, size_t length, cl_int *errcode_ret)

cl_program clCreateProgramWithBinary (

cl_context context, cl_uint num_devices,

const cl_device_id *device_list, const size_t *lengths,

const unsigned char **binaries,

cl_int *binary_status, cl_int *errcode_ret)

cl_program clCreateProgramWithBuiltInKernels (

cl_context context, cl_uint num_devices,

const cl_device_id *device_list,

const char *kernel_names, cl_int *errcode_ret)

Retain and release program objects [5.8.2]

cl_int clRetainProgram (cl_program program)

cl_int clReleaseProgram (cl_program program)

cl_int clSetProgramReleaseCallback

(cl_program program, void (CL_CALLBACK*pfn_nofy)

(cl_program prog, void *user_data),

void *user_data)

Set SPIR-V specializaon constants [5.8.3]

cl_int clSetProgramSpecializaonConstant (

cl_program program, cl_uint spec_id, size_t spec_size,

const void *spec_value)

Building program executables [5.8.4]

cl_int clBuildProgram (cl_program program,

cl_uint num_devices, const cl_device_id *device_list,

const char *opons, void (CL_CALLBACK*pfn_nofy)

(cl_program program, void *user_data),

void *user_data)

Separate compilaon and linking [5.8.5]

cl_int clCompileProgram (cl_program program,

cl_uint num_devices, const cl_device_id *device_list,

const char *opons, cl_uint num_input_headers,

const cl_program *input_headers,

const char **header_include_names,

void (CL_CALLBACK*pfn_nofy)

(cl_program program, void *user_data),

void *user_data)

cl_program clLinkProgram (cl_context context,

cl_uint num_devices, const cl_device_id *device_list,

const char *opons, cl_uint num_input_programs,

const cl_program *input_programs,

void (CL_CALLBACK*pfn_nofy)

(cl_program program, void *user_data),

void *user_data, cl_int *errcode_ret)

Unload the OpenCL compiler [5.8.8]

cl_int clUnloadPlaormCompiler (

cl_plaorm_id plaorm)

Query program objects [5.8.9]

cl_int clGetProgramInfo (cl_program program,

cl_program_info param_name,

size_t param_value_size, void *param_value,

size_t *param_value_size_ret)

param_name: [Table 5.17]

CL_PROGRAM_{IL, REFERENCE_COUNT},

CL_PROGRAM_{CONTEXT, NUM_DEVICES, DEVICES},

CL_PROGRAM_{SOURCE, BINARY_SIZES, BINARIES},

CL_PROGRAM_{NUM_KERNELS, KERNEL_NAMES},

CL_PROGRAM_SCOPE_GLOBAL_{C,D}TORS_PRESENT

cl_int clGetProgramBuildInfo (

cl_program program, cl_device_id device,

cl_program_build_info param_name,

size_t param_value_size, void *param_value,

size_t *param_value_size_ret)

param_name: [Table 5.18]

CL_PROGRAM_BINARY_TYPE,

CL_PROGRAM_BUILD_{STATUS, OPTIONS, LOG},

CL_PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE

(Connued on next page >)

Memory Objects

A memory object is a handle to a reference counted region

of global memory. Includes buer objects, image objects,

and pipe objects. Items in blue apply when the appropriate

extension is enabled.

Memory objects [5.5.1, 5.5.2]

cl_int clRetainMemObject (cl_mem memobj)

cl_int clReleaseMemObject (cl_mem memobj)

cl_int clSetMemObjectDestructorCallback (

cl_mem memobj, void (CL_CALLBACK *pfn_nofy)

(cl_mem memobj, void *user_data),

void *user_data)

cl_int clEnqueueUnmapMemObject (

cl_command_queue command_queue,

cl_mem memobj, void *mapped_ptr,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

Migrate memory objects [5.5.4]

cl_int clEnqueueMigrateMemObjects (

cl_command_queue command_queue,

cl_uint num_mem_objects,

const cl_mem *mem_objects,

cl_mem_migraon_ags ags,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

ags: CL_MIGRATE_MEM_OBJECT_HOST,

CL_MIGRATE_MEM_OBJECT_CONTENT_UNDEFINED

Query memory object [5.5.5]

cl_int clGetMemObjectInfo (cl_mem memobj,

cl_mem_info param_name, size_t param_value_size,

void *param_value, size_t *param_value_size_ret)

param_name: [Table 5.13]

CL_MEM_{TYPE, FLAGS, SIZE, HOST_PTR},

CL_MEM_CONTEXT, CL_MEM_OFFSET,

CL_MEM_{MAP, REFERENCE}_COUNT,

CL_MEM_ASSOCIATED_MEMOBJECT,

CL_MEM_USES_SVM_ POINTER,

CL_MEM_{D3D10, D3D11}_RESOURCE_KHR,

CL_MEM_DX9_MEDIA_ADAPTER_TYPE_KHR,

CL_MEM_DX9_MEDIA_SURFACE_INFO_KHR

Pipes

A pipe is a memory object that stores data organized as a FIFO.

Pipe objects can only be accessed using built-in funcons that

read from and write to a pipe. Pipe objects are not accessible

from the host.

Create pipe objects [5.4.1]

cl_mem clCreatePipe (cl_context context,

cl_mem_ags ags, cl_uint pipe_packet_size,

cl_uint pipe_max_packets,

const cl_pipe_properes *properes,

cl_int *errcode_ret)

ags: 0 or CL_MEM_{READ, WRITE}_ONLY,

CL_MEM_{READ_WRITE, HOST_NO_ACCESS}

Pipe object queries [5.4.2]

cl_int clGetPipeInfo (cl_mem pipe,

cl_pipe_info param_name, size_t param_value_size,

void *param_value, size_t *param_value_size_ret)

param_name:

CL_PIPE_PACKET_SIZE, CL_PIPE_MAX_PACKETS

Sampler Objects [5.7]

Items in blue require the cl_khr_mipmap_image extension.

cl_sampler

clCreateSamplerWithProperes (cl_context context,

const cl_sampler_properes *sampler_properes,

cl_int *errcode_ret)

sampler_properes: [Table 5.15]

CL_SAMPLER_NORMALIZED_COORDS,

CL_SAMPLER_{ADDRESSING, FILTER}_MODE,

CL_SAMPLER_MIP_FILTER_MODE,

CL_SAMPLER_LOD_{MIN, MAX}

cl_int clRetainSampler (cl_sampler sampler)

cl_int clReleaseSampler (cl_sampler sampler)

cl_int clGetSamplerInfo (cl_sampler sampler,

cl_sampler_info param_name,

size_t param_value_size, void *param_value,

size_t *param_value_size_ret)

param_name: CL_SAMPLER_REFERENCE_COUNT,

CL_SAMPLER_{CONTEXT, FILTER_MODE},

CL_SAMPLER_ADDRESSING_MODE,

CL_SAMPLER_NORMALIZED_COORDS [Table 5.16]

Sampler declaraon elds [6.13.14.1]

The sampler can be passed as an argument to the kernel using

clSetKernelArg, or declared in the outermost scope of kernel

funcons, or it can be a constant variable of type sampler_t

declared in the program source.

const sampler_t <sampler-name> =

<normalized-mode> | <address-mode> | <lter-mode>

normalized-mode:

CLK_NORMALIZED_COORDS_{TRUE, FALSE}

address-mode:

CLK_ADDRESS_X, where X may be NONE, REPEAT,

CLAMP, CLAMP_TO_EDGE, MIRRORED_REPEAT

lter-mode: CLK_FILTER_NEAREST, CLK_FILTER_LINEAR

Flush and Finish [5.15]

cl_int clFlush (cl_command_queue command_queue)

cl_int clFinish (cl_command_queue command_queue)

OpenCL 2.2 Reference GuidePage 4

OpenCL API

OpenCL 2.2 Reference GuidePage 4

OpenCL API



Event Objects

Event objects can be used to refer to a kernel execuon

command, and read, write, map, and copy commands on

memory objects or user events.

Event objects [5.11]

cl_event clCreateUserEvent (

cl_context context, cl_int *errcode_ret)

cl_int clSetUserEventStatus (

cl_event event, cl_int execuon_status)

cl_int clWaitForEvents (cl_uint num_events,

const cl_event *event_list)

cl_int clGetEventInfo (cl_event event,

cl_event_info param_name, size_t param_value_size,

void *param_value, size_t *param_value_size_ret)

param_name: [Table 5.24]

CL_EVENT_COMMAND_{QUEUE, TYPE},

CL_EVENT_{CONTEXT, REFERENCE_COUNT},

CL_EVENT_COMMAND_EXECUTION_STATUS

cl_int clRetainEvent (cl_event event)

cl_int clReleaseEvent (cl_event event)

cl_int clSetEventCallback (cl_event event,

cl_int command_exec_callback_type,

void (CL_CALLBACK *pfn_event_nofy)

(cl_event event, cl_int event_command_exec_status,

void *user_data), void *user_data)

Markers, barriers, & waing for events [5.12]

cl_int clEnqueueMarkerWithWaitList (

cl_command_queue command_queue,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueBarrierWithWaitList (

cl_command_queue command_queue,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

Proling operaons [5.14]

cl_int clGetEventProlingInfo (cl_event event,

cl_proling_info param_name,

size_t param_value_size, void *param_value,

size_t *param_value_size_ret)

param_name: [Table 5.25]

CL_PROFILING_COMMAND_{COMPLETE, QUEUED},

CL_PROFILING_COMMAND_{SUBMIT, START, END}

cl_int clGetKernelSubGroupInfo (

cl_kernel kernel, cl_device_id device,

cl_kernel_sub_group_info param_name,

size_t input_value_size, const void *input_value,

size_t param_value_size, void *param_value,

size_t *param_value_size_ret)

param_name: [Table 5.22]

CL_KERNEL_LOCAL_SIZE_FOR_SUB_GROUP_COUNT,

CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE,

CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE

Execute kernels [5.10]

cl_int clEnqueueNDRangeKernel (

cl_command_queue command_queue,

cl_kernel kernel, cl_uint work_dim,

const size_t *global_work_oset,

const size_t *global_work_size,

const size_t *local_work_size,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

cl_int clEnqueueNaveKernel (

cl_command_queue command_queue,

void (CL_CALLBACK *user_func)(void *), void *args,

size_t cb_args, cl_uint num_mem_objects,

const cl_mem *mem_list, const void **args_mem_loc,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

Kernel Objects

A kernel object encapsulates the specic __kernel funcon and

the argument values to be used when execung it.

Create kernel objects [5.9.1]

cl_kernel clCreateKernel (cl_program program,

const char *kernel_name, cl_int *errcode_ret)

cl_int clCreateKernelsInProgram (cl_program program,

cl_uint num_kernels, cl_kernel *kernels,

cl_uint *num_kernels_ret)

cl_int clRetainKernel (cl_kernel kernel)

cl_int clReleaseKernel (cl_kernel kernel)

Kernel arguments and queries [5.9.2-4]

cl_int clSetKernelArg (cl_kernel kernel,

cl_uint arg_index, size_t arg_size,

const void *arg_value)

cl_int clSetKernelArgSVMPointer (cl_kernel kernel,

cl_uint arg_index, const void *arg_value)

cl_int clSetKernelExecInfo (cl_kernel kernel,

cl_kernel_exec_info param_name,

size_t param_value_size, const void *param_value)

param_name: CL_KERNEL_EXEC_INFO_SVM_PTRS,

CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM

cl_kernel clCloneKernel (cl_kernel source_kernel,

cl_int *errcode_ret)

cl_int clGetKernelInfo (cl_kernel kernel,

cl_kernel_info param_name,

size_t param_value_size, void *param_value,

size_t *param_value_size_ret)

param_name: [Table 5.20]

CL_KERNEL_{FUNCTION_NAME, NUM_ARGS},

CL_KERNEL_REFERENCE_COUNT,

CL_KERNEL_{ATTRIBUTES, CONTEXT, PROGRAM}

cl_int clGetKernelWorkGroupInfo (cl_kernel kernel,

cl_device_id device,

cl_kernel_work_group_info param_name,

size_t param_value_size, void *param_value,

size_t *param_value_size_ret)

param_name: CL_KERNEL_GLOBAL_WORK_SIZE,

CL_KERNEL_[COMPILE_]WORK_GROUP_SIZE,

CL_KERNEL_{COMPILE, MAX}_NUM_SUB_GROUPS,

CL_KERNEL_{LOCAL, PRIVATE}_MEM_SIZE,

CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE

cl_int clGetKernelArgInfo (cl_kernel kernel,

cl_uint arg_indx, cl_kernel_arg_info param_name,

size_t param_value_size, void *param_value,

size_t *param_value_size_ret)

param_name: [Table 5.23] CL_KERNEL_ARG_NAME,

CL_KERNEL_ARG_{ACCESS, ADDRESS}_QUALIFIER,

CL_KERNEL_ARG_TYPE_{NAME, QUALIFIER}

Debugging opons:

-g Generate addional errors for built-in funcons

that allow you to enqueue commands on a device

SPIR binary opons:

Requires the cl_khr_spir extension.

-x spir Indicate that binary is in SPIR format

-spir-std=x x is SPIR spec version, e.g.: 1.2

Double and half-precision oang-point in C++:

-cl-fp16-enable Enable full half data type support

cl_khr_fp16 macro.

-cl-fp64-enable Enable full half data type support

cl_khr_fp64 macro.

Linker opons [5.8.7]

Library linking opons:

-create-library

-enable-link-opons

Program linking opons:

-cl-denorms-are-zero -cl-no-signed-zeroes

-cl-nite-math-only -cl-fast-relaxed-math

-cl-unsafe-math-opmizaons

Program Objects (connued)

Compiler opons [5.8.6]

Preprocessor:

(-D processed in order for clBuildProgram or

clCompileProgram)

-D name -D name=denion -I dir

Math intrinsics:

-cl-single-precision-constant

-cl-denorms-are-zero

-cl-fp32-correctly-rounded-divide-sqrt

Opmizaon opons:

-cl-opt-disable -cl-mad-enable

-cl-no-signed-zeros -cl-nite-math-only

-cl-unsafe-math-opmizaons -cl-fast-relaxed-math

-cl-uniform-work-group-size

Warning request/suppress:

-w -Werror

Control OpenCL C and C++ language version:

-cl-std=CL1.1 OpenCL 1.1 specicaon

-cl-std=CL1.2 OpenCL 1.2 specicaon

-cl-std=CL2.0 OpenCL 2.0 specicaon

-cl-std=C++ OpenCL C++ specicaon

Query kernel argument informaon:

-cl-kernel-arg-info

Summary of SVM opons in OpenCL [3.3.3, Table 3-2]

SVM Granularity of sharing Memory allocaon Mechanisms to enforce consistency Explicit updates between host and device?

Non-SVM buers OpenCL Memory objects (buer) clCreateBuer Host synchronizaon points on the same

or between devices. Yes, through Map and Unmap commands.

Coarse-Grained buer SVM OpenCL Memory objects (buer) clSVMAlloc Host synchronizaon points between

devices Yes, through Map and Unmap commands.

Fine Grained buer SVM Bytes within OpenCL Memory objects (buer) clSVMAlloc Synchronizaon points plus atomics (if

supported) No

Fine-Grained system SVM Bytes within Host memory (system) Host memory allocaon

mechanisms (e.g. malloc)

Synchronizaon points plus atomics (if

supported) No

Memory Model: SVM

[3.3.3]

OpenCL extends the global memory region into host memory through a

shared virtual memory (SVM) mechanism. Three types of SVM in OpenCL:

• Coarse-Grained buer SVM: (Required) Sharing at the granularity of regions of OpenCL buer memory objects.

• Fine-Grained buer SVM: (Oponal) Sharing occurs at the granularity of individual loads/stores into bytes within

OpenCL buer memory objects.

• Fine-Grained system SVM: Sharing occurs at the granularity of individual loads/stores into bytes occurring

anywhere within the host memory.

OpenCL 2.2 Reference Guide Page 5

OpenCL C++ Language

OpenCL C++ and C++ 14

The OpenCL C++ programming language is based on the

ISO/IEC JTC1 SC22 WG21 N3690 language (a.k.a. C++14)

specicaon with specic restricons and excepons.

Secon numbers denoted here with § refer to the C++ 14

specicaon.

• Implicit conversions for pointer types follow the rules

described in the C++ 14 specicaon.

• Conversions between integer types follow the conversion

rules specied in the C++14 specicaon except for

specic out-of-range behavior and saturated conversions.

• The preprocessing direcves dened by the C++14

specicaon are supported.

• Macro names dened by the C++14 specicaon but not

currently supported by OpenCL are reserved for future

use.

• OpenCL C++ standard library implements modied version

of the C++ 14 numeric limits library.

• OpenCL C++ implements the following parts of the C++ 14

iterator library: Primives, iterator operaons, predened

iterators, and range access.

• The OpenCL C++ kernel language doesn’t support variadic

funcons and variable length arrays.

• OpenCL C++ library implements most of the C++14 tuples

except for allocator related traits (§ 20.4.2.8).

• OpenCL C++ supports type traits dened in the C++ 14

specicaon with addions and changes to the following:

 UnaryTypeTraits (§ 3.15.1)

 BinaryTypeTraits (§ 3.15.2)

 TransformaonTraits (§ 3.15.3)

• OpenCL C++ standard library implements most C++ 14

tuples excluding allocator related traits.

• C++14 features not supported by OpenCL C++:

 the dynamic_cast operator (§ 5.2.7)

 type idencaon (§ 5.2.8)

 recursive funcon calls (§ 5.2.2, item 9) unless they are a

compile-me constant expression

 non-placement new and delete operators (§ 5.3.4, 5.3.5)

 goto statement (§ 6.6)

 register and thread_local storage qualiers (§ 7.1.1)

 virtual funcon qualier (§ 7.1.2)

 funcon pointers (§ 8.3.5, 8.5.3) unless they are a

compile-me constant expression

 virtual funcons and abstract classes (§ 10.3, 10.4)

 excepon handling (§ 15)

 the C++ standard library (§ 17 … 30)

 asm declaraon (§ 7.4)

 no implicit lambda to funcon pointer conversion (§ 5.1.2)

OpenCL C++ Language Reference

Secon and table references are to the OpenCL 2.2 C++ Language specicaon.

Qualiers and Oponal Aributes

Funcon Qualier [2.6.1]

__kernel, kernel

Type and Variable Aributes [2.8]

[[cl::aligned(X)]] [[cl::aligned]]

Species a minimum alignment (in bytes) for variables of the

specied type.

[[cl::packed]]

Species that each member of the structure or union is placed to

minimize the memory required.

Kernel Funcon Aributes [2.8.3]

[[cl::work_group_size_hint(X, Y, Z)]]

A hint to the compiler to specify the value most likely

to be specied by the local_work_size argument to

clEnqueueNDRangeKernel.

[[cl::required_work_group_size(X, Y, Z)]]

The work-group size that must be used as the local_work_size

argument to clEnqueueNDRangeKernel.

[[cl::required_num_sub_groups(X)]]

The number of sub-groups that must be generated by a kernel

launch.

[[cl::vec_type_hint(<type>)]]

A hint to the compiler as a representaon of the computaonal

width of the kernel.

Kernel Parameter Aribute [2.8.4]

[[cl::max_size(n)]]

The value of the aribute species the maximum size in bytes of

the corresponding memory object.

Loop Aributes [2.8.5]

[[cl::unroll_hint(n)]] [[cl::unroll_hint]]

Used to specify that a loop (for, while, and do loops) can be

unrolled.

[[cl::ivdep(len)]] [[cl::ivdep]]

A hint to indicate that the compiler may assume there are

no memory dependencies across loop iteraons in order to

autovectorize consecuve iteraons of loop.

Conversions and Reinterpretaon

Header <opencl_convert>

Conversion types [3.2]

Conversions are available for the scalar types bool, char,

uchar, short, ushort, int, uint, long, ulong, half (if cl_khr_fp16

extension is enabled), oat, double (if cl_khr_fp64 is enabled),

and derived vector types.

template <class T, rounding_mode rmode, class U>

T convert_cast(U const& arg);

template <class T, rounding_mode rmode>

T convert_cast(T const& arg);

// and more...

Rounding modes [3.2.3]

::rte to nearest even ::rtz toward zero

::rtp toward + innity ::rtn toward - innity

Reinterpreng types [3.3]

Header <opencl_reinterpret>

Supported data types except bool and void may be

reinterpreted as another data type of the same size using the

as_type funcon for scalar and vector data types.

template <class T, class U>

T as_type(U const& arg);

Preprocessor Direcves & Macros [2.7]

#pragma OPENCL FP_CONTRACT on-o-switch

on-o-switch: ON, OFF, or DEFAULT

#pragma OPENCL EXTENSION extensionname : behavior

#pragma OPENCL EXTENSION all : behavior

__FILE__ Current source le

__LINE__ Integer line number

__OPENCL_CPP_VERSION__ Integer version number, e.g: 100

__func__ Current funcon name

Supported Data Types [3.1]

Header <opencl_def>

cl_* types have exactly the same size as their host counterparts

dened in <cl_plaorm.h> le. Half types require cl_khr_fp16.

Double types require that cl_khr_fp64 be enabled and that

CL_DEVICE_DOUBLE_FP_CONFIG is not zero.

Built-in scalar data types

OpenCL Type API Type Descripon

bool -- true (1) or false (0)

char cl_char 8-bit signed

unsigned char, uchar cl_uchar 8-bit unsigned

short cl_short 16-bit signed

unsigned short, ushort cl_ushort 16-bit unsigned

int cl_int 32-bit signed

unsigned int, uint cl_uint 32-bit unsigned

long cl_long 64-bit signed

unsigned long, ulong cl_ulong 64-bit unsigned

oat cl_oat 32-bit oat

double cl_double 64-bit IEEE 754

half cl_half 16-bit oat (storage only)

void void empty set of values

Built-in vector data types

n is 2, 3, 4, 8, or 16. The halfn vector data type is required to be

supported as a data storage format.

OpenCL Type API Type Descripon

bool

[u]char

cl_[u]charn 8-bit [un]signed

[u]short

cl_ [u]shortn 16-bit [un]signed

[u]int

cl_ [u]intn 32-bit [un]signed

[u]long

cl_ [u]longn 64-bit [un]signed

oat

cl_oatn32-bit oat

double

cl_doublen64-bit oat

half

cl_ halfn16-bit oat

Other types

[3.7.1, 3.8.1]

Header <opencl_image>

Image and sampler types require CL_DEVICE_IMAGE_SUPPORT

is CL_TRUE. See header <opencl_pipe> for pipe type. See

header <opencl_device_queue> for device_queue type.

Type in OpenCL C++ API type for applicaon

cl::sampler cl_sampler

cl::image[1d, 2d, 3d]

cl::image1d_[buer, array]

cl::image2d_ms

cl::image2d_array[_ms]

cl::image2d_depth[_ms]

cl::image2d_array_depth[_ms]

cl_image

cl::pipe cl_pipe

cl::device_queue cl_queue

half wrapper [3.6.1]

Header <opencl_half> OpenCL C++ implements a wrapper

class for the built-in half data type. The class methods perform

implicit vload_half and vstore_half operaons from Vector Data

Load and Store Funcons secon.

fp16(const half &r) noexcept; Constructs an object with a

half built-in type.

fp16(const oat &r) noexcept; Constructs an object with a

oat built-in type.

fp16(const double &r) noexcept; Constructs an object with a

double built-in type.

ndrange [3.13.6]

Header <opencl_device_queue> The ndrange type is used to

represent the size of the enqueued workload with a dimension

from 1 to 3.

struct ndrange {

explicit ndrange(size_t global_work_size) noexcept;

ndrange(size_t global_work_size, size_t local_work_size

noexcept;

ndrange(size_t global_work_oset, size_t global_work_size,

size_t local_work_size) noexcept;

template <size_t N>

ndrange(const size_t (&global_work_size)[N]) noexcept;

template <size_t N>

ndrange(const size_t (&global_work_size)[N],

const size_t (&global_work_size)[N]) noexcept;

template <size_t N>

ndrange(const size_t (&global_work_oset)[N],

const size_t (&global_work_size)[N],

const size_t (&global_work_size)[N]) noexcept;

};

Example

#include <opencl_device_queue>

#include <opencl_work_item>

using namespace cl;

kernel void foo(device_queue q) {

q.enqueue_kernel(cl::enqueue_policy::no_wait, cl::ndrange( 1 ),

[](){ uint d = get_global_id(0); } );

}

OpenCL 2.2 Reference GuideOpenCL C++ LanguagePage 6

OpenCL C++ Language



Address Spaces Library

Header <opencl_memory>

Explicit address space storage classes [3.4.2]

global<T> class

Can only be used to declare variables at program scope, with

stac specier, extern specier, or passed as a kernel argument.

local<T> class

Can only be used to declare variables at kernel funcon scope,

program scope, with stac keyword, extern specier, or passed

as a kernel argument.

priv<T> class

Cannot be used to declare variables in the program scope, with

stac specier, or extern specier.

constant<T> class

Can only be used to declare variables at program scope, with

stac specier, extern specier, or passed as a kernel argument.

Explicit address space pointer classes [3.4.3]

The explicit address space pointer classes can be converted to

and from pointers with compable address spaces, qualiers,

and types. Local, global, and private pointers can be converted to

standard C++ pointers.

typedef T element_type;

typedef ptrdi_t dierence_type;

typedef add_global_t<T>& reference;

typedef const add_global_t<T>& const_reference;

typedef add_global_t<T>* pointer;

typedef const add_global_t<T>* const_pointer;

The following pointer classes are dened in the header le

<opencl_memory>:

template <class T> class global_ptr

template <class T> class local_ptr

template <class T> class private_ptr

template <class T> class constant_ptr

Non-member funcons [3.4.3.9]

In each of the paral declaraons below, the placeholder Q may

be replaced with global, local, private, or constant. The omied

inial part of each declaraon is:

template<class T, class U>

bool operator==(const

_ptr<T> &a, const

_ptr &b)

noexcept;

bool operator!=(const OP_ptr<T> &a, const

_ptr &b)

noexcept;

bool operator<(const

_ptr<T> &a, const

_ptr &b)

noexcept;

bool operator>(const

_ptr<T> &a, const

_ptr &b)

noexcept;

bool operator<=(const

_ptr<T> &a, const

_ptr &b)

noexcept;

bool operator>=(const

_ptr<T> &a, const

_ptr &b)

noexcept;

In each of the paral declaraons below, the omied inial part

of the declaraon is:

template<class T>

bool operator==(const Q_ptr<T> &x, nullptr_t) noexcept;

bool operator==(nullptr_t, const Q_ptr<T> &x) noexcept;

bool operator!=(const Q_ptr<T> &x, nullptr_t) noexcept;

bool operator!=(nullptr_t, Q_ptr global_ptr<T> & x) noexcept;

bool operator<(const Q_ptr<T> &x, nullptr_t) noexcept;

bool operator<(nullptr_t, const Q_ptr<T> & x) noexcept;

bool operator>(const Q_ptr<T> &x, nullptr_t) noexcept;

bool operator>(nullptr_t, const Q_ptr<T> & x) noexcept;

bool operator<=(const Q_ptr<T> &x, nullptr_t) noexcept;

bool operator<=(nullptr_t, const Q_ptr<T> & x) noexcept;

bool operator>=(const Q_ptr<T> &x, nullptr_t) noexcept;

bool operator>=(nullptr_t, const Q_ptr<T> & x) noexcept;

void swap(Q_ptr<T>& a, Q_ptr<T>& b) noexcept;

Pointer class constructors [3.4.3.5]

Q may be global, local, private, or constant.

constexpr Q_ptr() noexcept; Construct an object which

points to nothing

explicit Q_ptr(pointer p) noexcept; Construct an object which

points to p

Q_ptr(const Q_ptr &) noexcept; Copy constructor

Q_ptr(Q_ptr &&r) noexcept; Move constructor

constexpr Q_ptr(nullptr_t) noexcept; Construct an object

inialized with nullptr

Pointer class assignment operators [3.4.3.6]

Q may be global, local, private, or constant.

Q_ptr &operator=(const Q_ptr &r)

noexcept; Copy assignment operator

Q_ptr &operator=(Q_ptr &&r)

noexcept; Move assignment operator

Q_ptr &operator=(pointer r)

noexcept;

Assign r pointer to the

stored pointer

Q_ptr &operator=(nullptr_t)

noexcept;

Assign nullptr to the stored

pointer

Pointer class observers [3.4.3.7]

Q may be global, local, private, or constant.

add_lvalue_reference_t<add_Q <T>>

operator*() const noexcept; Return *get()

pointer operator->() const noexcept; Return get()

reference operator[](size_t pos)

const noexcept; Return get()[pos]

pointer get() const noexcept; Return the stored pointer

explicit operator bool()

const noexcept; Return get()!=nullptr

Pointer class modiers [3.4.3.8]

Q may be global, local, private, or constant.

pointer release() noexcept;

Assign nullptr to the stored

pointer, returns the value

that get() had at start

void reset(pointer p = pointer())

noexcept;

void reset(pointer p) noexcept;

Assign p to the stored pointer

void reset(nullptr_t p = nullptr)

noexcept; Equivalent to reset(pointer())

void swap(Q_ptr& r) noexcept; Invokes swap on the stored

pointers.

Atomic Operaons Library [3.24]

Header <opencl_atomic>

template<class T> struct atomic;

template<> struct atomic<integral>;

template<class T> struct atomic<T*>;

enum memory_order

memory_order_x where x may be relaxed, acquire,

acq_rel, seq_cst, release

enum memory_scope

memory_scope_x where x may be work_item,

work_group, sub_group, all_svm_devices, device

Atomic types [3.24.4]

Combined members from struct atomic, including

specializaons for integers (atomic<integral>) and pointers

(atomic<T*>). For struct atomic<integral>, replace T with

integral. For struct atomic<T*>, replace T with T*.

The pointer specializaon is available if

__INTPTR_WIDTH__== 32, or both extensions

cl_khr_int64_[base, extended]_atomics are enabled and

__INTPTR_WIDTH__== 64.

bool is_lock_free() const [volale] noexcept;

void store(T, memory_order = memory_order_seq_cst,

memory_scope = memory_scope_device) [volale] noexcept;

T load(memory_order = memory_order_seq_cst,

memory_scope = memory_scope_device)

const [volale] noexcept;

operator T() const [volale] noexcept;

T exchange(T, memory_order = memory_order_seq_cst,

memory_scope = memory_scope_device) [volale] noexcept;

bool compare_exchange_[weak, strong](T&, T, memory_order,

memory_order, memory_scope) [volale] noexcept;

bool compare_exchange_[weak, strong](T&, T, memory_order

= memory_order_seq_cst,

memory_scope = memory_scope_device) [volale] noexcept;

atomic() noexcept = default;

constexpr atomic(T) noexcept;

T operator=(T) [volale] noexcept;

(Connued on next page >)

Vector Component Addressing

[2.1.2.3]

Vector Components

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

oat2 v; v.x, v.r,

v.s0

v.y, v.g,

v.s1

oat3 v; v.x, v.r,

v.s0

v.y, v.g,

v.s1

v.z, v.b,

v.s2

oat4 v; v.x, v.r,

v.s0

v.y, v.g,

v.s1

v.z, v.b,

v.s2

v.w, v.a,

v.s3

oat8 v; v.s0 v.s1 v.s2 v.s3 v.s4 v.s5 v.s6 v.s7

oat16 v; v.s0 v.s1 v.s2 v.s3 v.s4 v.s5 v.s6 v.s7 v.s8 v.s9 v.sa,

v.sA

v.sb,

v.sB

v.sc,

v.sC

v.sd,

v.sD

v.se,

v.sE

v.sf,

v.sF

Vector Addressing Equivalences

Numeric indices are preceded by the leer s. Swizzling,

duplicaon, and nesng are allowed, e.g.: v.yx, v.xx, v.lo.x

v.lo v.hi v.odd v.even

oat2 v.x, v.s0 v.y, v.s1 v.y, v.s1 v.x, v.s0

oat3 *v.s01, v.xy v.s23, v.zw v.s13, v.yw v.s02, v.xz

oat4 v.s01, v.xy v.s23, v.zw v.s13, v.yw v.s02, v.xz

oat8 v.s0123 v.s4567 v.s1357 v.s0246

oat16 v.s01234567 v.s89abcdef v.s13579bdf v.s02468ace

*When using .lo or .hi with a 3-component vector,

the .w component is undened.

Q_ptr &operator++() noexcept;

Q_ptr &operator--() noexcept;

Prex [in/de]crement stored

pointer by one.

Q_ptr operator++(int) noexcept;

Q_ptr operator--(int) noexcept;

Posix [in/de]crement stored

pointer by one.

Q_ptr &operator+=(

dierence_type r) noexcept;

Adds r to the stored pointer

and returns *this.

Q_ptr &operator-=(

dierence_type r) noexcept;

Subtracts r to the stored

pointer and returns *this.

Q_ptr operator+(

dierence_type r) noexcept; [Adds/subtracts] r to the

stored pointer and returns

the value *this has at the

start of the operaon.

Q_ptr operator-(

dierence_type r) noexcept;

Other address space funcons [3.4.4]

template <class T>

mem_fence get_mem_fence (

T *ptr);

Return the mem_fence value

for ptr.

template<class T, class U>

T dynamic_as_cast(U *ptr);

Returns a pointer to a region

in the address space pointer

class specied in T

OpenCL 2.2 Reference Guide Page 7

OpenCL C++ Language



Images and Samplers Library [3.11]

Header <opencl_image>

struct sampler;

template <addressing_mode A, normalized_coordinates C,

ltering_mode F> constexpr sampler make_sampler();

template <class T, image_access A, image_dim Dim,

bool Depth, bool Array, bool MS> struct image;

Image types [3.11.2]

T is the type of value returned when reading or sampling from

given image, or the type of color used to write to image.

using image1d = image<T, A, image_dim::image_1d, false, false,

false>;

using image1d_buer = image<T, A, image_dim::image_buer,

false, false, false>;

using image1d_array = image<T, A, image_dim::image_1d, false,

true, false>;

using image2d = image<T, A, image_dim::image_2d, false, false,

false>;

using image2d_depth = image<T, A, image_dim::image_2d, true,

false, false>;

using image2d_array = image<T, A, image_dim::image_2d, false,

true, false>;

using image2d_array_depth = image<T, A,

image_dim:: image_2d, true, true, false>;

using image3d = image<T, A, image_dim::image_3d, false, false,

false>;

The extensions cl_khr_gl_msaa_sharing and

cl_khr_gl_depth_images add the following funcons.

using image2d_ms = image<T, A, image_dim::image_2d, false,

false, true>;

using image2d_array_ms = image<T, A, image_dim::image_2d,

false, true, true>;

using image2d_depth_ms = image<T, A, image_dim::image_2d,

true, false, true>;

using image2d_array_depth_ms = image<T, A,

image_dim::image_2d, true, true, true>;

Image element types [3.11.4]

In OpenCL terminology, images are classied as depth images,

which have the Depth template parameter set to true, or

normal images, which have the Depth template parameter set

to false. Half types are only available if cl_khr_fp16 extension

is enabled.

depth images

Non-mulsample depth image types: oat, half

For mul-sample 2D and mul-sample 2D

array images, only valid type: oat

normal images

Valid types: oat4, int4, uint4, and half4

For mul-sample 2D and mul-sample 2D

array images, only valid types: oat4, int4

and uint4

Image dimension [3.11.5]

template <image_dim Dim>

struct image_dim_num;

enum image_dim

image_1d, image_2d, image_3d, image_buer

Members of class image

Members indicated with a dot are available when these

extensions are enabled:

•

cl_khr_mipmap_image[_writes]

•

cl_khr_gl_msaa_sharing and cl_khr_gl_depth_images

For images specied with image_dim::image1d and

image_dim::buer

int width() const noexcept;

•

int width(oat lod) const noexcept;

For images specied with image_dim::image2d

int [width, height]() const noexcept;

•

int [width, height](oat lod) const noexcept;

•

int num_samples() const noexcept;

For images specied with image_dim::image3d

int [width, height, depth]() const noexcept;

•

int [width, height, depth](oat lod) const noexcept;

For arrayed images

int array_size() const noexcept;

•

int array_size(int lod) const noexcept;

Image access [3.11.6]

enum image_access

sample, read, write, read_write

Members of class image

The non-mulsample image template class specializaons

present dierent sets of methods based on their access

parameter. Members indicated with a dot are available when

these extensions are enabled:

•

cl_khr_mipmap_image

•

cl_khr_mipmap_image_writes

•

cl_ khr_gl_msaa_sharing and cl_khr_gl_depth_images

For images specied with image_access::read

element_type image::read(integer_coord coord) const noexcept;

pixel image::operator[](integer_coord coord) const noexcept;

element_type image::pixel::operator element_type()

const noexcept;

•

element_type image::read(integer_coord coord, int sample)

noexcept;

For images specied with image_access::write

void image::write(integer_coord coord, element_type color)

noexcept;

image::pixel image::operator[](integer_coord coord) noexcept;

image::pixel & image::pixel::operator=(element_type color)

noexcept;

•

void image::write(integer_coord coord, element_type color,

int lod) noexcept;

For images specied with image_access::read_write

element_type image::read(integer_coord coord) const noexcept;

void image::write(integer_coord coord, element_type color)

noexcept;

image::pixel image::operator[](integer_coord coord) noexcept;

element_type image::pixel::operator element_type(

) const noexcept;

image::pixel & image::pixel::operator=(element_type color)

noexcept;

For images specied with image_access::sample

element_type image::read(integer_coord coord) const noexcept;

element_type image::sample(const sampler &s,

integer_coord coord) const noexcept;

element_type image::sample(const sampler &s,

oat_coord coord) const noexcept;

image::pixel image::operator[](integer_coord coord)

const noexcept;

element_type image::pixel::operator element_type(

) const noexcept;

•

element_type image::sample(const sampler &s,

oat_coord coord, oat lod) const noexcept;

•

element_type image::sample(const sampler &s,

integer_coord coord, gradient_coord gradient_x,

gradient_coord gradient_y) const noexcept;

Common image methods [3.11.7]

Each image type implements this set of common members.

Member indicated with a dot is available when these extensions

are enabled:

•

cl_khr_mipmap_image[_writes]

image_channel_type image::data_type() const noexcept;

image_channel_order image::order() const noexcept;

•

int image::miplevels() const noexcept;

enum image_channel_type

snorm_int8, snorm_int16, unorm_int8,

unorm_int16, unorm_int24,

unorm_short_565, unorm_short_555,

unorm_short_101010, unorm_short_101010_2, sint8,

sint16, sint32, uint8, uint16, uint32,

oat16, oat32

enum image_channel_order

a, r, rx, rg, rgx, ra, rgb, rgbx, rgba, argb, bgra, intensity,

luminance, abgr, srgb, srgbx, srgba, sbgra,

depth, depth_stencil

(Connued on next page >)

Atomic Operaons Library (connued)

Members available in specializaons atomic<integral> and

atomic<T*>. For struct atomic<integral>, replace T with integral,

and for struct atomic<T*>, replace T with T*. op may be one of

add, sub, and, or, xor, min, or max.

fetch_op(

, memory_order = memory_order_seq_cst,

memory_scope = memory_scope_device) [volale] noexcept;

Ti operator[++, --]([int]) [volale] noexcept;

Ti operator[+, -, &, |, ^]=(Ti) [volale] noexcept;

Atomic types

Pointer specializaons indicated with a dot are available when

these extensions are enabled:

•

cl_khr_fp64

•

both cl_khr_int64_[base, extended]_atomics

using atomic_[u]int = atomic<[u]int>;

using atomic_oat = atomic<oat>;

•

using atomic_[u]long = atomic<[u]long>;

•

using atomic_double = atomic<double>;

Available if __INTPTR_WIDTH__== 32, or both extensions

cl_khr_int64_[base, extended]_atomics are enabled and

__INTPTR_WIDTH__== 64.

using atomic_intptr_t = atomic<intptr_t>;

using atomic_uintptr_t = atomic<uintptr_t>;

Available if __SIZE_WIDTH__== 32, or both extensions

cl_khr_int64_[base, extended]_atomics are enabled and

__SIZE_WIDTH__== 64:

using atomic_size_t = atomic<size_t>;

Available if __PTRDIFF_WIDTH__ == 32, or both extensions

cl_khr_int64_[base, extended]_atomics are enabled and

__PTRDIFF_WIDTH__ == 64:

using atomic_ptrdi_t = atomic<ptrdi_t>;

Members of struct atomic_ag:

atomic_ag() noexcept = default;

bool test_and_set(memory_order = memory_order_seq_cst,

memory_scope = memory_scope_device) [volale] noexcept;

void clear(memory_order = memory_order_seq_cst,

memory_scope = memory_scope_device) [volale] noexcept;

Non-member funcons:

bool atomic_ag_test_and_set([volale]atomic_ag*) noexcept;

bool atomic_ag_test_and_set_explicit([volale]atomic_ag*,

memory_order, memory_scope) noexcept;

void atomic_ag_clear([volale]atomic_ag*) noexcept;

void atomic_ag_clear_explicit([volale]atomic_ag*,

memory_order, memory_scope) noexcept;

Fences [3.24.6]

void atomic_fence(mem_fence ags,

memory_order order, memory_scope scope) noexcept;

ags: mem_fence::global, mem_fence::local,

mem_fence::image or a combinaon of these values

ORed together

scope: memory_scope_x where x may be all_svm_devices,

device, work_group, sub_group, work_item

OpenCL 2.2 Reference GuideOpenCL C++ LanguagePage 8

OpenCL C++ Language



Device Enqueue Library [3.13]

Header <opencl_device_queue>

Allows a kernel to independently enqueue the same device, without host interacon.

enum enqueue_policy

no_wait, wait_kernel, wait_work_group

enum event_status

submied, complete, error

enum enqueue_status

success, failure, invalid_queue, invalid_ndrange, invalid_event_wait_list, queue_full,

invalid_arg_size, event_allocaon_failure, out_of_resources

enum event_proling_info

exec_me

Members of struct device_queue [3.13.3]

struct device_queue: marker_type;

enqueue_status

enqueue_marker(uint num_events_in_wait_list,

const event *event_wait_list, event *event_ret) noexcept;

Enqueues a marker to device queue

aer a list of events specied by

event_wait_list completes.

template <class Fun , class... Args>

enqueue_status enqueue_kernel(enqueue_policy policy,

const ndrange &ndrange, Fun fun, Args... args) noexcept;

Enqueue functor or lambda fun on

the device with specied policy over

the specied ndrange.

template <class Fun, class... Args>

enqueue_status enqueue_kernel(enqueue_policy policy,

uint num_events_in_wait_list, const event *event_wait_list,

event *event_ret, const ndrange &ndrange, Fun fun,

Args... args) noexcept;

Enqueues functor or lambda fun in

the same way as the overload above

with the excepon for the passed

event list.

device_queue(const device_queue&) = default;

device_queue(device_queue&&) = default; Constructors

Members of struct event [3.13.4]

struct event;

bool is_valid() const noexcept; Returns true if event object is a valid event.

explicit operator bool() const noexcept; Returns true if event object is a valid event.

void retain() noexcept; Increments the event reference count.

void release() noexcept; Decrements the event reference count.

void set_status(event_status status) noexcept; Sets the execuon status of a user event.

void proling_info(event_proling_info name,

global_ptr<long> value) noexcept;

Captures the proling informaon for

funcons that are enqueued as commands.

(Connued on next page >)

Pipes Library

Header <opencl_pipe>

Use pipe and pipe_storage template classes as a communicaon channel between kernels.

enum class pipe_access { read, write };

template <class T, pipe_access Access = pipe_access::read> struct pipe;

template<cl::pipe_access Access = pipe_access::read, class T, size_t N> pipe<T, Access>

class pipe methods

[3.8.4]

When

pipe_access is: Member funcon Descripon

read bool read(T& ref) const noexcept; Read packet from pipe into ref.

write bool write(const T& ref) noexcept; Write packet specied by ref to

pipe.

read reservaon<memory_scope_work_item>

reserve(uint num_packets) const noexcept;

Reserve num_packets entries for

reading/wring from/to pipe.

write reservaon<memory_scope_work_item>

reserve(uint num_packets) noexcept;

read

reservaon<memory_scope_work_group>

work_group_reserve(uint num_packets)

const noexcept;

write

reservaon<memory_scope_work_group>

work_group_reserve(uint num_packets)

noexcept;

read

reservaon <memory_scope_sub_group>

sub_group_reserve(uint num_packets)

const noexcept;

write

reservaon <memory_scope_sub_group>

sub_group_reserve(uint num_packets)

noexcept;

read, write uint num_packets() const noexcept;

Returns current number of packets

that have been wrien to but not

yet been read from the pipe.

read, write uint max_packets() const noexcept; Returns max. number of packets

specied when pipe was created.

When

pipe_access is: Member funcon Descripon

read bool pipe::reservaon::read(uint index, T& ref)

const noexcept;

Read packet from the reserved area

of the pipe referred to by index

into ref.

write bool pipe::reservaon::write(uint index,

const T& ref) noexcept;

Write packet specied by ref to the

reserved area of the pipe referred

to by index.

read void pipe::reservaon::commit()

const noexcept; Indicates that all reads/writes

to num_packets associated with

reservaon are completed.

write bool pipe::reservaon::commit() noexcept;

read bool pipe::reservaon::is_valid();

Return true if reservaon is a valid

reservaon ID.

write bool pipe::reservaon::is_valid()

const noexcept;

read, write explicit pipe::reservaon::operator bool()

const noexcept;

Non-member funcons

template<pipe_access Access = pipe_access::read,

class T, size_t N>

pipe<T, Access> make_pipe(const pipe_storage

<T, N>& ps);

Constructs a read only or write only pipe

from pipe_storage object.

pipe_storage class [3.8.5]

N in the following declaraon species the maximum number of packets which can be held by an

object.

template <class T, size_t N> struct pipe_storage;

Members of struct pipe_storage:

pipe_storage();

pipe_storage(const pipe_storage&) = default;

pipe_storage(pipe_storage&&) = default;

template<pipe_access Access = pipe_access::read>

pipe<T, Access> get() noexcept;

Constructs a read only or write only pipe

from pipe_storage object.

Images and Samplers Library (connued)

Other image methods [3.11.8]

Members indicated with a dot are available when these

extensions are enabled:

•

cl_khr_mipmap_image

•

cl_khr_mipmap_image or cl_khr_mipmap_image_writes

•

cl_ khr_gl_msaa_sharing and cl_khr_gl_depth_images

element_type image::sample(const sampler &s,

oat_coord coord) const noexcept;

element_type image::sample(const sampler &s,

integer_coord coord) const noexcept;

•

element_type image::sample(const sampler &s,

oat_coord coord, oat lod) const noexcept;

•

element_type image::sample(const sampler &s,

oat_coord coord, gradient_coord gradient_x,

gradient_coord gradient_y) const noexcept;

element_type image::read(integer_coord coord) const noexcept;

•

void image::read(integer_coord coord, int sample) noexcept;

void image::write(integer_coord coord, element_type color)

noexcept;

•

void image::write(integer_coord coord, element_type color,

int lod) noexcept;

pixel operator[](integer_coord coord) noexcept;

pixel operator[](integer_coord coord) const noexcept;

element_type pixel::operator element_type() const noexcept;

pixel & pixel::operator=(element_type color) noexcept;

int width() const noexcept;

•

int width(int lod) const noexcept;

int height() const noexcept;

•

int height(int lod) const noexcept;

int depth() const noexcept;

•

int depth(int lod) const noexcept;

int array_size() const noexcept;

•

int array_size(int lod) const noexcept;

image_channel_type image::data_type() const noexcept;

image_channel_order image::order() const noexcept;

integer_coord size() const noexcept;

•

int miplevels() const noexcept;

•

int num_samples() const noexcept;

Sampler [3.11.9]

Acquire a sampler inside of a kernel by passing it as a kernel

parameter from host using clSetKernelArg, or creang it using

the make_sampler funcon in the kernel code.

enum addressing_mode

mirrored_repeat, repeat, clamp_to_edge, clamp, none

enum normalized_coordinates

normalized, unnormalized

enum normalized_coordinates

nearest, linear

OpenCL 2.2 Reference Guide Page 9

OpenCL C++ Language



Synchronizaon Funcons [3.16]

Header <opencl_synchronizaon>

struct work_group_named_barrier: marker_type;

Barriers [3.16.2]

void work_group_barrier(mem_fence ags,

memory_scope scope = memory_scope_work_group);

Work-items in a work-group must

execute this before any can connue

void sub_group_barrier(mem_fence ags, memory_scope

scope = memory_scope_work_group);

Work-items in a sub-group must

execute this before any can connue

ags: mem_fence::global, mem_fence::local, mem_fence::image or a combinaon of

these values ORed together

scope: memory_scope_x where x may be all_svm_devices, device, work_group,

sub_group, work_item

Named barriers [3.16.3]

Members from struct work_group_named_barrier. work_group_named_barrier requires

the cl_khr_sub_group_named_barrier extension be enabled.

work_group_named_barrier(uint sub_group_count);

Inialize a new named barrier object

to synchronize sub_group_count sub-

groups in the current work-group.

work_group_named_barrier(

const work_group_named_barrier&) = default;

work_group_named_barrier(

work_group_named_barrier&&) = default;

wait(mem_fence ags,

memory_scope scope = memory_scope_work_group)

const noexcept;

All work-items in a sub-group

execung the kernel on a processor

must execute this method before any

are allowed to connue.

Workgroup Funcons [3.15]

Header <opencl_work_group>

Logical operaons [3.15.2]

bool work_group_all (bool predicate)

bool work_group_any (bool predicate)

Evaluates predicate for all work-items in the work-

group and returns true if predicate evaluates to

true for all/any work-items in the work-group.

bool sub_group_all (bool predicate)

bool sub_group_any (bool predicate)

Evaluates predicate for all work-items in the sub-

group and returns a non-zero value if predicate

evaluates to non-zero for all/any work-items in the

sub-group.

Broadcast funcons [3.15.3]

T is type int, uint, long, ulong, or oat, double (if cl_khr_fp64 is enabled) or half (if cl_khr_fp16 is

enabled).

T work_group_broadcast(T a, size_t local_id);

T work_group_broadcast(T a, size_t local_id_x, size_t local_id_y);

T work_group_broadcast(T a, size_t local_id_x, size_t local_id_y,

size_t local_id_z);

Broadcast the value of a for

work-item idened by local_id

to all work-items in the work-

group.

T sub_group_broadcast(T x, size_t sub_group_local_id);

Broadcast the value of x

for work-item returned by

get_sub_group_local_id to all

work-items in the sub-group.

Numeric operaons [3.15.4]

enum work_group_op

add, min, max

T is type int, uint, long, ulong, or oat, double (if cl_khr_fp64 is enabled) or half (if cl_khr_fp16 is

enabled).

template <work_group_op op>

T work_group_reduce(T x);

Return result of reducon operaon <op> for all

values of x specied by work-items in a work-group.

template <work_group_op op>

T work_group_scan_[ex, in]clusive(T x);

Perform an exclusive/inclusive scan operaon

<op> of all values specied by work-items in the

work-group.

template <work_group_op op>

T sub_group_reduce(T x);

Return result of reducon operaon <op> for all

values of x specied by work-items in a sub-group.

template <work_group_op op>

T sub_group_scan_[ex, in]clusive(T x);

Perform an exclusive/inclusive scan operaon <op>

of all values specied by work-items in a sub-group.

The scan results are returned for each work-item.

Device Enqueue Library (connued)

Non-member funcons [3.13.5]

device_queue get_default_device_queue(); Returns the default device queue.

event make_user_event(); Creates, returns, and sets the execuon status

of the user event to event_status::submied.

template <class Fun, class... Args>uint

get_kernel_work_group_size(

Fun fun, Args... args);

Provides a mechanism to query the maximum

work-group size that can be used to execute

a functor

template <class Fun, class... Args> uint

get_kernel_preferred_work_group_size_-

mulple(Fun fun, Args... args);

Returns the preferred mulple of work-group

size for launch.

template <class Fun, class... Args> uint

get_kernel_sub_group_count_for_ndrange(

const ndrange & ndrange, Fun fun, Args... args);

Returns the number of sub-groups in each

work-group of the dispatch

template <class Fun, class... Args> uint

get_kernel_max_sub_group_size_for_ndrange(

const ndrange & ndrange, Fun fun, Args... args);

Returns the maximum sub-group size for a

functor.

template <class Fun, class... Args> uint

get_kernel_local_size_for_sub_group_count(

uint num_sub_groups, Fun fun, Args... args);

Returns a valid local size that would produce

the requested number of sub-groups such that

each sub-group is complete with no paral

sub-groups

template <class Fun, class... Args>

uint get_kernel_max_num_sub_groups(

Fun fun, Args... args);

Provides a mechanism to query the maximum

number of sub-groups that can be used to

execute the passed functor on the current

device.

Enqueue enums

[3.13.7-11]

enum enqueue_policy

no_wait, wait_kernel, wait_work_group

enum enqueue_status

success, failure, invalid_queue, invalid_ndrange, invalid_event_wait_list, queue_full,

invalid_arg_size, event_allocaon_failure, out_of_resources

enum event_status

submied, complete, error

enum event_proling_info

exec_me

Work-Item Funcons [3.14]

Header <opencl_work_item>

Query the number of dimensions, global, and local work size specied to

clEnqueueNDRangeKernel, and global and local idener of each work-item when this kernel is

executed on a device.

uint get_work_dim(); Number of dimensions in use

size_t get_global_size(uint dimindx); Number of global work-items

size_t get_global_id(uint dimindx); Global work-item ID value

size_t get_local_size(uint dimindx); Number of local work-items if kernel executed with

uniform work-group size

size_t get_enqueued_local_size(uint dimindx); Number of local work-items

size_t get_local_id(uint dimindx); Local work-item ID

size_t get_num_groups(uint dimindx); Number of work-groups

size_t get_group_id(uint dimindx); Work-group ID

size_t get_global_oset(uint dimindx); Global oset

size_t get_global_linear_id(); Work-items 1-dimensional global ID

size_t get_local_linear_id(); Work-items 1-dimensional local ID

size_t get_sub_group_size(); Number of work-items in the subgroup

size_t get_max_sub_group_size(); Maximum size of a subgroup

size_t get_num_sub_groups(); Number of subgroups

size_t get_enqueued_num_sub_groups(); If kernel executed with a uniform work-group size,

results are same as for get_num_sub_groups.

size_t get_sub_group_id(); Sub-group ID

size_t get_sub_group_local_id(); Unique work-item ID

OpenCL 2.2 Reference GuideOpenCL C++ LanguagePage 10

OpenCL C++ Language

Math Funcons [3.19]

Header <opencl_math>

Vector versions of the math funcons operate component-wise.

The descripon is per-component. T is halfn (if cl_khr_fp16 is

enabled), oatn, or doublen (if cl_khr_fp64 is enabled), where n

is 2, 3, 4, 8, or 16. Tf may only be oatn. All angles are in radians.

Trigonometric funcons

[3.19.2]

T acos (T) Arc cosine

T acosh (T) Inverse hyperbolic cosine

T acospi (T x) Compute acos (x) / π

T asin (T) Arc sine

T asinh (T) Inverse hyperbolic sine

T asinpi (T x) Compute asin (x) / π

T atan (T y_over_x) Arc tangent

T atan2 (T y, T x) Arc tangent of y / x

T atanh (T) Hyperbolic arc tangent

T atanpi (T x) Compute atan (x) / π

T atan2pi (T x, T y) Compute atan2 (y, x) / π

T cos (T x)

Tf nave_math::cos(Tf x);

Tf half_math::cos(Tf x);

Cosine, x is an angle

T cosh (T x) Hyperbolic cosine

T cospi (T x) Compute cos (π x)

T sin (T x)

Tf nave_math::sin(Tf x);

Tf half_math::sin(Tf x);

Sine, x is an angle

T sincos (T x, T *cosval) Sine and cosine of x

T sinh (T x) Hyperbolic sine

T sinpi (T x) sin (π x)

T tan (T x)

Tf nave_math::tan(Tf x);

Tf half_math::tan(Tf x);

Tangent

T tanh (T x) Hyperbolic tangent

T tanpi (T x) tan (π x)

Power funcons

[3.19.3]

T cbrt (T) Cube root

T pow (T x, T y) Compute x to the power of y

oatn pown (T x, intn y) Compute x y, where y is an

integer

T powr (T x, T y)

Tf nave_math::powr(Tf x,

Tf y);

Tf half_math::powr(Tf x, Tf y);

Compute x y, where x is >= 0

Tf rootn (T x, intn y) Compute x to the power of 1/y

T rsqrt (T)

Tf nave_math::rsqrt(Tf x);

Tf half_math::rsqrt(Tf x);

Inverse square root

T sqrt (T)

Tf nave_math::sqrt(Tf x);

Tf half_math::sqrt(Tf x);

Square root

Logarithmic funcons

[3.19.4]

intn ilogb (T x) Return exponent as an integer

value

T lgamma (T x)

T lgamma_r (T x, intn *signp) Log gamma funcon

T log (T)

Tf nave_math::log( Tf x);

Tf half_math::log( Tf x);

Natural logarithm

T log2 (T)

Tf nave_math::log2(Tf x);

Tf half_math::log2(Tf x);

Base 2 logarithm

T log10 (T)

Tf nave_math::log10(Tf x);

Tf half_math::log10(Tf x);

Base 10 logarithm

T log1p (T x)Compute loge (1.0 + x)

T logb (T x) Exponent of x

Exponenal funcons

[3.19.5]

T exp (T x)

Tf nave_math::exp(Tf x);

Tf half_math::exp(Tf x);

Exponenal base-e exp. of x

T exp2 (T)

Tf nave_math::exp2(Tf x);

Tf half_math::exp2(Tf x);

Exponenal base 2

T exp10 (T x)

Tf nave_math::exp10(

Tf x);

Tf half_math::exp10(Tf x);

Exponenal base 10

T expm1 (T x) Compute ex -1.0

T ldexp (T x, intn k) x * 2k

Floang point funcons

[3.19.6]

T ceil (T) Round to integer toward + innity

T copysign (T x, T y)x with sign changed to sign of y

T oor (T) Round to integer toward innity

T fma (T a, T b, T c) Mulply and add, then round

T fmod (T x, T y) Modulus. Returns x – y * trunc

(x/y)

T fract (T x, T *iptr) Fraconal value in x

T frexp (T x, intn *exp) Extract manssa and exponent

T modf (T x, T *iptr) Decompose oang-point number

oatn nan (uintn nancode)

doublen nan

(ulongn nancode)

halfn nan

(ushortn nancode)

Quiet NaN

T nextaer (T x, T y) Next representable oang-point

value aer x in the direcon of y

T remainder (T x, T y) Floang point remainder

T remquo (T x, T y,

intn *quo)Remainder and quoent

T rint (T x)Round to nearest even integer

T round (T x) Integral value nearest to x

rounding

T trunc (T x)

Return integral value nearest to

x rounding halfway cases away

from zero.

Comparison funcons

[3.19.7]

T fdim (T x, T y) Posive dierence between x and y

T fmax (T x, T y) Return y if x < y, else returns x

T fmin (T x, T y) Return y if y < x, else returns x

T fmod (T x, T y) Modulus. Returns x – y * trunc

(x/y)

T maxmag (T x, T y) Maximum magnitude of x and y

T minmag (T x, T y) Minimum magnitude of x and y

Other funcons

[3.19.8]

Tf nave_math::divide(

Tf x, Tf y);

Tf half_math::divide(

Tf x, Tf y);

Compute x / y

T erfc (T) Complementary error funcon.

T erf (T x) Calculates error funcon of T

T fabs (T x) Absolute value

T hypot (T x, T y) Square root of x2 + y2

T mad (T a, T b, T c) Approximates a * b + c

Tf nave_math::recip(Tf x);

Tf half_math::recip(Tf x); Reciprocal

T tgamma (T x) Gamma funcon

Integer Built-in Funcons [3.20]

Header <opencl_integer>

T is type char, charn, uchar, ucharn, short, shortn, ushort,

ushortn, int, intn, uint, uintn, long, longn, ulong, or ulongn,

where n is 2, 3, 4, 8, or 16. Tu is the unsigned version of T. Tsc is

the scalar version of T.

bitwise funcons

[3.20.2]

T clz (T x)Number of leading 0-bits in x

T ctz (T x)Number of trailing 0-bits in x

T popcount (T x)Number of non-zero bits in x

T rotate (T v, T i)result[indx] = v[indx] << i[indx]

For upsample, return type is scalar when the parameters are scalar.

short[n] upsample (

char[n] hi, uchar[n] lo)result[i]= (([u]short)hi[i]<< 8) | lo[i]

ushort[n] upsample (

uchar[n] hi, uchar[n] lo)result[i]=((ushort)hi[i]<< 8) | lo[i]

int[n] upsample (

short[n] hi, ushort[n] lo)result[i]=((int)hi[i]<< 16) | lo[i]

uint[n] upsample (

ushort[n] hi, ushort[n] lo)result[i]=((uint)hi[i]<< 16) | lo[i]

long[n] upsample (

int[n] hi, uint[n] lo)result[i]=((long)hi[i]<< 32) | lo[i]

ulong[n] upsample (

uint[n] hi, uint[n] lo)result[i]=((ulong)hi[i]<< 32) | lo[i]

numeric funcons

[3.20.3]

Tu abs (T x)| x |

Tu abs_di (T x, T y)| x – y | without modulo overow

T add_sat (T x, T y)x + y and saturates the result

T hadd (T x, T y) (x + y) >> 1 without mod. overow

T rhadd (T x, T y) (x + y + 1) >> 1

T clamp (T x, T min, T max)

T clamp (T x, Tsc min, Tsc max)min(max(x, minval), maxval)

T mad_hi (T a, T b, T c)mul_hi(a, b) + c

T mad_sat (T a, T b, T c)a * b + c and saturates the result

T max (T x, T y)

T max (T x, Tsc y)y if x < y, otherwise it returns x

T min (T x, T y)

T min (T x, Tsc y)y if y < x, otherwise it returns x

T mul_hi (T x, T y)High half of the product of x and y

T sub_sat (T x, T y)x - y and saturates the result

24-bit operaons

[3.20.4]

The following fast integer funcons opmize the performance

of kernels. In these funcons, T is type int, uint, intn or uintn,

where n is 2, 3, 4, 8, or 16.

intn mad24 (T x, T y, T z) Mulply 24-bit integer values x, y, add

32-bit int. result to 32-bit integer z

T mul24 (T x, T y) Mulply 24-bit integer values x and y

Common Funcons [3.17]

Header <opencl_common>

These funcons are implemented using the round to nearest

even rounding mode. Vector versions operate component-wise.

Ts is type oat, oponally double (if cl_khr_fp64 is enabled), or

half if cl_khr_fp16 is enabled. Tn is the vector form of Ts, where

n is 2, 3, 4, 8, or 16. T is Ts and Tn.

T clamp (T x, T min, T max) Clamp x to range given by

min, max

T degrees (T radians) radians to degrees

T max (T x, T y) Max of x and y

T min (T x, T y) Min of x and y

T mix (T x, T y, T a) Linear blend of x and y

T radians (T degrees) degrees to radians

T step (T edge, T x) 0.0 if x < edge, else 1.0

T smoothstep (T edge0, T edge1, T x) Step and interpolate

T sign (T x) Sign of x

OpenCL 2.2 Reference Guide Page 11

OpenCL C++ Language



Relaonal Built-in Funcons [3.21]

Header <opencl_relaonal>

These funcons can be used with built-in scalar or vector types

as arguments and return a scalar or vector integer result. T

is type oat, oatn, char, charn, uchar, ucharn, short, shortn,

ushort, ushortn, int, intn, uint, uintn, long, longn, ulong, ulongn,

or oponally double or doublen (if cl_khr_fp64 is enabled) or

half or halfn (if cl_khr_fp16 is enabled). n is 2, 3, 4, 8, or 16.

booln isequal(oatn x, oatn y);

booln isequal(halfn x, halfn y);

booln isequal(doublen x, doublen y);

Compare of x == y

booln isnotequal(oatn x, oatn y); Compare of x != y

booln isgreater(oatn x, oatn y); Compare of x > y

booln isgreaterequal(oatn x,

oatn y); Compare of x >= y

booln isless(oatn x, oatn y); Compare of x < y

booln islessequal(oatn x, oatn y); Compare of x <= y

booln islessgreater(oatn x, oatn y); Compare of

(x < y) || (x > y)

booln isordered(oatn x, oatn y); Test if arguments are

ordered

booln isunordered(oatn x, oatn y); Test if arguments are

unordered

booln isnite(oatn x, oatn y); Test for nite value

booln isinf(oatn x, oatn y); Test for + or – innity

booln isnan(oatn x, oatn y); Test for a NaN

booln isnormal(oatn x, oatn y); Test for a normal value

booln signbit(oatn x, oatn y); Test for sign bit

bool any(booln t); 1 if MSB in component of

x is set; else 0

bool all(booln t); 1 if MSB in all components

of x are set; else 0

T bitselect(T a, T b, T c);

Each bit of result is

corresponding bit of a if

corresponding bit of c is 0

T select(T a, T b, booln c);

For each component of a

vector type,

result[i] = if MSB of c[i] is

set ? b[i] : a[i] For scalar

type, result = c ? b : a

Array Library

[3.25]

Header <opencl_array>

template<class T, size_t N> struct array;

Iterators from struct array

[const_]iterator begin() [const] noexcept;

[const_]iterator end() [const] noexcept;

[const_]reverse_iterator rbegin() [const] noexcept;

[const_]reverse_iterator rend() [const] noexcept;

const_iterator cbegin() const noexcept;

const_iterator cend() const noexcept;

const_reverse_iterator crbegin() const noexcept;

const_reverse_iterator crend() const noexcept;

Capacies from struct array

constexpr size_type size() const noexcept;

constexpr size_type max_size() const noexcept;

constexpr bool empty() const noexcept;

prin Funcon [3.23]

Header <opencl_prin>

Writes output to an implementaon-dened stream.

int prin (constant char * restrict format, …)

prin output synchronizaon

When the event associated with a parcular kernel invocaon

completes, the output of applicable prin calls is ushed to the

implementaon-dened output stream.

prin format string

%[ags][width][.precision][vector][length] conversion

Examples:

The following examples show the use of the vector specier in

the prin format string.

oat4 f = oat4(1.0f, 2.0f, 3.0f, 4.0f);

uchar4 uc = uchar4(0xFA, 0xFB, 0xFC, 0xFD);

prin(“f4 = %2.2v4hlf\n”, f);

prin(“uc = %#v4hhx\n”, uc);

The above two prin calls print the following:

f4 = 1.00,2.00,3.00,4.00

uc = 0xfa,0x,0xfc,0xfd

Limits

[3.26]

Header <opencl_limits>

Half is available if cl_khr_fp16 is enabled, and double is available if cl_khr_fp64 is enabled.

Floang point limits

OpenCL C++ Macros

(x is HALF, FLT, DBL) HALF FLT DBL

Applicaon Macro

(x is HALF, FLT, DBL)

x_DIG 3 6 15 CL_x_DIG

x_MANT_DIG 11 24 53 CL_x_MANT_DIG

x_MAX_10_EXP +4 +4 +38 +308 CL_x_MAX_10_EXP

x_MAX_EXP +16 +128 +1024 CL_x_MAX_EXP

x_MIN_10_EX -4 -37 -307 CL_x_MIN_10_EXP

x_MIN_EXP -13 -125 -1021 CL_x_MIN_EXP

x_RADIX 2 2 2 CL_x_RADIX

x_MAX 0x1.cp15h 0x1.fep127f 0x1.fp1023 CL_x_MAX

x_MIN 0x1.0p-14h 0x1.0p-126f 0x1.0p-1022 CL_x_MIN

x_EPSILON 0x1.0p-10h 0x1.0p-23f 0x1.0p-52 CL_x_EPSILON

enum oat_round_style

round_indeterminate, round_toward_zero, round_to_nearest, round_toward_innity, round_toward_neg_innity

enum oat_denorm_style

denorm_indeterminate, denorm_absent, denorm_present

Integer limits

#dene CHAR_BIT 8

#dene CHAR_MAX SCHAR_MAX

#dene CHAR_MIN SCHAR_MIN

#dene INT_MAX 2147483647

#dene INT_MIN (-2147483647 – 1)

#dene LONG_MAX 0x7fL

#dene LONG_MIN (-0x7fL – 1)

#dene SCHAR_MAX 127

#dene SCHAR_MIN (-127 – 1)

#dene SHRT_MAX 32767

#dene SHRT_MIN (-32767 – 1)

#dene UCHAR_MAX 255

#dene USHRT_MAX 65535

#dene UINT_MAX 0x

#dene ULONG_MAX 0xUL

(Connued on next page >)

Geometric Funcons [3.18]

Header <opencl_geometric>

These funcons use the round to nearest even rounding mode.

Vector versions operate component-wise. Ts is scalar type oat,

double if cl_khr_fp64 is enabled, or half if cl_khr_fp16 is enabled.

Tn is the vector form of Ts with 2, 3, or 4 components.

oat{3,4} cross (oat{3,4} p0, oat{3,4} p1)

double{3,4} cross (double{3,4} p0,

double{3,4} p1)

half{3,4} cross (half{3,4} p0, half{3,4} p1)

Cross product

Ts distance (T p0, T p1)Vector distance

Ts dot (T p0, T p1)Dot product

Ts length (T p)Vector length

T normalize (T p)Normal vector

length 1

Vector Data Load/Store [3.22]

Header <opencl_vector_load_store>

T is type char, uchar, short, ushort, int, uint, long, ulong, or

oat, oponally double (if cl_khr_fp64 is enabled), or half (if

cl_khr_fp16 is enabled). Tn refers to the vector form of type T,

where n is 2, 3, 4, 8, or 16.

template <size_t N, class T>

make_vector_t<T, N>

vload(size_t oset, const T* p);

template <size_t N, class T>

make_vector_t<T, N>

vload(size_t oset,

const constant_ptr<T> p);

Read vector data

from address

(p + (oset * n))

template <size_t N>

make_vector_t<oat, N>

vload_half(size_t oset, const half* p);

template <size_t N>

make_vector_t<oat, N>

vload_half(size_toset,

const constant_ptr<half> p);

Read a halfn from

address

(p + (oset * n))

template <size_t N>

make_vector<oat, N>

vloada_half(size_t oset, const half* p);

template <size_t N>

make_vector<oat, N>

vloada_half(size_t oset,

const constant_ptr<half> p);

Read half vector

from (p + (oset *

n)). For half3, read

from (p + (oset

* 4)).

template <class T>

void vstore(T data, size_t oset,

vector_element_t<T>* p);

Write vector data to

address (p + (oset

* n)

template <rounding_mode rmode =

rounding_mode::rte, class T>

void vstore_half(T data, size_t oset,

half* p);

Write a half to

address

(p + oset)

template <rounding_mode rmode =

rounding_mode::rte, class T>

void vstorea_half(T data, size_t oset,

half* p);

Write a half vector

to address (p +

(oset * n))

OpenCL 2.2 Reference GuideOpenCL C++ LanguagePage 12

OpenCL C++ Language

OpenCL 2.2 Reference GuidePage 12

OpenCL C++ Language



Tuple Library [3.28]

Header <opencl_tuple>

template <class... Types> class tuple;

Tuple creaon

funcons

make_tuple() forward_as_tuple()

e() tuple_cat()

Tuple helper

classes class tuple_size class tuple_element

Element access get()

Relaonal

operators

operator==() operator<() operator!=()

operator>() operator<=() operator>=()

Specialized

algorithms swap()

Type Traits Library [3.29]

Header <opencl_type_traits>

template <class... Types> class tuple;

Primary type categories

is_void is_null_pointer

is_integral is_oang_point

is_array is_pointer

is_enum is_union

is_class is_funcon

is_lvalue_reference

is_rvalue_reference

is_member_object_pointer

is_member_funcon_pointer

Composite type categories

is_reference is_arithmec

is_object is_ fundamental

is_scalar is_compound

is_member_pointer

Type property queries

alignment_of rank extent

Type relaons

is_same

is_base_of

is_converble

Const-volale modicaons

remove_const add_const

remove_volale add_volale

remove_cv add_cv

As modicaons

remove_as remove_ars

add_constant remove_constant

add_local remove_local

add_global remove_global

add_private remove_private

add_generic remove_generic

Reference modicaons

remove_reference

add_lvalue_reference

add_rvalue_reference

Sign modicaons

make_signed make_unsigned

Array modicaons

remove_extent remove_all_extents

Pointer modicaons

add_pointer remove_pointer

Built-in vector queries

vector_size is_vector_type

Built-in vector modicaons

vector_element

make_vector

Other transformaons

aligned_storage aligned_union

decay enable_if

common_type underlying_type

condional result_of

Type properes

is_const is_volale

is_private is_local

is_global is_constant

is_generic is_vector

is_trivial is_trivially_copyable

is_pod is_literal_type

is_empty is_polymorphic

is_abstract is_nal

is_signed is_unsigned

is_standard_layout

is_[trivially_]construcble

is_[trivially_]default_construcble

is_[[trivially_]copy_]construcble

is_[[trivially_]move_]construcble

is_[trivially_]assignable

is_[[trivially_]copy_]assignable

is_[[trivially_]move_]assignable

is_[trivially_, nothrow_]destrucble

is_nothrow_[default_]construcble

is_nothrow_[copy_, move_]construcble

is_nothrow_[copy_, move_]assignable

has_virtual_destructor

Iterator Library [3.30]

Header <opencl_iterator>

template<class Category, class T,

class Distance = ptrdi_t,

class Pointer = T*,

class Reference = T&> struct iterator;

Iterator operaons

advance() distance()

next() prev()

Tags

input_iterator_tag

output_iterator_tag

forward_iterator_tag

bidireconal_iterator_tag

random_access_iterator_tag

Vector Wrapper Library [3.7]

Header <opencl_vec>

template<class T, size_t Size> struct vec;

struct vec members

vec( ) = default;

vec(const vec &) = default;

vec(vec &&) = default;

vec(const vector_type &r) noexcept;

vec(vector_type &&r) noexcept;

template <class... Params>

vec(Params... params) noexcept;

operator vector_type() const noexcept;

operatorOP() where OP may be =, ++, --, +=,

-=, *=, /=, %=

swizzle()

Simple swizzles

If preprocessor macro SIMPLE_SWIZZLES is

dened, then:

auto func() noexcept; where func may

be x through zzzz

Non-member operators

operatorOP() where OP may be ==, !=, <, >

<=, >=, +, -, *, /

Vector Ulies [3.9]

Header <opencl_vector_ulity>

template <size_t Channel, class Vec>

constexpr remove_ars_t

<vector_element_t<Vec>>

get(Vec & vector) noexcept;

template <size_t Channel, class Vec>

constexpr void set(Vec & vector,

remove_ars_t<vector_element_t<Vec>>

value) noexcept;

struct channel_ref members

operatorOP() where OP may be =, ++, --, +=, -=,

*=, /=, %=

Math Constants

[3.27]

Header <opencl_math_constants>

The values of the following symbolic constants are single-

precision oat.

MAXFLOAT Value of maximum non-innite single-

precision oang-point number

HUGE_VALF Posive oat expression, evaluates to +innity

HUGE_VAL Posive double expression, evals. to +innity

INFINITY Constant oat expression, posive or unsigned

innity

NAN Constant oat expression, quiet NaN

template<class T> class math_constants;

template<> class math_constants<halfn>;

template<> class math_constants<oatn>;

template<> class math_constants<doublen>;

Constants, funcons, and macros

The preprocessor macros in the table below are shown for

double and are available if cl_khr_fp64 is enabled. Append _F

for oat, or append _H for half if cl_khr_fp16 is enabled.

Name of constant FunctName Preprocessor macros

e e() M_E

log2e log2e() M_LOG2E

log10e log10e() M_LOG10E

ln2 ln2() M_LN2

ln10 ln10() M_LN10

pi pi() M_PI

pi_2 pi_2() M_PI_2

pi_4 pi_4() M_PI_4

one_pi one_pi() M_1_PI

two_pi two_pi() M_2_PI

two_sqrtpi two_sqrtpi() M_2_SQRTPI

sqrt2 sqrt2() M_SQRT2

sqrt1_2 sqrt1_2() M_SQRT1_2

Replace the placeholders in the templates below with values

from the indicated column in the table above.

template<class T> constexpr T Constant_v =

math_constants<T>::FunctName;

template<class T> class math_constants;

stac constexpr T FunctName noexcept { return T(); }

Examples:

template<class T> constexpr T pi_v = math_constants<T>::pi();

template<class T> class math_constants;

stac constexpr T pi() noexcept { return T(); }

Limits (connued)

Class numeric limits

[3.26.2]

template<class T> class numeric_limits;

All the members below are declared as stac constexpr.

bool is_specialized = false;

T min() noexcept { return T(); }

T max() noexcept { return T(); }

T lowest() noexcept { return T(); }

int digits = 0;

int digits10 = 0;

int max_digits10 = 0;

bool is_signed = false;

bool is_integer = false;

bool is_exact = false;

int radix = 0;

T epsilon() noexcept {return T()};

T round_error() noexcept {

return T(); }

int min_exponent = 0;

int min_exponent10 = 0;

int max_exponent = 0;

int max_exponent10 = 0;

bool has_innity = false;

bool has_quiet_NaN = false;

bool has_signaling_NaN = false;

oat_denorm_style has_denorm

= denorm_absent;

bool has_denorm_loss = false;

T innity() noexcept {

return T(); }

T quiet_NaN() noexcept {

return T(); }

T signaling_NaN() noexcept {

return T(); }

T denorm_min() noexcept {

return T(); }

bool is_iec559 = false;

bool is_bounded = false;

bool is_modulo = false;

bool traps = false;

bool nyness_before = false;

oat_round_style round_style =

round_toward_zero;

bool is_scalar = false;

bool is_vector = false;

Non-members

template<class T> class numeric_limits<const T>;

template<class T> class numeric_limits<volale T>;

template<class T> class numeric_limits<

const volale T>;

Range access

begin() cbegin() rbegin() crbegin()

end() cend() rend() crend()

Predened iterators

inserter() front_inserter() back_inserter()

make_reverse_iterator()

make_move_iterator()

operatorOP() where OP may be ==, !=, <, > <=,

>=, +, -

OpenCL 2.2 Reference Guide Page 13

OpenCL C Language

Aribute Qualiers [6.11]

Use to specify special aributes of enum, struct, and union

types.

__aribute__((aligned(n))) __aribute__((endian(host)))

__aribute__((aligned)) __aribute__((endian(device)))

__aribute__((packed)) __aribute__((endian))

Use to specify special aributes of variables or structure elds.

__aribute__((aligned(alignment)))

__aribute__((nosvm))

Use to specify basic blocks and control-ow-statements.

__aribute__((ar1)) {…}

Use to specify that a loop (for, while, and do loops) can be

unrolled. (Must appear immediately before the loop to be

aected.)

__aribute__((opencl_unroll_hint(n)))

__aribute__((opencl_unroll_hint))

Preprocessor Direcves & Macros [6.10]

#pragma OPENCL FP_CONTRACT on-o-switch

on-o-switch: ON, OFF, DEFAULT

__FILE__ Current source le

__func__ Current funcon name

__LINE__ Integer line number

__OPENCL_VERSION__ Integer version number, e.g: 200

CL_VERSION_1_0 Substutes integer 100 for 1.0

CL_VERSION_1_1 Substutes integer 110 for 1.1

CL_VERSION_1_2 Substutes integer 120 for 1.2

CL_VERSION_2_0 Substutes integer 200 for 2.0

__OPENCL_C_VERSION__ Sub. integer for OpenCL C version

__ENDIAN_LITTLE__ 1 if device is lile endian

__IMAGE_SUPPORT__ 1 if images are supported

__FAST_RELAXED_MATH__ 1 if –cl-fast-relaxed-math

opmizaon opon is specied

FP_FAST_FMA Dened if double fma is fast

FP_FAST_FMAF Dened if oat fma is fast

FP_FAST_FMA_HALF Dened if half fma is fast

__kernel_exec (X, typen) Same as:

__kernel __aribute__((work_group_size_hint(X, 1, 1)))

__aribute__((vec_type_hint(typen)))

OpenCL C Language Reference

Secon and table references are to the OpenCL 2.0 C Language specicaon.

Supported Data Types

Half vector and scalar types require cl_khr_fp16. Double types

require that CL_DEVICE_DOUBLE_FP_CONFIG is not zero.

Built-in Scalar Data Types

[6.1.1]

OpenCL Type API Type Descripon

bool -- true (1) or false (0)

char cl_char 8-bit signed

unsigned char, uchar cl_uchar 8-bit unsigned

short cl_short 16-bit signed

unsigned short, ushort cl_ushort 16-bit unsigned

int cl_int 32-bit signed

unsigned int, uint cl_uint 32-bit unsigned

long cl_long 64-bit signed

unsigned long, ulong cl_ulong 64-bit unsigned

oat cl_oat 32-bit oat

double cl_double 64-bit IEEE 754

half cl_half 16-bit oat (storage only)

size_t -- 32- or 64-bit unsigned integer

ptrdi_t -- 32- or 64-bit signed integer

intptr_t -- 32- or 64-bit signed integer

uintptr_t -- 32- or 64-bit unsigned integer

void void void

Built-in Vector Data Types

[6.1.2]

n is 2, 3, 4, 8, or 16.

OpenCL Type API Type Descripon

[u]char

cl_[u]charn 8-bit [un]signed

[u]short

cl_[u]shortn 16-bit [un]signed

[u]int

cl_[u]intn 32-bit [un]signed

[u]long

cl_[u]longn 64-bit [un]signed

oat

cl_oatn32-bit oat

double

cl_doublen64-bit oat

Other Built-in Data Types

[6.1.3]

The

OPTIONAL

types shown below are only dened if

CL_DEVICE_IMAGE_SUPPORT is CL_TRUE. API type for applicaon

shown in italics where applicable. Items in blue require the

cl_khr_gl_msaa_sharing extension.

OpenCL Type Descripon

image2d_[msaa_]t OPTIONAL 2D image handle

image3d_t OPTIONAL 3D image handle

image2d_array_ [msaa_]t OPTIONAL 2D image array

image1d_t OPTIONAL 1D image handle

image1d_buer_t OPTIONAL 1D image buer

image1d_array_t OPTIONAL 1D image array

image2d_ [msaa_]depth_t OPTIONAL 2D depth image

image2d_array_ [msaa_]depth_t OPTIONAL 2D depth image array

sampler_t OPTIONAL sampler handle

queue_t

ndrange_t

clk_event_t

reserve_id_t

event_t event handle

cl_mem_fence_ags

Reserved Data Types

[6.1.4]

OpenCL Type Descripon

boolnboolean vector

halfn 16-bit, vector

quad, quadn 128-bit oat, vector

complex half, complex halfn

imaginary half, imaginary halfn16-bit complex, vector

complex oat, complex oatn

imaginary oat, imaginary oatn32-bit complex, vector

complex double, complex doublen

imaginary double, imaginary doublen 64-bit complex, vector

complex quad, complex quadn

imaginary quad, imaginary quadn 128-bit complex, vector

oatnxm n*m matrix of 32-bit oats

doublenxm n*m matrix of 64-bit oats

Vector Component Addressing

[6.1.7]

Vector Components

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

oat2 v; v.x, v.s0 v.y, v.s1

oat3 v; v.x, v.s0 v.y, v.s1 v.z, v.s2

oat4 v; v.x, v.s0 v.y, v.s1 v.z, v.s2 v.w, v.s3

oat8 v; v.s0 v.s1 v.s2 v.s3 v.s4 v.s5 v.s6 v.s7

oat16 v; v.s0 v.s1 v.s2 v.s3 v.s4 v.s5 v.s6 v.s7 v.s8 v.s9 v.sa,

v.sA

v.sb,

v.sB

v.sc,

v.sC

v.sd,

v.sD

v.se,

v.sE

v.sf,

v.sF

Vector Addressing Equivalences

Numeric indices are preceded by the leer s or S, e.g.: s1. Swizzling, duplicaon, and nesng are allowed, e.g.: v.yx, v.xx, v.lo.x

v.lo v.hi v.odd v.even v.lo v.hi v.odd v.even

oat2 v.x, v.s0 v.y, v.s1 v.y, v.s1 v.x, v.s0 oat8 v.s0123 v.s4567 v.s1357 v.s0246

oat3 *v.s01, v.xy v.s23, v.zw v.s13, v.yw v.s02, v.xz oat16 v.s01234567 v.s89abcdef v.s13579bdf v.s02468ace

oat4 v.s01, v.xy v.s23, v.zw v.s13, v.yw v.s02, v.xz *When using .lo or .hi with a 3-component vector, the .w component is undened.

Operators and Qualiers

Operators [6.3]

These operators behave similarly as in C99 except operands

may include vector types when possible:

+-*%/--

++ == != & ~ ^

> < >= <= |!

&& || ?: >> << =

,op=sizeof

Address Space Qualiers [6.5]

__global, global __local, local

__constant, constant __private, private

Funcon Qualiers [6.7]

__kernel, kernel

__aribute__((vec_type_hint(type)))

//type defaults to int

__aribute__((work_group_size_hint(X, Y, Z)))

__aribute__((reqd_work_group_size(X, Y, Z)))

Conversions, Type Casng Examples [6.2]

T a = (T)b; // Scalar to scalar, or scalar to vector

T a = convert_T(b);

T a = convert_T_R(b);

T a = as_T(b);

T a = convert_T_sat_R(b);

R: one of the rounding modes

_rte to nearest even

_rtz toward zero

_rtp toward + innity

_rtn toward - innity

OpenCL 2.2 Reference GuidePage 14

OpenCL C Language

Math Constants

[6.13.2]

The values of the following symbolic constants are

single-precision oat.

MAXFLOAT Value of maximum non-innite single-precision

oang-point number

HUGE_VALF Posive oat expression, evaluates to +innity

HUGE_VAL Posive double expression, evals. to +innity

OPTIONAL

INFINITY Constant oat expression, posive or unsigned

innity

NAN Constant oat expression, quiet NaN

When double precision is supported (if cl_khr_fp64 is enabled)

macros ending in _F are available in type double by removing

_F from the macro name, and in type half (if cl_khr_fp16 is

enabled) by replacing _F with _H.

M_E_F Value of e

M_LOG2E_F Value of log2e

M_LOG10E_F Value of log10e

M_LN2_F Value of loge2

M_LN10_F Value of loge10

M_PI_F Value of π

M_PI_2_F Value of π / 2

M_PI_4_F Value of π / 4

M_1_PI_F Value of 1 / π

M_2_PI_F Value of 2 / π

M_2_SQRTPI_F Value of 2 / √π

M_SQRT2_F Value of √2

M_SQRT1_2_F Value of 1 / √2

Math Built-in Funcons [6.13.2]

Ts is type oat, oponally double (if cl_khr_fp64 is enabled), or

half (if cl_khr_fp16 is enabled). Tn is the vector form of Ts, where

n is 2, 3, 4, 8, or 16. T is Ts and Tn. All angles are in radians.

HN indicates that half and nave variants are available using only

the oat or oatn types by prepending “half_” or “nave_” to the

funcon name. Prototypes shown in brown text are available in

half_ and nave_ forms only using the oat or oatn types.

T acos (T) Arc cosine

T acosh (T) Inverse hyperbolic cosine

T acospi (T x) acos (x) / π

T asin (T) Arc sine

T asinh (T) Inverse hyperbolic sine

T asinpi (T x) asin (x) / π

T atan (T y_over_x) Arc tangent

T atan2 (T y, T x) Arc tangent of y / x

T atanh (T) Hyperbolic arc tangent

T atanpi (T x) atan (x) / π

T atan2pi (T x, T y) atan2 (y, x) / π

T cbrt (T) Cube root

T ceil (T) Round to integer toward + innity

T copysign (T x, T y) x with sign changed to sign of y

T cos (T)

Cosine

T cosh (T) Hyperbolic cosine

T cospi (T x) cos (π x)

T half_divide (T x, T y)

T nave_divide (T x, T y)

x / y

(T may only be oat or oatn)

T erfc (T) Complementary error funcon

T erf (T) Calculates error funcon of T

T exp (T x)

Exponenal base e

T exp2 (T)

Exponenal base 2

T exp10 (T)

Exponenal base 10

T expm1 (T x) ex -1.0

T fabs (T) Absolute value

T fdim (T x, T y) Posive dierence between x and y

T oor (T) Round to integer toward innity

T fma (T a, T b, T c) Mulply and add, then round

T fmax (T x, T y)

Tn fmax (Tn x, Ts y)

Return y if x < y,

otherwise it returns x

T fmin (T x, T y)

Tn fmin (Tn x, Ts y)

Return y if y < x,

otherwise it returns x

T fmod (T x, T y) Modulus. Returns x – y * trunc (x/y)

T fract (T x, T *iptr) Fraconal value in x

Ts frexp (T x, int *exp)

Tn frexp (T x, intn *exp) Extract manssa and exponent

T hypot (T x, T y) Square root of x2 + y2

int[n] ilogb (T x) Return exponent as an integer value

Ts ldexp (T x, int n)

Tn ldexp (T x, intn n) x * 2n

T lgamma (T x)

Ts lgamma_r (Ts x, int *signp)

Tn lgamma_r (Tn x, intn *signp)

Log gamma funcon

T log (T)

Natural logarithm

T log2 (T)

Base 2 logarithm

T log10 (T)

Base 10 logarithm

T log1p (T x)ln (1.0 + x)

T logb (T x) Exponent of x

T mad (T a, T b, T c) Approximates a * b + c

T maxmag (T x, T y) Maximum magnitude of x and y

T minmag (T x, T y) Minimum magnitude of x and y

T modf (T x, T *iptr) Decompose oang-point number

oat[n] nan (uint[n] nancode) Quiet NaN (Return is scalar when

nancode is scalar)

half[n] nan (ushort[n]

nancode)

double

[n]

nan (ulong

[n]

nancode)

Quiet NaN

(Return is scalar when nancode is

scalar)

T nextaer (T x, T y) Next representable oang-point

value aer x in the direcon of y

T pow (T x, T y) Compute x to the power of y

Ts pown (T x, int

Tn pown (T x, intn

y) Compute xy, where y is an integer

T powr (T x, T y)

Compute xy, where x is >= 0

T half_recip (T x)

T nave_recip (T x)

1 / x

(T may only be oat or oatn)

T remainder (T x, T y) Floang point remainder

Ts remquo (Ts x, Ts y, int *quo)

Tn remquo (Tn x, Tn y, intn *quo)Remainder and quoent

T rint (T) Round to nearest even integer

Ts rootn (T x, int

Tn rootn (T x, intn

y) Compute x to the power of 1/y

T round (T x) Integral value nearest to x rounding

T rsqrt (T)

Inverse square root

T sin (T)

Sine

T sincos (T x, T *cosval) Sine and cosine of x

T sinh (T) Hyperbolic sine

T sinpi (T x) sin (π x)

T sqrt (T)

Square root

T tan (T)

Tangent

T tanh (T) Hyperbolic tangent

T tanpi (T x) tan (π x)

T tgamma (T) Gamma funcon

T trunc (T) Round to integer toward zero

Work-Item Built-in Funcons [6.13.1]

Query the number of dimensions, global, and local work size

specied to clEnqueueNDRangeKernel, and global and local

idener of each work-item when this kernel is executed on a

device.

uint get_work_dim () Number of dimensions in use

size_t get_global_size (

uint dimindx) Number of global work-items

size_t get_global_id (

uint dimindx) Global work-item ID value

size_t get_local_size (

uint dimindx)

Number of local work-items if kernel

executed with uniform work-group size

size_t get_enqueued_local_size (

uint dimindx)

Number of local work-

items

size_t get_local_id (uint dimindx) Local work-item ID

size_t get_num_groups (

uint dimindx) Number of work-groups

size_t get_group_id (

uint dimindx) Work-group ID

size_t get_global_oset (

uint dimindx) Global oset

size_t get_global_linear_id () Work-items 1-dimensional

global ID

size_t get_local_linear_id () Work-items 1-dimensional

local ID

uint get_sub_group_size () Number of work-items in

the subgroup

uint get_max_sub_group_size () Maximum size of a

subgroup

uint get_num_sub_groups () Number of subgroups

uint get_enqueued_num_sub_groups ()

uint get_sub_group_id () Sub-group ID

uint get_sub_group_local_id () Unique work-item ID

Blocks [6.12]

A result value type with a list of parameter types: for example:

1. The ^ declares variable “myBlock” is a Block.

2. The return type for the Block “myBlock”is int.

3. myBlock takes a single argument of type int.

4. The argument is named “num.”

5. Mulplier captured from block’s environment.

int (^myBlock)(int) =

^(int num) {return num * multiplier; };

jk l

Access Qualiers [6.6]

Apply to 2D and 3D image types to declare if the image memory

object is being read or wrien by a kernel.

__read_only, read_only __write_only, write_only

__read_write, read_write

OpenCL 2.2 Reference Guide Page 15

OpenCL C Language



Image Read and Write Funcons [6.13.14]

The built-in funcons dened in this secon can only be used

with image memory objects created with clCreateImage.

sampler species the addressing and ltering mode to use.

aQual refers to one of the access qualiers. For samplerless

read funcons this may be read_only or read_write.

• Writes to images with sRGB channel orders requires device

support of the cl_khr_srgb_image_writes extension.

• read_imageh and write_imageh require the

cl_khr_fp16 extension.

• MSAA images require the cl_khr_gl_msaa_sharing extension.

• Image 3D writes require the extension

cl_khr_3d_image_writes. [9.4.8]

Read and write funcons for 2D images

Read an element from a 2D image, or write a color value to a

locaon in a 2D image.

oat4 read_imagef (read_only image2d_t image,

sampler_t sampler, {int2, oat2} coord)

int4 read_imagei (read_only image2d_t image,

sampler_t sampler, {int2, oat2} coord)

uint4 read_imageui (read_only image2d_t image,

sampler_t sampler, {int2, oat2} coord)

oat4 read_imagef (read_only image2d_array_t image,

sampler_t sampler, {int4, oat4} coord)

int4 read_imagei (read_only image2d_array_t image,

sampler_t sampler, {int4, oat4} coord)

uint4 read_imageui (read_only image2d_array_t image,

sampler_t sampler, {int4, oat4} coord)

oat read_imagef (read_only image2d_depth_t image,

sampler_t sampler, {int2, oat2} coord)

oat read_imagef (read_only image2d_array_depth_t image,

sampler_t sampler, {int4, oat4} coord)

oat4 read_imagef (aQual image2d_t image, int2 coord)

int4 read_imagei (aQual image2d_t image, int2 coord)

uint4 read_imageui (aQual image2d_t image, int2 coord)

oat4 read_imagef (aQual image2d_array_t image, int4 coord)

int4 read_imagei (aQual image2d_array_t image, int4 coord)

uint4 read_imageui (aQual image2d_array_t image, int4 coord)

oat read_imagef (aQual image2d_depth_t image, int2 coord)

oat read_imagef (aQual image2d_array_depth_t image,

int4 coord)

half4 read_imageh (read_only image2d_t image,

sampler_t sampler, {int2, oat2} coord)

half4 read_imageh (aQual image2d_t image, int2 coord)

half4 read_imageh (read_only image2d_array_t image,

sampler_t sampler, {int4, oat4} coord)

half4 read_imageh (aQual image2d_array_t image,

int4 coord)

void write_imagef (aQual image2d_t image,

int2 coord, oat4 color)

void write_imagei (aQual image2d_t image,

int2 coord, int4 color)

void write_imageui (aQual image2d_t image,

int2 coord, uint4 color)

void write_imageh (aQual image2d_t image,

int2 coord, half4 color)

void write_imagef (aQual image2d_array_t image,

int4 coord, oat4 color)

void write_imagei (aQual image2d_array_t image,

int4 coord, int4 color)

void write_imageui (aQual image2d_array_t image,

int4 coord, uint4 color)

void write_imagef (aQual image2d_depth_t image,

int2 coord, oat depth)

void write_imagef (aQual image2d_array_depth_t image,

int4 coord, oat depth)

void write_imageh (aQual image2d_array_t image,

int4 coord, half4 color)

Read and write funcons for 1D images

Read an element from a 1D image, or write a color value to a

locaon in a 1D image.

oat4 read_imagef (read_only image1d_t image,

sampler_t sampler, {int, oat} coord)

int4 read_imagei (read_only image1d_t image,

sampler_t sampler, {int, oat} coord)

uint4 read_imageui (read_only image1d_t image,

sampler_t sampler, {int, oat} coord)

oat4 read_imagef (read_only image1d_array_t image,

sampler_t sampler, {int2, oat4} coord)

int4 read_imagei (read_only image1d_array_t image,

sampler_t sampler, {int2, oat2} coord)

uint4 read_imageui (read_only image1d_array_t image,

sampler_t sampler, {int2, oat2} coord)

oat4 read_imagef (aQual image1d_t image, int coord)

oat4 read_imagef (aQual image1d_buer_t image, int coord)

int4 read_imagei (aQual image1d_t image, int coord)

uint4 read_imageui (aQual image1d_t image, int coord)

int4 read_imagei (aQual image1d_buer_t image, int coord)

uint4 read_imageui (aQual image1d_buer_t image, int coord)

oat4 read_imagef (aQual image1d_array_t image, int2 coord)

int4 read_imagei (aQual image1d_array_t image, int2 coord)

uint4 read_imageui (aQual image1d_array_t image, int2 coord)

half4 read_imageh (read_only image1d_t image,

sampler_t sampler, {int, oat} coord)

half4 read_imageh (aQual image1d_t image, int coord)

half4 read_imageh (read_only image1d_array_t image,

sampler_t sampler, {int2, oat4} coord)

half4 read_imageh (aQual image1d_array_t image, int2 coord)

half4 read_imageh (aQual image1d_buer_t image, int coord)

void write_imagef (aQual image1d_t image,

int coord, oat4 color)

void write_imagei (aQual image1d_t image,

int coord, int4 color)

void write_imageui (aQual image1d_t image,

int coord, uint4 color)

void write_imageh (aQual image1d_t image,

int coord, half4 color)

void write_imagef (aQual image1d_buer_t image,

int coord, oat4 color)

void write_imagei (aQual image1d_buer_t image,

int coord, int4 color)

void write_imageui (aQual image1d_buer_t image,

int coord, uint4 color)

void write_imageh (aQual image1d_buer_t image,

int coord, half4 color)

void write_imagef (aQual image1d_array_t image,

int2 coord, oat4 color)

void write_imagei (aQual image1d_array_t image,

int2 coord, int4 color)

void write_imageui (aQual image1d_array_t image,

int2 coord, uint4 color)

void write_imageh (aQual image1d_array_t image,

int2 coord, half4 color)

Read and write funcons for 3D images

Read an element from a 3D image, or write a color value to

a locaon in a 3D image. Wring to 3D images requires the

cl_khr_3d_image_writes extension [9.4.8].

oat4 read_imagef (read_only image3d_t image,

sampler_t sampler, {int4, oat4} coord)

int4 read_imagei (read_only image3d_t image,

sampler_t sampler, int4 coord)

int4 read_imagei (read_only image3d_t image,

sampler_t sampler, oat4 coord)

uint4 read_imageui (read_only image3d_t image,

sampler_t sampler, {int4, oat4} coord)

oat4 read_imagef (aQual image3d_t image, int4 coord)

int4 read_imagei (aQual image3d_t image, int4 coord)

uint4 read_imageui (aQual image3d_t image, int4 coord)

half4 read_imageh (read_only image3d_t image,

sampler_t sampler, {int4, oat4} coord)

half4 read_imageh (aQual image3d_t image, int4 coord)

void write_imagef (aQual image3d_t image,

int4 coord, oat4 color)

void write_imagei (aQual image3d_t image,

int4 coord, int4 color)

void write_imageui (aQual image3d_t image,

int4 coord, uint4 color)

void write_imageh (aQual image3d_t image,

int4 coord, half4 color)

Extended mipmap read and write funcons

These funcons require the cl_khr_mipmap_image and

cl_khr_mipmap_image_writes extensions.

oat read_imagef (read_only image2d_[depth_]t image,

sampler_t sampler, oat2 coord, oat lod)

int4 read_imagei (read_only image2d_t image,

sampler_t sampler, oat2 coord, oat lod)

uint4 read_imageui (read_only image2d_t image,

sampler_t sampler, oat2 coord, oat lod)

oat read_imagef (read_only image2d_ [depth_]t image,

sampler_t sampler, oat2 coord, oat2 gradient_x,

oat2 gradient_y)

int4 read_imagei (read_only image2d_t image,

sampler_t sampler, oat2 coord, oat2 gradient_x,

oat2 gradient_y)

uint4 read_imageui (read_only image2d_t image,

sampler_t sampler, oat2 coord, oat2 gradient_x,

oat2 gradient_y)

oat4 read_imagef (read_only image1d_t image,

sampler_t sampler, oat coord, oat lod)

int4 read_imagei (read_only image1d_t image,

sampler_t sampler, oat coord, oat lod)

uint4 read_imageui(read_only image1d_t image,

sampler_t sampler, oat coord, oat lod)

oat4 read_imagef (read_only image1d_t image,

sampler_t sampler, oat coord, oat gradient_x,

oat gradient_y)

int4 read_imagei (read_only image1d_t image,

sampler_t sampler, oat coord, oat gradient_x,

oat gradient_y)

uint4 read_imageui(read_only image1d_t image,

sampler_t sampler, oat coord, oat gradient_x,

oat gradient_y)

oat4 read_imagef (read_only image3d_t image,

sampler_t sampler, oat4 coord, oat lod)

int4 read_imagei(read_only image3d_t image,

sampler_t sampler, oat4 coord, oat lod)

uint4 read_imageui(read_only image3d_t image,

sampler_t sampler, oat4 coord, oat lod)

oat4 read_imagef (read_only image3d_t image,

sampler_t sampler, oat4 coord, oat4 gradient_x,

oat4 gradient_y)

(Connued on next page >)

OpenCL 2.2 Reference GuidePage 16

OpenCL C Language



Image Query Funcons

[6.13.14.5] [9.10.3]

The MSAA forms require the extension cl_khr_gl_msaa_sharing.

Mipmap requires the extension cl_khr_mipmap_image.

Query image width, height, and depth in pixels

int get_image_width (aQual image{1,2,3}d_t image)

int get_image_width (aQual image1d_buer_t image)

int get_image_width (aQual image{1,2}d_array_t image)

int get_image_width (

aQual image2d_[array_]depth_t image)

int get_image_width (aQual image2d_[array_]msaa_t image)

int get_image_width (

aQual image2d_ [array_]msaa_depth_t image)

int get_image_height (aQual image{2,3}d_t image)

int get_image_height (aQual image2d_array_t image)

int get_image_height (

aQual image2d_[array_]depth_t image)

int get_image_height (

aQual image2d_[array_]msaa_t image)

int get_image_height (

aQual image2d_[array_]msaa_depth_t image)

int get_image_depth (image3d_t image)

Query image array size

size_t get_image_array_size (aQual image1d_array_t image)

size_t get_image_array_size (aQual image2d_array_t image)

size_t get_image_array_size (

aQual image2d_array_depth_t image)

size_t get_image_array_size (

aQual image2d_array_msaa_depth_t image)

Query image dimensions

int2 get_image_dim (aQual image2d_t image)

int2 get_image_dim (aQual image2d_array_t image)

int4 get_image_dim (aQual image3d_t image)

int2 get_image_dim (aQual image2d_[array_]depth_t image)

int2 get_image_dim (aQual image2d_[array_]msaa_t image)

int2 get_image_dim (

aQual image2d_ [array_]msaa_depth_t image)

Query image channel data type and order

int get_image_channel_data_type (

aQual image{1,2,3}d_t image)

int get_image_channel_data_type (

aQual image1d_buer_t image)

int get_image_channel_data_type (

aQual image{1,2}d_array_t image)

int get_image_channel_data_type (aQual

image2d_[array_]depth_t image)

int get_image_channel_data_type (

aQual image2d_[array_]msaa_t image)

int get_image_channel_data_type (

aQual image2d_[array_]msaa_depth_t image)

int get_image_channel_order (aQual image{1,2,3}d_t image)

int get_image_channel_order (

aQual image1d_buer_t image)

int get_image_channel_order (

aQual image{1,2}d_array_t image)

int get_image_channel_order (

aQual image2d_[array_]depth_t image)

int get_image_channel_order (

aQual image2d_[array_]msaa_t image)

int get_image_channel_order(

aQual image2d_[array_]msaa_depth_t image)

Extended query funcons [9.10.3]

These funcons require the cl_khr_mipmap_image extension.

int get_image_num_mip_levels (aQual image1d_t image)

int get_image_num_mip_levels (

aQual image2d_ [depth_]t image)

int get_image_num_mip_levels (aQual image3d_t image)

int get_image_num_mip_levels (

aQual image1d_array_t image)

int get_image_num_mip_levels (

aQual image2d_array_[depth_]t image)

int get_image_num_samples (

aQual image2d_[array_]msaa_t image)

int get_image_num_samples (

aQual image2d_ [array_]msaa_depth_t image)

Image Read and Write (connued)

Extended mipmap read and write funcons (cont’d)

int4 read_imagei(read_only image3d_t image,

sampler_t sampler, oat4 coord, oat4 gradient_x,

oat4 gradient_y)

uint4 read_imageui(read_only image3d_t image,

sampler_t sampler, oat4 coord, oat4 gradient_x,

oat4 gradient_y)

oat4 read_imagef (read_only image1d_array_t image,

sampler_t sampler, oat2 coord, oat lod)

int4 read_imagei (read_only image1d_array_t image,

sampler_t sampler, oat2 coord, oat lod)

uint4 read_imageui(read_only image1d_array_t image,

sampler_t sampler, oat2 coord, oat lod)

oat4 read_imagef (read_only image1d_array_t image,

sampler_t sampler, oat2 coord, oat gradient_x,

oat gradient_y)

int4 read_imagei (read_only image1d_array_t image,

sampler_t sampler, oat2 coord, oat gradient_x,

oat gradient_y)

uint4 read_imageui(read_only image1d_array_t image,

sampler_t sampler, oat2 coord, oat gradient_x,

oat gradient_y)

oat read_imagef (read_only image2d_array_ [depth_]t image,

sampler_t sampler, oat4 coord, oat lod)

int4 read_imagei (read_only image2d_array_t image,

sampler_t sampler, oat4 coord, oat lod)

uint4 read_imageui (read_only image2d_array_t image,

sampler_t sampler, oat4 coord, oat lod)

oat read_imagef (

read_only image2d_array_ [depth_]t image,

sampler_t sampler, oat4 coord, oat2 gradient_x,

oat2 gradient_y)

int4 read_imagei (read_only image2d_array_t image,

sampler_t sampler, oat4 coord, oat2 gradient_x,

oat2 gradient_y)

uint4 read_imageui (read_only image2d_array_t image,

sampler_t sampler, oat4 coord, oat2 gradient_x,

oat2 gradient_y)

void write_imagef (aQual image2d_ [depth_]t image,

int2 coord, int lod, oat4 color)

void write_imagei (aQual image2d_t image, int2 coord, int lod,

int4 color)

void write_imageui (aQual image2d_t image, int2 coord, int lod,

uint4 color)

void write_imagef (aQual image1d_t image, int coord, int lod,

oat4 color)

void write_imagei (aQual image1d_t image, int coord, int lod,

int4 color)

void write_imageui (aQual image1d_t image, int coord, int lod,

uint4 color)

void write_imagef (aQual image1d_array_t image, int2 coord,

int lod, oat4 color)

void write_imagei (aQual image1d_array_t image, int2 coord,

int lod, int4 color)

void write_imageui (aQual image1d_array_t image, int2 coord,

int lod, uint4 color)

void write_imagef (aQual image2d_array_ [depth_]t image,

int4 coord, int lod, oat4 color)

void write_imagei (aQual image2d_array_t image, int4 coord,

int lod, int4 color)

void write_imageui (aQual image2d_array_t image, int4 coord,

int lod, uint4 color)

void write_imagef (aQual image3d_t image, int4 coord, int lod,

oat4 coord)

void write_imagei (aQual image3d_t image, int4 coord, int lod,

int4 color)

void write_imageui (aQual image3d_t image, int4 coord, int lod,

uint4 color)

Extended mul-sample image read funcons

[9.10.3]

The extension cl_khr_gl_msaa_sharing adds the following built-in

funcons.

oat read_imagef (aQual image2d_msaa_depth_t image,

int2 coord, int sample)

oat read_imagef (aQual image2d_array_depth_msaa_t image,

int4 coord, int sample)

oat4 read_image{f, i, ui} (image2d_msaa_t image,

int2 coord, int sample)

oat4 read_image{f, i, ui} (image2d_array_msaa_t image,

int4 coord, int sample)

Notes

OpenCL 2.2 Reference Guide Page 17

OpenCL C Language

Relaonal Built-in Funcons [6.13.6]

These funcons can be used with built-in scalar or vector types

as arguments and return a scalar or vector integer result. T

is type oat, oatn, char, charn, uchar, ucharn, short, shortn,

ushort, ushortn, int, intn, uint, uintn, long, longn, ulong, ulongn,

or oponally double or doublen (if cl_khr_fp64 is enabled) or half

and halfn

(if cl_khr_fp16 is enabled). Ti is type char, charn, short,

shortn, int, intn, long, or longn. Tu is type uchar, ucharn, ushort,

ushortn, uint, uintn, ulong, or ulongn. n is 2, 3, 4, 8, or 16.

int isequal (oat x, oat y)

intn isequal (oatn x, oatn y)

int isequal (double x, double y)

longn isequal (doublen x, doublen y)

int isequal (half x, half y)

shortn isequal (halfn x, halfn y)

Compare of x == y

int isnotequal (oat x, oat y)

intn isnotequal (oatn x, oatn y)

int isnotequal (double x, double y)

longn isnotequal (doublen x, doublen y)

int isnotequal (half x, half y)

shortn isnotequal (halfn x, halfn y)

Compare of x != y

int isgreater (oat x, oat y)

intn isgreater (oatn x, oatn y)

int isgreater (double x, double y)

longn isgreater (doublen x, doublen y)

int isgreater (half x, half y)

shortn isgreater (halfn x, halfn y)

Compare of x > y

int isgreaterequal (oat x, oat y)

intn isgreaterequal (oatn x, oatn y)

int isgreaterequal (double x, double y)

Compare of x >= y

longn isgreaterequal (doublen x, doublen y)

int isgreaterequal (half x, half y)

shortn isgreaterequal (halfn x, halfn y)

Compare of x >= y

int isless (oat x, oat y)

intn isless (oatn x, oatn y)

int isless (double x, double y)

Compare of x < y

longn isless (doublen x, doublen y)

int isless (half x, half y)

shortn isless (halfn x, halfn y)

Compare of x < y

int islessequal (oat x, oat y)

intn islessequal (oatn x, oatn y)

int islessequal (double x, double y)

longn islessequal (doublen x, doublen y)

int islessequal (half x, half y)

shortn islessequal (halfn x, halfn y)

Compare of x <= y

int islessgreater (oat x, oat y)

intn islessgreater (oatn x, oatn y)

int islessgreater (double x, double y)

longn islessgreater (doublen x, doublen y)

int islessgreater (half x, half y)

shortn islessgreater (halfn x, halfn y)

Compare of

(x < y) || (x > y)

int isnite (oat)

intn isnite (oatn)

int isnite (double)

longn isnite (doublen)

int isnite (half)

shortn isnite (halfn)

Test for nite value

int isinf (oat)

intn isinf (oatn)

int isinf (double)

longn isinf (doublen)

int isinf (half)

shortn isinf (halfn)

Test for + or – innity

int isnan (oat)

intn isnan (oatn)Test for a NaN

int isnan (double)

longn isnan (doublen)

int isnan (half)

shortn isnan (halfn)

Test for a NaN

int isnormal (oat)

intn isnormal (oatn)

int isnormal (double)

Test for a normal

value

longn isnormal (doublen)

int isnormal (half)

shortn isnormal (halfn)

Test for a normal

value

int isordered (oat x, oat y)

intn isordered (oatn x, oatn y)

int isordered (double x, double y)

longn isordered (doublen x, doublen y)

int isordered (half x, half y)

shortn isordered (halfn x, halfn y)

Test if arguments are

ordered

int isunordered (oat x, oat y)

intn isunordered (oatn x, oatn y)

int isunordered (double x, double y)

longn isunordered (doublen x, doublen y)

int isunordered (half x, half y)

shortn isunordered (halfn x, halfn y)

Test if arguments are

unordered

int signbit (oat)

intn signbit (oatn)

int signbit (double)

longn signbit (doublen)

int signbit (half)

shortn signbit (halfn)

Test for sign bit

int any (Ti x)1 if MSB in component

of x is set; else 0

int all (Ti x)

1 if MSB in all

components of x are

set; else 0

T bitselect (T a, T b, T c)

half bitselect (half a, half b, half c)

halfn bitselect (halfn a, halfn b, halfn c)

Each bit of result is

corresponding bit of

a if corresponding bit

of c is 0

T select (T a, T b, Ti c)

T select (T a, T b, Tu c)

halfn select (halfn a, halfn b, shortn c)

half select (half a, half b, short c)

halfn select (halfn a, halfn b, ushortn c)

half select (half a, half b, ushort c)

For each component

of a vector type,

result[i] = if MSB of

c[i] is set ? b[i] : a[i]

For scalar type, result

= c ? b : a

Integer Built-in Funcons [6.13.3]

T is type char, charn, uchar, ucharn, short, shortn, ushort,

ushortn, int, intn, uint, uintn, long, longn, ulong, or ulongn,

where n is 2, 3, 4, 8, or 16. Tu is the unsigned version of T. Tsc is

the scalar version of T.

Tu abs (T x)| x |

Tu abs_di (T x, T y)| x – y | without modulo overow

T add_sat (T x, T y)x + y and saturates the result

T hadd (T x, T y) (x + y) >> 1 without mod. overow

T rhadd (T x, T y) (x + y + 1) >> 1

T clamp (T x, T min, T max)

T clamp (T x, Tsc min, Tsc max)min(max(x, minval), maxval)

T clz (T x)Number of leading 0-bits in x

T ctz (T x)Number of trailing 0-bits in x

T mad_hi (T a, T b, T c)mul_hi(a, b) + c

T mad_sat (T a, T b, T c)a * b + c and saturates the result

T max (T x, T y)

T max (T x, Tsc y)y if x < y, otherwise it returns x

T min (T x, T y)

T min (T x, Tsc y)y if y < x, otherwise it returns x

T mul_hi (T x, T y)High half of the product of x and y

T rotate (T v, T i)result[indx] = v[indx] << i[indx]

T sub_sat (T x, T y)x - y and saturates the result

T popcount (T x)Number of non-zero bits in x

For upsample, return type is scalar when the parameters are scalar.

short[n] upsample (

char[n] hi, uchar[n] lo)result[i]= ((short)hi[i]<< 8) | lo[i]

ushort[n] upsample (

uchar[n] hi, uchar[n] lo)result[i]=((ushort)hi[i]<< 8) | lo[i]

int[n] upsample (

short[n] hi, ushort[n] lo)result[i]=((int)hi[i]<< 16) | lo[i]

uint[n] upsample (

ushort[n] hi, ushort[n] lo)result[i]=((uint)hi[i]<< 16) | lo[i]

long[n] upsample (

int[n] hi, uint[n] lo)result[i]=((long)hi[i]<< 32) | lo[i]

ulong[n] upsample (

uint[n] hi, uint[n] lo)result[i]=((ulong)hi[i]<< 32) | lo[i]

The following fast integer funcons opmize the performance

of kernels. In these funcons, T is type int, uint, intn, or uintn,

where n is 2, 3, 4, 8, or 16.

T mad24 (T x, T y, T z) Mulply 24-bit integer values x, y, add

32-bit int. result to 32-bit integer z

T mul24 (T x, T y) Mulply 24-bit integer values x and y

Common Built-in Funcons [6.13.4]

These funcons operate component-wise and use round to

nearest even rounding mode. Ts is type oat, oponally double

(if cl_khr_fp64 is enabled) or half and halfn

(if cl_khr_fp16 is

enabled). Tn is the vector form of Ts, where n is 2, 3, 4, 8, or 16.

T is Ts and Tn.

T clamp (T x, T min, T max)

Tn clamp (Tn x, Ts min, Ts max)

Clamp x to range given by

min, max

T degrees (T radians) radians to degrees

T max (T x, T y)

Tn max (Tn x, Ts y) Max of x and y

T min (T x, T y)

Tn min (Tn x, Ts y) Min of x and y

T mix (T x, T y, T a)

Tn mix (Tn x, Tn y, Ts a) Linear blend of x and y

T radians (T degrees) degrees to radians

T step (T edge, T x)

Tn step (Ts edge, Tn x) 0.0 if x < edge, else 1.0

T smoothstep (T edge0, T edge1, T x)

T smoothstep (Ts edge0, Ts edge1, T x) Step and interpolate

T sign (T x) Sign of x

Geometric Built-in Funcons [6.13.5]

Ts is scalar type oat, oponally double (if cl_khr_fp64 is enabled),

or half (if cl_khr_fp16 is enabled). T is Ts and the 2-, 3-, or

4-component vector forms of Ts.

oat{3,4} cross (oat{3,4} p0, oat{3,4} p1)

double{3,4} cross (double{3,4} p0, double{3,4} p1)

half{3,4} cross (half{3,4} p0, half{3,4} p1)

Cross product

Ts distance (T p0, T p1)Vector distance

Ts dot (T p0, T p1)Dot product

Ts length (T p)Vector length

T normalize (T p)Normal vector

length 1

oat fast_distance (oat p0, oat p1)

oat fast_distance (oatn p0, oatn p1) Vector distance

oat fast_length (oat p)

oat fast_length (oatn p) Vector length

oat fast_normalize (oat p)

oatn fast_normalize (oatn p)

Normal vector

length 1

OpenCL 2.2 Reference GuidePage 18

OpenCL C Language

Vector Data Load/Store [6.13.7]

T is type char, uchar, short, ushort, int, uint, long, ulong, or oat,

oponally double (if cl_khr_fp64 is enabled), or half (if cl_khr_fp16

is enabled). Tn refers to the vector form of type T, where n is 2, 3,

4, 8, or 16. R defaults to current rounding mode, or is one of the

rounding modes listed in 6.2.3.2.

vloadn (size_t oset,

const [constant] T *p)

Read vector data from

address (p + (oset * n))

void vstoren (Tn data,

size_t oset, T *p)

Write vector data to address

(p + (oset * n)

oat vload_half (size_t oset,

const [constant] half *p)

Read a half from address

(p + oset)

oatn vload_halfn (size_t oset,

const [constant] half *p)

Read a halfn from address

(p + (oset * n))

void vstore_half (oat data,

size_t oset, half *p)

void vstore_half_R (oat data,

size_t oset, half *p)

void vstore_half (double data,

size_t oset, half *p)

Write a half to address

(p + oset)

void vstore_half_R (double data,

size_t oset, half *p)

Write a half to address

(p + oset)

void vstore_halfn (oatn data,

size_t oset, half *p)

void vstore_halfn_R (oatn data,

size_t oset, half *p)

void vstore_halfn (doublen data,

size_t oset, half *p)

Write a half vector to address

(p + (oset * n))

void vstore_halfn_R (doublen

data, size_t oset, half *p)

Write a half vector to address

(p + (oset * n))

oatn vloada_halfn (size_t oset,

const [constant] half *p)

Read half vector data from

(p + (oset * n)). For half3,

read from (p + (oset * 4)).

void vstorea_halfn (oatn data,

size_t oset, half *p)

void vstorea_halfn_R (oatn data,

size_t oset, half *p)

void vstorea_halfn (doublen data,

size_t oset, half *p)

void vstorea_halfn_R (doublen

data, size_t oset, half *p)

Write half vector data to (p +

(oset * n)). For half3, write

to (p + (oset * 4)).

Atomic Funcons [6.13.11]

OpenCL C implements a subset of the C11 atomics (see secon 7.17 of the C11 specicaon) and

synchronizaon operaons.

In the following tables, A refers to an atomic_* type (not including atomic_ag). C refers to

its corresponding non-atomic type. M refers to the type of the other argument for arithmec

operaons. For atomic integer types, M is C. For atomic pointer types, M is ptrdi_t.

The type atomic_* is a 32-bit integer. atomic_long and atomic_ulong require extension

cl_khr_int64_base_atomics or cl_khr_int64_extended_atomics. The atomic_double type is available

if cl_khr_fp64 is enabled. The default scope is work_group for local atomics and all_svm_devices

for global atomics. The extensions cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics

implement atomic operaons on 64-bit signed and unsigned integers to locaons in __global and

__local memory.

See the table under Atomic Types and Enum Constants for informaon about parameter types

memory_order, memory_scope, and memory_ag.

void atomic_init(volale A *obj, C value)Inializes the atomic object pointed to by obj to

the value value.

void atomic_work_item_fence(

cl_mem_fence_ags ags, memory_order

order, memory_scope scope)

Eects based on value of order. ags must be

CLK_{GLOBAL, LOCAL, IMAGE}_MEM_FENCE or a

combinaon of these.

void atomic_store(volale A *object, C desired)

void atomic_store_explicit(volale A *object,

C desired, memory_order order

[ , memory_scope scope])

Atomically replace the value pointed to by object

with the value of desired. Memory is aected

according to the value of order.

C atomic_load(volale A *object)

C atomic_load_explicit(volale A *object,

memory_order order[ , memory_scope scope])

Atomically returns the value pointed to by

object. Memory is aected according to the

value of order.

C atomic_exchange(volale A *object, C desired)

C atomic_exchange_explicit(volale A *object,

C desired, memory_order order

[ , memory_scope scope])

Atomically replace the value pointed to by object

with desired. Memory is aected according to

the value of order.

bool atomic_compare_exchange_strong(

volale A *object, C *expected, C desired)

bool atomic_compare_exchange_strong_explicit(

volale A *object, C *expected, C desired,

memory_order success,

memory_order failure[ , memory_scope scope])

bool atomic_compare_exchange_weak(

volale A *object,

C *expected, C desired)

bool atomic_compare_exchange_weak_explicit(

volale A *object, C *expected, C desired,

memory_order success,

memory_order failure[ , memory_scope scope])

Atomically compares the value pointed to by

object for equality with that in expected, and

if true, replaces the value pointed to by object

with desired, and if false, updates the value

in expected with the value pointed to by object.

These operaons are atomic read-modify-write

operaons.

C atomic_fetch_<key>(volale A *object, M operand)

C atomic_fetch_<key>_explicit(volale A *object,

M operand, memory_order order

[ , memory_scope scope])

Atomically replaces the value pointed to by

object with the result of the computaon

applied to the value pointed to by object and

the given operand.

bool atomic_ag_test_and_set(

volale atomic_ag *object)

bool atomic_ag_test_and_set_explicit(

volale atomic_ag *object,

memory_order order[ , memory_scope scope])

Atomically sets the value pointed to by object

to true. Memory is aected according to the

value of order. Returns atomically, the value of

the object immediately before the eects.

void atomic_ag_clear(volale atomic_ag *object)

void atomic_ag_clear_explicit(

volale atomic_ag *object,

memory_order order[ , memory_scope scope])

Atomically sets the value pointed to by object

to false. The order argument shall not be

memory_order_acquire nor

memory_order_acq_rel. Memory is aected

according to the value of order.

Values for key for atomic_fetch and modify funcons

key op computaon key op computaon

add + addion and & bitwise and

sub - subtracon min min compute min

or | bitwise inclusive or max max compute max

xor ^ bitwise exclusive or

Atomic Types and Enum Constants

Parameter Type Values

memory_order memory_order_relaxed memory_order_acquire memory_order_release

memory_order_ acq_rel memory_order_seq_cst

memory_scope

memory_scope_work_item memory_scope_work_group

memory_scope_sub_group memory_scope_all_svm_devices

memory_scope_device (default for funcons that do not take a memory_scope

argument)

Atomic integer and oang-point types

† indicates types supported by a limited subset of atomic operaons

‡ indicates size depends on whether implemented on 64-bit or 32-bit architecture.

§ indicates types supported only if extensions cl_khr_int64_base_atomics and

cl_khr_int64_extended_atomics are enabled.

atomic_int

atomic_uint

atomic_ag

atomic_long §

atomic_ulong §

atomic_oat

†

atomic_double

†§

atomic_intptr_t ‡§

atomic_uintptr_t ‡§

atomic_size_t ‡§

atomic_ptrdi_t ‡§

Atomic macros

#dene ATOMIC_VAR_INIT(C value) Expands to a token sequence to inialize an atomic object of

a type that is inializaon-compable with value.

#dene ATOMIC_FLAG_INIT Inialize an atomic_ag to the clear state.

Async Copies and Prefetch [6.13.10]

T is type char, charn, uchar, ucharn, short, shortn, ushort, ushortn, int, intn, uint, uintn, long,

longn, ulong, ulongn, oat, oatn, oponally double or doublen (if cl_khr_fp64 is enabled), or

half or halfn (if cl_khr_fp16 is enabled).

event_t async_work_group_copy ( __local T *dst,

const __global T *src, size_t num_gentypes, event_t event)

event_t async_work_group_copy ( __global T *dst,

const __local T *src, size_t num_gentypes, event_t event)Copies

num_gentypes

T elements from

src to dst

event_t async_work_group_strided_copy ( __local T *dst, const __global T *src,

size_t num_gentypes, size_t src_stride, event_t event)

event_t async_work_group_strided_copy ( __global T *dst, const __local T *src,

size_t num_gentypes, size_t dst_stride, event_t event)

void wait_group_events (

int num_events, event_t *event_list)

Wait for compleon of

async_work_group_copy

void prefetch (const __global T *p,

size_t num_gentypes)

Prefetch num_gentypes * sizeof(T) bytes

into global cache

Synchronizaon & Memory Fence Funcons [6.13.8]

ags argument is the memory address space, set to a 0 or an OR’d combinaon of

CLK_X_MEM_FENCE where X may be LOCAL, GLOBAL, or IMAGE. Memory fence funcons provide

ordering between memory operaons of a work-item.

void work_group_barrier (cl_mem_fence_ags ags

[, memory_scope scope])

Work-items in a work-group must

execute this before any can connue

void atomic_work_item_fence (cl_mem_fence_ags ags

[, memory_scope scope])

Orders loads and stores of a work-

item execung a kernel

void sub_group_barrier (cl_mem_fence_ags ags

[, memory_scope scope])

Work-items in a sub-group must

execute this before any can connue

OpenCL 2.2 Reference Guide Page 19

OpenCL C Language

Address Space Qualier Funcons [6.13.9]

T refers to any of the built-in data types supported by OpenCL C

or a user-dened type.

[const] global T * to_global (

[const] T *ptr)global address space

[const] local T * to_local (

[const] T *ptr)local address space

[const] private T * to_private (

[const] T *ptr)private address space

[const] cl_mem_fence_ags

get_fence( [const] T *ptr)

Memory fence value:

CLK_GLOBAL_MEM_FENCE,

CLK_LOCAL_MEM_FENCE

prin Funcon [6.13.13]

Writes output to an implementaon-dened stream.

int prin (constant char * restrict format, …)

prin output synchronizaon

When the event associated with a parcular kernel invocaon

completes, the output of applicable prin calls is ushed to the

implementaon-dened output stream.

prin format string

The format string follows C99 convenons and supports an

oponal vector specier:

%[ags][width][.precision][vector][length] conversion

Examples:

The following examples show the use of the vector specier in

the prin format string.

oat4 f = (oat4)(1.0f, 2.0f, 3.0f, 4.0f);

uchar4 uc = (uchar4)(0xFA, 0xFB, 0xFC, 0xFD);

prin("f4 = %2.2v4hlf\n", f);

prin("uc = %#v4hhx\n", uc);

The above two prin calls print the following:

f4 = 1.00,2.00,3.00,4.00

uc = 0xfa,0x,0xfc,0xfd

Workgroup Funcons [6.13.15]

T is type int, uint, long, ulong, or oat, oponally double (if

cl_khr_fp64 is enabled) or half (if cl_khr_fp64 is enabled).

Returns a non-zero value if predicate evaluates to non-zero for

all or any workitems in the work-group.

int work_group_all (int predicate)

int work_group_any (int predicate)

int sub_group_all (int predicate)

int sub_group_any (int predicate)

Return result of reducon operaon specied by <op> for all

values of x specied by workitems in work-group. <op> may be

min, max, or add.

T work_group_reduce_<op> (T x)

T sub_group_reduce_<op> (T x)

Broadcast the value of a to all work-items in the work-group.

local_id must be the same value for all workitems in the work-

group.

T work_group_broadcast (T a, size_t local_id)

T work_group_broadcast (T a, size_t local_id_x,

size_t local_id_y)

T work_group_broadcast (T a, size_t local_id_x,

size_t local_id_y, size_t local_id_z)

T sub_group_broadcast (T x, size_t local_id)

Do an exclusive or inclusive scan operaon specied by <op> of

all values specied by work-items in the work-group. The scan

results are returned for each work-item. <op> may be min, max,

or add.

T work_group_scan_exclusive_<op> (T x)

T work_group_scan_inclusive_<op> (T x)

T sub_group_scan_exclusive_<op> (T x)

T sub_group_scan_inclusive_<op> (T x)

Pipe Built-in Funcons [6.13.16.2-4]

T represents the built-in OpenCL C scalar or vector integer or

oang-point data types or any user dened type built from these

scalar and vector data types. Half scalar and vector types require

the cl_khr_fp16 extension. Double or vector double types require

the cl_khr_fp64 extension. The macro CLK_NULL_RESERVE_ID

refers to an invalid reservaon ID.

int read_pipe (

__read_only pipe T p, T *ptr)

Read packet from p

into ptr.

int read_pipe (__read_only pipe T p,

reserve_id_t reserve_id,

uint index, T *ptr)

Read packet from

reserved area of the

pipe reserve_id and

index into ptr.

int write_pipe (

__write_only pipe T p, const T *ptr)

Write packet specied

by ptr to p.

int write_pipe (

__write_only pipe T p,

reserve_id_t reserve_id,

uint index, const T *ptr)

Write packet specied

by ptr to reserved area

reserve_id and index.

bool is_valid_reserve_id (

reserve_id_t reserve_id)

Return true if reserve_id

is a valid reservaon ID

and false otherwise.

reserve_id_t reserve_read_pipe (

__read_only pipe T p,

uint num_packets)

reserve_id_t reserve_write_pipe (

__write_only pipe T p,

uint num_packets)

Reserve num_packets

entries for reading from

or wring to p.

void commit_read_pipe (

__read_only pipe T p,

reserve_id_t reserve_id)

void commit_write_pipe (

__write_only pipe T p,

reserve_id_t reserve_id)

Indicates that all reads

and writes to num_

packets associated with

reservaon reserve_id

are completed.

uint get_pipe_max_packets (

pipe T p)

Returns maximum

number of packets

specied when p was

created.

uint get_pipe_num_packets (

pipe T p)

Returns the number of

available entries in p.

void work_group_commit_read_pipe (pipe T p, reserve_id_t reserve_id)

void work_group_commit_write_pipe (pipe T p, reserve_id_t reserve_id)

void sub_group_commit_read_pipe (pipe T p, reserve_id_t reserve_id)

void sub_group_commit_write_pipe (pipe T p, reserve_id_t reserve_id)

Indicates that all reads and writes

to num_packets associated with

reservaon reserve_id are completed.

reserve_id_t work_group_reserve_read_pipe (pipe T p, uint num_packets)

reserve_id_t work_group_reserve_write_pipe (pipe T p, uint num_packets)

reserve_id_t sub_group_reserve_read_pipe (pipe T p, uint num_packets)

reserve_id_t sub_group_reserve_write_pipe (pipe T p, uint num_packets)

Reserve num_packets entries for

reading from or wring to p. Returns a

valid reservaon ID if the reservaon

is successful.

Miscellaneous Vector Funcons [6.13.12]

Tm and Tn are type charn, ucharn, shortn, ushortn, intn, uintn,

longn, ulongn, oatn, oponally doublen (if cl_khr_fp64 is

enabled) or halfn (if cl_khr_fp16 is enabled), where n is 2,4,8, or

16 except in vec_step it may also be 3. TUn is ucharn, ushortn,

uintn, or ulongn.

int vec_step (Tn a)

int vec_step (typename)

Takes built-in scalar or vector data type

argument. Returns 1 for scalar, 4 for

3-component vector, else number of

elements in the specied type.

Tn shue (Tm x,

TUn mask)

Tn shue2 (Tm x, Tm y,

TUn mask)

Construct permutaon of elements

from one or two input vectors, return

a vector with same element type as

input and length that is the same as

the shue mask.

Enqueuing and Kernel Query Built-in Funcons [6.13.17]

A kernel may enqueue code represented by Block syntax, and control execuon order with event

dependencies including user events and markers. There are several advantages to using the Block

syntax: it is more compact; it does not require a cl_kernel object; and enqueuing can be done as

a single semanc step. The macro CLK_NULL_EVENT refers to an invalid device event. The macro

CLK_NULL_QUEUE refers to an invalid device queue.

int enqueue_kernel (queue_t queue, kernel_enqueue_ags_t ags,

const ndrange_t ndrange, void (^block)(void))

int enqueue_kernel (queue_t queue, kernel_enqueue_ags_t ags,

const ndrange_t ndrange, uint num_events_in_wait_list,

const clk_event_t *event_wait_list, clk_event_t *event_ret,

void (^block)(void))

int enqueue_kernel (queue_t queue, kernel_enqueue_ags_t ags,

const ndrange_t ndrange,

void (^block)(local void *, …), uint size0, …)

int enqueue_kernel (queue_t queue, kernel_ enqueue_ags_t ags,

const ndrange_t ndrange,

uint num_events_in_wait_list, const clk_event_t *event_wait_list,

clk_event_t *event_ret, void (^block)(local void *, …), uint size0, …)

Allows a work-item to

enqueue a block for

execuon to queue.

Work-items can enqueue

mulple blocks to a device

queue(s).

ags may be one of

CLK_ENQUEUE_FLAGS_

{NO_WAIT, WAIT_KERNEL,

WAIT_WORK_GROUP}

uint get_kernel_work_group_size (void (^block)(void))

uint get_kernel_work_group_size (void (^block)(local void *, …))

Query the maximum work-

group size that can be

used to execute a block.

uint get_kernel_preferred_work_group_size_mulple (

void (^block)(void))

uint get_kernel_preferred_work_group_size_mulple (

void (^block)(local void *, …))

Returns the preferred

mulple of work-group

size for launch.

int enqueue_marker (queue_t queue, uint num_events_in_wait_list,

const clk_event_t *event_wait_list, clk_event_t *event_ret)

Enqueue a marker

command to queue.

uint get_kernel_sub_group_count_for_ndrange

(const ndrange_t ndrange, void (^block)(void))

uint get_kernel_sub_group_count_for_ndrange

(const ndrange_t ndrange, void (^block)(local void *, …))

Returns number of

subgroups in each

workgroup of the dispatch.

uint get_kernel_max_sub_group_size_for_ndrange

(const ndrange_t ndrange, void (^block)(void))

uint get_kernel_max_sub_group_size_for_ndrange

(const ndrange_t ndrange, void (^block)(local void *, …))

Returns the maximum

sub-group size for a block.

OpenCL 2.2 Reference GuidePage 20

OpenCL C Language

OpenCL C Language OpenCL 2.2 Reference GuidePage 20

OpenCL C Language

Event Built-in Funcons [6.13.17.8]

T is type int, uint, long, ulong, or oat, oponally double (if cl_khr_fp64 is enabled), or half (if

cl_khr_fp16 is enabled).

void retain_event (clk_event_t event)Increments event reference count.

void release_event (clk_event_t event)Decrements event reference count.

clk_event_t create_user_event () Create a user event.

bool is_valid_event (clk_event_t event)True for valid event.

void set_user_event_status (

clk_event_t event, int status)

Sets the execuon status of a user event.

status: CL_COMPLETE or a negave error

value.

void capture_event_proling_info (

clk_event_t event, clk_proling_info name,

global void *value)

Captures proling informaon for command

associated with event in value.

Helper Built-in Funcons [6.13.17.9]

queue_t get_default_queue (void) Default queue or CLK_NULL_QUEUE

ndrange_t ndrange_1D (size_t global_work_size)

ndrange_t ndrange_1D (size_t global_work_size,

size_t local_work_size)

ndrange_t ndrange_1D (size_t global_work_oset,

size_t global_work_size, size_t local_work_size)

Builds a 1D ND-range

descriptor.

ndrange_t ndrange_nD (const size_t global_work_size[n])

ndrange_t ndrange_nD (size_t global_work_size,

const size_t local_work_size[n])

ndrange_t ndrange_nD (const size_t global_work_oset,

const size_t global_work_size,

const size_t local_work_size[n])

Builds a 2D or 3D ND-range descriptor.

n may be 2 or 3.

Notes

OpenCL 2.2 Reference Guide Page 21

OpenCL Extensions

EGL Interoperabililty [9.16, 9.17]

Create CL Event Objects from EGL

This funcon requires the extension cl_khr_egl_event.

cl_event clCreateEventFromEGLsyncKHR (

cl_context context, CLeglSyncKHR sync,

CLeglDisplayKHR display, cl_int *errcode_ret)

Create CL Image Objects from EGL

These funcons require the extension cl_khr_egl_image.

cl_mem clCreateFromEGLImageKHR (

cl_context context, CLeglDisplayKHR display,

CLeglImageKHR image, cl_mem_ags ags,

const cl_egl_image_properes_khr *properes,

cl_int *errcode_ret)

cl_int clEnqueue{Acquire, Release}EGLObjectsKHR (

cl_command_queue command_queue,

cl_uint num_objects, const cl_mem *mem_objects,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list,

cl_event *event)

DX9 Media Surface Sharing [9.7]

Header <cl_dx9_media_sharing.h>

Enable the extension cl_khr_dx9_media_sharing.

cl_int clGetDeviceIDsFromDX9MediaAdapterKHR (

cl_plaorm_id plaorm,

cl_uint num_media_adapters,

cl_dx9_media_adapter_type_khr *media_adapters_type,

void *media_adapters,

cl_dx9_media_adapter_set_khr media_adapter_set,

cl_uint num_entries, cl_device_id *devices,

cl_int *num_devices)

media_adapter_type:

CL_ADAPTER_{D3D9, D3D9EX, DXVA}_KHR

media_adapter_set: CL_{ALL, PREFERRED}_DEVICES_-

FOR_DX9_MEDIA_ADAPTER_KHR

cl_mem clCreateFromDX9MediaSurfaceKHR (

cl_context context, cl_mem_ags ags,

cl_dx9_media_adapter_type_khr adapter_type,

void *surface_info, cl_uint plane, cl_int *errcode_ret)

ags: See clCreateFromGLBuer

adapter_type: CL_ADAPTER_{D3D9, D3D9EX, DXVA}_KHR

cl_int

clEnqueue{Acquire, Release}DX9MediaSurfacesKHR(

cl_command_queue command_queue,

cl_uint num_objects, const cl_mem *mem_objects,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

Direct3D 11 Sharing [9.8.7]

Header <cl_d3d11.h> These funcons require the

cl_khr_d3d11_sharing extension. For values of ags, see

clCreateFromGLBuer.

cl_int clGetDeviceIDsFromD3D11KHR (

cl_plaorm_id plaorm,

cl_d3d11_device_source_khr d3d_device_source,

void *d3d_object,

cl_d3d11_device_set_khr d3d_device_set,

cl_uint num_entries, cl_device_id *devices,

cl_uint *num_devices)

d3d_device_source: CL_D3D11_DEVICE_KHR,

CL_D3D11_DXGI_ADAPTER_KHR

d3d_device_set: CL_ALL_DEVICES_FOR_D3D11_KHR,

CL_PREFERRED_DEVICES_FOR_D3D11_KHR

cl_mem clCreateFromD3D11BuerKHR (

cl_context context, cl_mem_ags ags,

ID3D11Buer *resource, cl_int *errcode_ret)

cl_mem clCreateFromD3D11Texture3DKHR (

cl_context context, cl_mem_ags ags,

ID3D11Texture3D *resource, UINT subresource,

cl_int *errcode_ret)

cl_mem clCreateFromD3D11Texture2DKHR (

cl_context context, cl_mem_ags ags,

ID3D11Texture2D *resource,

UINT subresource, cl_int *errcode_ret)

cl_int clEnqueue{Acquire, Release}D3D11ObjectsKHR (

cl_command_queue command_queue,

cl_uint num_objects, const cl_mem *mem_objects,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

Direct3D 10 Sharing

[9.6.7]

These funcons require the cl_khr_d3d10_sharing extension.

The associated header le is <cl_d3d10.h>.

cl_int clGetDeviceIDsFromD3D10KHR (

cl_plaorm_id plaorm,

cl_d3d10_device_source_khr d3d_device_source,

void *d3d_object,

cl_d3d10_device_set_khr d3d_device_set,

cl_uint num_entries, cl_device_id *devices,

cl_uint *num_devices)

d3d_device_source:

CL_D3D10_{DEVICE, DXGI_ADAPTER}_KHR

d3d_device_set:

CL_{ALL, PREFERRED}_DEVICES_FOR_D3D10_KHR

cl_mem clCreateFromD3D10BuerKHR (

cl_context context, cl_mem_ags ags,

ID3D10Buer *resource, cl_int *errcode_ret)

ags: See clCreateFromGLBuer

cl_mem clCreateFromD3D10Texture2DKHR (

cl_context context, cl_mem_ags ags,

ID3D10Texture2D *resource, UINT subresource,

cl_int *errcode_ret)

ags: See clCreateFromD3D10BuerKHR

cl_mem clCreateFromD3D10Texture3DKHR (

cl_context context, cl_mem_ags ags,

ID3D10Texture3D *resource, UINT subresource,

cl_int *errcode_ret)

ags: See clCreateFromGLBuer

cl_int clEnqueue{Acquire, Release}D3D10ObjectsKHR (

cl_ command_queue command_queue,

cl_uint num_objects, const cl_mem *mem_objects,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

Using OpenCL Extensions

[9]

In this secon, extensions shown in italics provide core features.

#pragma OPENCL EXTENSION extension_name : {enable |disable}

To test if an extension is supported, use

clGetPlaormInfo() or clGetDeviceInfo()

To get the address of the extension funcon:

clGetExtensionFunconAddressForPlaorm()

cl_apple_gl_sharing (see cl_khr_gl_sharing)

cl_khr_3d_image_writes

cl_khr_byte_addressable_store

cl_khr_context_abort

cl_khr_d3d10_sharing

cl_khr_d3d11_sharing

cl_khr_depth_images

cl_khr_device_enqueue_local_arg_types

cl_khr_dx9_media_sharing

cl_khr_egl_event

cl_khr_egl_image

cl_khr_fp16

cl_khr_fp64

cl_khr_gl_depth_images

cl_khr_gl_event

cl_khr_gl_msaa_sharing

cl_khr_gl_sharing

cl_khr_global_int32_base_atomics - atomic_*()

cl_khr_global_int32_extended_atomics - atomic_*()

cl_khr_icd

cl_khr_image2d_from_buer

cl_khr_inialize_memory

cl_khr_int64_base_atomics - atom_*()

cl_khr_int64_extended_atomics - atom_*()

cl_khr_local_int32_base_atomics - atomic_*()

cl_khr_local_int32_extended_atomics - atomic_*()

cl_khr_mipmap_image

cl_khr_mipmap_image_writes

cl_khr_priority_hints

cl_khr_srgb_image_writes

cl_khr_spir

cl_khr_subgroup_named_barrier

cl_khr_terminate_context

cl_khr_throle_hints

OpenCL Extensions Reference

Secon and table references are to the OpenCL Extensions 2.1 specicaon.

CL Image Objects > GL Renderbuers

cl_mem clCreateFromGLRenderbuer (

cl_context context, cl_mem_ags ags,

GLuint renderbuer, cl_int *errcode_ret)

ags: See clCreateFromGLBuer

Query Informaon

cl_int clGetGLObjectInfo (cl_mem memobj,

cl_gl_object_

type *gl_object_type,

GLuint *gl_object_name)

*gl_object_type returns:

CL_GL_OBJECT_TEXTURE_BUFFER,

CL_GL_OBJECT_TEXTURE{1D, 2D, 3D},

CL_GL_OBJECT_TEXTURE{1D, 2D}_ARRAY,

CL_GL_OBJECT_{BUFFER, RENDERBUFFER}

cl_int clGetGLTextureInfo (cl_mem memobj,

cl_gl_texture_info param_name,

size_t param_value_size, void *param_value,

size_t *param_value_size_ret)

param_name: CL_GL_{TEXTURE_TARGET,

MIPMAP_LEVEL}, CL_GL_NUM_SAMPLES (Requires

extension cl_khr_gl_msaa_sharing)

Share Objects

cl_int clEnqueue{Acquire, Release}GLObjects (

cl_command_queue command_queue,

cl_uint num_objects, const cl_mem *mem_objects,

cl_uint num_events_in_wait_list,

const cl_event *event_wait_list, cl_event *event)

CL Event Objects > GL Sync Objects

cl_event clCreateEventFromGLsyncKHR (

cl_context context, GLsync sync, cl_int *errcode_ret)

Requires the cl_khr_gl_event extension.

OpenGL, OpenGL ES Sharing

[9.3 - 9.5]

These funcons require the cl_khr_gl_sharing or

cl_apple_gl_sharing extension.

CL Context > GL Context, Sharegroup

cl_int clGetGLContextInfoKHR (

const cl_context_properes *properes,

cl_gl_context_info param_name,

size_t param_value_size, void *param_value,

size_t *param_value_size_ret)

param_name: CL_DEVICES_FOR_GL_CONTEXT_KHR,

CL_CURRENT_DEVICE_FOR_GL_CONTEXT_KHR

CL Buer Objects > GL Buer Objects

cl_mem clCreateFromGLBuer (cl_context context,

cl_mem_ags ags, GLuint bufobj, cl_int *errcode_ret)

ags:

CL_MEM_{READ_ONLY, WRITE_ONLY, READ_WRITE}

CL Image Objects > GL Textures

cl_mem clCreateFromGLTexture (cl_context context,

cl_mem_ags ags, GLenum texture_target,

GLint miplevel, GLuint texture, cl_int *errcode_ret)

ags: See clCreateFromGLBuer

texture_target: GL_TEXTURE_{1D, 2D}[_ARRAY],

GL_TEXTURE_{3D, BUFFER, RECTANGLE},

GL_TEXTURE_CUBE_MAP_POSITIVE_{X, Y, Z},

GL_TEXTURE_CUBE_MAP_NEGATIVE_{X, Y, Z},

GL_TEXTURE_2D_MULTISAMPLE[_ARRAY] (Requires

extension cl_khr_gl_msaa_sharing)

OpenCL 2.2 Reference GuidePage 22

Example of Enqueuing Kernels

Arguments that are a pointer type to local address space [6.13.17.2]

A block passed to enqueue_kernel can have arguments declared to be a pointer to local memory.

The enqueue_kernel built-in funcon variants allow blocks to be enqueued with a variable

number of arguments. Each argument must be declared to be a void pointer to local memory.

These enqueue_kernel built-in funcon variants also have a corresponding number of arguments

each of type uint that follow the block argument. These arguments specify the size of each local

memory pointer argument of the enqueued block.

kernel void

my_func_A_local_arg1(global int *a, local int *lptr, ...)

{

...

}

kernel void

my_func_A_local_arg2(global int *a,

local int *lptr1, local float4 *lptr2, ...)

{

...

}

kernel void

my_func_B(global int *a, ...)

{

...

ndrange_t ndrange = ndrange_1d(...);

uint local_mem_size = compute_local_mem_size();

enqueue_kernel(get_default_queue(),

CLK_ENQUEUE_FLAGS_WAIT_KERNEL,

ndrange,

^(local void *p){

my_func_A_local_arg1(a, (local int *)p, ...);},

local_mem_size);

}

kernel void

my_func_C(global int *a, ...)

{

...

ndrange_t ndrange = ndrange_1d(...);

void (^my_blk_A)(local void *, local void *) =

^(local void *lptr1, local void *lptr2){

my_func_A_local_arg2(

(local int *)lptr1,

(local float4 *)lptr2, ...);};

// calculate local memory size for lptr

// argument in local address space for my_blk_A

uint local_mem_size = compute_local_mem_size();

enqueue_kernel(get_default_queue(),

CLK_ENQUEUE_FLAGS_WAIT_KERNEL,

ndrange,

my_blk_A,

local_mem_size, local_mem_size * 4);

}

A Complete Example [6.13.17.3]

The example below shows how to implement an iterave algorithm where the host enqueues

the rst instance of the nd-range kernel (dp_func_A). The kernel dp_func_A will launch a kernel

(evaluate_dp_work_A) that will determine if new nd-range work needs to be performed. If

new nd-range work does need to be performed, then evaluate_dp_work_A will enqueue a new

instance of dp_func_A . This process is repeated unl all the work is completed.

kernel void

dp_func_A(queue_t q, ...)

{

...

// queue a single instance of evaluate_dp_work_A to

// device queue q. queued kernel begins execution after

// kernel dp_func_A finishes

if (get_global_id(0) == 0)

{

enqueue_kernel(q,

CLK_ENQUEUE_FLAGS_WAIT_KERNEL,

ndrange_1d(1),

^{evaluate_dp_work_A(q, ...);});

}

kernel void

evaluate_dp_work_A(queue_t q,...)

{

// check if more work needs to be performed

bool more_work = check_new_work(...);

if (more_work)

{

size_t global_work_size = compute_global_size(...);

void (^dp_func_A_blk)(void) =

^{dp_func_A(q, ...});

// get local WG-size for kernel dp_func_A

size_t local_work_size =

get_kernel_work_group_size(dp_func_A_blk);

// build nd-range descriptor

ndrange_t ndrange = ndrange_1D(global_work_size,

local_work_size);

// enqueue dp_func_A

enqueue_kernel(q,

CLK_ENQUEUE_FLAGS_WAIT_KERNEL,

ndrange,

dp_func_A_blk);

}

...

}

OpenCL 2.2 Reference Guide Page 23

Notes

OpenCL Class Diagram

The gure below describes the OpenCL specicaon as a class diagram using the Unied Modeling

Language

(UML) notaon. The diagram shows both nodes and edges which are classes and their

relaonships. As a simplicaon it shows only classes, and no aributes or operaons.

Annotaons

Relaonships

abstract classes {abstract}

aggregaons

inheritance

relaonship

navigability

Cardinality

many *

one and only one 1

oponally one 0..1

one or more 1..*

Unied Modeling Language (hp://www.uml.org/) is a trademark of Object Management Group (OMG).

OpenCL Device Architecture Diagram

The table below shows memory regions with allocaon and memory access capabilies. R=Read,

W=Write

Global Constant Local Private

Host Dynamic allocaon

R/W access

Dynamic allocaon

R/W access

Dynamic allocaon

No access

No allocaon

No access

Kernel No allocaon

R/W access

Stac allocaon

R-only access

Stac allocaon

R/W access

Stac allocaon

R/W access

The conceptual OpenCL

device architecture diagram

shows processing elements

(PE), compute units (CU),

and devices. The host is not

shown.

OpenCL 2.2 Reference GuidePage 24

The Khronos Group is an industry consorum creang open standards for the authoring and

acceleraon of parallel compung, graphics and dynamic media on a wide variety of plaorms and

devices. See www.khronos.org to learn more about the Khronos Group.

OpenCL is a trademark of Apple Inc. and is used under license by Khronos.

Reference card producon by Miller & Mattson www.millermason.com

OpenCL Reference Card Index

The following index shows the page number for each item included in this guide. The color of the row in the table below is the color of the box to which you should refer.

Access Qualiers C14

Address Space Qualier funcons C19

Address Space Qualiers C13

Address Spaces C++ 6

Array library C++ 11

Async Copies C18

Atomics C18

Atomics library C++ 6-7

Aribute Qualiers C13

Aributes C++ 5

Barrier funcons C++ 9

bitwise funcons C++ 10

Blocks C14

Broadcast funcons C++ 9

Buer objects 2

C Language Reference C13

C++ 14 C++ 5

C++ Language Reference C++ 5

channel_ref C++ 12

Class Diagram 23

cl[Release, Retain]CommandQueue () 1

cl[Release, Retain]Context () 1

cl[Release, Retain]Device () 1

cl[Release, Retain]Event () 4

cl[Release, Retain]Kernel () 4

cl[Release, Retain]Memobject () 3

cl[Release, Retain]Program () 3

cl[Release, Retain]Sampler () 3

clBuildProgram () 3

clCloneKernel () 4

clCompileProgram () 3

clCreateBuer () 2

clCreateCommandQueue* () 1

clCreateContext* () 1

clCreateImage () 2

clCreateKernel () 4

clCreateKernelsInProgram () 4

clCreatePipe () 3

clCreateProgramWith* () 3

clCreateSamplerWithProperes () 3

clCreateSubBuer () 2

clCreateSubDevices () 1

clCreateUserEvent () 4

clEnqueueBarrierWithWaitList () 4

clEnqueueCopyBuer () 2

clEnqueueCopyBuerRect () 2

clEnqueueCopyBuerToImage () 2

clEnqueueCopyImage () 2

clEnqueueCopyImageToBuer () 2

clEnqueueFillBuer () 2

clEnqueueFillImage () 2

clEnqueueMapBuer () 2

clEnqueueMapImage () 2

clEnqueueMarkerWithWaitList () 4

clEnqueueMigrateMemobjects () 3

clEnqueueNaveKernel () 4

clEnqueueNDRangeKernel () 4

clEnqueueReadBuer () 2

clEnqueueReadBuerRect () 2

clEnqueueReadImage () 2

clEnqueueSVM* () 3

clEnqueueUnmapMemobject () 3

clEnqueueWriteBuer () 2

clEnqueueWriteBuerRect () 2

clEnqueueWriteImage () 2

clFinish () 3

clFlush () 3

clGetCommandQueueInfo () 1

clGetContextInfo () 1

clGetDeviceAndHostTimer () 1

clGetDeviceIDs () 1

clGetDeviceInfo () 1

clGetEventInfo () 4

clGetEventProlingInfo () 4

clGetExtensionFunconAddress* () 1

clGetHostTimer () 1

clGetImageInfo () 2

clGetKernelArgInfo () 4

clGetKernelInfo () 4

clGetKernelSubGroupInfo () 4

clGetKernelWorkGroupInfo () 4

clGetMemobjectInfo () 3

clGetPipeInfo () 3

clGetPlaormIDs () 1

clGetPlaormInfo () 1

clGetProgramBuildInfo () 3

clGetProgramInfo () 3

clGetSamplerInfo () 3

clGetSupportedImageFormats () 2

clIcdGetPlaormIDsKHR () 1

clLinkProgram () 3

clSetDefaultDeviceCommandQueue () 1

clSetEventCallback () 4

clSetKernelArg () 4

clSetKernelExecInfo () 4

clSetMemobjectDestructorCallback () 3

clSetProgramReleaseCallback () 3

clSetProgramSpecializaonConstant () 3

clSetUserEventStatus () 4

clSVMAlloc () 3

clSVMFree () 3

clTerminateContextKHR () 1

clUnloadPlaormCompiler () 3

clWaitForEvents () 4

Command queues 1

Common funcons C++ 10

Common funcons C17

Comparison funcons C++ 10

Compile 3-4

Compiler opons 4

constant<T> C++ 6

Constants C++ 12

Constants C14

Contexts 1

Conversions C++ 5

Conversions and Type Casng C13

Debugging opons 4

depth images C++ 7

Device Architecture Diagram 23

Device Enqueue C++ 8-9

Direct3D 10 Sharing Ext 21

Direct3D 11 Sharing Ext 21

DX9 Media Surface Sharing Ext 21

EGL Interoperabililty Ext 21

Enqueue C++ 8-9

Enqueue C19

Event funcons C20

Event objects 4

Extensions Ext 21

fast C13

Fast 24-bit operaons C++ 10

Fences C++ 7

Flush and Finish 3

Funcon qualier C++ 5

Funcon qualier C13

Geometric funcons C++ 11

Geometric funcons C17

global<T> C++ 6

H-I

half C++ 5

Image objects 2

Image query funcons C16

Image Read and Write funcons C15-16

Images C++ 7-8

Images and Samplers library C++ 7-8

Integer funcons C17

Integer funcons C++ 10

Iterator library C++ 12

K-L

Kernel objects 4

Kernel query funcons C19

Limits C++ 11-12

Link 3-4

Linker 4

local<T> C++ 6

Macros C14

Markers, barriers, & waing for events 4

Math constants C++ 12

Math constants C14

Math funcons C++ 10

Math funcons C14

Memory objects 3

mipmap C++ 7-8

mipmap C15-16

N-O

Named barriers C++ 9

Named Barriers for Subgroups Ext 21

ndrange C++ 5

ndrange C20

OpenGL and OpenGL ES Sharing Ext 21

Operators C13

P-Q

Pipe funcons C19

Pipes 3

Pipes C++ 8

Plaorm Layer 1

Pointer class C++ 6

pragma C13

Prefetch C18

Preprocessor Direcves & Macros C++ 5

Preprocessor Direcves & Macros C13

prin funcon C++ 11

prin funcon C19

priv<T> C++ 6

Proling operaons 4

Program linking opons 4

Program objects 3-4

Qualiers C13

read_image*() C15-16

Reinterpreng types C++ 5

Relaonal funcons C17

Relaonal funcons C++ 11

Retain and release program objects 3

Rounding modes C++ 5

Rounding modes C13

Runme 1

Sampler C++ 7-8

Sampler objects 3

Shared Virtual Memory 3

SPIR 1.2 Binaries Ext 21

SPIR-V specializaon constants 3

SVM 4

SVM operaons 3

swizzles C++ 12

Synchronizaon & Memory Fence

funcons C18

Synchronizaon funcons C++ 9

Traits C++ 12

Tuple library C++ 12

Types C++ 5

Types C13

Vector Component Addressing C++ 6

Vector Component Addressing C13

Vector Data Load/Store C18

Vector Data Load/Store C++ 11

Vector funcons C19

Vector Ulies C++ 12

Vector Wrapper library C++ 12

Work-Item funcons C14

Work-Item funcons C++ 9

Workgroup funcons C++ 9

Workgroup funcons C19

write_image*() C15-16

Opencl22 Reference Guide

Navigation menu

Versions of this User Manual:

Views

Navigation