I.MX Graphics User’s Guide User's

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 170 [warning: Documents this large are best viewed by clicking the View PDF Link!]

NXP Semiconductors Document Number: IMXGRAPHICUG
Rev. 0, 02/2018
i.MX Graphics User’s Guide
i.MX Graphics User’s Guide, Rev. 0, 02/2018
2 NXP Semiconductors
Contents
Chapter 1 Introduction ............................................................................................................................................. 6
Chapter 2 i.MX G2D API ............................................................................................................................................ 6
2.1 Overview ...................................................................................................................................................... 6
2.2 Enumerations and structures ....................................................................................................................... 6
2.3 G2D function descriptions .......................................................................................................................... 10
2.4 Support of new operating system in G2D .................................................................................................. 16
2.5 Sample code for G2D API usage ................................................................................................................. 16
2.6 Feature list on multiple platforms.............................................................................................................. 19
Chapter 3 i.MX EGL and OGL Extension Support .................................................................................................... 20
3.1 Introduction ............................................................................................................................................... 20
3.2 EGL extension support ............................................................................................................................... 20
3.3 OpenGL ES extension support .................................................................................................................... 23
3.4 Extension GL_VIV_direct_texture .............................................................................................................. 29
3.5 Extension GL_VIV_texture_border_clamp ................................................................................................. 32
Chapter 4 i.MX Framebuffer API ............................................................................................................................ 35
4.1 Overview .................................................................................................................................................... 35
4.2 API data types and environment variables ................................................................................................ 35
4.3 API description and syntax ......................................................................................................................... 37
Chapter 5 OpenCL ................................................................................................................................................... 44
5.1 Overview .................................................................................................................................................... 44
5.2 Vivante OpenCL implementation ............................................................................................................... 51
5.3 Optimization for OpenCL embedded profile .............................................................................................. 53
5.4 OpenCL Debug messages ........................................................................................................................... 56
Chapter 6 OpenVX Introduction ............................................................................................................................. 57
6.1 Overview .................................................................................................................................................... 57
6.2 Designing framework of OpenVX ............................................................................................................... 57
6.3 OpenVX extension implementation ........................................................................................................... 59
6.4 OpenCL functions compatible with Vivante vision..................................................................................... 62
Chapter 7 Vulkan .................................................................................................................................................... 65
7.1 OverView .................................................................................................................................................... 65
7.2 Vivante Extension Support for Vulkan ....................................................................................................... 65
Chapter 8 Multiple GPUs and Virtualization........................................................................................................... 67
8.1 Overview .................................................................................................................................................... 67
8.2 Multi-GPU configurations .......................................................................................................................... 67
8.3 GPU affinity configuration .......................................................................................................................... 67
i.MX Graphics User’s Guide, Rev. 0, 02/2018
3 NXP Semiconductors
8.4 OpenCL on multi-GPU device ..................................................................................................................... 67
8.5 GPU virtualization configuration ................................................................................................................ 68
Chapter 9 G2D compositor on Weston .................................................................................................................. 69
9.1 Overview .................................................................................................................................................... 69
9.2 Enabe G2D compositor .............................................................................................................................. 69
Chapter 10 XServer Video Driver ......................................................................................................................... 70
10.1 EXA driver ................................................................................................................................................... 70
10.2 XRandR ....................................................................................................................................................... 71
Chapter 11 Advanced GPU Configuration ............................................................................................................ 82
11.1 GPU Scaling Governor ................................................................................................................................ 82
11.2 GPU Device Cooling .................................................................................................................................... 82
Chapter 12 Vivante Software Tool Kit .................................................................................................................. 82
12.1 Vivante Tool Kit overview .......................................................................................................................... 82
12.2 vEmulator ................................................................................................................................................... 84
12.3 vShader ...................................................................................................................................................... 95
12.4 vCompiler ................................................................................................................................................. 103
12.5 vTexture ................................................................................................................................................... 107
12.6 vProfiler and vAnalyzer ............................................................................................................................ 111
12.7 Debug and performance counters ........................................................................................................... 125
Chapter 13 GPU Tools ........................................................................................................................................ 127
13.1 gpuinfo tool .............................................................................................................................................. 127
13.2 gmem_info tool ........................................................................................................................................ 129
13.3 Apitrace user guide .................................................................................................................................. 130
Chapter 14 GPU Memory Introduction .............................................................................................................. 135
14.1 GPU memory overview ............................................................................................................................ 135
14.2 GPU memory pools .................................................................................................................................. 135
14.3 GPU memory allocators ........................................................................................................................... 135
14.4 GPU reserved memory ............................................................................................................................. 136
14.5 GPU memory base address ...................................................................................................................... 136
Chapter 15 Application Programming Recommendations ................................................................................. 138
15.1 Understand the system configuration and target application ................................................................. 138
15.2 Optimize off chip data transfer such as accessing off-chip DDR memory/mobile DDR memory ............ 138
15.3 Avoid W-Clipping issue in the Application Program ................................................................................. 138
15.4 Avoid GPU hang and data corruption when use occlusion query ............................................................ 139
15.5 Avoid random cache or memory accesses ............................................................................................... 139
15.6 Optimize your use of system memory ..................................................................................................... 139
i.MX Graphics User’s Guide, Rev. 0, 02/2018
4 NXP Semiconductors
15.7 Target a fixed frame rate that is visibly smooth....................................................................................... 139
15.8 Minimize GL state changes ...................................................................................................................... 140
15.9 Batch primitives to minimize the number of draw calls .......................................................................... 140
15.10 Perform calculations per vertex instead of per fragment/pixel .......................................................... 140
15.11 Enable early-Z, hierarchical-Z and back face culling ............................................................................ 140
15.12 Use branching carefully ....................................................................................................................... 141
15.13 Do not use static or stack data as vertex data - use VBOs instead ...................................................... 141
15.14 Use dynamic VBO if data is changing frame by frame ......................................................................... 141
15.15 Tessellate your data so that Hierarchical Z (HZ) can do its job ............................................................ 142
15.16 Use dynamic textures as a texture cache (texture atlas) ..................................................................... 142
15.17 If you use many small triangle strips, stitch them together ................................................................ 142
15.18 Specify EGL configuration attributes precisely .................................................................................... 142
15.19 Use aligned texture/render buffers ..................................................................................................... 142
15.20 Disable MSAA rendering unless high quality is needed ....................................................................... 143
15.21 Avoid partial clears .............................................................................................................................. 143
15.22 Avoid mask operations ........................................................................................................................ 143
15.23 Use MIPMAP textures .......................................................................................................................... 143
15.24 Use compressed textures if constricted by RAM/ROM budget ........................................................... 143
15.25 Draw objects from near to far if possible ............................................................................................ 143
15.26 Avoid indexed triangle strips. .............................................................................................................. 143
15.27 Vertex attribute stride should not be larger than 256 bytes ............................................................... 144
15.28 Avoid binding buffers to mixed index/vertex array ............................................................................. 144
15.29 Avoid using CPU to update texture/buffer contexts during render .................................................... 144
15.30 Avoid frequent context switching ........................................................................................................ 144
15.31 Optimize resources within a shader .................................................................................................... 144
15.32 Avoid using glScissor Clear for small regions ....................................................................................... 144
15.33 Use PRE to accelerate data transfer .................................................................................................... 144
15.34 i.MX 8QuadMax dual-GPU performance ............................................................................................. 145
Chapter 16 Demo Framework ............................................................................................................................ 146
16.1 Summaries................................................................................................................................................ 146
16.2 Introduction ............................................................................................................................................. 146
16.3 Design overview ....................................................................................................................................... 147
16.4 High level overview .................................................................................................................................. 147
16.5 Demo application details ......................................................................................................................... 148
16.6 Helper Class Overview .............................................................................................................................. 152
16.7 Android SDK+NDK on Windows OS build guide ....................................................................................... 157
i.MX Graphics User’s Guide, Rev. 0, 02/2018
5 NXP Semiconductors
16.8 Ubuntu build guide .................................................................................................................................. 158
16.9 Windows OS build guide .......................................................................................................................... 160
16.10 Yocto build guide ................................................................................................................................. 162
16.11 FslContentSync.py notes ...................................................................................................................... 166
16.12 Roadmap Upcoming features ........................................................................................................... 166
16.13 Known limitations ................................................................................................................................ 167
Chapter 17 Environment Variables Summary .................................................................................................... 168
17.1 Environment variable for drivers and HAL ............................................................................................... 168
17.2 Environment variable for compiler .......................................................................................................... 169
i.MX Graphics User’s Guide, Rev. 0, 02/2018
6 NXP Semiconductors
Chapter 1 Introduction
The purpose of this document is to provide information on graphic APIs and driver support. Each chapter describes
a specific set of APIs or driver integration as well as specific hardware acceleration customization. The target
audiences for this document are developers writing graphics applications or video drivers.
Chapter 2 i.MX G2D API
2.1 Overview
The G2D Application Programming Interface (API) is designed to be easy to understand and to use the 2D Bit blit
(BLT) function. It allows the user to implement the customized applications with simple interfaces. It is hardware
and platform independent for i.MX 2D Graphics.
G2D API supports the following features but is not limited to these:
Simple BLT operation from source to destination
Alpha blending for source and destination with Porter-Duff rules
High-performance memory copy from source to destination
Up-scaling and down-scaling from source to destination
90/180/270 degree rotation from source to destination
Horizontal and vertical flip from source to destination
Enhanced visual quality with dither for pixel precision-loss
High performance memory clear for destination
Pixel-level cropping for source surface
Global alpha blending for source only
Asynchronous mode and sync
Contiguous memory allocator
Support VG engine
The G2D API document includes a detailed interface description and sample code for reference.
The API is designed with C-Style coding and can be used in both C and C++ applications.
G2D API supports the following features but is not limited to these:
Multi source blit
2.2 Enumerations and structures
This chapter describes all enumeration and structure definitions in G2D.
2.2.1 g2d_format enumeration
This enumeration describes the pixel format for source and destination.
Table 1. g2d_format enumeration
Name
Numeric
Description
G2D_RGB565
0
RGB565 pixel format
G2D_RGBA8888
1
32 bit-RGBA pixel format
G2D_RGBX8888
2
32 bit-RGBX without alpha blending
G2D_BGRA8888
3
32 bit-BGRA pixel format
G2D_BGRX8888
4
32 bit-BGRX without alpha blending
i.MX Graphics User’s Guide, Rev. 0, 02/2018
7 NXP Semiconductors
G2D_BGR565
5
16 bit-BGR565 pixel format
G2D_ARGBA8888
6
32 bit-ARGB pixel format
G2D_ABGR8888
7
32 bit-ABGR pixel format
G2D_XRGB8888
8
32 bit-XRGB without alpha
G2D_XBGR8888
9
32 bit-XBGR without alpha
G2D_RGB888
10
24 bit-RGB
G2D_NV12
20
Y plane followed by interleaved U/V plane
G2D_I420
21
Y, U, V are within separate planes
G2D_YV12
22
Y, V, U are within separate planes
G2D_NV21
23
Y plane followed by interleaved V/U plane
G2D_YUYV
24
Interleaved Y/U/Y/V plane
G2D_YVYU
25
Interleaved Y/V/Y/U plane
G2D_UYVY
26
Interleaved U/Y/V/Y plane
G2D_VYUY
27
Interleaved V/Y/U/Y plane
G2D_NV16
28
Y plane followed by interleaved U/V plane
G2D_NV61
29
Y plane followed by interleaved V/U plane
2.2.2 g2d_blend_func enumeration
This enumeration describes the blend factor for source and destination.
Table 2. g2d_blend_func enumeration
Name
Numeric
Description
G2D_ZERO
0
Blend factor with 0
G2D_ONE
1
Blend factor with 1
G2D_SRC_ALPHA
2
Blend factor with source alpha
G2D_ONE_MINUS_SRC_ALPHA
3
Blend factor with 1 - source alpha
G2D_DST_ALPHA
4
Blend factor with destination alpha
G2D_ONE_MINUS_DST_ALPHA
5
Blend factor with 1 - destination alpha
G2D_PRE_MULTIPLIED_ALPHA
0 x 10
Extensive blend as pre-multiplied alpha
G2D_DEMULTIPLY_OUT_ALPHA
0 x 20
Extensive blend as demultiply out alpha
2.2.3 g2d_cap_mode enumeration
This enumeration describes the alternative capability in 2D BLT.
Table 3. g2d_cap_mode enumeration
Numeric
Description
0
Enable alpha blend in 2D BLT
1
Enable dither in 2D BLT
2
Enable global alpha in blend
Note: G2D_GLOBAL_ALPHA is only valid when G2D_BLEND is enabled.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
8 NXP Semiconductors
2.2.4 g2d_rotation enumeration
This enumeration describes the rotation mode in 2D BLT.
Table 4. g2d_rotation enumeration
Numeric
Description
0
No rotation
1
Rotation with 90 degree
2
Rotation with 180 degree
3
Rotation with 270 degree
4
Horizontal flip
5
Vertical flip
2.2.5 g2d_cache_mode enumeration
This enumeration describes the cache operation mode.
Table 5. g2d_cache_mode enumeration
Numeric
Description
0
Clean the cacheable buffer
1
Clean and invalidate cacheable buffer
2
Invalidate the cacheable buffer
2.2.6 g2d_hardware_type enumeration
This enumeration describes the supported hardware type.
Table 6. g2d_hardware_type enumeration
Numeric
Description
0
2D hardware type by default
1
VG hardware type
2.2.7 g2d_surface structure
This structure describes the surface with operation attributes.
Table 7. g2d_surface structure
g2d_surface Members
Type
Description
format
g2d_format
Pixel format of surface buffer
planes[3]
Int
Physical addresses of surface buffer
left
Int
Left offset in blit rectangle
top
Int
Top offset in blit rectangle
right
Int
Right offset in blit rectangle
i.MX Graphics User’s Guide, Rev. 0, 02/2018
9 NXP Semiconductors
bottom
Int
Left offset in blit rectangle
stride
Int
RGB/Y stride of surface buffer
width
Int
Surface width in pixel unit
height
Int
Surface height in pixel unit
blendfunc
g2d_blend_func
Alpha blend mode
global_alpha
Int
Global alpha value 0~255
clrcolor
Int
Clear color is 32bit RGBA
rot
g2d_rotation
Rotation mode
Notes:
RGB and YUV formats can be set in source surface, but only RGB format can be set in destination surface.
RGB pixel buffer only uses planes [0], buffer address is with 16bytes alignment on i.MX
6Quad/Dual/DualLite/Solo/SoloLite, 1 pixel alignment on i.MX 6QuadPlus.
NV12: Y in planes [0], UV in planes [1], with 64bytes alignment,
I420: Y in planes [0], U in planes [1], U in planes [2], with 64 bytes alignment
The cropped region in source surface is specified with left, top, right and bottom parameters.
RGB stride alignment is 16bytes on i.MX 6Quad/Dual/DualLite/Solo/SoloLite, 1 pixel on i.MX 6QuadPlus,
both for source and destination surface.
NV12 stride alignment is 8bytes for source surface, UV stride = Y stride,
I420 stride alignment is 8bytes for source surface, U stride=V stride = ½ Y stride.
G2D_ROTATION_0/G2D_FLIP_H/G2D_FLIP_V shall be set in source surface, and the clockwise rotation
degree shall be set in destination surface.
Application should calculate the rotated position and set it for destination surface.
The geometry definition of surface structure is described as follows.
stride
left
top
bottom
right
width
Planes
height
Figure 1 g2d_surface structure
2.2.8 g2d_buf structure
This structure describes the buffer used as G2D interfaces.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
10 NXP Semiconductors
Table 8. g2d_buf structure
g2d_buf Members
Type
Description
buf_handle
void *
The handle associated with buffer
buf_vaddr
void *
Virtual address of the buffer
buf_paddr
int
Physical address of the buffer
buf_size
int
The actual size of the buffer
2.2.9 g2d_surface_pair structure
This structure binds one source g2d_surface and one destination g2d_surface as a pair. When doing multi-source
blit, they are one-to-one correspondent.
Table 9. g2d_surface_pair structure
g2d_surface_pair Members
Type
Description
s
g2d_surface
Source g2d_surface
d
g2d_surface
Destination g2d_surface
2.2.10 g2d_feature enumeration
This enumeration describes the features in G2D BLT.
Table 10. g2d_feature enumeration
Name
Numeric
Description
G2D_SCALING
0
Scaling
G2D_ROTATION
1
Rotation
G2D_SRC_YUV
2
Source YUV format
G2D_DST_YUV
3
Destination YUV format
G2D_MULTI_SOURCE_BLT
4
Multisource blit
2.3 G2D function descriptions
2.3.1 g2d_open
Description:
Open a G2D device and return a handle.
Syntax:
int g2d_open (void **handle);
Parameters:
handle Pointer to receive G2D device handle
Returns:
Success with 0, fail with -1
i.MX Graphics User’s Guide, Rev. 0, 02/2018
11 NXP Semiconductors
2.3.2 g2d_close
Description:
Close G2D device with the handle.
Syntax:
int g2d_close (void *handle);
Parameters:
handle G2D device handle
Returns:
Success with 0, fail with -1
2.3.3 g2d_make_current
Description:
Set the specific hardware type for current context, and the default is G2D_HARDWARE_2D.
Syntax:
int g2d_make_current (void *handle, enum g2d_hardware_type type);
Parameters:
handle G2D device handle
type G2D hardware type
Returns:
Success with 0, fail with -1
2.3.4 g2d_clear
Description:
Clear a specific area.
Syntax:
int g2d_clear (void *handle, struct g2d_surface *area);
Parameters:
handle G2D device handle
area The area to be cleared
Returns:
Success with 0, fail with -1
2.3.5 g2d_blit
Description:
G2D blit from source to destination with alternative operation (Blend, Dither, etc.).
Syntax:
i.MX Graphics User’s Guide, Rev. 0, 02/2018
12 NXP Semiconductors
int g2d_blit (void *handle, struct g2d_surface *src, struct g2d_surface *dst);
Parameters:
handle G2D device handle
src source surface
dst destination surface
Returns:
Success with 0, fail with -1
2.3.6 g2d_copy
Description:
G2D copy with specified size.
Syntax:
int g2d_copy (void *handle, struct g2d_buf *d, struct g2d_buf* s, int size);
Parameters:
handle G2D device handle
d destination buffer
s source buffer
size copy bytes
Limitations:
If the destination buffer is cacheable, it must be invalidated before g2d_copy
due to the alignment limitation of G2D driver.
Returns:
Success with 0, fail with -1
2.3.7 g2d_query_cap
Description:
Query the alternative capability enablement.
Syntax:
int g2d_query_cap (void *handle, enum g2d_cap_mode cap, int *enable);
Parameters:
handle G2D device handle
cap G2D capability to query
enable Pointer to receive G2D capability enablement
Returns: Success with 0, fail with -1
2.3.8 g2d_enable
Description:
i.MX Graphics User’s Guide, Rev. 0, 02/2018
13 NXP Semiconductors
Enable G2D capability with the specific mode.
Syntax:
int g2d_enable (void *handle, enum g2d_cap_mode cap);
Parameters:
handle G2D device handle
cap G2D capability to enable
Returns:
Success with 0, fail with -1
2.3.9 g2d_disable
Description:
Enable G2D capability with the specific mode.
Syntax:
int g2d_disable (void *handle, enum g2d_cap_mode cap);
Parameters:
handle G2D device handle
cap G2D capability to disable
Returns:
Success with 0, fail with -1
2.3.10 g2d_cache_op
Description:
Perform cache operations for the cacheable buffer allocated through the G2D driver.
Syntax:
int g2d_cache_op (struct g2d_buf *buf, enum g2d_cache_mode op);
Parameters:
buf the buffer to be handled with cache operations
op cache operation type
Returns:
Success with 0, fail with -1
2.3.11 g2d_alloc
Description:
Allocate a buffer through G2D device
Syntax:
struct g2d_buf *g2d_alloc (int size, int cacheable);
i.MX Graphics User’s Guide, Rev. 0, 02/2018
14 NXP Semiconductors
Parameters:
size allocated bytes
cacheable 0, non-cacheable, 1, cacheable attribute defined by system
Returns:
Success with valid G2D buffer pointer, fail with 0
2.3.12 g2d_free
Description:
Free the buffer through G2D device.
Syntax:
int g2d_free (struct g2d_buf *buf);
Parameters:
buf G2D buffer to free
Returns:
Success with 0, fail with -1
2.3.13 g2d_flush
Description:
Flush G2D command and return without completing pipeline.
Syntax:
int g2d_flush (void *handle);
Parameters:
handle G2D device handle
Returns:
Success with 0, fail with -1
2.3.14 g2d_finish
Description:
Flush G2D command and then return when pipeline is finished.
Syntax:
int g2d_finish (void *handle);
Parameters:
handle G2D device handle
Returns:
Success with 0, fail with -1
i.MX Graphics User’s Guide, Rev. 0, 02/2018
15 NXP Semiconductors
2.3.15 g2d_multi_blit
Description:
Blit multiple sources to one destination.
Syntax:
int g2d_multi_blit (void *handle, struct g2d_surface_pair *sp[], int layers);
Parameters:
handle G2D device handle
sp array in which elements point to g2d_surface_pair
layers number of the source layers that need to be blited
Returns:
Success with 0, fail with -1
Note:
There are some restrictions for this API that we should be aware of.
This API only works on the i.MX 6DualPlus/QuadPlus platform.
The maximum number of the source layers that can be blited one time is 8.
Although g2d_surface_pair binds one source g2d_surface and one destination g2d_surface as a pair, it
only supports one destination surface. The relationship between the source and destination is many to
one, but each source surface can be set separately and differently, and its dimension, stride, rotation, and
format can differ with that of the destination surface.
The rotation of the destination surface is set to 0 degree by defaut, and cannot be changed.
The key restriction is that the destination rectangle cannot be set, which means that the destination
rectangle must be the same as the source rectangle. Therefore, if the source rectangle is set to (l, t, r, b),
the destination rectangle should also be set to (l, t, r, b) by hardware. In the chapter on multi source blit
(2.4.4), as it makes no sense to set the destination rectangles, we just set all of them to (0, 0, width,
height) for future extension.
2.3.16 g2d_query_hardware
Description:
Query whether 2D and VG hardware are available in the current G2D.
Syntax:
int g2d_query_hardware (void *handle, enum g2d_hardware_type type, int *available);
Parameters:
handle G2D device handle
type G2D hardware type
available Pointer to receive G2D hardware type availability
Returns:
Success with 0, fail with -1
2.3.17 g2d_query_feature
Description:
Query if the features are available in G2D BLT.
Syntax:
int g2d_query_feature (void *handle, enum g2d_feature feature, int *available);
Parameters:
handle G2D device handle
i.MX Graphics User’s Guide, Rev. 0, 02/2018
16 NXP Semiconductors
feature G2D feature in g2d_blit
available Pointer to receive G2D feature availability
Returns:
Success with 0, fail with -1
2.4 Support of new operating system in G2D
G2D code is independent on operating system (OS) except of buffer allocation. Allocating the memory for buffer is
made by mechanism that is offered by each OS differently. The code for allocation is located in [G2D repository
copy]/source/os/[OS name]. Therefore, supporting new OS includes the following steps:
1. Create a new folder in [G2D repository copy]/source/os/ with the name of the new OS and update
implementation in the included source code according to the new OS allocation mechanism.
2. When creating new makefiles for the OS, include the files from the new folder.
3. The test named overlay_test contains the OS dependent code. For supporting the new OS in this test,
create new folder in [G2D repository copy]/test/overlay_test/os and update the code according to the
new OS mechanism for display initialization. Also update makefiles to include code from the new folder.
2.5 Sample code for G2D API usage
This chapter provides the brief prototype code with G2D API.
2.5.1 Color space conversion from YUV to RGB
g2d_open(&handle);
src.planes[0] = buf_y;
src.planes[1] = buf_u;
src.planes[2] = buf_v;
src.left = crop.left;
src.top = crop.top;
src.right = crop.right;
src.bottom = crop.bottom;
src.stride = y_stride;
src.width = y_width;
src.height = y_height;
src.rot = G2D_ROTATION_0;
src.format = G2D_I420;
dst.planes[0] = buf_rgba;
dst.left = 0;
dst.top = 0;
dst.right = disp_width;
dst.bottom = disp_height;
dst.stride = disp_width;
dst.width = disp_width;
dst.height = disp_height;
dst.rot = G2D_ROTATION_0;
dst.format = G2D_RGBA8888;
i.MX Graphics User’s Guide, Rev. 0, 02/2018
17 NXP Semiconductors
g2d_blit(handle, &src, &dst);
g2d_finish(handle);
g2d_close(handle);
2.5.2 Alpha blend in source over mode
g2d_open(&handle);
src.planes[0] = src_buf;
src.left = 0;
src.top = 0;
src.right = test_width;
src.bottom = test_height;
src.stride = test_width;
src.width = test_width;
src.height = test_height;
src.rot = G2D_ROTATION_0;
src.format = G2D_RGBA8888;
src.blendfunc = G2D_ONE;
dst.planes[0] = dst_buf;
dst.left = 0;
dst.top = 0;
dst.right = test_width;
dst.bottom = test_height;
dst.stride = test_width;
dst.width = test_width;
dst.height = test_height;
dst.format = G2D_RGBA8888;
dst.rot = G2D_ROTATION_0;
dst.blendfunc = G2D_ONE_MINUS_SRC_ALPHA;
g2d_enable(handle,G2D_BLEND);
g2d_blit(handle, &src, &dst);
g2d_finish(handle);
g2d_disable(handle,G2D_BLEND);
g2d_close(handle);
2.5.3 Source cropping and destination rotation
g2d_open(&handle);
src.planes[0] = src_buf;
src.left = crop.left;
src.top = crop.left;
src.right = crop.right;
i.MX Graphics User’s Guide, Rev. 0, 02/2018
18 NXP Semiconductors
src.bottom = crop.bottom;
src.stride = src_stride;
src.width = src_width;
src.height = src_height;
src.format = G2D_RGBA8888;
src.rot = G2D_ROTATION_0;//G2D_FLIP_H or G2D_FLIP_V
dst.planes[0] = dst_buf;
dst.left = 0;
dst.top = 0;
dst.right = dst_width;
dst.bottom = dst_height;
dst.stride = dst_width;
dst.width = dst_width;
dst.height = dst_height;
dst.format = G2D_RGBA8888;
dst.rot = G2D_ROTATION_90;
g2d_blit(handle, &src, &dst);
g2d_finish(handle);
g2d_close(handle);
2.5.4 Multi source blit
const int layers = 8;
struct g2d_buf *d_buf;
struct g2d_buf *mul_s_buf[layers];
struct g2d_surface_pair *sp[layers];
g2d_open(&handle)
for(n = 0; n < layers; n++) {
sp[n] = (struct g2d_surface_pair *)malloc(sizeof(struct g2d_surface_pair));
}
d_buf = g2d_alloc(test_width * test_height * 4, 0);
for(n = 0; n < layers; n++) {
mul_s_buf[n] = g2d_alloc(test_width * test_height * 4, 0);
}
for(n = 0; n < layers; n++) {
sp[n]->s.left = img_info_ptr[n]->img_left;
sp[n]->s.top = img_info_ptr[n]->img_top;
sp[n]->s.right = img_info_ptr[n]->img_right;
sp[n]->s.bottom = img_info_ptr[n]->img_bottom;
sp[n]->s.stride = img_info_ptr[n]->img_width;
sp[n]->s.width = img_info_ptr[n]->img_width;
sp[n]->s.height = img_info_ptr[n]->img_height;
i.MX Graphics User’s Guide, Rev. 0, 02/2018
19 NXP Semiconductors
sp[n]->s.rot = img_info_ptr[n]->img_rot;
sp[n]->s.format = img_info_ptr[n]->img_format;
sp[n]->s.planes[0] = mul_s_buf[n]->buf_paddr;
}
sp[0]->d.left = 0;
sp[0]->d.top = 0;
sp[0]->d.right = test_width;
sp[0]->d.bottom = test_height;
sp[0]->d.stride = test_width;
sp[0]->d.width = test_width;
sp[0]->d.height = test_height;
sp[0]->d.format = G2D_RGBA8888;
sp[0]->d.rot = G2D_ROTATION_0;
sp[0]->d.planes[0] = d_buf->buf_paddr;
for(n = 1; n < layers; n++) {
sp[n]->d = sp[0]->d;
}
g2d_multi_blit(handle, sp, layers);
g2d_finish(handle);
for(n = 0; n < layers; n++)
g2d_free(mul_s_buf[n]);
g2d_free(d_buf);
g2d_close(handle);
2.6 Feature list on multiple platforms
This user guide is for multiple platforms, such as i.MX 6 and i.MX 8, and the hardwares for the G2D implementation
are different on those platforms, so some G2D features are also different.
For example, the G2D_YVYU and G2D_VYUY formats are not supported on the i.MX8, and the g2d_multi_blit
function only works on the i.MX 6DualPlus/QuadPlus. Therefore, we list those differences in the following feature
table.
Table 11. Feature list on multiple platforms
Feature
i.MX 6
i.MX 7
i.MX 8
Solo/Dual/Quad
DualPlus/QuadPlus
ULP1
QuadMax
G2D_YVYU
Yes
Yes
Yes
No
G2D_VYUY
Yes
Yes
Yes
No
G2D_HARDWARE_VG
Yes
Yes
Yes
No
G2D_MULTI_SOURCE_BLT
No
Yes
Yes
No
g2d_cache_op
Yes
Yes
Yes
No
i.MX Graphics User’s Guide, Rev. 0, 02/2018
20 NXP Semiconductors
Chapter 3 i.MX EGL and OGL Extension Support
3.1 Introduction
The following tables list the level of support for EGL and OES extensions available with i.MX hardware and
software. Support levels are current as of the date of the document and subject to change.
Two tables are provided. The first table lists the EGL interface extensions. The second table lists extensions for
OpenGL ES 1.1, OpenGL ES 2.0, and OpenGL ES 3.0.
Key:
Extension Name and Number: Each listed extension is derived from the relevant khronos.org webpage list and
includes the extension number as well as a hyperlink to the khronos description of the extension.
Yes: Support is currently available.
No: Support is not available. (Reasons for lack of support may vary: the extension may be proprietary or obsolete,
or not applicable to the specified OES version.)
N/A: Support is not provided as the extension is not applicable in this and subsequent versions of the specification.
3.2 EGL extension support
The following table includes the list of all current EGL Extensions and indicates their support level.
(list from www.khronos.org/registry/egl/ as of 1/24/2013)
Table 12. EGL extension support
EGL Extension Number, Name and hyperlink
Supported
1. EGL_KHR_config_attribs
2. EGL_KHR_lock_surface
YES
3. EGL_KHR_image
YES
4. EGL_KHR_vg_parent_image
5. EGL_KHR_gl_texture_2D_image
EGL_KHR_gl_texture_cubemap_image
EGL_KHR_gl_texture_3D_image
EGL_KHR_gl_renderbuffer_image
YES
YES
no
YES
6. EGL_KHR_reusable_sync
YES
8. EGL_KHR_image_base
YES
9. EGL_KHR_image_pixmap
YES
10. EGL_IMG_context_priority
16. EGL_KHR_lock_surface2
17. EGL_NV_coverage_sample
18. EGL_NV_depth_nonlinear
19. EGL_NV_sync
20. EGL_KHR_fence_sync
YES
24. EGL_HI_clientpixmap
25. EGL_HI_colorformats
26. EGL_MESA_drm_image
27. EGL_NV_post_sub_buffer
28. EGL_ANGLE_query_surface_pointer
i.MX Graphics User’s Guide, Rev. 0, 02/2018
21 NXP Semiconductors
29. EGL_ANGLE_surface_d3d_texture_2d_share_handle
30. EGL_NV_coverage_sample_resolve
31. EGL_NV_system_time
32. EGL_KHR_stream
33. EGL_KHR_stream_consumer_gltexture
34. EGL_KHR_stream_producer_eglsurface
35. EGL_KHR_stream_producer_aldatalocator
36. EGL_KHR_stream_fifo
37. EGL_EXT_create_context_robustness
YES
38. EGL_ANGLE_d3d_share_handle_client_buffer
39. EGL_KHR_create_context
YES
40. EGL_KHR_surfaceless_context
41. EGL_KHR_stream_cross_process_fd
42. EGL_EXT_multiview_window
43. EGL_KHR_wait_sync
44. EGL_NV_post_convert_rounding
45. EGL_NV_native_query
46. EGL_NV_3dvision_surface
47. EGL_ANDROID_framebuffer_target
48. EGL_ANDROID_blob_cache
49. EGL_ANDROID_image_native_buffer
YES
50. EGL_ANDROID_native_fence_sync
YES
51. EGL_ANDROID_recordable
52. EGL_EXT_buffer_age
YES
53. EGL_EXT_image_dma_buf_import
YES
54. EGL_ARM_pixmap_multisample_discard
55. EGL_EXT_swap_buffers_with_damage
56. EGL_NV_stream_sync
57. EGL_EXT_platform_base
58. EGL_EXT_client_extensions
59. EGL_EXT_platform_x11
60. EGL_KHR_cl_event
61. EGL_KHR_get_all_proc_addresses
EGL_KHR_client_get_all_proc_addresses
62. EGL_MESA_platform_gbm
63. EGL_EXT_platform_wayland
64. EGL_KHR_lock_surface3
65. EGL_KHR_cl_event2
66. EGL_KHR_gl_colorspace
67. EGL_EXT_protected_surface
YES
68. EGL_KHR_platform_android
69. EGL_KHR_platform_gbm
70. EGL_KHR_platform_wayland
YES
71. EGL_KHR_platform_x11
72. EGL_EXT_device_base
73. EGL_EXT_platform_device
74. EGL_NV_device_cuda
i.MX Graphics User’s Guide, Rev. 0, 02/2018
22 NXP Semiconductors
75. EGL_NV_cuda_event
76. EGL_TIZEN_image_native_buffer
77. EGL_TIZEN_image_native_surface
78. EGL_EXT_output_base
79. EGL_EXT_device_drm
EGL_EXT_output_drm
80. EGL_EXT_device_openwf
EGL_EXT_output_openwf
81. EGL_EXT_stream_consumer_egloutput
83. EGL_KHR_partial_update
84. EGL_KHR_swap_buffers_with_damage
85. EGL_ANGLE_window_fixed_size
86. EGL_EXT_yuv_surface
87. EGL_MESA_image_dma_buf_export
88. EGL_EXT_device_enumeration
89. EGL_EXT_device_query
90. EGL_ANGLE_device_d3d
91. EGL_KHR_create_context_no_error
92. EGL_KHR_debug
93. EGL_NV_stream_metadata
94. EGL_NV_stream_consumer_gltexture_yuv
95. EGL_IMG_image_plane_attribs
96. EGL_KHR_mutable_render_buffer
97. EGL_EXT_protected_content
98. EGL_ANDROID_presentation_time
99. EGL_ANDROID_create_native_client_buffer
100. EGL_ANDROID_front_buffer_auto_refresh
101. EGL_KHR_no_config_context
102. EGL_KHR_context_flush_control
103. EGL_ARM_implicit_external_sync
104. EGL_MESA_platform_surfaceless
105. EGL_EXT_image_dma_buf_import_modifiers
106. EGL_EXT_pixel_format_float
107. EGL_EXT_gl_colorspace_bt2020_linear
EGL_EXT_gl_colorspace_bt2020_pq
108. EGL_EXT_gl_colorspace_scrgb_linear
109. EGL_EXT_surface_SMPTE2086_metadata
110. EGL_NV_stream_fifo_next
111. EGL_NV_stream_fifo_synchronous
112. EGL_NV_stream_reset
113. EGL_NV_stream_frame_limits
114. EGL_NV_stream_remote
EGL_NV_stream_cross_object
EGL_NV_stream_cross_display
EGL_NV_stream_cross_process
EGL_NV_stream_cross_partition
EGL_NV_stream_cross_system
i.MX Graphics User’s Guide, Rev. 0, 02/2018
23 NXP Semiconductors
3.3 OpenGL ES extension support
The following table includes the list of all current OpenGL ES Extensions and indicates their support level.
(list from www.khronos.org/registry/gles/ as of 9/27/2012)
Table 13. OpenGL ES extension support
Extension Number, Name and hyperlink
ES1.1
ES2.0/3.0/3.1/3.2
1. GL_OES_blend_equation_separate
YES
na
2. GL_OES_blend_func_separate
YES
na
3. GL_OES_blend_subtract
YES
na
4. GL_OES_byte_coordinates
YES
na
5. GL_OES_compressed_ETC1_RGB8_texture
YES
YES
6. GL_OES_compressed_paletted_texture
YES
YES
7. GL_OES_draw_texture
YES
na
8. GL_OES_extended_matrix_palette
YES
9. GL_OES_fixed_point
YES
10. GL_OES_framebuffer_object
YES
na
11. GL_OES_matrix_get
YES
na
12. GL_OES_matrix_palette
YES
na
14. GL_OES_point_size_array
YES
15. GL_OES_point_sprite
YES
16. GL_OES_query_matrix
YES
na
17. GL_OES_read_format
YES
18. GL_OES_single_precision
YES
19. GL_OES_stencil_wrap
YES
20. GL_OES_texture_cube_map
YES
na
21. GL_OES_texture_env_crossbar
22. GL_OES_texture_mirrored_repeat
YES
na
23. GL_OES_EGL_image
YES
YES
24. GL_OES_depth24
YES
YES
25. GL_OES_depth32
YES
26. GL_OES_element_index_uint
YES
YES
27. GL_OES_fbo_render_mipmap
YES
YES
28. GL_OES_fragment_precision_high
YES
29. GL_OES_mapbuffer
YES
YES
30. GL_OES_rgb8_rgba8
YES
YES
31. GL_OES_stencil1
32. GL_OES_stencil4
33. GL_OES_stencil8
YES
na
115. EGL_NV_stream_socket
EGL_NV_stream_socket_unix
EGL_NV_stream_socket_inet
EGL_ANDROID_get_render_buffer
YES
EGL_ANDROID_swap_rectangle
YES
EGL_WL_bind_wayland_display
YES
i.MX Graphics User’s Guide, Rev. 0, 02/2018
24 NXP Semiconductors
Extension Number, Name and hyperlink
ES1.1
ES2.0/3.0/3.1/3.2
34. GL_OES_texture_3D
35. GL_OES_texture_float_linear
no
GL_OES_texture_half_float_linear
CORE
36. GL_OES_texture_float
CORE
GL_OES_texture_half_float
CORE
37. GL_OES_texture_npot
YES
YES
38. GL_OES_vertex_half_float
YES
YES
39. GL_AMD_compressed_3DC_texture
40. GL_AMD_compressed_ATC_texture
41. GL_EXT_texture_filter_anisotropic
CORE
CORE
42. GL_EXT_texture_type_2_10_10_10_REV
CORE
43. GL_OES_depth_texture
YES
44. GL_OES_packed_depth_stencil
YES
YES
45. GL_OES_standard_derivatives
YES
46. GL_OES_vertex_type_10_10_10_2
CORE
47. GL_OES_get_program_binary
YES
48. GL_AMD_program_binary_Z400
49. GL_EXT_texture_compression_dxt1
YES
50. GL_AMD_performance_monitor
51. GL_EXT_texture_format_BGRA8888
YES
YES
52. GL_NV_fence
53. GL_IMG_read_format
54. GL_IMG_texture_compression_pvrtc
55. GL_QCOM_driver_control
56. GL_QCOM_performance_monitor_global_mode
57. GL_IMG_user_clip_plane
58. GL_IMG_texture_env_enhanced_fixed_function
59. GL_APPLE_texture_2D_limited_npot
60. GL_EXT_texture_lod_bias
YES
N/A
61. GL_QCOM_writeonly_rendering
62. GL_QCOM_extended_get
63. GL_QCOM_extended_get2
64. GL_EXT_discard_framebuffer
YES
65. GL_EXT_blend_minmax
YES
YES
66. GL_EXT_read_format_bgra
YES
YES
67. GL_IMG_program_binary
68. GL_IMG_shader_binary
69. GL_EXT_multi_draw_arrays
YES
YES
GL_SUN_multi_draw_arrays
no
no
70. GL_QCOM_tiled_rendering
71. GL_OES_vertex_array_object
YES
72. GL_NV_coverage_sample
73. GL_NV_depth_nonlinear
74. GL_IMG_multisampled_render_to_texture
75. GL_OES_EGL_sync
YES
YES
76. GL_APPLE_rgb_422
i.MX Graphics User’s Guide, Rev. 0, 02/2018
25 NXP Semiconductors
Extension Number, Name and hyperlink
ES1.1
ES2.0/3.0/3.1/3.2
77. GL_EXT_shader_texture_lod
78. GL_APPLE_framebuffer_multisample
79. GL_APPLE_texture_format_BGRA8888
80. GL_APPLE_texture_max_level
81. GL_ARM_mali_shader_binary
82. GL_ARM_rgba8
83. GL_ANGLE_framebuffer_blit
84. GL_ANGLE_framebuffer_multisample
85. GL_VIV_shader_binary
86. GL_EXT_frag_depth
YES
87. GL_OES_EGL_image_external
YES
YES
88. GL_DMP_shader_binary
89. GL_QCOM_alpha_test
90. GL_EXT_unpack_subimage
N/A
91. GL_NV_draw_buffers
92. GL_NV_fbo_color_attachments
93. GL_NV_read_buffer
94. GL_NV_read_depth_stencil
95. GL_NV_texture_compression_s3tc_update
96. GL_NV_texture_npot_2D_mipmap
97. GL_EXT_color_buffer_half_float
CORE
98. GL_EXT_debug_label
99. GL_EXT_debug_marker
100. GL_EXT_occlusion_query_boolean
101. GL_EXT_separate_shader_objects
102. GL_EXT_shadow_samplers
103. GL_EXT_texture_rg
YES
104. GL_NV_EGL_stream_consumer_external
105. GL_EXT_sRGB
106. GL_EXT_multisampled_render_to_texture
YES
107. GL_EXT_robustness
YES
108. GL_EXT_texture_storage
109. GL_ANGLE_instanced_arrays
110. GL_ANGLE_pack_reverse_row_order
111. GL_ANGLE_texture_compression_dxt3
GL_ANGLE_texture_compression_dxt5
112. GL_ANGLE_texture_usage
113. GL_ANGLE_translated_shader_source
114. GL_FJ_shader_binary_GCCSO
115. GL_OES_required_internalformat
YES
116. GL_OES_surfaceless_context
YES
117. GL_KHR_texture_compression_astc_hdr
GL_KHR_texture_compression_astc_ldr
YES
118. GL_KHR_debug
YES
119. GL_QCOM_binning_control
120. GL_ARM_mali_program_binary
i.MX Graphics User’s Guide, Rev. 0, 02/2018
26 NXP Semiconductors
Extension Number, Name and hyperlink
ES1.1
ES2.0/3.0/3.1/3.2
121. GL_EXT_map_buffer_range
122. GL_EXT_shader_framebuffer_fetch
CORE
123. GL_APPLE_copy_texture_levels
124. GL_APPLE_sync
125. GL_EXT_multiview_draw_buffers
126. GL_NV_draw_texture
127. GL_NV_packed_float
128. GL_NV_texture_compression_s3tc
129. GL_NV_3dvision_settings
130. GL_NV_texture_compression_latc
131. GL_NV_platform_binary
132. GL_NV_pack_subimage
133. GL_NV_texture_array
134. GL_NV_pixel_buffer_object
135. GL_NV_bgr
136. GL_OES_depth_texture_cube_map
YES
137. GL_EXT_color_buffer_float
CORE
138. GL_ANGLE_depth_texture
139. GL_ANGLE_program_binary
140. GL_IMG_texture_compression_pvrtc2
141. GL_NV_draw_instanced
142. GL_NV_framebuffer_blit
143. GL_NV_framebuffer_multisample
144. GL_NV_generate_mipmap_sRGB
145. GL_NV_instanced_arrays
146. GL_NV_shadow_samplers_array
147. GL_NV_shadow_samplers_cube
148. GL_NV_sRGB_formats
149. GL_NV_texture_border_clamp
150. GL_EXT_disjoint_timer_query
151. GL_EXT_draw_buffers
152. GL_EXT_texture_sRGB_decode
YES
153. GL_EXT_sRGB_write_control
154. GL_EXT_texture_compression_s3tc
YES
155. GL_EXT_pvrtc_sRGB
156. GL_EXT_instanced_arrays
157. GL_EXT_draw_instanced
158. GL_NV_copy_buffer
159. GL_NV_explicit_attrib_location
160. GL_NV_non_square_matrices
161. GL_EXT_shader_integer_mix
162. GL_OES_texture_compression_astc
163. GL_NV_blend_equation_advanced
GL_NV_blend_equation_advanced_coherent
164. GL_INTEL_performance_query
165. GL_ARM_shader_framebuffer_fetch
i.MX Graphics User’s Guide, Rev. 0, 02/2018
27 NXP Semiconductors
Extension Number, Name and hyperlink
ES1.1
ES2.0/3.0/3.1/3.2
166. GL_ARM_shader_framebuffer_fetch_depth_stencil
167. GL_EXT_shader_pixel_local_storage
168. GL_KHR_blend_equation_advanced
CORE
GL_KHR_blend_equation_advanced_coherent
169. GL_OES_sample_shading
CORE
170. GL_OES_sample_variables
CORE
171. GL_OES_shader_image_atomic
CORE
172. GL_OES_shader_multisample_interpolation
CORE
173. GL_OES_texture_stencil8
CORE
174. GL_OES_texture_storage_multisample_2d_array
CORE
175. GL_EXT_copy_image
CORE
176. GL_EXT_draw_buffers_indexed
CORE
177. GL_EXT_geometry_shader
CORE
GL_EXT_geometry_point_size
CORE
178. GL_EXT_gpu_shader5
CORE
179. GL_EXT_shader_implicit_conversions
CORE
180. GL_EXT_shader_io_blocks
CORE
181. GL_EXT_tessellation_shader
CORE
GL_EXT_tessellation_point_size
CORE
182. GL_EXT_texture_border_clamp
CORE
183. GL_EXT_texture_buffer
CORE
184. GL_EXT_texture_cube_map_array
CORE
185. GL_EXT_texture_view
186. GL_EXT_primitive_bounding_box
CORE
187. GL_ANDROID_extension_pack_es31a
CORE
188. GL_EXT_compressed_ETC1_RGB8_sub_texture
189. GL_KHR_robust_buffer_access_behavior
YES
190. GL_KHR_robustness
YES
191. GL_KHR_context_flush_control
192. GL_DMP_program_binary
193. GL_APPLE_clip_distance
194. GL_APPLE_color_buffer_packed_float
195. GL_APPLE_texture_packed_float
196. GL_NV_internalformat_sample_query
197. GL_NV_bindless_texture
198. GL_NV_conditional_render
199. GL_NV_path_rendering
200. GL_NV_image_formats
201. GL_NV_shader_noperspective_interpolation
202. GL_NV_viewport_array
203. GL_EXT_base_instance
204. GL_EXT_draw_elements_base_vertex
CORE
205. GL_EXT_multi_draw_indirect
CORE
206. GL_EXT_render_snorm
207. GL_EXT_texture_norm16
208. GL_OES_copy_image
CORE
i.MX Graphics User’s Guide, Rev. 0, 02/2018
28 NXP Semiconductors
Extension Number, Name and hyperlink
ES1.1
ES2.0/3.0/3.1/3.2
209. GL_OES_draw_buffers_indexed
CORE
210. GL_OES_geometry_shader
CORE
211. GL_OES_gpu_shader5
CORE
212. GL_OES_primitive_bounding_box
CORE
213. GL_OES_shader_io_blocks
CORE
214. GL_OES_tessellation_shader
CORE
215. GL_OES_texture_border_clamp
CORE
216. GL_OES_texture_buffer
CORE
217. GL_OES_texture_cube_map_array
CORE
218. GL_OES_texture_view
219. GL_OES_draw_elements_base_vertex
CORE
220. GL_OES_copy_image
CORE
221. GL_EXT_texture_sRGB_R8
222. GL_EXT_yuv_target
223. GL_EXT_texture_sRGB_RG8
224. GL_EXT_float_blend
225. GL_EXT_post_depth_coverage
226. GL_EXT_raster_multisample
227. GL_EXT_texture_filter_minmax
228. GL_NV_conservative_raster
229. GL_NV_fragment_coverage_to_color
230. GL_NV_fragment_shader_interlock
231. GL_NV_framebuffer_mixed_samples
232. GL_NV_fill_rectangle
233. GL_NV_geometry_shader_passthrough
234. GL_NV_path_rendering_shared_edge
235. GL_NV_sample_locations
236. GL_NV_sample_mask_override_coverage
237. GL_NV_viewport_array2
238. GL_NV_polygon_mode
239. GL_EXT_buffer_storage
240. GL_EXT_sparse_texture
241. GL_OVR_multiview
242. GL_OVR_multiview2
243. GL_KHR_no_error
246. GL_INTEL_framebuffer_CMAA
247. GL_EXT_blend_func_extended
248. GL_EXT_multisample_compatibility
249. GL_KHR_texture_compression_astc_sliced_3d
250. GL_OVR_multiview_multisampled_render_to_texture
251. GL_IMG_texture_filter_cubic
251. GL_IMG_texture_filter_cubic
252. GL_EXT_polygon_offset_clamp
253. GL_EXT_shader_pixel_local_storage2
254. GL_EXT_shader_group_vote
255. GL_IMG_framebuffer_downsample
i.MX Graphics User’s Guide, Rev. 0, 02/2018
29 NXP Semiconductors
Extension Number, Name and hyperlink
ES1.1
ES2.0/3.0/3.1/3.2
256. GL_EXT_protected_textures
CORE
257. GL_EXT_clip_cull_distance
258. GL_NV_viewport_swizzle
259. GL_EXT_sparse_texture2
260. GL_NV_gpu_shader5
261. GL_NV_shader_atomic_fp16_vector
262. GL_NV_conservative_raster_pre_snap_triangles
263. GL_EXT_window_rectangles
264. GL_EXT_shader_non_constant_global_initializers
265. GL_INTEL_conservative_rasterization
266. GL_NVX_blend_equation_advanced_multi_draw_buffers
267. GL_OES_viewport_array
268. GL_EXT_conservative_depth
269. GL_EXT_clear_texture
270. GL_IMG_bindless_texture
271. GL_NV_texture_barrier
GL_VIV_direct_texture
YES
YES
3.4 Extension GL_VIV_direct_texture
Name
VIV_direct_texture
Name strings
GL_VIV_direct_texture
IPStatus
Contact NXP Semiconductor regarding any intellectual property questions associated with this extension.
Status
Implemented: July, 2011
Version
Last modified: 29 July, 2011
Revision: 2
Number
Unassigned
Dependencies
OpenGL ES 1.1 is required. OpenGL ES 2.0 support is available.
Overview
Create a texture with direct access support. This is useful when an application desires to use the same texture over and over
while frequently updating its content. It could also be used for mapping live video to a texture. A video decoder could write its
result directly to the texture and then the texture could be directly rendered onto a 3D shape. glTexDirectVIVMap is similar
i.MX Graphics User’s Guide, Rev. 0, 02/2018
30 NXP Semiconductors
to glTexDirectVIV. The only difference is that it has two inputs, “Logical” and “Physical,” which support mapping a user
space memory or a physical address into the texture surface.
New Procedures and Functions
glTexDirectVIV
Syntax:
GL_API void GL_APIENTRY
glTexDirectVIV (
GLenum Target,
GLsizei Width,
GLsizei Height,
GLenum Format,
GLvoid ** Pixels
);
Parameters
Target
Target texture. Must be GL_TEXTURE_2D.
Width
Height
Size of LOD 0. Width must be 16 pixel aligned. The width and
height of LOD 0 of the texture is specified by the Width and Height
parameters. The driver may auto-generate the rest of LODs if the
hardware supports high quality scaling (for non-power of 2
textures) and LOD generation. If the hardware does not support
high quality scaling and LOD generation, the texture remains a
single-LOD texture.
Format
Choose the format of the pixel data from the following formats:
GL_VIV_YV12, GL_VIV_NV12, GL_VIV_NV21, GL_VIV_YUY2,
GL_VIV_UYVY, GL_RGBA, and GL_BGRA_EXT.
If the format is GL_VIV_YV12, glTexDirectVIV creates a planar
YV12 4:2:0 texture and the format of the Pixels array is as
follows: Yplane, Vplane, Uplane.
If the format is GL_VIV_NV12, glTexDirectVIV creates a planar
NV12 4:2:0 texture and the format of the Pixels array is as
follows: Yplane, UVplane.
If the format is GL_VIV_NV21, glTexDirectVIV creates a planar
NV21 4:2:0 texture and the format of the Pixels array is as
follows: Yplane, VUplane.
If the format is GL_VIV_YUY2 or GL_VIV_UYVY, glTexDirectVIV
creates a packed 4:2:2 texture and the Pixels array contains
only one pointer to the packed YUV texture.
If Format is GL_RGBA, glTexDirectVIV creates a pixel array
with four GL_UNSIGNED_BYTE components: the first byte for
red pixels, the second byte for green pixels, the third byte for
blue, and the fourth byte for alpha.
If Format is GL_BGRA_EXT, glTexDirectVIV creates a pixel
array with four GL_UNSIGNED_BYTE components: the first
byte for blue pixels, the second byte for green pixels, the third
byte for red, and the fourth byte for alpha.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
31 NXP Semiconductors
Pixels
Stores the memory pointer created by the driver.
Output
If the function succeeds, it returns a pointer, or, for some YUV formats, it returns a set of pointers that
directly point to the texture. The pointer(s) are returned in the user-allocated array pointed to by the Pixels
parameter.
GlTexDirectVIVMap
Syntax:
GL_API void GL_APIENTRY
glTexDirectVIVMap (
Glenum Target,
Glsizei Width,
Glsizei Height,
Glenum Format,
Glvoid ** Logical,
const Gluint * Physical
);
Parameters
Target
Target texture. Must be GL_TEXTURE_2D.
Width
Height
Size of LOD 0. Width must be 16 pixel aligned. See glTexDirectVIV.
Format
Same as glTexDirectVIV Format.
Logical
Pointer to the logical address of the application-defined texture
buffer. Logical address must be 64 bit (8 byte) aligned.
Physical
Pointer to the physical address of the application-defined buffer to
the texture, or ~0 if no physical address has been provided.
GlTexDirectInvalidateVIV
Syntax:
GL_API void GL_APIENTRY
glTexDirectInvalidateVIV (
Glenum Target
);
Parameters
Target
Target texture. Must be GL_TEXTURE_2D.
New Tokens
GL_VIV_YV12
0x8FC0
GL_VIV_NV12
0x8FC1
GL_VIV_YUY2
0x8FC2
i.MX Graphics User’s Guide, Rev. 0, 02/2018
32 NXP Semiconductors
GL_VIV_UYVY
0x8FC3
GL_VIV_NV21
0x8FC4
Error codes
GL_INVALID_ENUM
Target is not GL_TEXTURE_2D, or format is not a valid format.
GL_INVALID_VALUE
Width or Height parameter is less than 1.
GL_OUT_OF_MEMORY
A memory allocation error occurred.
GL_INVALID_OPERATION
Specified format is not supported by the hardware, or
no texture is bound to the active texture unit, or
some other error occurs during the call.
Example 1.
First, call glTexDirectVIV to get a pointer.
Second, copy the texture data to this memory address.
Then, call glTexDirectInvalidateVIV to apply the texture before drawing something with that texture.
… …
glTexDirectVIV(GL_TEXUTURE_2D, 512, 512, GL_VIV_YV12, &texels);
… …
GlTexDirectInvalidateVIV(GL_TEXTURE_2D);
glDrawArrays(…);
Example 2.
First, call glTexDirectVIVMap to map Logical and Physical address to the texture.
Second, modify Logical and Physical data.
Then, call glTexDirectInvalidateVIV to apply the texture before drawing something with that texture.
… …
char *Logical = (char*) malloc (sizeof(char)*size);
Gluint physical = ~0U;
glTexDirectVIVMap(GL_TEXUTURE_2D, 512, 512, GL_VIV_YV12,
(void**)&Logical, &
32
hysical);
… …
GlTexDirectInvalidateVIV(GL_TEXTURE_2D);
glDrawArrays(…);
Issues
None
3.5 Extension GL_VIV_texture_border_clamp
Name
VIV_texture_border_clamp
i.MX Graphics User’s Guide, Rev. 0, 02/2018
33 NXP Semiconductors
Name Strings
GL_VIV_texture_border_clamp
Status
Implemented September 2012.
Version
Last modified: 27 September 2012
Vivante revision: 1
Number
Unassigned
Dependencies
This extension is implemented for use with OpenGL ES 1.1 and OpenGL ES 2.0.
This extension is based on OpenGL ARB Extension #13: GL_ARB_texture_border_clamp:
www.opengl.org/registry/specs/ARB/texture_border_clamp.txt. See also vendor extension GL_SGIS_texture_border_clamp:
www.opengl.org/registry/specs/SGIS/texture_border_clamp.txt.
Overview
This extension was adapted from the OpenGL extension for use with OpenGL ES implementations. The OpenGL ARB Extension
13 description applies here as well:
“The base OpenGL provides clamping such that the texture coordinates are limited to exactly the range
[0,1]. When a texture coordinate is clamped using this algorithm, the texture sampling filter straddles the
edge of the texture image, taking 1/2 its sample values from within the texture image, and the other 1/2
from the texture border. It is sometimes desirable for a texture to be clamped to the border color, rather
than to an average of the border and edge colors.
This extension defines an additional texture clamping algorithm. CLAMP_TO_BORDER_[VIV] clamps texture
coordinates at all mipmap levels such that NEAREST and LINEAR filters return only the color of the border
texels.”
The color returned is derived only from border texels and cannot be configured.
Issues
None
New Tokens
Accepted by the <param> parameter of TexParameteri and TexParameterf, and by the <params> parameter of
TexParameteriv and TexParameterfv, when their <pname> parameter is TEXTURE_WRAP_S, TEXTURE_WRAP_T, or
TEXTURE_WRAP_R:
CLAMP_TO_BORDER_VIV
0x812D
Errors
None.
New State
Only the type information changes for these parameters.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
34 NXP Semiconductors
See OES 2.0 Specification Section 3.7.4, page 75-76, Table 3.10, “Texture parameters and their values.”
i.MX Graphics User’s Guide, Rev. 0, 02/2018
35 NXP Semiconductors
Chapter 4 i.MX Framebuffer API
4.1 Overview
The graphics software includes i.MX Framebuffer (FB) API which enables users to easily create and port their
graphics applications by using a framebuffer device without the need to expend additional effort handling
platform-related tasks. i.MX Framebuffer API focuses on providing mechanisms for controlling display, window,
and pixmap render surfaces.
The EGL Native Platform Graphics Interface provides mechanisms for creating rendering surfaces onto which client
APIs can draw, creating graphics contexts for client APIs, and synchronizing drawing by client APIs as well as native
platform rendering APIs. This enables seamless rendering using Khronos APIs such as OpenGL ES and OpenVG for
high-performance, accelerated, mixed-mode 2D, and 3D rendering. For further information on EGL, see
www.khronos.org/registry/egl. The API described in this document is compatible with EGL version 1.4 of the
specification.
The following platforms are supported:
Linux® OS/X11
Android platform
Windows® Embedded Compact OS
QNX®
4.2 API data types and environment variables
4.2.1 Data types
The GPU software provides platform independent member definitions for the following EGL types:
typedef struct _FBDisplay * EGLNativeDisplayType;
typedef struct _FBWindow * EGLNativeWindowType;
typedef struct _FBPixmap * EGLNativePixmapType;
Figure 2 Types as listed on EGL 1.4 API Quick Reference Card
(from www.khronos.org/files/egl-1-4-quick-reference-card.pdf)
i.MX Graphics User’s Guide, Rev. 0, 02/2018
36 NXP Semiconductors
4.2.2 Environment variables
Table 14. i.MX FB API environment variables
Environment Variables
Description
FB_MULTI_BUFFER
To use multiple-buffer rendering, set the environment variable
FB_MULTI_BUFFER to an unsigned integer value, which indicates the
number of buffers required. The maximum is 8.
Recommended values: 4.
The FB_MULTI_BUFFER variable can be set to any positive integer value.
If set to 1, the multiple-buffer function is not enabled, and the VSYNC
is also disabled, so there may be tearing on screen, but it is good for
benchmark test.
If set to 2 or 3, VSYNC is enabled and there are double or trible frame
buffer. Because of the hardware limitation of current IPU, there may
be tearing on screen.
If set to 4 or more, VSYNC is enabled and no screen tearing appears.
If set to a value more than 8, the driver uses 8 as the buffer count.
FB_FRAMEBUFFER_0,
FB_FRAMEBUFFER_1,
FB_FRAMEBUFFER_2,
FB_FRAMEBUFFER_n
To open a specified framebuffer device, set the environment variable
FB_FRAMEBUFFER_n to a proper value (for example,
FB_FRAMEBUFFER_0 = /dev/fb0).
Allowed values for n: any positive integer.
Note: If there are no environment variables set, the driver tries to use the
default framebuffer devices (fb0 for index 0, fb1 for index 1, fb2 for index
2, fb3 for index 3, and so on).
FB_IGNORE_DISPLAY_SIZE
When set to a positive integer and a window’s initial size request is
greater than the display size, the window size is not reduced to fit within
the display. Global.
Allowed values: any positive integer.
Note: The drivers read the value from this environment variable as a
Boolean to check if the user wants to ignore the display size when creating
a window.
If the variable is set to value, 0, or this environment variable is not
set, when creating window, the driver uses display size to cut down
the size of the window to ensure that the entire window area is
inside the display screen.
If the user sets this variable to 1, or any positive integer value, then
the window area can be partly or entirely outside of the display
screen area (see the image below in which the ignore display size is
equal to 1).
GPU_VIV_DISABLE_CLEAR_FB
It turns off zero fill memory, so the content of FBDEV buffer is not cleared.
FB_LEGACY
If the board support drm-fb, the gpu will render though drm by default. If
the user wants to render to framebuffer directly instead of through drm,
window
Display
i.MX Graphics User’s Guide, Rev. 0, 02/2018
37 NXP Semiconductors
sets this variable to 1.
Below are some usage syntax examples for environment variables:
To create a window with its size different from the display size, use the environment variable
FB_IGNORE_DISPLAY_SIZE. Example usage syntax:
export FB_IGNORE_DISPLAY_SIZE=1
To let the driver use multiple buffers to do swap work, use the environment variable FB_MULTI_BUFFER. Example
usage syntax:
export FB_MULTI_BUFFER=2
To specify the display device, use the environment variable FB_FRAMEBUFFER_n, where n = any positive integer.
Example usage syntax:
export FB_FRAMEBUFFER_0=/dev/fb0
export FB_FRAMEBUFFER_1=/dev/fb1
export FB_FRAMEBUFFER_2=/dev/fb2
export FB_FRAMEBUFFER_3=/dev/fb3
4.3 API description and syntax
fbGetDisplay
Description:
This function is used to get the default display of the framebuffer device.
To open the framebuffer device, set an environment variable FB_FRAMEBUFFER_n to the framebuffer location.
Syntax:
EGLNativeDisplayType
fbGetDisplay (
void * context
);
Parameters:
context Pointer to the native display instance.
Return Values:
The function returns a pointer to the EGL native display instance if successful; otherwise, it returns a NULL pointer.
fbGetDisplayByIndex
Description:
This function is used to get a specified display within a multiple framebuffer environment by providing an index
number.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
38 NXP Semiconductors
To use multiple buffers when rendering, set the environment variable FB_MULTI_BUFFER to an unsigned integer
value, which indicates the number of buffers. Maximum is 3.
To open a specific Framebuffer device, set environment variables to their proper values (e.g., set
FB_FRAMEBUFFER_0 = /dev/fb0). If there are no environment variables set, the driver tries to use the default fb
devices (fb0 for index 0, fb1 for index 1, fb2 for index 2, fb3 for index 3, and so on).
Syntax:
EGLNativeDisplayType
fbGetDisplayByIndex (
int DisplayIndex
);
Parameters:
DisplayIndex An integer value where the integer is associated with one of the following environment
variables for framebuffer devices:
FB_FRAMEBUFFER_0
FB_FRAMEBUFFER_1
FB_FRAMEBUFFER_2
FB_FRAMEBUFFER_n
Return Value:
The function returns a pointer to the EGL native display instance if successful; otherwise, it returns a NULL pointer.
fbGetDisplayGeometry
Description:
This function is used to get display width and height information.
Syntax:
void
fbGetDisplayGeometry (
EGLNativeDisplayType Display,
int * Width,
int * Height
);
Parameters:
Display [in] Pointer to EGL native display instance created by fbGetDisplay.
Width [out] Pointer that receives the width of the display.
Height [out] Pointer that receives the height of the display.
fbGetDisplayInfo
Description:
This function is used to get display information.
Syntax:
void
fbGetDisplayInfo (
EGLNativeDisplayType Display,
i.MX Graphics User’s Guide, Rev. 0, 02/2018
39 NXP Semiconductors
int * Width,
int * Height,
unsigned long * Physical,
int * Stride,
int * BitsPerPixel
);
Parameters:
Display [in] A pointer to the EGL native display instance created by fbGetDisplay.
Width [out] A pointer to the location that contains the width of the display.
Height [out] A pointer to the location that contains the height of the display.
Physical [out] A pointer to the location that contains the physical start address of the display.
Stride [out] A pointer to the location that contains the stride of the display.
BitsPerPixel [out] A pointer to the location that contains the pixel depth of the display.
fbDestroyDisplay
Description:
This function is used to destroy a display.
Syntax:
void
fbDestroyDisplay (
EGLNativeDisplayType Display
);
Parameters:
Display [in] Pointer to EGL native display instance created by fbGetDisplay.
fbCreateWindow
Description:
This function is used to create a window for the framebuffer platform with the specified position and size. If
width/height is 0, it uses the display width/height as its value.
Note: When either window X + width or the Y + height is larger than the displays width or height respectively, the
API reduces the window size to force the whole window inside the display screen limits. To avoid reducing the
window size in this scenario, users can set a value of 1 to the environment variable FB_IGNORE_DISPLAY_SIZE.
Syntax:
EGLNativeWindowType
fbCreateWindow (
EGLNativeDisplayType Display,
int X,
int Y,
int Width,
int Height
i.MX Graphics User’s Guide, Rev. 0, 02/2018
40 NXP Semiconductors
);
Parameters:
Display [in] Pointer to EGL native display instance created by fbGetDisplay.
X [in] Specifies the initial horizontal position of the window.
Y [in] Specifies the initial vertical position of the window.
Width [in] Specifies the width of the window.
Height [in] Specifies the height of the window in device units.
Return Value:
The function returns a pointer to the EGL native window instance if successful; otherwise, it returns a NULL
pointer.
fbGetWindowGeometry
Description:
This function is used to get window position and size information.
Syntax:
void
fbGetWindowGeometry (
EGLNativeWindowType Window,
int * X,
int * Y,
int * Width,
int * Height
);
Parameters:
Window [in] Pointer to EGL native window instance created by fbCreateWindow.
X [out] Pointer that receives the horizontal position value of the window.
Y [out] Pointer that receives the vertical position value of the window.
Width [out] Pointer that receives the width value of the window.
Height [out] Pointer that receives the height value of the window.
fbGetWindowInfo
Description:
This function is used to get window position and size and address information.
Syntax:
void
fbGetWindowInfo (
EGLNativeWindowType Window,
int * X,
int * Y,
int * Width,
int * Height
int * BitsPerPixel,
i.MX Graphics User’s Guide, Rev. 0, 02/2018
41 NXP Semiconductors
unsigned int * Offset
);
Parameters:
Window [in] A pointer to the EGL native window instance created by fbCreateWindow.
X [out] A pointer to the location that contains the horizontal position value of the window.
Y [out] A pointer to the location that contains the vertical position value of the window.
Width [out] A pointer to the location that contains the width of the window.
Height [out] A pointer to the location that contains the height of the window.
BitsPerPixel [out] A pointer to the location that contains the pixel depth of the window.
Offset [out] A pointer to the location that contains the offset of the window.
fbDestroyWindow
Description:
This function is used to destroy a window.
Syntax:
void
fbDestroyWindow (
EGLNativeWindowType Window
);
Parameters:
Window [in] Pointer to EGL native window instance created by fbCreateWindow.
fbCreatePixmap
Description:
This function is used to create a pixmap of a specific size on the specified framebuffer device. If either the width or
height is 0, the function fails to create a pixmap and return NULL.
Syntax:
EGLNativePixmapType
fbCreatePixmap (
EGLNativeDisplayType Display,
int Width,
int Height
);
Parameters:
Display [in] Pointer to the EGL native display instance created by fbGetDisplay.
Width [in] Specifies the width of the pixmap.
Height [in] Specifies the height of the pixmap.
Return Value:
The function returns a pointer to the EGL native pixmap instance if successful; otherwise, it returns a NULL pointer.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
42 NXP Semiconductors
fbCreatePixmapWithBpp
Description:
This function is used to create a pixmap of a specific size and bit depth on the specified framebuffer device. If
either the width or height is 0, the function fails to create a pixmap and return NULL.
Syntax:
EGLNativePixmapType
fbCreatePixmapWithBpp (
EGLNativeDisplayType Display,
int Width,
int Height
int BitsPerPixel
);
Parameters:
Display [in]A pointer to the EGL native display instance created by fbGetDisplay.
Width [in] Specifies the width of the pixmap.
Height [in] Specifies the height of the pixmap.
BitsPerPixel [in] Specifies the bit depth of the pixmap.
Return Value:
The function returns a pointer to the EGL native pixmap instance if successful; otherwise, it returns a NULL pointer.
fbGetPixmapGeometry
Description:
This function is used to get pixmap size information.
Syntax:
void
fbGetPixmapGeometry (
EGLNativePixmapType Pixmap,
int * Width,
int * Height
);
Parameters:
Pixmap [in] Pointer to the EGL native pixmap instance created by fbCreatePixmap.
Width [out] Pointer that receives a width value for pixmap.
Height [out] Pointer that receives a height value for pixmap.
fbGetPixmapInfo
Description:
This function is used to get pixmap size and depth information.
Syntax:
void
i.MX Graphics User’s Guide, Rev. 0, 02/2018
43 NXP Semiconductors
fbGetPixmapInfo (
EGLNativePixmapType Pixmap,
int * Width,
int * Height
int * BitsPerPixel
int * Stride,
void ** Bits
);
Parameters:
Pixmap [in] A pointer to the EGL native pixmap instance created by fbCreatePixmap.
Width [out] A pointer to the location that contains a width value for pixmap.
Height [out] A pointer to the location that contains a height value for pixmap.
BitsPerPixel [out] A pointer to the location that contains the pixel depth of the pixmap.
Stride [out] A pointer to the location that contains the stride of the pixmap.
Bits [out] A pointer to the location that contains the bit address of the pixmap.
fbDestroyPixmap
Description:
This function is used to destroy a pixmap.
Syntax:
void
fbDestroyPixmap (
EGLNativePixmapType Pixmap
);
Parameters:
Pixmap [in] Pointer to the EGL native pixmap instance created by fbCreatePixmap.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
44 NXP Semiconductors
Chapter 5 OpenCL
5.1 Overview
5.1.1 General description
OpenCL (Open Computing Language) is an open industry standard application programming interface (API) used to
program multiple devices including GPUs, CPUs, as well as other devices organized as part of a single
computational platform. The OpenCL standard targets a wide range of devices from mobile phones, tablets, PCs,
and consumer electronic (CE) devices, all the way to embedded applications such as automotive and image
processing functions. The API takes advantage of all resources in a platform to fully utilize all compute capability
and to efficiently process the growing complexity of incoming data streams from multiple I/O (input/output)
sources. I/O streams can be camera inputs, images, scientific or mathematical data, and any other form of complex
data that can make use of data or task parallelism.
OpenCL uses parallel execution SIMD (single instruction, multiple data) engines found in GPUs to enhance data
computational density by performing massively parallel data processing on multiple data items, across multiple
compute engines. Each compute unit has its own arithmetic logic units (ALUs), including pipelined floating point
(FP), integer (INT) units and a special function unit (SFU) that can perform computations as well as transcendental
operations. The parallel computations and associated series of operations are called a kernel, and the GPU cores
can execute a kernel on thousands of work-items in parallel at any given time.
At a high level, OpenCL provides both a programming language and a framework to enable parallel programming.
OpenCL includes APIs, libraries and a runtime system to assist and support software development. With OpenCL, it
is possible to write general purpose programs that can execute directly on GPUs, without needing to know
graphics architecture details or using 3D graphics APIs like OpenGL or DirectX. OpenCL also provides a low-level
Hardware Abstraction Layer (HAL) as well as a framework that exposes many details of the underlying hardware
layer and thus allows the programmer to take full advantage of the hardware.
For more details on all the capabilities of OpenCL, see the following specifications from the Khronos Group:
OpenCL 1.2 Specification
www.khronos.org/registry/cl/specs/opencl-1.2.pdf
OpenCL 1.2 C++ Bindings Specification
www.khronos.org/registry/cl/specs/opencl-cplusplus-1.2.pdf
5.1.2 OpenCL framework
The OpenCL framework has two principal parts, similar to OpenGL, the host C API and the device C-based language
runtime. The host in OpenCL terminology corresponds to the client in OpenGL and the device corresponds to the
server. Device programs are called kernels. Execution of an OpenCL program is preceded by a series of API calls
that configure the system and GPGPU for execution.
OpenCL abstracts today's heterogeneous architectures using a hierarchical platform model. A host coordinates the
execution and data transfers on, to and from one or several compute devices. Compute devices are comprised of
compute units and each such unit contains an array of processing elements.
5.1.2.1 OpenCL execution model: kernels and work elements
The OpenCL execution model is defined by how the kernels are executed. When a kernel is submitted for
execution by the host, an index space is defined. An instance of the kernel executes for each point in this index
space. This kernel instance is called a work-item. Work-items are identified by their position in the index space
i.MX Graphics User’s Guide, Rev. 0, 02/2018
45 NXP Semiconductors
that provides the global ID for the work-item. Each work-item executes the same code but the specific pathway
through the code and the data operated upon varies by work-item.
Work-items are organized into work-groups. Work-groups provide a broader decomposition of the index space.
Work-groups are each assigned a unique work-group ID with the same dimensionality as the index space used for
the work-items. Work-items are assigned a unique local ID within a work-group so that a single work-item can be
uniquely identified by its global ID or by a combination of its local ID and work-group ID. The work-items in a given
work-group execute concurrently on the same compute device.
The index space supported in OpenCL is called an NDRange. An NDRange is an N-dimensional index space, where N
is one (1), two (2) or three (3). An NDRange is defined by an integer array of length N specifying the extent of the
index space in each dimension starting at an offset index F (zero by default). Each work-item’s global ID and local
ID are N-dimensional tuples. The global ID components are values in the range from F, to F plus the number of
elements in that dimension minus one.
Work-groups are assigned IDs using a similar approach to that used for work-item global IDs. An array of length N
defines the number of work-groups in each dimension. Work-items are assigned to a work-group and given a local
ID with components in the range from zero to the size of the work-group in that dimension minus one. Hence, the
combination of a work-group ID and the local-ID within a work-group uniquely defines a work-item. Each work-
item is identifiable in two ways; in terms of a global index, unique through the whole kernel index space, and in
terms of a local index, unique within a work group.
5.1.2.2 OpenCL command queues
OpenCL provides both task and data parallelism. Data movements are coordinated via command queues which
provide a general means of specifying inter-task relationships and task execution orders that obey the
dependencies in the computation. OpenCL may execute several tasks in parallel, if they are not order dependent.
Tasks are composed of data-parallel kernels which, similarly to shaders, apply a single function to a range of
elements in parallel. Only restricted synchronization and communication is allowed during kernel execution.
OpenCL kernels execute over a 1, 2 or 3 dimensional index space. All work-items execute the same program
(kernel) but their execution may diverge, with branching dependent on the data or their index. For details
regarding how many work groups are allowed within an index space see Using clEnqueueNDRangeKernel.
A kernel or a memory operation is first enqueued onto a command queue. Kernels are executed asynchronously
and the host application execution may proceed right after the enqueue operation. The application may opt to
wait for an operation to complete and an operation (kernel or memory) may be marked with a list of events that
must occur before it executes.
Events are kernel completion and memory operations. OpenCL traverses the dependence graph between the
kernels and memory transfers in a queue and ensures the correct execution order. Multiple command queues may
be constructed, further enhancing parallelism control across platforms and multiple compute devices.
Command-queue barriers are used to control the commands within the command queue. The
command-queue barrier indicates which commands must be finished before proceeding. This allows
for out-of-order command processing. The command queue barrier ensures that all previously
enqueued commands finish execution before any following commands begin execution.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
46 NXP Semiconductors
Figure 3 Command queue barrier
The work-group barrier built-in function provides control of the work-item flow within work-groups. All work-items
must execute the barrier construct before any can continue execution beyond the barrier.
5.1.2.3 OpenCL memory model
The OpenCL memory model is divided into four different types of memory domains. These are:
Global Memory. Each compute device has global memory space which can reside off-chip in system
memory (DRAM) or inside the chip at the L1 or temporary register level. Global memory is accessible to
all work-items executing in a context, as well as to the host (read, write, and map commands).
Constant Memory is also global memory, but it is read-only. Constant memory can be placed in any
level of memory that the application programmer decides, making it an implementation dependent
decision. This is the region for host-allocated and host-initialized objects that are not changed during
kernel execution.
Local Memory. Each compute unit has local memory which resides very near the processing elements.
Access to local memory is very fast and the size of local memory is much smaller than global memory,
making it a scarce resource that needs to be controlled for optimal communication of work-items inside
a work-group. Local memory is specific to a work-group, and is accessible only by work-items belonging
to that work group.
Private Memory. Each processing element has another level of memory called private memory, which is
only accessible to a single work-item. Private memory is specific to a work-item and is not visible to
other work-items.
During run-time, each processing element is assigned a set of on-chip registers that are used for data storage of
intermediate data. Data that cannot be stored in registers spills over to global memory which can be very costly in
i.MX Graphics User’s Guide, Rev. 0, 02/2018
47 NXP Semiconductors
terms of performance and constant data movement to/from temporary registers. Software may emulate local and
private memory using global memory. System Memory is often loaded to L1 cache, Temporary or Local Storage
Registers and the GPGPU reads from those locations. At every level of the application program, the programmer
must be aware of the size and hierarchy of storage elements.
Table 15. Vivante memory structures mapped to Khronos OpenCL memory types
Khronos OpenCL Memory
Model Name
Vivante GPGPU OpenCL Memory
Structures Utilized
Definition
Private Memory
Registers, System Memory
Accessible only to an individual work-item; not visible
to any other work-items
Local Memory
Local Storage Registers, System
Memory
Accessible to all work-items within a specific work-
group; accessible only by work-items belonging to
that work-group
Global Memory
System Memory
Accessible to all-work-items executing in a context, as
well as to the host (read, write, and map commands).
Constant Memory
Constant Registers, System
Memory
Read only global memory region for host-allocated
and initialized objects that are not changed during
kernel execution
Host (CPU) Memory
Host Memory
Region for a kernel application’s program data and
structures
The OpenCL concurrent-read /concurrent-write (CRCW) memory model has so-called relaxed consistency which
means that different work-items may see a different view of global memory as the computation proceeds. Within
individual work-items reads and writes to all memory spaces are ordered. Synchronization between work-items in
a work-group is necessary to ensure consistency. No mechanism for synchronization between work-groups is
provided. Such a model assures parallel scalability by requiring explicit synchronization and communication.
For the highest throughput and computational speed, kernels should use high-speed on-chip memories and
registers as much as possible. Instruction control flow and memory operations, including data gathering /
scattering and direct memory access (DMA) should be automatically reorganized / re-ordered depending on data
dependencies detected by the optimized compiler. The Vivante OpenCL compiler automatically maps
dependencies and re-orders instructions for the best performance.
5.1.2.4 Host to GPGPU compute device data transfers
The application running on the host uses the OpenCL API to create memory objects in global memory, and to
enqueue memory commands that operate on these memory objects. The host and OpenCL device memory models
are, for the most part, independent of each other. This is by necessity as the host is defined outside of OpenCL.
They do, however, at times need to interact. This interaction occurs in one of two ways: by explicitly copying data
from the host to the GPU compute device memory, or implicitly, by mapping and unmapping regions of a memory
object.
Explicit using clEnqueueReadBuffer and clEnqueueWriteBuffer (clEnqueueReadImage,
clEnqueueWriteImage.)
To copy data explicitly, the host enqueues commands to transfer data between the memory object and
host memory. These memory transfer commands may be blocking or non-blocking. The OpenCL function
call for a blocking memory transfer returns once the associated memory resources on the host can be
i.MX Graphics User’s Guide, Rev. 0, 02/2018
48 NXP Semiconductors
safely reused. For a non-blocking memory transfer, the OpenCL function call returns as soon as the
command is enqueued regardless of whether host memory is safe to use.
Implicit using clEnqueueMapBuffer and clEnqueueUnMapMemObject.
The mapping/unmapping method of interaction between the host and OpenCL memory objects allows
the host to map a region from the memory object into its address space. The memory map command may
be blocking or non-blocking. Once a region from the memory object has been mapped, the host can read
or write to this region. The host unmaps the region when accesses (reads and/or writes) to this mapped
region by the host are complete.
The OpenCL specification does not explicitly state where each memory space will be mapped to on
individual implementations. This provides great freedom for vendors on the one hand and some
uncertainty for programmers on the other. Fortunately, kernels may be compiled just-in-time and
possible differences may be tackled during run-time.
When using these interfaces, it is important to consider the amount of copying involved to/from system
memory and the various levels within the compute device(s). There is a two-copy process: between host
and AXI (or SoC internal bus), and between AXI (or SoC internal bus) and the Vivante GPGPU compute
device. Double copying lowers overall system memory bandwidth and lowers performance. Because of
variations in system architecture (both internal and external/memory), there is sometimes a large
performance delta between the system or calculated GFLOPS and the kernel or GPGPU GFLOPS. GPGPU
GFLOPS are based on the theoretical computational capability of the ALUs within the GPGPU, assuming
the system architecture can deliver full data to the GPGPU. OpenCL APIs for buffers and images aid in
avoiding double copy by allowing the mapping of host memory to device memory. With proper memory
transfer management and the use of host/CPU memory remapped to the GPGPU memory space, copying
between host memory and GPGPU memory can be skipped so data transfer becomes a one-copy process.
The trade-off is that the programmer needs to be mindful of page boundaries and memory alignment
issues.
5.1.3 OpenCL profiles
In addition to Full Profile, the OpenCL specification also includes an Embedded Profile, which relaxes the OpenCL
compliance requirements for mobile and embedded devices. The main commons and differences between OpenCL
1.1/1.2 EP (Embedded Profile) and FP (Full Profile) come down to:
Commons:
Both EP and FP significantly offload the CPU of parallel, multi-threaded tasks.
For both EP and FP double precision and half-precision floating point are optional.
Difference:
Full Profile is for highly complex, accurate, and real time computations, while Embedded Profile is a
small subset targeting smaller devices (handheld, mobile, embedded) that perform GPGPU/OpenCL
processing with relaxed data type and precision requirements (image processing, augmented reality,
gesture recognition, and more).
64-bit integers are required for FP and optional for EP.
EP requires either RTZ or RTE. FP requires both.
Computational precision (units in the last place; i.e., ULP) requirements in EP are relaxed.
Atomic instruction support is not required in EP.
3D Image support is not required in EP.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
49 NXP Semiconductors
Minimum requirements for constant buffer size, object allocation size, constant argument counts and
local memory sizes are scaled down in EP.
And more (in general EP is a scaled down version of FP).
Die size and power increase with FP because of the higher requirements, features and memory sizes.
5.1.4 Vivante OpenCL embedded compatible IP
As of the date of this document, select Vivante GPGPU cores are compatible with OpenCL Embedded Profile
version 1.1. Hardware capability deltas include:
Table 16. Vivante OpenCL embedded profile hardware
Hardware and revision
GC2000
Feature
5.1.0.rc8a
Compute Devices (GPGPU cores)
1
Compute Units per device (Shader cores)
4
Processing Elements per compute unit
4
Profile
Embedded
Preferred work-group/thread group size
16
Max count global work-items each dim
64K
Max count of work-items each dim per work-group
1K
Local Storage Registers On-chip
64
Instruction Memory
512
Texture Samplers
8 PS + 4 VS
Texture Samplers available to OCL
(HW, unlimited via SW)
4
L1 Cache Size
4 KB
L1 Cache Banks
1
L1 Cache Sets/Bank
4
L1 Cache Ways/Set
16
L1 Cache Line Size
64B
L1 Cache MC ports
1
5.1.5 Vivante OpenCL full profile hardware model
As of the date of this document, select Vivante GPGPU cores are compatible with OpenCL Full Profile version 1.2.
Hardware capability deltas are subject to change and includes:
Table 17. Vivante OpenCL full profile hardware
Hardware and revision
GC2000+
GC7000XSVX
GC7000L
i.MX SOC
i.MX 6QuadPlus, i.MX 6DualPlus
i.MX 8QuadMax
i.MX 8 MQuad
Compute Devices (GPGPU
cores)
1
1
1
Compute Units per device
(Shader cores)
4
8
1
i.MX Graphics User’s Guide, Rev. 0, 02/2018
50 NXP Semiconductors
Processing Elements per
compute unit
4
32
16
Profile
Full-Lite*
Full
Full
Preferred work-group/ thread
group size
16
32
8
Max count global work-items
each dim
(if 3D only 1 dim can be up to
4G, the others 64K)
4 G/64 K
4 G/64 K
4 G/64 K
Max count of work-items each
dim per work-group
1 K
1 K
1K
Local Storage Registers On-chip
0
2048 (32 K)
16K
Instruction Memory
I$:512/1 M
I$:512/1 M
I$:512/1 M
Texture Samplers
32
32
32
Texture Samplers available to
OCL
32
32
32
L1 Cache Size
4 KB
64 KB
16K
L1 Cache Banks
2
4
2
L1 Cache Sets/Bank
2
8
8
L1 Cache Ways/Set
16
16
8
L1 Cache Line Size
64 B
64 B
64B
L1 Cache MC ports per GPGPU
core
2
2
2
i.MX Graphics User’s Guide, Rev. 0, 02/2018
51 NXP Semiconductors
5.2 Vivante OpenCL implementation
5.2.1 OpenCL pipeline
Figure 4 Vivante OpenCL data pipeline for an OpenCL compute device
i.MX Graphics User’s Guide, Rev. 0, 02/2018
52 NXP Semiconductors
Figure 5 Vivante OpenCL compute device showing memory scheme
5.2.2 Front end
The front end passes the instructions and constant data as State Loads to the OpenCL Compute Unit (Shader)
block. State Loads program instructions and constant data and work groups initiate execution on the instructions
and the constants loaded.
5.2.3 The OpenCL compute unit
All OpenCL executions occur in this block and all work-groups in a compute unit should belong to the same kernel.
Threads from a work-group are grouped into internal “Thread-groups”. All the threads in a thread-group execute in
parallel. Barrier instruction is supported to enforce synchronization within a work-group.
The compute unit contains Local Memory and the L1 Cache and is where the Load/Store instruction to access
global memory originates. The compute unit can accommodate multiple work-groups (based on the temporary
register and local memory usage) simultaneously.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
53 NXP Semiconductors
5.2.4 Memory hierarchy
Figure 6 OpenCL memory hierarchy
5.3 Optimization for OpenCL embedded profile
OpenCL EP (Embedded Profile) is basically a scaled down version of OpenCL FP(Full Profile) and thus may require
extra optimization. The guidelines below help with the optimization of Vivante OpenCL Embedded Profile GPGPU
cores.
When optimizing code on Vivante hardware, it is important to remember a few key points to get the best
performance from the hardware:
Take advantage of algorithm and data parallelism
Choose the correct execution configuration (more details below)
Overlap memory transfer from different levels of the OpenCL memory hierarchy with simultaneous
thread execution
Maximize memory bandwidth and minimize data transfers (large transfers are more beneficial than many
smaller transfers because of the impact of latency)
Maximize instruction throughput and minimize instruction count
i.MX Graphics User’s Guide, Rev. 0, 02/2018
54 NXP Semiconductors
5.3.1 Using preferred multiple of work-group size
The work-group size should be a multiple of the thread group size, otherwise some threads remain idle and the
application does not fully utilize all the compute resources. For example, if the work-group size is 8 and the Vivante
core supports 16, only half the compute resources are used. For example, in some early Vivante GPGPU revisions,
the work-group size limit is 192 and the thread group size is 16. See the Overview section on OpenCL Compatible IP
for IP-specific capabilities.
5.3.2 Using multiple work-groups of reduced size
Multiple work groups need to be set to reduce synchronization penalties. To prevent stalls at barriers, it is
recommended to have at least four (4) work-groups to keep the cores busy or as long as the number of work-
groups is greater than or equal to two (2). One work-group is very inefficient; four or more is preferred and helps
avoid latency.
5.3.3 Packing work-item data
It is important to pack data to extract the optimal performance from the SIMD ALU hardware and align the data
into a format supported by the hardware. Efficient use of the Vivante GPGPU core requires that the kernel
contains enough parallelism to fill all four vector units. Work-items in the same thread group have the same
program counter and execute the same instruction for each cycle. Whenever possible, pack together work-items
that follow the same direction (e.g., on branches) since the granularity is very close and there may be less
divergence and higher performance. If each work-item handles less than or equal to 8 bytes, it is better to combine
two or more work-items into one to improve utilization of the SIMD ALU.
5.3.4 Improving locality
If the input data is an array-of-structs, and each work-item needs to access only a small part of the struct across
many array elements at different stages, it may be better to convert and use a struct-of-arrays or several different
arrays as input to improve data locality and avoid cache thrashing.
If each work-item needs to process a row of data without sharing any data with other work-items, it is better to
check if the algorithm can be converted to make each work-item process a column of data so that data accessed by
adjacent work-items can share the same cache lines.
5.3.5 Minimizing use of 1 KB local memory
The OpenCL Embedded Profile specification defines the minimum requirement for local memory to be 1KB to pass
conformance testing. Based on algorithm analysis and profiling different image and computer vision algorithms,
we found that a 1KB local memory size was too small to benefit those algorithms. In most instances, those
algorithms actually slowed down when using 1KB local memory. To increase performance, we recommend not
using local memory since it is more efficient to transfer larger chunks of data from system memory to keep the
OpenCL pipeline full.
Note: if local memory type is CL_GLOBAL, the local memory is emulated using global memory, and the
performance is the same as global memory. There is extra overhead on data copy from global to local, which slows
down the performance.
5.3.6 Using 16 byte memory Read/Write size
When accessing memory, it is important to minimize the read/write count and to ensure L1 cache utilization is high
to reduce outstanding read/write requests. Since the internal GPGPU read-write-request queue has a limit, if the
queue and L1 cache are filled, then the GPGPU remains idle.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
55 NXP Semiconductors
5.3.7 Useing _RTZ rounding mode
Wherever possible, use _RTZ (round to zero) since it is natively supported in hardware with one instruction.
Support for _RTE (round to nearest even) is optional in OpenCL EP and is only supported in Vivante GPGPU EP
hardware from 2013. This function is handled in software for EP cores if necessary.
5.3.8 Using native functions
5.3.8.1 Using native_function() for increased performance
There are two types of runtime math libraries available to developers. Native_function() and regular function().
Function(): slower, computationally expensive, higher instruction count, and greater accuracy
Native_function(): faster, computationally inexpensive, lower instruction count (sometimes reduced to
one instruction), and lower accuracy.
If accuracy is not important but speed/performance is, use native math functions that map directly to the
Vivante GPGPU hardware.
For image processing computations that do not require high accuracy, use native instructions to significantly lower
the instruction count and speed up performance. Based on actual analysis and performance profiling with the
Vivante GPGPU, we found that using native_function() instructions such as sin, cos, etc., reduces the instruction
count from many instructions to one or two instructions. Use of native functions also sped performance by 3x-10x.
5.3.8.2 Using native_divide and native_reciprocal for faster floating point calculations
There are two use cases for floating point division which a user can select:
Normal use of the division operator ( / ) in OpenCL has high precision and covers all corner use cases. This
operator generates more instructions and runs slower.
Native Divide: this use case uses the built-in function native_divide or native_reciprocal, which uses what
the hardware supports. The Vivante OpenCL compiler generates one or two instructions for each native_divide or
native_reciprocal instruction. If there are no corner use cases in applications, such as NaN, INF, or (2^127) /
(2^127), it is better to use native_divide since it is faster.
5.3.8.3 Using compile option for native functions
Both the function() and native_function() methods are supported in the Vivante GPGPUs, so it is up to the
developer to use whichever method makes sense for their application. If the OpenCL program uses the standard
division operator and a developer wants to use native_divide or native_reciprocal without modifying their
program, the Vivante OpenCL compiler has a simple option “-cl-fast-relaxed-math” that uses native built-in
functions during compilation.
5.3.9 Using buffers instead of images
For the following image functions, it is better to use buffers instead of images.
read_image{f/i/ui/h}
write_image{f/i/ui/h}
Write_image* functions are implemented by software; it is better to use buffers to reduce the additional overhead
involved in checking for size, format, etc. Since a few formats are not supported by Vivante GPGPU hardware,
some built-in read_image() functions are implemented in software. The software implementation uses more
instructions with many steps of “condition” checking. To improve performance, we recommend using buffers since
it reduces instruction count.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
56 NXP Semiconductors
5.4 OpenCL Debug messages
When writing OpenCL applications, it is important to check the code returned by the API. Since the return codes
specified in the OpenCL specification may not be descriptive enough to isolate where the problem is located, the
Vivante OpenCL driver provides an environment variable, VIV_DEBUG to help debug problems. When VIV_DEBUG
is set to -MSG_LEVEL:ERROR, the Vivante OpenCL driver prints onscreen error messages as well as return the error
code to the caller.
The following error code descriptions and suggested workarounds are provided.
5.4.1 OCL-007005: (clCreateKernel) cannot link kernel
One of the following “Not Enough” messages usually precedes this message. Issuer indicates the real reason for
the problem which may be:
Not Enough Register Memory (constant or temp)
Not Enough Instruction Memory
5.4.2 Not enough register memory
Local variables, including arrays, are implemented using temp registers. If an array is larger than then number of
available temp registers, a link-time failure occurs.
WORKAROUNDS:
1. If the array size is more than 64, use an array address to force the compiler to use private memory instead
of temp registers.
2. If there are many variables, use variable addresses to force the compiler to use private memory to reduce
register usage.
Note that there is performance degradation when using private memory instead of registers. It is better to change
the algorithm to use a smaller array or less variables.
5.4.3 Not enough instruction memory
WORKAROUNDS:
1. Replace sin/cos/tan/divide/powr/exp/exp2/exp10/log/log2/log10/sqrt/rsqrt/recip with
native_sin/native_divide, etc.
2. Convert unrolled-loops back to loops.
3. Use buffer instead of image for write, and for reads which are not linear-filtered.
4. If the program is just too long, it should be split into two or more programs with intermediate data saved
from one program to next.
5.4.4 GlobalWorkSize over hardware limit
WORKAROUND:
1. Split one clEnqueueNDRangeKernel into several instances. Change the kernel source to compute real
global/local/group ID using offset as a parameter.
2. Convert one dimension to two dimensions, or two dimensions to three. For example, one dimension of
1M work-items can be converted to a GlobalWorkSize of 64K x16 work-items. The kernel function needs
modification to reflect the change of dimension.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
57 NXP Semiconductors
Chapter 6 OpenVX Introduction
6.1 Overview
OpenVX is a low-level programming framework domain to enable software developers to efficiently access
computer vision hardware acceleration with both functional and performance portability. OpenVX has been
designed to support modern hardware architectures, such as mobile and embedded SoCs as well as desktop
systems. Many of these systems are parallel and heterogeneous: containing multiple processor types including
multi-core CPUs, DSP subsystems, GPUs, dedicated vision computing fabrics as well as hardwired functionality.
Additionally, vision system memory hierarchies can often be complex, distributed, and not fully coherent. OpenVX
is designed to maximize functional and performance portability across these diverse hardware platforms, providing
a computer vision framework that efficiently addresses current and future hardware architectures with minimal
impact on applications.
OpenVX defines a C Application Programming Interface (API) for building, verifying, and coordinating graph
execution, as well as for accessing memory objects. The graph abstraction enables OpenVX implementers to
optimize the execution of the graph for the underlying acceleration architecture.
OpenVX also defines the vxu utility library, which exposes each OpenVX predefined function as a directly callable C
function, without the need for first creating a graph. Applications built using the vxu library do not benefit from the
optimizations enabled by graphs; however, the vxu library can be useful as the simplest way to use OpenVX and as
first step in porting existing vision applications.
For more details of programming with OpenVX, see the following specification from Khronos Group,
OpenVX 1.0.1 specification (https://www.khronos.org/registry/vx ).
6.2 Designing framework of OpenVX
6.2.1 Software landscape
OpenVX (OVX) is intended to be used either directly by applications or as the acceleration layer for higher-level
vision frameworks, engines or platform APIs. Vivante software includes VX (Vision Imaging Accelleration) control
mechanisms for hardware accelerated vision imaging, therefy allowing the user to implement customized
applications and drivers using the Vivantespecific Vivante VX API (Application Programming Interface). This API
provides programmable user kernel extensions for OpenCL 1.2 and provides additional Vision functionality to
supplement those currently available with OpenVX 1.0.1 open standard from the Khronos group.
Figure 7 OVX usage overview
6.2.2 Object-oriented behaviors
OpenVX objects are both strongly typed at compile-time for safety critical applications and are strongly typed at
i.MX Graphics User’s Guide, Rev. 0, 02/2018
58 NXP Semiconductors
run-time for dynamic applications.
The objects of OVX framework are:
Context, The OpenVX context is the object domain for all OpenVX objects.
Kernel, A Kernel in OpenVX is the abstract representation of a computer vision function, such as a “Sobel
Gradient” or “Lucas Kanade Feature Tracking”.
Parameter, an abstract input, output, or bidirectional data object passed to a computer vision function.
Node, A node is an instance of a kernel that will be paired with a specific set of references (the parameters).
Graph, A set of nodes connected in a directed (only goes one-way) acyclic (does not loop back) fashion.
OpenVX Data Objects:
Array, An opaque array object that could be an array of primitive data types or an array of structures.
Convolution, An opaque object that contains MxN matrix of vx_int16 values. Also contains a scaling factor
for normalization.
Delay, An opaque object that contains a manually controlled, temporally-delayed list of objects.
Distribution, An opaque object that contains a frequency distribution (e.g., a histogram).
Image, An opaque image object that may be some format in vx_df_image_e.
LUT, An opaque lookup table object used with vxTableLookupNode and vxuTableLookup
Matrix, An opaque object that contains MxN matrix of some scalar values.
Pyramid, An opaque object that contains multiple levels of scaled vx_image objects.
Remap, An opaque object that contains the map of source points to destination points used to transform
images.
Scalar, An opaque object that contains a single primitive data type.
Threshold, An opaque object that contains the thresholding configuration.
Error objects of OVX:
Error objects are specialized objects that may be returned from other object creator functions when serious
platform issue occur (i.e., out of memory or out of handles). These can be checked at the time of creation of these
objects, but checking also may be put-off until usage in other APIs or verification time, in which case, the
implementation must return appropriate errors to indicate that an invalid object type was used.
6.2.3 Graphs concepts
The graph is the central computation concept of OpenVX. The purpose of using graphs to express the Computer
Vision problem is to allow for the possibility of any implementation to maximize its optimization potential because
all the operations of the graph and its dependencies are known ahead of time, before the graph is processed.
Graphs are composed of one or more nodes that are added to the graph through node creation functions. Graphs
in OpenVX must be created ahead of processing time and verified by the implementation, after which they can be
processed as many times as needed.
There are several nodes in a graph, which are responsible for independent computation. One node can be linked to
another by data dependencies.
6.2.4 User kernels
OpenVX allows users to define new functions that can be excuted as Nodes from inside Graph or are Graph
internal. Users will benefit from this mode,
Exploiting
Allow componentized functions to be reused elsewhere in OpenVX
Formalize strict verification requirements (i.e., Contract Programming).
i.MX Graphics User’s Guide, Rev. 0, 02/2018
59 NXP Semiconductors
Figure 8 Graph and user kernel usage
6.3 OpenVX extension implementation
VeriSilicon’s VX Extensions for Vision Imaging provide additional functionality for Vision Image processing beyond
the functions provided through the Khronos Group OpenVX API version 1.0.1. These enhancements take
advantage of the enhanced Vision capabilities available in VeriSilicon’s Vision-capable hardware. VeriSilicon
software provides a set of extensions which interface with OpenCL 1.2 and support higher level C language
programming of VeriSilicon’s custom EVIS (Enhanced Vision Instruction Set).
The VeriSilicon VX extension and enhancements includes three major components:
An API level interface to the EVIS (Enhanced Vision Instruction Set),
Extended C language features for Vision Processing,
Supported for a subset of Vision-compatible OpenCL built-in functions.
6.3.1 Hardware requirements
Initial VeriSilicon cores with Vision Imaging hardware capabilities include:
GC7000XSVX (available in i.MX 8QuadMax)
6.3.2 EVIS instruction interface
Vivante’s Vision Imaging capable IP have an Enhanced Vision Instruction Set (EVIS), which enhances the ability of
the GPU or VIP (Vision Image Processor) to process complex vision operations. A single EVIS instruction can do a
task which may require tens or even hundreds of normal ISA instructions to finish.
Table 18 shows the instructions supported as Intrinsic calls.
6.3.3 Extended language features
Vivante’s OpenVX C programming Language corresponds closely to the OpenCL C programming language.
Vivante’s C language extensions for OpenVX C share many language facilities with OpenCL C 1.2.
However, it can be considered a subset of OpenCL C 1.2, as it does not include OCL features which are
useless for OpenVX and other Vision Imaging applications.
Vivante’s OpenVX C includes specific language facilities like Vision built-ins and data types specific for
OpenVX.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
60 NXP Semiconductors
Table 18. OPCODE EVIS instructions supported as intrinsic calls
EVIS OP_CODE
Description
Supported by
Vivante VX
ABS_DIFF
Absolute difference between two values
Y
IADD
Adds two or three integer values
Y
IACC_SQ
Squares a value and adds it to an accumulator
Y
LERP
Linear interpolation between two values
Y
FILTER
Performs a filter on a 3x3 block
Y
MAG_PHASE
Computes magnitude and phase of 2 packed data values
Y
MUL_SHIFT
Multiples two 8-or 16-bit integers and shifts
Y
DP16X1
1 Dot Product from 2 16 component values
Y
DP8X2
2 Dot Products from 2 8 component values
Y
DP4X4
4 Dot Products from 2 4 component values
Y
DP2X8
8 Dot Products from 2 2 component values
Y
CLAMP
Clamps up to 16 values to a max or min value
Y
BI_LINEAR
Computes a bi0linear interpolation of 4 pixel values
Y
SELECT_ADD
Adds a pixel value or increments a counter inside bins
Y
ATOMIC_ADD
Adds a valid atomically to an address
Y
BIT_EXTRACT
Extracts up to 8 bitfields from a packed stream
Y
BIT_REPLACE
Replaces up to 8 bitfields from a packed stream
Y
DP32X1
1 Dot Product from 2 32 component values
Y
DP16X2
2 Dot Products from 2 16 component values
Y
DP8X4
4 Dot Products from 2 8 component values
Y
DP4X8
8 Dot Products from 2 4 component values
Y
DP2X16
16 Dot Products from 2 2 component values
Y
6.3.4 Packed types
Vivante’s OpenCL compiler implements OpenCL C signed and unsigned char and short types in an unpacked
format, such that a normal char4 occupies 128 bits (4 32-bit registers). This is undesirable for Vision applications,
where packed data is the “natural” layout for almost all operations. To fully utilize the computing power of EVIS
instructions, Vivante VX includes additional packed types, which can be identified by their vxc_ prefix.
/* packed char2/4/8/16 */
typedef _viv_char2_packed vxc_char2;
typedef _viv_char4_packed vxc_char4;
typedef _viv_char8_packed vxc_char8;
typedef _viv_char16_packed vxc_char16;
/* packed uchar2/4/8/16 */
typedef _viv_uchar2_packed vxc_uchar2;
typedef _viv_uchar4_packed vxc_uchar4;
typedef _viv_uchar8_packed vxc_uchar8;
typedef _viv_uchar16_packed vxc_uchar16;
/* packed short2/4/8 */
typedef _viv_short2_packed vxc_short2;
typedef _viv_short4_packed vxc_short4;
i.MX Graphics User’s Guide, Rev. 0, 02/2018
61 NXP Semiconductors
typedef _viv_short8_packed vxc_short8;
/* packed ushort2/4/8 */
typedef _viv_ushort2_packed vxc_ushort2;
typedef _viv_ushort4_packed vxc_ushort4;
typedef _viv_ushort8_packed vxc_ushort8;
6.3.5 Initializing constants on load
Constant data in OpenCL requires compile-time initialization. There is also a need to initialize the data when the
kernel is loaded/run, so that the application can control the behavior of a program by changing its constants at
load-time. The VeriSilicon VX extended keyword _viv_uniform can be used to define load-time initialization
constant data,
_viv_uniform vxc_512bits u512;
An application using VeriSilicon VX needs to set the proper values for _viv_uniform before the kernel program is
run.
6.3.6 Inline assembly
A packed type cannot be used as an unpacked type in expressions or built-in functions. The programmer needs to
convert packed type data to unpacked type data in order to perform these operations. The conversion negatively
impacts performance in terms of both instruction count and register usage, so it is desirable to perform operations
directly on packed data whenever possible. The Vivante Vision compiler accepts inline assembly for a wide range of
operations to speed up packed data calculations.
For example, to add two packed char16 data, the programmer can use following inline assembly:
vxc_uchar16 a, b, c;
vxc_short8 b;
_viv_uniform vxc_512bits u512;
...
_viv_asm(ADD, c, a, b); /* c = a + b; */
where the syntax of inline assembly is:
_viv_asm(
OP_CODE,
dest,
source0,
source1
);
Table 19 lists the standard shader instructions that operate on packed data and are supported through inline
assembly, keyword _viv_asm.
Table 19. OPCODES IR instructions supported by inline assembly
IR OP_CODE Instruction
Description
Supported by Vivante VX
ABS
Absolute value
Y
ADD
Add
Y
ADD_SAT
Integer add with saturation
Y
AND_BITWISE
Bitwise AND
Y
BIT_REVERSAL
Integer bit-wise reversal
ES31
BITEXTRACT
Extract Bits from src to dest
ES31
i.MX Graphics User’s Guide, Rev. 0, 02/2018
62 NXP Semiconductors
BITINSERT
Bit replacement
ES31
BITSEL
Bitwise Select
Y
BYTE_REVERSAL
Integer byte-wise reversal
ES31
CLAMP0MAX
clamp0max dest, value, max
Y
CMP
Compare each component
Y
CONV
Convert
Y
DIV
Divide
Y
FINDLSB
Find least significant bit
ES31
FINDMSB
Find most significant bit
ES31
LEADZERO
Detect Leading Zero
Y
LSHIFT
Left Shifter
Y
MADSAT
Integer multiple and add with saturation
Y
MOD
Modulus
Y
MOV
Move
Y
MUL
Multiply
Y
MULHI
Integer only
Y
MULSAT
Integer multiply with saturation
Y
NEG
neg(a) is similar to (0 - (a))
Y
NOT_BITWISE
Bitwise NOT
Y
OR_BITWISE
Bitwise OR
Y
POPCOUNT
Population Count
ES31/OCL1.2
ROTATE
Rotate
Y
RSHIFT
Right Shifter
Y
SUB
Substract
Y
SUBSAT
Integer subtraction with saturation
Y
XOR_BITWISE
Bitwise XOR
Y
*ES31 = Supported by VivanteVX, but may not be needed for Vision processing
6.4 OpenCL functions compatible with Vivante vision
Vivante’s VX extensions for Vision Image processing support most of the OpenCL 1.2 built-in functions for normal
OCL data types. Packed types are not supported in these built-in functions.
For image read/write functions, only sample-less 1D/1D array/2D image read/write functions are supported.
6.4.1 Read_Imagef,i,ui
/* OCL image builtins can be used in VX kernel */
float4 read_imagef (image2d_t image, int2 coord);
int4 read_imagei (image2d_t image, int2 coord);
uint4 read_imageui (image2d_t image, int2 coord);
float4 read_imagef (image1d_t image, int coord);
int4 read_imagei (image1d_t image, int coord);
i.MX Graphics User’s Guide, Rev. 0, 02/2018
63 NXP Semiconductors
uint4 read_imageui (image1d_t image, int coord);
float4 read_imagef (image1d_array_t image, int2 coord);
int4 read_imagei (image1d_array_t image, int2 coord);
uint4 read_imageui (image1d_array_t image, int2 coord);
6.4.2 Write_Imagef,i,ui
void write_imagef (image2d_t image, int2 coord, float4 color);
void write_imagei (image2d_t image, int2 coord, int4 color);
void write_imageui (image2d_t image, int2 coord, uint4 color);
void write_imagef (image1d_t image, int coord, float4 color);
void write_imagei (image1d_t image, int coord, int4 color);
void write_imageui (image1d_t image, int coord, uint4 color);
void write_imagef (image1d_array_t image, int2 coord, float4 color);
void write_imagei (image1d_array_t image, int2 coord, int4 color);
void write_imageui (image1d_array_t image, int2 coord, uint4 color)
6.4.3 Query Image Dimensions
int2 get_image_dim (image2d_t image);
size_t get_image_array_size(image1d_array_t image);
/* Built-in Image Query Functions */
int get_image_width (image1d_t image);
int get_image_width (image2d_t image);
int get_image_width (image1d_array_t image);
int get_image_height (image2d_t image);
6.4.4 Channel Data Types Supported
/* Return the channel data type. Valid values are:
* CLK_SNORM_INT8
* CLK_SNORM_INT16
* CLK_UNORM_INT8
* CLK_UNORM_INT16
* CLK_UNORM_SHORT_565
* CLK_UNORM_SHORT_555
* CLK_UNORM_SHORT_101010
* CLK_SIGNED_INT8
* CLK_SIGNED_INT16
* CLK_SIGNED_INT32
* CLK_UNSIGNED_INT8
* CLK_UNSIGNED_INT16
* CLK_UNSIGNED_INT32
* CLK_HALF_FLOAT
* CLK_FLOAT
*/
int get_image_channel_data_type (image1d_t image);
int get_image_channel_data_type (image2d_t image);
int get_image_channel_data_type (image1d_array_t image);
6.4.5 Image Channel Orders Supported
/* Return the image channel order. Valid values are:
i.MX Graphics User’s Guide, Rev. 0, 02/2018
64 NXP Semiconductors
* CLK_A
* CLK_R
* CLK_Rx
* CLK_RG
* CLK_RGx
* CLK_RA
* CLK_RGB
* CLK_RGBx
* CLK_RGBA
* CLK_ARGB
* CLK_BGRA
* CLK_INTENSITY
* CLK_LUMINANCE
*/
int get_image_channel_order (image1d_t image);
int get_image_channel_order (image2d_t image);
int get_image_channel_order (image1d_array_t image);
i.MX Graphics User’s Guide, Rev. 0, 02/2018
65 NXP Semiconductors
Chapter 7 Vulkan
7.1 OverView
Vulkan is a new generation graphics and compute API that provides high-efficiency, cross-platform access to
modern GPUs used in a wide variety of devices from PCs and consoles to mobile phones and embedded platforms.
Vulkan defines as an API (Application Programming Interface) for graphics and compute hardware. The API consists
of many commands that allow a programmer to specify shader programs, compute kernels, objects, and
operations involved in producing high-quality graphical images, specifically color images of three-dimensional
objects.
To the programmer, Vulkan is a set of commands that allow the specification of shader programs or shaders,
kernels, data used by kernels or shaders, and state controlling aspects of Vulkan outside the scope of shaders.
Typically, the data represents geometry in two or three dimensions and texture images, while the shaders and
kernels control the processing of the data, rasterization of the geometry, and the lighting and shading of fragments
generated by rasterization, resulting in the rendering of geometry into the framebuffer.
A typical Vulkan program begins with platform-specific calls to open a window or otherwise prepare a display
device onto which the program will draw. Then, calls are made to open queues to which command buffers are
submitted. The command buffers contain lists of commands which will be executed by the underlying hardware.
The application can also allocate device memory, associate resources with memory and refer to these resources
from within command buffers. Drawing commands cause application-defined shader programs to be invoked,
which can then consume the data in the resources and use them to produce graphical images. To display the
resulting images, further platform-specific commands are made to transfer the resulting image to a display device
or window.
For more details of programming with Vulkan, refer to the following specification from Khronos Group.
https://www.khronos.org/registry/vulkan/
7.2 Vivante Extension Support for Vulkan
The following table includes a list of all current Vulkan extensions and indicates their support level in Vivante
software.
(list from https://www.khronos.org/registry/vulkan/ as of 5/24/2017)
Note: This list does not include unsupported vendor specific extensions.
Table 20. Vulkan extension
Vulkan Extension Name
SW 6.2.x for Vulkan 1.0
VK_KHR_android_surface
YES
VK_KHR_descriptor_update_template
VK_KHR_display
YES
VK_KHR_display_swapchain
YES
VK_KHR_get_physical_device_properties2
VK_KHR_get_surface_capabilities2
VK_KHR_incremental_present
VK_KHR_maintenance1
VK_KHR_mir_surface
VK_KHR_push_descriptor
VK_KHR_sampler_mirror_clamp_to_edge
VK_KHR_shader_draw_parameters
VK_KHR_shared_presentable_image
VK_KHR_surface
YES
i.MX Graphics User’s Guide, Rev. 0, 02/2018
66 NXP Semiconductors
VK_KHR_swapchain
YES
VK_KHR_wayland_surface
YES
Vulkan Extension Name
SW 6.2.x for Vulkan 1.0
VK_KHR_win32_surface
YES
VK_KHR_xcb_surface
VK_KHR_xlib_surface
EXT Extensions (Multivendor)
VK_EXT_acquire_xlib_display
VK_EXT_debug_marker
VK_EXT_debug_report
YES
VK_KHR_get_surface_capabilities2
VK_KHR_incremental_present
VK_KHR_maintenance1
VK_EXT_direct_mode_display
VK_EXT_discard_rectangles
VK_EXT_display_control
VK_EXT_display_surface_counter
VK_EXT_hdr_metadata
VK_EXT_shader_subgroup_ballot
VK_EXT_shader_subgroup_vote
VK_EXT_swapchain_colorspace
VK_EXT_validation_flags
GOOGLE Extensions (Google, Inc.)
VK_GOOGLE_display_timing
KHX Extensions (full vendor description unavailable)
VK_KHX_device_group
VK_KHX_device_group_creation
VK_KHX_external_memory
VK_KHX_external_memory_capabilities
VK_KHX_external_memory_fd
VK_KHX_external_memory_win32
VK_KHX_external_semaphore
VK_KHX_external_semaphore_capabilities
VK_KHX_external_semaphore_fd
VK_KHX_external_semaphore_win32
VK_KHX_multiview
VK_KHX_win32_keyed_mutex
i.MX Graphics User’s Guide, Rev. 0, 02/2018
67 NXP Semiconductors
Chapter 8 Multiple GPUs and Virtualization
8.1 Overview
Vivante multi-GPU implementations provide a variety of capabilities which can be managed through hardware
and software controls. This chapter intends to summarize the software controls used for Vivante multi-GPU IP
implementations.
Multi-GPU feature can be enabled with dual GC7000XSVX on i.MX 8QuadMax and the derived devices.
8.2 Multi-GPU configurations
Vivante Multi-GPU IP may be configured into one of the following behavior model through SW:
Combined Mode where two (or more) GPU cores in the multi-GPU design behave in concert. Driver presents
multi-GPU to SW application as a single logical GPU. The multiple GPUs work in the same virtual address space
and share the same MMU page table. The multiple GPUs fetch and excute a shared Command Buffer.
Independent Mode where each GPU in the multi-GPU design performs independently. The multiple GPUs work
in different virtual address spaces but share the same MMU page table. Each GPU core fetches and excutes its
own Command Buffer. This enables different SW applications to run simultaneously on different GPU cores.
Note, OpenCL API allows application to handle the multi-GPU Independent Mode directly, as each GPU core in a
multi-GPU design represents an independent OpenCL Compute Device. OpenCL driver does not support the
multi-GPU combined mode.
8.3 GPU affinity configuration
In the multi-GPU Independent Mode, application can specify to run on a specific GPU among the multiple GPUs
through an environment variable VIV_MGPU_AFFINITY. Once an application’s GPU affinity is specified, the
application will only run on the specified GPU and will not migrate to other GPUs even if those GPUs are idle.
VIV_MGPU_AFFINITY is the environment variable to control the application GPU affinity on multi-GPU platform.
The client drivers will assume they are using a standalone GPU through a gcoHARDWARE object no matter how
this variable is set. The possible values for the environment variable VIV_MGPU_AFFINITY include:
Not defined or
Defined as "0" gcoHARDWARE objects work in gcvMULTI_GPU_COMBINED mode (default)
"1:0" gcoHARDWARE objects work in gcvMULTI_GPU_INDEPENDENT mode and GPU0 is used
"1:1" gcoHARDWARE objects work in gcvMULTI_GPU_INDEPENDENT mode and GPU1 is used
On a single GPU device, setting VIV_MGPU_AFFINITY to 0 or 1 does not make any difference as all application
processes/threads are bound to GPU0. But the application will fail the GPU context initialization if
VIV_MGPU_AFFINITY is set to "1:1" (driver reports error).
8.4 OpenCL on multi-GPU device
OpenCL driver works in multi-GPU Independent Mode only. In this configuration, multiple GPUs in the device
operate as individual OpenCL Compute Devices. The OpenCL application is responsible to assign and dispatch the
compute tasks to each GPU (Compute Device).
The following OpenCL APIs return the list of compute devices available on a platform, and the device
information.
cl_int clGetDeviceIDs (cl_platform_id platform, cl_device_type device_type, cl_uint num_entries,
cl_device_id *devices, cl_uint *num_devices)
cl_int clGetDeviceInfo (cl_device_id device, cl_device_info param_name, size_t param_value_size,
void *param_value, size_t *param_value_size_ret)
i.MX Graphics User’s Guide, Rev. 0, 02/2018
68 NXP Semiconductors
8.5 GPU virtualization configuration
Multi-GPU also can be used on different OS systems as independent mode separately, this can be configured by
overriding the irq availability n DTS entry for different OS implementation, in arch/arm64/boot/dts/freescale/fsl-
imx8qmxxx.dts.
Guest OS 1 (GPU0 only)
&gpu_3d1 {
status = "disable";
};
Guest OS 2 (GPU1 only)
&gpu_3d0 {
status = "disable";
};
i.MX Graphics User’s Guide, Rev. 0, 02/2018
69 NXP Semiconductors
Chapter 9 G2D compositor on Weston
9.1 Overview
Wayland is intended as a simpler replacement for X, easier to develop and maintain. GNOME and KDE are
expected to be ported to it.
Wayland is a protocol for a compositor to talk to its clients as well as a C library implementation of that protocol.
The compositor can be a standalone display server running on Linux kernel modesetting and evdev input devices,
an X application, or a wayland client itself. The clients can be traditional applications, X servers (rootless or
fullscreen) or other display servers.
Part of the Wayland project is also the Weston reference implementation of a Wayland compositor. Weston can
run as an X client or under Linux KMS and ships with a few demo clients. The Weston compositor is a minimal and
fast compositor and is suitable for many embedded and mobile use cases.
This chapter describes how to enable Weston accelerated by G2D APIS. G2D compositor can increase system
bandwidth utilization, so the performance was better than GL compositor in the complex environment, but it still
doesn’t support display rotation and EXT_RESOLVE feature.
9.2 Enabe G2D compositor
9.2.1 Open the file: /etc/default/Weston in the Release image.
# cat /etc/default/weston
#!/bin/sh
OPTARGS="--xwayland"
9.2.2 Add the parameters in the OPTARGS, and disable EXT_RESOLVE feature in compositor.
OPTARGS="xwayland use-g2d=1"
GPU_VIV_EXT_RESOLVE=0
9.2.3 Restart Weston by this command:
# systemctl restart weston
9.2.4 Disable EXT_RESOLVE feature before running the client application.
# export GPU_VIV_EXT_RESOLVE=0
# Weston-simple-egl
i.MX Graphics User’s Guide, Rev. 0, 02/2018
70 NXP Semiconductors
Chapter 10 XServer Video Driver
10.1 EXA driver
XServer video driver is designed to help XServer to render desktop onto a screen. It manages the display driver,
and provides rendering acceleration and other display features, such as rotation and multiple display methods. The
video driver implements XServer EXcellent Architecture (EXA).
10.1.1 EXA driver options
These options are used in the configuration file /etc/X11/xorg.conf:
Section "Device"
Identifier "i.MX Accelerated Framebuffer Device"
Driver "vivante"
Option "fbdev" "/dev/fb1"
Option "vivante_fbdev" "/dev/fb1"
Option "SyncDraw" "false"
EndSection
Table 21. EXA driver options
Option
Meaning
Default Value
Comment
ShadowFB
Whether to enable the
shadow frame buffer (FB).
False
Deprecated technology. It rotates the FB. If
it is enabled, acceleration is disabled.
Rotate
Rotation of FB.
<null>
Deprecated technology. It can be CW/CCW/
UD. If it is set to one of these values,
Shadow FB is automatically enabled.
Rotation cannot change after XServer is
started.
NoAccel
Disables EXA acceleration.
False
If it is set to True, the EXA functions are not
accelerated by the GPU.
VivCacheMem
Pixmap created by GPU is
generally cacheable.
True
Normal Pixmaps are created cacheable.
Special Pixmaps used for EGL are still non-
cacheable.
SyncDraw
Wait for the GPU to
complete for every single
drawing.
False
This affects the performance if it is set to
True.
10.1.2 24 bpp pixmap
The GPU can only accelerate a 16 bpp or 32 bpp pixmap. For a 24 bpp screen, a 32 bpp buffer is actually reserved.
10.1.3 Shared pixmap extension
The Shared Pixmap Extension (SHM) pixmap will be described in next release.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
71 NXP Semiconductors
10.1.4 How to disable XRandR
For an embedded device that does not support XRandR (for which the memory can be reduced), set
gEnableXRandR to False in vivante_fbdev_driver.c.
10.1.5 Cursor
Hardware IPU does not provide a hardware cursor.
10.1.6 DRI
DRI is designed to accelerate OpenGL rendering. It enables the GPU direct render to the on-screen buffer. Due to
the lack of hard cursor support, and because often the window location is not well aligned, the GPU cannot render
to screen directly. Therefore, DRI is not fully used.
DRI is supported in this video driver. DRI2 or DRI3 is not supported.
10.1.7 Tearing
XServer (and early Microsoft Windows OS) does not support double buffering for the screen. There is a copy from
off-screen buffer to target on-screen area (or direct rendering to on-screen). The operation cannot be completed
in the blank time of the display, and the IPU cannot provide an ideal VSYNC signal. Therefore, there is tearing.
To remove tearing, a GLES compositor is needed. This tearing free feature will be described in next release.
10.2 XRandR
This video driver supports XRandR.
The X Resize, Rotate and Reflect Extension (RandR) is an X Window System extension, which allows clients to
dynamically resize, rotate, and reflect the root window of a screen (en.wikipedia.org/wiki/Xrandr).
10.2.1 Useful commands
If the display supports multiple resolution types, use the following commands for a query:
root@imx6qsabresd:~# export DISPLAY=:0.0
root@imx6qsabresd:~# xrandr
Screen 0: minimum 240 x 240, current 1920 x 1080, maximum 8192 x 8192
DISP3 BG connected 1920x1080+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
S:1920x1080p-50 50.0*
S:1920x1080p-60 60.0
S:1280x720p-50 50.0
S:1280x720p-60 60.0
S:720x576p-50 50.0
S:720x480p-60 59.9
V:640x480p-60 60.0
S:640x480p-60 59.9
If using the console serial port for the command line interface, the DISPLAY environment variable is not configured
by default and the xrandr command fails. The solution is to set the DISPLAY environment variable. (Reference: see
manpage for X)
root@imx6qsabresd:~# xrandr
Can't open display
root@imx6qsabresd:~# echo $DISPLAY
i.MX Graphics User’s Guide, Rev. 0, 02/2018
72 NXP Semiconductors
root@imx6qsabresd:~# export DISPLAY=:0.0
root@imx6qsabresd:~# xrandr
Screen 0: minimum 240 x 240, current 1024 x 768, maximum 8192 x 8192
DISP4 BG - DI1 connected 1024x768+0+0 (normal left inverted right x axis y axis) 0mm x
0mm
U:1024x768p-60 60.0*+
Change the resolution:
root@imx6qsabresd:~# xrandr -s 1920x1080
Figure 9 Changing the resolution
i.MX Graphics User’s Guide, Rev. 0, 02/2018
73 NXP Semiconductors
Rotate the screen:
root@imx6qsabresd:~# xrandr -o left:
Figure 10 Rotating the screen
i.MX Graphics User’s Guide, Rev. 0, 02/2018
74 NXP Semiconductors
root@imx6qsabresd:~# xrandr -o right:
Figure 11 Rotating the screen
i.MX Graphics User’s Guide, Rev. 0, 02/2018
75 NXP Semiconductors
root@imx6qsabresd:~# xrandr -o inverted:
Figure 12 Rotating the screen
i.MX Graphics User’s Guide, Rev. 0, 02/2018
76 NXP Semiconductors
Reflect the screen:
root@imx6qsabresd:~# xrandr -x
Figure 13 Reflecting the screen
i.MX Graphics User’s Guide, Rev. 0, 02/2018
77 NXP Semiconductors
root@imx6qsabresd:~# xrandr -y
Figure 14 Reflecting the screen
i.MX Graphics User’s Guide, Rev. 0, 02/2018
78 NXP Semiconductors
Restore to normal state:
root@imx6qsabresd:~# xrandr -o normal:
Figure 15 Restoring to normal state
10.2.2 Rendering the desktop on overlay
/dev/fb1 is the overlay device on the same screen as /dev/fb0; and /dev/fb3 is the overlay of /dev/fb2. Use
xorg.conf to specify fb1 or fb3:
Section "Device"
Identifier "i.MX Accelerated Framebuffer Device"
Driver "vivante"
Option "fbdev" "/dev/fb1"
Option "vivante_fbdev" "/dev/fb1"
EndSection
After rebooting the system, the desktop is rendered on the overlay:
i.MX Graphics User’s Guide, Rev. 0, 02/2018
79 NXP Semiconductors
Figure 16 Rendering the desktop on overlay
If the size is too small (240x240), XRandR can be used to define a new mode.
1. Get the output name:
root@imx6qsabresd:~# xrandr
Screen 0: minimum 240 x 240, current 240 x 320, maximum 8192 x 8192
DISP4 FG connected 240x320+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
U:240x320p-60 60.0*
2. Define a new mode:
root@imx6qsabresd:~# xrandr --newmode "640x480R" 23.50 640 688 720 800 480 483 487 494 +hsync -
vsync
3. Add the newly created mode:
root@imx6qsabresd:~# xrandr --addmode "DISP4 FG" 640x480R
4. Check the modes:
root@imx6qsabresd:~# xrandr
Screen 0: minimum 240 x 240, current 240 x 320, maximum 8192 x 8192
DISP4 FG connected 240x320+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
U:240x320p-60 60.0*
640x480R 59.5
i.MX Graphics User’s Guide, Rev. 0, 02/2018
80 NXP Semiconductors
5. Switch to a new mode:
root@imx6qsabresd:~# xrandr -s 640x480
Figure 17 Switching to a new mode
Note:
The overlay size cannot exceed the display size. For example, if LVDS is 1024x768, the overlay size cannot
be larger than this.
Timings for overlay are meaningless, but wrong timings may damage the display, so be careful when
creating a new display mode for the display.
If fb3 is used, fb2 must be enabled. Otherwise, fb3 is invisible.
10.2.3 Process of selecting the HDMI default resolution
The process of selecting the HDMI default resolution is as follows:
1. Set the user preferred mode (must be within the initial size).
2. Set the display preferred mode (must be within the initial size).
3. Check the aspect (if not found, use 4:3. Find the biggest resolution within the initial size for the aspect
ratio).
4. Check the first mode.
Initial size: initial FB virtual size or configured maximum size.
To specify the user preferred mode, add the option “PreferredMode” or “modes”.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
81 NXP Semiconductors
10.2.4 Performance
The performance is decreased during screen rotation or mirroring.
10.2.5 Memory consumption
The video driver supports a maximum of 1920x1080@32bpp. To support rotation, a shadow buffer is reserved, so
the total memory consumption is 16 MB (1920x1080x4x2).
i.MX Graphics User’s Guide, Rev. 0, 02/2018
82 NXP Semiconductors
Chapter 11 Advanced GPU Configuration
11.1 GPU Scaling Governor
i.MX 8QuadMax GPU DVFS design supports different running modes: overdrive, nominal, and underdrive.
Nominal is the default, the overdrive is supposed to be performance/benchmark mode, and underdrive mode is
expected as energy saving mode.
Try to switch among the 3 modes, just using command line after boot without recompile the gpu driver.
$ echo "overdrive" > /sys/bus/platform/drivers/galcore/gpu_mode
$ echo "nominal" > /sys/bus/platform/drivers/galcore/gpu_mode
$ echo "underdrive" > /sys/bus/platform/drivers/galcore/gpu_mode
Try to check which mode is running on now, using command line as below:
$ cat /sys/bus/platform/drivers/galcore/gpu_mode
11.2 GPU Device Cooling
i.MX device support the thermal driver, which could signal the overheat event to GPU driver, once GPU driver
receive the event, it can enable GPU DFS feature to reduce GPU frequency as N/64 of the original designated clock.
The default N factor is 1 in the original BSP release, the end-user can reconfigure it through below command:
echo N >/sys/bus/platform/drivers/galcore/gpu3DMinClock
The user also can check the existing config as below
cat /sys/bus/platform/drivers/galcore/gpu3DMinClock
Chapter 12 Vivante Software Tool Kit
This chapter contains copyright material disclosed with permission of Vivante Corporation.
12.1 Vivante Tool Kit overview
The Vivante Tool Kit (VTK) is a set of applications designed to be used by graphics application developers to rapidly
develop and port graphics applications either stand alone, or as part of an IDE targeting a system-on-chip (SoC)
platform containing an embedded GPU.
12.1.1 VTK component overview
The VTK includes a graphics and OpenCL emulator (vEmulator) to enable embedded graphics and compute
application development on a PC platform, a driver and hardware performance profiling utility (vProfiler), and a
visual analyzer (vAnalyzer) for graphing the performance metrics. Also provided are pre-processing utilities for
stand-alone development of optimized shader programs (vShader) and for compiling shader code (vCompiler) into
binary executables targeting Vivante accelerated hardware platforms. An image transfer utility (vTexture) provides
compression and decompression options.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
83 NXP Semiconductors
Figure 18 Vivante Tool Kit vTools components
12.1.2 VTK operating system requirements
Most VTK vTools applications are designed to run on Microsoft Windows operating systems. The following systems
are compatible with current releases of vTools:
Microsoft Windows® XP Professional, with Service Pack 2 or later
Microsoft Windows® Vista with Service Pack 2 or later
Microsoft Windows® 7 Professional
Some components, such as the vProfiler, are run on other platforms. See the individual vTools component detail
description.
12.1.3 VTK installation
The vProfiler tool is not included in the VTK. This tool can be built by setting a build command option when making
the Vivante Graphics Drivers.
The VTK package contains a vtools folder. Inside this folder are six .zip packages which can be individually
extracted. As an example, for a WinRAR system, right-click and select Extract Here. A folder is created with the
same name as the .zip file.
vAnalyze.zip
vCompiler.zip
vEmulator.zip
vShader.zip
vTexture.zip
vTracer.zip
Each vTools extracted folder contains a SETUP.exe and a vToolName.msi file. The tool can be installed
independently by running the SETUP.exe located in the tool folder. Typical licensing and folder placement
options may appear as part of the installation prompts.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
84 NXP Semiconductors
vAnalyzer and vShader have a Windows GUI. vEmulator is a library. vCompiler and vTexture are utilities run from
the command line.
NOTES:
The default installation location for the VTK is usually a folder named something like C:\Program
Files\Vivante\vToolName, where vToolName is the name of the tool being installed. Some systems may install to a
Program Files (x86) folder.
Windows OS navigation instructions such as Control Panel navigation vary with the different Windows operating
systems.
Administrator rights may be required to install the tool.
When installing an updated version, use Windows OS Add/Remove programs to remove the installed version of
the tool, before installing the update version.
12.1.4 Software release compatibility
SW release 5.0.11.p7 - VTK v1.6.2.p1
SW release 5.1.1 - VTK v1.6.3
SW release 5.0.11_p6 - VTK v1.6.2
SW release 5.0.11_p5 - VTK v1.6.1
SW release 5.0.11_p4 - VTK v1.6.0
SW release 5.0.11_p3 - VTK v1.5.9 and v1.5.8
SW release 5.0.11_p2 - VTK v1.5.7
SW release 4.6.9.p13, 5.0.9 and 5.0.9.1 - VTK v1.5.3
SW release 4.6.9.p13 and 5.0.7 - VTK v1.5
SW release 4.6.9.p9 - VTK v1.4.2
SW release 4.6.9 - VTK v1.4
12.2 vEmulator
Vivante’s vEmulator duplicates the graphics and compute functionality of the Khronos APIsnamely, OpenGL ES
3.0, 2.0, 1.1 and OpenCL 1.1in a desktop PC environment. This enables developers to write and test applications
for Vivante embedded GPU cores prior to their availability, using the graphics cards on Windows® XP or Windows®
Vista or Windows® 7 PC platforms.
i.MX Graphics User’s Guide, Rev. 0, 02/2018
85 NXP Semiconductors
Figure 19 vEmulator embedded graphics emulator
vEmulator is not an application, but rather a set of libraries that convert Khronos mobile API function calls into
OpenGL desktop or OpenCL function calls. These libraries can be accessed directly by the graphics / compute
application.
12.2.1 Supported operating systems and graphics hardware
vEmulator libraries are available for Microsoft Windows XP, Windows Vista and Windows 7 operating systems:
Microsoft Windows XP Professional, with Service Pack 2 or later
Microsoft Windows Vista with Service Pack 2 or later
Microsoft Windows 7 Professional
vEmulator has been tested on popular graphics cards, including:
NVIDIA GeForce GTX 200 series with driver version 182.05 or later
NVIDIA GeForce 9000 and 8000 series with driver version 182.05 or later
NVIDIA GeForce 8400 GSwithForceWare driver version 176.44 or later
ATI Radeon HD 3000 and 4000 series with driver version Catalyst 9.1 or later
vEmulator for OpenGL ES 3 has been tested on the nVidia GeForce GT430 card with driver version 310.90.
Additional graphics cards to be added as testing is confirmed.
12.2.1.1 Specifying platform mode for Windows OS
vEmulator supports both 32-bit and 64-bit operation on the same host (from VTK 1.61). The installation uses the
following locations for vEmulator files on Windows platforms:
C:\Program Files\vivante\vEmulator\x86 (for 32-bit emulation)
C:\Program Files\vivante\vEmulator\x64 (for 64-bit emulation)
Start Menu location: All Programs\Vivante\vEmulator\x86 (for 32-bit)
Start Menu location: All Programs\Vivante\vEmulator\x64 (for 64-bit)
i.MX Graphics User’s Guide, Rev. 0, 02/2018
86 NXP Semiconductors
To run samples for 32-bit emulation in the x86 folder, select the platform option Win32 from the dropdown list
box in the toolbar area:
Figure 20 Selecting Win32
To run samples for 64-bit emulation in the x64 folder, select the platform option x64 from the dropdown list box in
the toolbar area:
Figure 21 Selecting x64
12.2.2 vEmulator components
vEmulator libraries are packaged with the Vivante VTK installer. Once installed, the libraries resides in a folder
vEmulator in the VTK installation path, which can be specified by the user at time of installation. The default
location of the Vivante VTK is:
C:\Program Files\Vivante
The vEmulator folder contains everything that is needed for emulation. The vEmulator directory structure and its
files are described in the following table.
Table 22. vEmulator Directory Contents
vEmulator
subdirectory
Filename
Description
bin
libEGL.dll
Dynamic library for invoking EGL at runtime
libGLESv1_CM.dll
Dynamic library for OpenGL ES 1.1 emulation
libGLESv2x.dll
Dynamic library for OpenGL ES 2.0 emulation
libGLESv3.dll
Dynamic library for OpenGL ES 3.0 emulation
libOpenCL.dll
Dynamic library for OpenCL 1.1 emulation
libVEmulatorVDK.dll
Dynamic library for vEmulator VDK functions
inc
gc_vdk.h
Vivante VDK declarations
gc_vdk_types.h
Vivante VDK type declarations
inc/EGL
gc_sdk.h
Vivante SDK declarations and definitions
egl.h
EGL declarations
eglext.h
EGL extension declarations
eglplatform.h
Platform-specific EGL declarations
eglrename.h
Rename for building static link driver
eglunname.h
For mixed usage of ES11, ES20
inc/GLES
eglvivante.h
Vivante EGL declarations
egl.h
EGL declarations
gl.h
OpenGL 1.1 declarations
glext.h
OpenGL1.1 extension declarations
i.MX Graphics User’s Guide, Rev. 0, 02/2018
87 NXP Semiconductors
glplatform.h
Platform-specific OpenGL 1.1 declarations
glrename.h
Rename for building static link driver
inc/GLES2
glunname.h
For mixed usage of ES11, ES20
gl2.h
OpenGL 2.0 declarations
gl2ext.h
OpenGL 2.0 extension declarations
gl2platform.h
Platform-specific OpenGL 2.0 declarations
gl2rename.h
Rename for building static link driver
inc/GLES3
gl2unname.h
Unified name definitions
gl3.h
OpenGL 3.0 declarations
gl3ext.h
OpenGL 3.0 extension declarations
gl3platform.h
Platform-specific OpenGL 3.0 declarations
inc/hal
inc/KHR