Cg Toolkit User's Manual Users

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 356 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Release 1.4
September 2005
Cg Language Toolkit
NVIDIA Corporation
2701 San Tomas Expressway
Santa Clara, CA 95050
www.nvidia.com
ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS,
LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED
"AS IS." NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH
RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF
NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.
Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes
no responsibility for the consequences of use of such information or for any infringement of patents or
other rights of third parties that may result from its use. No license is granted by implication or
otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this
publication are subject to change without notice. This publication supersedes and replaces all
information previously supplied. NVIDIA Corporation products are not authorized for use as critical
components in life support devices or systems without express written approval of NVIDIA
Corporation.
Trademarks
NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the
United States and other countries.
Microsoft, Windows, the Windows logo, and DirectX are registered trademarks of Microsoft
Corporation.
OpenGL is a trademark of SGI.
Other company and product names may be trademarks of the respective companies with which they
are associated.
Updates
Any changes, additions, or corrections will be posted at the NVIDIA Cg Web site:
http://developer.nvidia.com/Cg
Refer to this site often to keep up on the latest changes and additions to the Cg language.
Copyright
© 2002—2005 NVIDIA Corporation. All rights reserved.
808-00504-0000-006 i
NVIDIA
Table of Contents
Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Release Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
Online Updates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
Introduction
to the Cg Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
The Cg Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Cg’s Programming Model for GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Cg Language Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Declaring Programs in Cg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Program Inputs and Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Working with Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Basic Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Type Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Structures and Member Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Statements and Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Function Definitions and Function Overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Arithmetic Operators from C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Multiplication Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Vector Constructor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Boolean and Comparison Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Swizzle Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Write Mask Operator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Conditional Operator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Texture Lookups in Advanced Fragment Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Effects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Passes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
State Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Parameters and Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Vertex and Fragment Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Textures and Samplers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Interfaces and Unsized Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Running Cg Programs on the CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
ii 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
More Details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Cg Standard Library Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Mathematical Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Geometric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Texture Map Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Derivative Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Debugging Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Predefined Fragment Program Output Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Introduction to the
Cg Runtime Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Introducing the Cg Runtime. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Benefits of the Cg Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Overview of the Cg Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Core Cg Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Core Cg Context. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Core Cg Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Core Cg Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Core Cg Error Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
API-Specific Cg Runtimes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Parameter Shadowing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
OpenGL Cg Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Direct3D Cg Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Introduction to CgFX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
CgFX Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Key Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Technique Validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Passes and Pass State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Effect Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Vertex and Fragment Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Textures and Samplers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Interfaces and Unsized Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Evaluating Cg Programs using the Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . 127
Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
OpenGL State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
OpenGL Sampler State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
OpenGL State Not Specifiable with State Assignments . . . . . . . . . . . . . . . . . . . . . . 142
A Brief Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Loading the Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Understanding simple.cg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Program Listing for simple.cg. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Definitions for Structures with Varying Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Passing Arguments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
808-00504-0000-006 iii
NVIDIA
Basic Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .149
Prepare for Lighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .150
Calculating the Vertex Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .151
Further Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .152
Advanced Profile Sample Shaders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Improved Skinning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .154
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .154
Vertex Shader Source Code for Improved Skinning . . . . . . . . . . . . . . . . . . . . . . . . .155
Improved Water . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .157
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .157
Vertex Shader Source Code for Improved Water . . . . . . . . . . . . . . . . . . . . . . . . . . .158
Pixel Shader Source Code for Improved Water . . . . . . . . . . . . . . . . . . . . . . . . . . . .160
Melting Paint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .161
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .161
Vertex Shader Source Code for Melting Paint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .161
Pixel Shader Source Code for Melting Paint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .163
MultiPaint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .165
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .165
Vertex Shader Source Code for MultiPaint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .166
Pixel Shader Source Code for MultiPaint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .167
Ray-Traced Refraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .170
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .170
Vertex Shader Source Code for Ray-Traced Refraction . . . . . . . . . . . . . . . . . . . . . . .171
Pixel Shader Source Code for Ray-Traced Refraction . . . . . . . . . . . . . . . . . . . . . . . .172
Skin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .175
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .175
Pixel Shader Source Code for Skin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .175
Thin Film Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .180
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .180
Vertex Shader Source Code for Thin Film Effect. . . . . . . . . . . . . . . . . . . . . . . . . . . .180
Pixel Shader Source Code for Thin Film Effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . .182
Car Paint 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .183
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .183
Vertex Shader Source Code for Car Paint 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .184
Pixel Shader Source Code for Car Paint 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .186
Basic Profile Sample Shaders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Anisotropic Lighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .190
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .190
Vertex Shader Source Code for Anisotropic Lighting. . . . . . . . . . . . . . . . . . . . . . . . .191
Bump Dot3x2 Diffuse and Specular . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .192
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .192
Vertex Shader Source Code for Bump Dot3x2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .193
Pixel Shader Source Code for Bump Dot3x2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .194
Bump-Reflection Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .196
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .196
iv 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Vertex Shader Source Code for Bump-Reflection Mapping. . . . . . . . . . . . . . . . . . . . 197
Pixel Shader Source Code for Bump and Reflection Mapping. . . . . . . . . . . . . . . . . . 199
Fresnel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Vertex Shader Source Code for Fresnel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Grass. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Vertex Shader Source Code for Grass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Refraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Vertex Shader Source Code for Refraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Pixel Shader Source Code for Refraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Shadow Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Vertex Shader Source Code for Shadow Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . 209
Pixel Shader Source Code for Shadow Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Shadow Volume Extrusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Vertex Shader Source Code for Shadow Volume Extrusion . . . . . . . . . . . . . . . . . . . 212
Sine Wave Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Vertex Shader Source Code for Sine Wave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Matrix Palette Skinning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Vertex Shader Source Code for Matrix Palette Skinning. . . . . . . . . . . . . . . . . . . . . . 218
Appendix A
Cg Language Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Language Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Silent Incompatibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Similar Operations That Must be Expressed Differently. . . . . . . . . . . . . . . . . . . . . . 222
Differences from ANSI C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Detailed Language Specification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
The Uniform Modifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Function Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Overloading of Functions by Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Syntax for Parameters in Function Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Function Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Method Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Partial Support of Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Type Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
808-00504-0000-006 v
NVIDIA
Type Qualifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .233
Type Conversions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .234
Type Equivalency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .236
Type-Promotion Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .236
Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .237
Arrays and Subscripting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .238
Unsized Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .239
Function Overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .240
Global Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .241
Use of Uninitialized Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .241
Preprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .241
Overview of Binding Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .241
Binding Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .242
Aliasing of Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .243
Restrictions on Semantics Within a Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . .243
Additional Details for Binding Semantics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .243
How Programs Receive and Return Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .243
Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .244
Minimum Requirements for if, while, and for Statements . . . . . . . . . . . . . . . . . .244
New Vector Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .244
Arithmetic Precision and Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .246
Operator Precedence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .247
Operator Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .247
Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .248
Reserved Words. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .249
Cg Standard Library Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .250
Vertex Program Profiles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .250
Mandatory Computation of Position Output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .250
Position Invariance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .250
Binding Semantics for Outputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .251
Fragment Program Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .252
Binding Semantics for Outputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .252
Appendix B
Language Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
OpenGL ARB Vertex Program Profile (arbvp1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .256
Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .256
Accessing OpenGL State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .256
Position Invariance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .258
Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .258
Compatibility with the vp20 Vertex Program Profile. . . . . . . . . . . . . . . . . . . . . . . . .259
Loading Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .260
Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .260
Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .262
OpenGL ARB Fragment Program Profile (arbfp1) . . . . . . . . . . . . . . . . . . . . . . . . . . . .263
Accessing OpenGL State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .263
vi 808-00504-0000-006
NVIDIA
Cg Language Toolkit
MRT Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Resource Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Language Constructs and Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
OpenGL NV_vertex_program 3.0 Profile (vp40). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Vertex Texturing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
OpenGL NV_fragment_program 2.0 Profile (fp40). . . . . . . . . . . . . . . . . . . . . . . . . . . 268
Branching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
FACE Semantic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
OpenGL NV_vertex_program 2.0 Profile (vp30). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
Position Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
Language Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
OpenGL NV_fragment_program Profile (fp30) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Language Constructs and Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Pack and Unpack Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
OpenGL NV_vertex_program 1.0 Profile (vp20). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Position Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
OpenGL NV_texture_shader and NV_register_combiners Profile (fp20). . . . . . . . . . . . 283
Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Modifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
Language Constructs and Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Standard Library Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
Auxiliary Texture Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
DirectX Vertex Shader 2.x Profiles (vs_2_*) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Statements and Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Using Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
DirectX Pixel Shader 2.x Profiles (ps_2_*) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
Language Constructs and Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
808-00504-0000-006 vii
NVIDIA
Limitations in this Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .303
DirectX Vertex Shader 1.1 Profile (vs_1_1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .304
Memory Restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .304
Language Constructs and Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .304
Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .306
Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .307
DirectX Pixel Shader 1.x Profiles (ps_1_*) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .308
Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .308
Modifiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .309
Language Constructs and Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .310
Standard Library Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .311
Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .312
Auxiliary Texture Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .315
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .319
Appendix C
Nine Steps to High-Performance Cg. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Appendix D
Cg Compiler Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
viii 808-00504-0000-006
NVIDIA
Cg Language Toolkit
808-00504-0000-006 ix
NVIDIA
Contents, Figures, and Tables
List of Figures
Fig. 1. Cg’s Model of the GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Fig. 2. The Parts of the Cg Runtime API . . . . . . . . . . . . . . . . . . . . . . . 45
Fig. 3. The Cg_Simple Workspace . . . . . . . . . . . . . . . . . . . . . . . . . 145
Fig. 4. The simple.cg Shader . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Fig. 5. Example of Improved Skinning . . . . . . . . . . . . . . . . . . . . . . . . 154
Fig. 6. Example of Improved Water . . . . . . . . . . . . . . . . . . . . . . . . . 157
Fig. 7. Example of Melting Paint . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Fig. 8. Example of MultiPaint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Fig. 9. Example of Ray-Traced Refraction . . . . . . . . . . . . . . . . . . . . . . . 170
Fig. 10. Example of Skin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Fig. 11. Example of Thin Film Effect . . . . . . . . . . . . . . . . . . . . . . . . . 180
Fig. 12. Example of Car Paint 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Fig. 13. Example of Anisotropic Lighting . . . . . . . . . . . . . . . . . . . . . . . 190
Fig. 14. Example of Bump Dot3x2 Diffuse and Specular . . . . . . . . . . . . . . . . 192
Fig. 15. Example of Bump-Reflection Mapping . . . . . . . . . . . . . . . . . . . . 196
Fig. 16. Example of Fresnel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Fig. 17. Example of Grass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Fig. 18. Example of Refraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Fig. 19. Example of Shadow Mapping . . . . . . . . . . . . . . . . . . . . . . . . 208
Fig. 20. Example of Shadow Volume Extrusion . . . . . . . . . . . . . . . . . . . . 211
Fig. 21. Example of Sine Wave . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Fig. 22. Example of Matrix Palette Skinning . . . . . . . . . . . . . . . . . . . . . . 217
x808-00504-0000-006
NVIDIA
Cg Language Toolkit
List of Figures
808-00504-0000-006 xi
NVIDIA
List of Tables
Table 1. Mathematical Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
Table 2. Geometric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38
Table 3. Texture Map Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . .39
Table 4. Derivative Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
Table 5. Debugging Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42
Table 6. CgFX OpenGL State Manager States . . . . . . . . . . . . . . . . . . . . . 130
Table 7. Enable/Disable States. . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Table 8. sampler_state State Assignments . . . . . . . . . . . . . . . . . . . . . . 141
Table 9. Type Conversions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Table 10. Expanded Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Table 11. Vertex Output Binding Semantics. . . . . . . . . . . . . . . . . . . . . . 251
Table 12. Fragment Output Binding Semantics . . . . . . . . . . . . . . . . . . . . 252
Table 16. arbvp1 Uniform Input Binding Semantics . . . . . . . . . . . . . . . . . 260
Table 17. arbvp1 Varying Input Binding Semantics. . . . . . . . . . . . . . . . . . 261
Table 18. arbvp1 Varying Output Binding Semantics. . . . . . . . . . . . . . . . . 261
Table 19. arbfp1 Uniform Input Binding Semantics . . . . . . . . . . . . . . . . . 265
Table 20. arbfp1 Varying Input Binding Semantics . . . . . . . . . . . . . . . . . 265
Table 21. arbfp1 Varying Output Binding Semantics. . . . . . . . . . . . . . . . . 265
Table 22. fp40 Compiler Branching Options . . . . . . . . . . . . . . . . . . . . . 269
Table 23. vp30 Uniform Input Binding Semantics . . . . . . . . . . . . . . . . . . 271
Table 24. vp30 Varying Input Binding Semantics . . . . . . . . . . . . . . . . . . . 272
Table 25. vp30 Varying Output Binding Semantics . . . . . . . . . . . . . . . . . . 272
Table 26. fp30 Uniform Input Binding Semantics . . . . . . . . . . . . . . . . . . 275
Table 27. fp30 Varying Input Binding Semantics. . . . . . . . . . . . . . . . . . . 275
Table 28. fp30 Varying Output Binding Semantics . . . . . . . . . . . . . . . . . . 276
Table 29. vp20 Uniform Input Binding Semantics . . . . . . . . . . . . . . . . . . 280
Table 30. vp20 Varying Input Binding Semantics. . . . . . . . . . . . . . . . . . . 281
Table 31. vp20 Varying Output Binding Semantics . . . . . . . . . . . . . . . . . . 281
Table 32. NV_texture_shader and NV_register_combiners Instruction Set Modifiers . . . 285
Table 33. Supported Standard Library Functions . . . . . . . . . . . . . . . . . . . 286
Table 34. Required Projective Texture Lookup Swizzles . . . . . . . . . . . . . . . . 288
xii 808-00504-0000-006
NVIDIA
Cg Language Toolkit
List of Tables
Table 35. fp20 Uniform Binding Semantics . . . . . . . . . . . . . . . . . . . . . 289
Table 36. fp20 Varying Input Binding Semantics . . . . . . . . . . . . . . . . . . . 289
Table 37. fp20 Varying Output Binding Semantics . . . . . . . . . . . . . . . . . . 290
Table 38. fp20 Auxiliary Texture Functions . . . . . . . . . . . . . . . . . . . . . 291
Table 39. vs_2_* Uniform Input Binding Semantics . . . . . . . . . . . . . . . . . 298
Table 40. vs_2_* Varying Input Binding Semantics . . . . . . . . . . . . . . . . . 298
Table 41. vs_2_* Varying Output Binding Semantics . . . . . . . . . . . . . . . . . 299
Table 42. ps_2_* Uniform Input Binding Semantics . . . . . . . . . . . . . . . . . 302
Table 43. ps_2_* Varying Input Binding Semantics . . . . . . . . . . . . . . . . . 302
Table 44. ps_2_* Varying Output Binding Semantics . . . . . . . . . . . . . . . . . 302
Table 45. vs_1_1 Uniform Input Binding Semantics . . . . . . . . . . . . . . . . . 306
Table 46. vs_1_1 Varying Input Binding Semantics . . . . . . . . . . . . . . . . . . 306
Table 47. vs_1_1 Varying Output Binding Semantics . . . . . . . . . . . . . . . . . 307
Table 48. ps_1_x Instruction Set Modifiers . . . . . . . . . . . . . . . . . . . . . 309
Table 49. Supported Standard Library Functions . . . . . . . . . . . . . . . . . . . 311
Table 50. Required Projective Texture Lookup Swizzles . . . . . . . . . . . . . . . . 312
Table 51. ps_1_x Uniform Input Binding Semantics . . . . . . . . . . . . . . . . . 313
Table 52. ps_1_x Varying Input Binding Semantics . . . . . . . . . . . . . . . . . 314
Table 53. ps_1_x Varying Output Binding Semantics . . . . . . . . . . . . . . . . . 314
Table 54. ps_1_x Auxiliary Texture Functions . . . . . . . . . . . . . . . . . . . . 315
808-00504-0000-006 xiii
NVIDIA
Foreword
Weareinthemidstofagreattransitionincomputergraphics,bothinterms
ofgraphicshardwareandintermsofthevisualqualityandauthoring
processforgames,interactiveapplications,andanimation.Graphics
hardwarehasevolvedfrom“bigiron”graphicsworkstationscosting
hundredsofthousandsofdollarstosinglechipgraphicsprocessingunits
(GPUs)whoseperformanceandfeatureshavegrowntomatchandnoweven
toexceedtraditionalworkstations.Theprocessingpowerprovidedbya
modernGPUinasingleframerivalstheamountofcomputationthatusedto
beexpendedforanofflinerenderedanimationframe.Indeed,atthelaunch
ofGeForce3ontheAppleMacintosh,aconvincingversionofPixarsLuxo,Jr.
wasdemonstratedrunninginteractivelyinrealtime.Atthe2001SIGGRAPH
conference,aninteractiveversionofamorerecentfilm,SquareStudios’Final
Fantasy,wasshownrunninginrealtime,againonaGeForce3.
Althoughthesefeatsofcomputationareastounding,thereismuchmoreto
come.TodaysGPUsevolveveryquickly.Typically,aproductgenerationis
onlysixmonthslong,andwitheachnewproductgenerationcomesatwo
foldincreaseinperformance.Graphicsprocessorperformanceincreasesat
approximatelythreetimestherateofmicroprocessorsMoore’sLawcubed!
Inadditiontotheperformanceincreases,eachyearbringsnewhardware
features,supportedbynewapplicationprogramminginterfaces(APIs).This
dizzyingpaceisdifficultfordeveloperstoadaptto,butadapttheymust.
Developersandusersaredemandingbetterrenderingqualityandmore
realisticimageryandexperiences.Usersdon’tcareaboutthedetails;they
simplywantgamesandotherinteractiveapplicationstolookmorelike
movies,specialeffects,andanimation.Developerswantmorepower(always
more),alongwithmoreflexibilityincontrollingthemassivelycapableGPUs
oftodayandtomorrow.APIsdonot,andcannot,keepupwiththerapid
paceofinnovationinGPUs.AsAPIsandunderlyingtechnologieschange,
programmers,artists,andsoftwarepublishersstruggletoadapttothe
changeandthechurnofthehardware/softwareplatform.
What’sneededistoraisethelevelofabstractionforinteractionwithGPUs.
ContinuedupdatesandimprovementstothehardwareandAPIsaretoo
painfulifdevelopersaretoo“closetothemetal.”Thisproblemwas
xiv 808-00504-0000-006
NVIDIA
Cg Language Toolkit
exacerbatedbytheadventofprogrammabilityinGPUs.OlderGPUshada
smallnumberofcontrollableorconfigurablerenderingpaths,butthemost
recenttechnologyishighlyprogrammable,andbecomingevermoreso.We
cannowwriteshortvertexandfragmentprogramstobeexecutedbythe
GPU.Thisrequiresgreatskill,andisonlypossiblewithshortprograms.
WhenGPUhardwaregrowstoallowprogramsofhundreds,thousands,or
evenmoreinstructions,assemblycodingwillnolongerbepractical.Rather
thanprogrammingeachrenderingstate,eachbit,byte,andwordofdataand
controlthroughalowlevelassemblylanguage,wewanttoexpressourideas
inamorestraightforwardform,usingahighlevellanguage.
ThusCg,“CforGraphics,”becomesnecessaryandinevitable.JustasCwas
derivedtoexposethespecificcapabilitiesofprocessorswhileallowing
higherlevelabstraction,CgallowsthesameabstractionforGPUs.Cg
changesthewayprogrammerscanprogram:focusingontheideas,the
concepts,andtheeffectstheywishtocreatenotonthedetailsofthe
hardwareimplementation.Cgalsodecouplesprogramsfromspecific
hardwarebecausethelanguageisfunctional,nothardwareimplementation
specific.Also,sinceCgcanbecompiledatruntimeonanyplatform,
operatingsystem,andforanygraphicshardware,Cgprogramsaretruly
portable.Finally,andperhapsbestofall,Cgprogramsarefutureproofand
canadapttorunwellonfutureproducts.Thecompilercanoptimizedirectly
foranewtargetGPUthatperhapsdidnotevenexistwhentheoriginalCg
programwaswritten.
ThisbookisintendedasanintroductiontoCg,aswellasapractical
handbooktogetprogrammersstarteddevelopinginCg.Itincludesa
languagedescription,areferenceforthestandardandruntimelibraries,and
isfullofhelpfulexamples.Thegoalforthisbookistobebothan
introductionandatoolforthenewuser,aswellasareferenceandresource
fordevelopersastheybecomemoreproficient.
WelcometotheworldofCg!
David Kirk
ChiefScientist
NVIDIACorporation
808-00504-0000-006 xv
NVIDIA
Preface
ThegoalofthisbookistointroducetoyouCg,anewhighlevellanguagefor
graphicsprogramming.Tothatend,wehaveorganizedthisdocumentinto
thefollowingsections:
“IntroductiontotheCgLanguage”onpage 1
AquickintroductiontothecurrentreleaseofCg,witheverythingyou
needtoknowtostartworkingit.
“CgStandardLibraryFunctions”onpage 33
AlistoftheStandardLibraryfunctions,whichcanhelptoreduceyour
programdevelopmenttime.
“IntroductiontotheCgRuntimeLibrary”onpage 43
AnintroductiontotheCgruntimeAPIs,whichallowyoutoeasily
compileCgprogramsandpassdatatothemfromwithinapplications.
“IntroductiontoCgFX”onpage 117
TheCgFXAPI,whichsupportsthisCgextendedfileformat,isdescribed.
“A BriefTutorialonpage 145
AdescriptionofasimpleCgprogramandMicrosoftVisualStudio
workspace(bothprovidedontheaccompanyingCD)thatyoucanuseto
startexperimentingwithCg.
AdvancedProfileSampleShaders”onpage 153
AlistofsampleNV30shaders,completewithsourcecode.
“BasicProfileSampleShaders”onpage 189
AlistofsampleNV2Xshaders,completewithsourcecode.
AppendixA,“CgLanguageSpecification”onpage 221
TheformalCglanguagespecification.
AppendixB,“LanguageProfiles”onpage 255
Describesfeaturesandrestrictionsofthecurrentlysupportedlanguage
profiles:DirectX8vertex,DirectX8pixel,OpenGLARBvertex,NV2X
OpenGLvertex,NV30OpenGLvertex,NV30OpenGLfragment,
OpenGLARBfragment,NV40OpenGLvertex,andNV40OpenGL
fragment.
xvi 808-00504-0000-006
NVIDIA
Cg Language Toolkit
AppendixC,“NineStepstoHighPerformanceCg”onpage 321
StrategiesforgettingthemostoutofyourCgcode.
AppendixD,“CgCompilerOptions”onpage 329
AlistofthevariouscommandlineoptionsthattheCgcompileraccepts.
CgDevelopersCD
TheCDprovidedwiththisbookcontainstheentireCgrelease,which
allowsyougetstartedimmediately.Thereadme.txtfileontheCD
describesthecontentsofthereleaseindetail.
YoucanbeginworkingwithCgimmediatelybyreadingthe“Introductionto
theCgLanguage”onpage 1 andthengoingthrough“A BriefTutorialon
page 145.OnceyouhaveabasicunderstandingoftheCglanguage,usethe
AdvancedProfileSampleShaders”onpage 153and“BasicProfileSample
Shaders”onpage 189asabasistobuildyourowneffects.
Release Notes
ReleasenotesforCgarenowcontainedinaseparatedocumentthatispartof
theCgdistribution.
Pleasereportanybugs,issues,andfeedbacktoNVIDIAbyemailing
cgsupport@nvidia.com.Wewillexpeditiouslyaddressanyreported
problems.
Online Updates
Anychanges,additions,orcorrectionsarepostedattheNVIDIACgWeb
site:
http://developer.nvidia.com/Cg
Refertothissiteoftentokeepuponthelatestchangesandadditionstothe
Cglanguage.Informationonhowtoreportanybugsyoumayfindinthe
releaseisalsoavailableonthissite.
808-00504-0000-006 1
NVIDIA
Introduction
to the Cg Language
Historically,graphicshardwarehasbeenprogrammedataverylowlevel.
Fixedfunctionpipelineswereconfiguredbysettingstatessuchasthe
texturecombiningmodes.Morerecently,programmersconfigured
programmablepipelinesbyusingprogramminginterfacesattheassembly
languagelevel.Intheory,theselowlevelprogramminginterfacesprovided
greatflexibility.Inpractice,theywerepainfultouseandpresentedaserious
barriertotheeffectiveuseofhardware.
Usingahighlevelprogramminglanguage,ratherthanthelowlevel
languagesofthepast,providesseveraladvantages:
Ahighlevellanguagespeedsupthetweakandruncyclewhenashader
isdeveloped.Theultimatetestforashaderis“Doesitlookright?”To
thatend,theabilitytoquicklyprototypeandmodifyashaderiscrucial
totherapiddevelopmentofhighqualityeffects.
Thecompileroptimizescodeautomaticallyandperformslowlevel
tasks,suchasregisterallocation,thataretediousandpronetoerror.
Shadingcodewritteninahighlevellanguageismucheasiertoreadand
understand.Italsoallowsnewshaderstobeeasilycreatedbymodifying
previouslywrittenshaders.Whatbetterwaytolearnthanfromashader
writtenbythebestartistsandprogrammers?
Shaderswritteninahighlevellanguageareportabletoawiderrangeof
hardwareplatformsthanshaderswritteninassemblycode.
ThischapterintroducesCg(CforGraphics),ahighlevellanguagetailored
forprogrammingGPUs.Cgoffersalltheadvantagesjustdescribed,allowing
programmerstofinallycombinetheinherentpoweroftheGPUwitha
languagethatmakesGPUprogrammingeasy.
2808-00504-0000-006
NVIDIA
Cg Language Toolkit
The Cg Language
CgisbasedonC,butwithenhancementsandmodificationsthatmakeiteasy
towriteprogramsthatcompiletohighlyoptimizedGPUcode.Cgcodelooks
almostexactlylikeCcode,withthesamesyntaxfordeclarations,function
calls,andmostdatatypes.
BeforedescribingtheCglanguageindetail,itisimportanttoexplainthe
reasonforsomeofthedifferencesthatexistbetweenCgandC.
Fundamentally,itcomesdowntothedifferenceintheprogrammingmodels
forGPUsandforCPUs.
Cg’s Programming Model for GPUs
CPUsnormallyhaveonlyoneprogrammableprocessor.Incontrast,GPUs
haveatleasttwoprogrammableprocessors,thevertexprocessorandthe
fragmentprocessor,plusothernonprogrammablehardwareunits.The
processors,thenonprogrammablepartsofthegraphicshardware,andthe
applicationarealllinkedthroughdataflows.Cg’smodeloftheGPUis
illustratedbyFig. 1.
Fig. 1. Cg’s Model of the GPU
808-00504-0000-006 3
NVIDIA
Introduction to the Cg Language
TheCglanguageallowsyoutowriteprogramsforboththevertexprocessor
andthefragmentprocessor.Werefertotheseprogramsasvertex programsand
fragment programs,respectively.(Fragmentprogramsarealsoknownaspixel
programsorpixel shaders,andweusethesetermsinterchangeablyinthis
document.)CgcodecanbecompiledintoGPUassemblycode,eitheron
demandatruntimeorbeforehand.
CgmakesiteasytocombineaCgfragmentprogramwithahandwritten
vertexprogram,orevenwiththenonprogrammableOpenGLorDirectX
vertexpipeline.Likewise,aCgvertexprogramcanbecombinedwitha
handwrittenfragmentprogram,orwiththenonprogrammableOpenGLor
DirectXfragmentpipeline.
Cg Language Profiles
BecauseallCPUssupportessentiallythesamesetofbasiccapabilities,theC
languagesupportsthissetonallCPUs.However,GPUprogrammabilityhas
notquiteyetreachedthissamelevelofgenerality.Forexample,thecurrent
generationofprogrammablevertexprocessorssupportsagreaterrangeof
capabilitiesthandotheprogrammablefragmentprocessors.Cgaddresses
thisissuebyintroducingtheconceptoflanguageprofiles.ACgprofiledefines
asubsetofthefullCglanguagethatissupportedonaparticularhardware
platformorAPI.ThecurrentreleaseoftheCgcompilersupportsthe
followingprofiles:
OpenGLARBvertexprograms
Runtimeprofile: CG_PROFILE_ARBVP1
Compileroption: -profile arbvp1
OpenGLARBfragmentprograms
Runtimeprofile: CG_PROFILE_ARBFP1
Compileroption: -profile arbfp1
OpenGLNV40vertexprograms
Runtimeprofile: CG_PROFILE_VP40
Compileroption: -profile vp40
OpenGLNV40fragmentprograms
Runtimeprofile: CG_PROFILE_FP40
Compileroption: -profile fp40
OpenGLNV30vertexprograms
Runtimeprofile: CG_PROFILE_VP30
Compileroption: -profile vp30
4808-00504-0000-006
NVIDIA
Cg Language Toolkit
OpenGLNV30fragmentprograms
Runtimeprofile: CG_PROFILE_FP30
Compileroption: -profile fp30
OpenGLNV2Xvertexprograms
Runtimeprofile: CG_PROFILE_VP20
Compileroption: -profile vp20
OpenGLNV2Xfragmentprograms
Runtimeprofile: CG_PROFILE_FP20
Compileroption: -profile fp20
DirectX9vertexshaders
Runtimeprofiles: CG_PROFILE_VS_2_X
CG_PROFILE_VS_2_0
Compileroptions: -profile vs_2_x
-profile vs_2_0
DirectX9pixelshaders
Runtimeprofiles: CG_PROFILE_PS_2_X
CG_PROFILE_PS_2_0
Compileroptions: -profile ps_2_x
-profile ps_2_0
DirectX8vertexshaders
Runtimeprofile: CG_PROFILE_VS_1_1
Compileroption: -profile vs_1_1
DirectX8pixelshaders
Runtimeprofiles: CG_PROFILE_PS_1_3
CG_PROFILE_PS_1_2
CG_PROFILE_PS_1_1
Compileroptions: -profile ps_1_3
-profile ps_1_2
-profile ps_1_1
TheDirectX9profiles(vs_2_xandps_2_x),OpenGLARBprofiles(arbfp1
andarbvp1),NV30OpenGLprofiles(fp30andvp30),andNV40OpenGL
profiles(fp40andvp40)generallysupportlonger,morecomplexprograms
andoffermorefeaturesandfunctionalitytothedeveloper.Thesearereferred
toasadvancedprofiles.
TheDirectX8profiles(vs_1_1andps_1_3)andNV2XOpenGLprofiles
(fp20andvp20)havemorerestrictionsonprogramlengthandavailable
808-00504-0000-006 5
NVIDIA
Introduction to the Cg Language
features,especiallyinfragmentprograms.Thesearereferredtoasbasic
profiles.
See“LanguageProfiles”onpage 255fordetaileddescriptionsofthese
andrelatedprofiles.
Declaring Programs in Cg
CPUcodegenerallyconsistsofoneprogramspecifiedbymain()inC.In
contrast,aCgprogramcanhaveanyname.Aprogramisdefinedusingthe
followingsyntax:
Program Inputs and Outputs
TheprogrammableprocessorsinGPUsoperateonstreamsofdata.The
vertexprocessoroperatesonastreamofvertices,andthefragmentprocessor
operatesonastreamoffragments.
Aprogrammercanthinkofthemainprogramasbeingexecutedjustonceon
aCPU.Incontrast,aprogramisexecutedrepeatedlyonaGPU—oncefor each
element of datainastream.Thevertexprogramisexecutedonceforeach
vertex,andthefragmentprogramisexecutedonceforeachfragment.
TheCglanguageaddsseveralcapabilitiestoCtosupportthisstreambased
programmingmodel.FornewCgprogrammers,thesecapabilitiesoftentake
sometimetounderstandbecausetheyhavenodirectcorrespondencetoC
capabilities.However,thesampleprogramslaterinthisdocument
demonstratethatitreallyiseasytousethesecapabilitiesinCgprograms.
Two Kinds of Program Inputs
ACgprogramcanconsumetwodifferentkindsofinputs:
Varying inputsareusedfordatathatisspecifiedwitheachelementofthe
streamofinputdata.Forexample,thevaryinginputstoavertex
programarethepervertexvaluesthatarespecifiedinvertexarrays.For
afragmentprogram,thevaryinginputsaretheinterpolants,suchas
texturecoordinates.
Uniform inputs areusedforvaluesthatarespecifiedseparatelyfromthe
mainstreamofinputdata,anddon’tchangewitheachstreamelement.
Forexample,avertexprogramtypicallyrequiresatransformation
matrixasauniforminput.Often,uniforminputsarethoughtofas
graphicsstate.
<return-type> <program-name>(<parameters>)[: <semantic-name>]
{ /* ... */ }
6808-00504-0000-006
NVIDIA
Cg Language Toolkit
Varying Inputs to a Vertex Program
Avertexprogramtypicallyconsumesseveraldifferentpervertex(varying)
inputs.Forexample,theprogrammightrequirethattheapplicationspecify
thefollowingvaryinginputsforeachvertex,typicallyinavertexarray:
Modelspaceposition
Modelspacenormalvector
Texturecoordinate
Inafixedfunctiongraphicspipeline,thesetofpossiblepervertexinputsis
smallandpredefined.Thispredefinedsetofinputsisexposedtothe
applicationthroughthegraphicsAPI.Forexample,OpenGL1.4providesthe
abilitytospecifyavertexarrayofnormalvectors.
Inaprogrammablegraphicspipeline,thereisnolongerasmallsetof
predefinedinputs.Itisperfectlyreasonableforthedevelopertowritea
vertexprogramthatusesapervertexrefractiveindexvalueaslongasthe
applicationprovidesthisvaluewitheachvertex.
Cgprovidesaflexiblemechanismforspecifyingthesepervertexinputsin
theformofasetofpredefinednames.Eachprograminputmustbeboundto
anamefromthisset.Inthefollowingstructure,thevertexprogram
definitionbindsitsparameterstothepredefinednamesPOSITION,NORMAL,
TANGENT,andTEXCOORD3.Theapplicationmustprovidethevertexarraydata
associatedwiththesepredefinednames.
Werefertothepredefinednamesasbinding semantics.Thefollowingsetof
bindingsemanticsissupportedinallCgvertexprogramprofiles.SomeCg
profilessupportadditionalbindingsemantics.
struct myinputs {
float3 myPosition : POSITION;
float3 myNormal : NORMAL;
float3 myTangent : TANGENT;
float refractive_index : TEXCOORD3;
};
outdata foo(myinputs indata) {
/* ... */
// Within the program, the parameters are referred to as
// “indata.myPosition”, “indata.myNormal”, and so on.
/* ... */
}
POSITION BLENDWEIGHT
NORMAL TANGENT
808-00504-0000-006 7
NVIDIA
Introduction to the Cg Language
ThebindingsemanticPOSITION0isequivalenttothebindingsemantic
POSITION;likewise,theotherbindingsemanticshavesimilarequivalents.
IntheOpenGLCgprofiles,bindingsemanticsimplicitlyspecifythemapping
ofvaryinginputstoparticularhardwareregisters.However,inDirectX
basedCgprofilesthereisnosuchimpliedmapping.
Bindingsemanticsmaybespecifieddirectlyonprogramparametersrather
thanonstructelements.Thus,thefollowingvertexprogramdefinitionis
legal:
Varying Outputs to and from Vertex Programs
Theoutputsofavertexprogrampassthroughtherasterizerandaremade
availabletoafragmentprogramasvaryinginputs.Foravertexprogramand
fragmentprogramtointeroperate,theymustagreeonthedatabeingpassed
betweenthem.
Asitdoeswiththedataflowbetweentheapplicationandvertexprogram,
Cgusesbindingsemanticstospecifythedataflowbetweenthevertex
programandfragmentprogram.
Thisexampleshowstheuseofbindingsemanticsforvertexprogramoutput:
BINORMAL PSIZE
BLENDINDICES TEXCOORD0—TEXCOORD7
outdata foo(float3 myPosition : POSITION,
float3 myNormal : NORMAL,
float3 myTangent : TANGENT,
float refractive_index : TEXCOORD3) {
/* ... */
// Within the program, the parameters are referred to by
// their variable names: “myPosition”, “myNormal”,
// “myTangent”, and “refractive_index”.
/* ... */
}
// Vertex program
struct myvf {
float4 pout : POSITION; // Used for rasterization
float4 diffusecolor : COLOR0;
float4 uv0 : TEXCOORD0;
float4 uv1 : TEXCOORD1;
};
myvf foo(/* ... */) {
myvf outstuff;
/* ... */
8808-00504-0000-006
NVIDIA
Cg Language Toolkit
And,thisexampleshowshowtousethissamedataastheinputtoa
fragmentprogram:
ThefollowingbindingsemanticsareavailableinallCgvertexprofilesfor
outputfromvertexprograms:POSITION,PSIZE,FOG,COLOR0–COLOR1,and
TEXCOORD0–TEXCOORD7.
Allvertexprogramsmustdeclareandsetavectoroutputthatusesthe
POSITIONbindingsemantic.Thisvalueisrequiredforrasterization.
Toensureinteroperabilitybetweenvertexprogramsandfragmentprograms,
bothmustusethesamestructfortheirrespectiveoutputsandinputs.For
example
return outstuff;
}
// Fragment program
struct myvf {
float4 diffusecolor : COLOR0;
float4 uv0 : TEXCOORD0;
float4 uv1 : TEXCOORD1;
};
fragout bar(myvf indata) {
float4 x = indata.uv0;
/* ... */
}
struct myvert2frag {
float4 pos : POSITION;
float4 uv0 : TEXCOORD0;
float4 uv1 : TEXCOORD1;
};
// Vertex program
myvert2frag vertmain(...) {
myvert2frag outdata;
/* ... */
return outdata;
}
// Fragment program
void fragmain(myvert2frag indata ) {
float4 tcoord = indata.uv0;
/* ... */
}
808-00504-0000-006 9
NVIDIA
Introduction to the Cg Language
Notethatvaluesassociatedwithsomevertexoutputsemanticsareintended
forandareusedbytherasterizer.Thesevaluescannotactuallybeusedinthe
fragmentprogram,eventhoughtheyappearintheinputstruct.For
example,theindata.posvalueassociatedwiththePOSITIONfragment
semanticmaynotbereadinthefragmainshader.
Varying Outputs from Fragment Programs
Bindingsemanticsarealwaysrequiredontheoutputsoffragmentprograms.
Fragmentprogramsarerequiredtodeclareandsetavectoroutputthatuses
theCOLORsemantic.Thisvalueisusuallyusedbythehardwareasthefinal
colorofthefragment.SomefragmentprofilesalsosupporttheDEPTHoutput
semantic,whichallowsthedepthvalueofthefragmenttobemodified,and
somesupportadditionalcoloroutputsforhardwarethatsupportsmultiple
rendertargets(MRTs).
Aswithvertexprograms,fragmentprogramsmayreturntheiroutputsinthe
bodyofastructure.However,itisusuallymoreconvenienttoeitherdeclare
outputsasoutparameters:
ortoassociateasemanticwiththereturnvalueoftheshader:
Thefollowingexampleshowsasimplevertexprogramthatcalculates
diffuseandspecularlighting.Twostructuresforvaryingdata,appinand
vertout,arealsodeclared.Don’tworryaboutunderstandingexactlywhat
theprogramisdoing—thegoalissimplytogiveyouanideaofwhatCgcode
lookslike.“A BriefTutorialonpage 145explainsthisshaderindetail.
void main(/* ... */,
out float4 color : COLOR, out float depth : DEPTH) {
/* ...*/
color = diffuseColor * /* ...*/;
depth = /*...*/;
}
float4 main(/* ... */) : COLOR {
/* ... */
return diffuseColor * /* ... */;
}
// Define inputs from application.
struct appin
{
float4 Position : POSITION;
float4 Normal : NORMAL;
};
10 808-00504-0000-006
NVIDIA
Cg Language Toolkit
// Define outputs from vertex shader.
struct vertout
{
float4 HPosition : POSITION;
float4 Color : COLOR;
};
vertout main(appin IN,
uniform float4x4 ModelViewProj,
uniform float4x4 ModelViewIT,
uniform float4 LightVec)
{
vertout OUT;
// Transform vertex position into homogenous clip-space.
OUT.HPosition = mul(ModelViewProj, IN.Position);
// Transform normal from model-space to view-space.
float3 normalVec = normalize(mul(ModelViewIT,
IN.Normal).xyz);
// Store normalized light vector.
float3 lightVec = normalize(LightVec.xyz);
// Calculate half angle vector.
float3 eyeVec = float3(0.0, 0.0, 1.0);
float3 halfVec = normalize(lightVec + eyeVec);
// Calculate diffuse component.
float diffuse = dot(normalVec, lightVec);
// Calculate specular component.
float specular = dot(normalVec, halfVec);
// Use the lit function to compute lighting vector from
// diffuse and specular values.
float4 lighting = lit(diffuse, specular, 32);
// Blue diffuse material
float3 diffuseMaterial = float3(0.0, 0.0, 1.0);
// White specular material
float3 specularMaterial = float3(1.0, 1.0, 1.0);
// Combine diffuse and specular contributions and
808-00504-0000-006 11
NVIDIA
Introduction to the Cg Language
Working with Data
LikeC,Cgsupportsfeaturesthatcreateandmanipulatedata:
Basictypes
Structures
Arrays
Typeconversions
Basic Data Types
Cgsupportssevenbasicdatatypes:
float
A32bitIEEEfloatingpoint(s23e8)numberthathasonesignbit,a23bit
mantissa,andan8bitexponent.Thistypeissupportedinallprofiles,
althoughtheDirectX8pixelprofilesimplementitwithreduced
precisionandrangeforsomeoperations.
half
A16bitIEEElikefloatingpoint(s10e5)number.
int
A32bitinteger.Profilesmayomitsupportforthistypeorhavethe
optiontotreatintasfloat.
fixed
A12bitfixedpointnumber(s1.10)number.Itissupportedinall
fragmentprofiles.
bool
Booleandataisproducedbycomparisonsandisusedinifand
conditionaloperator(?:)constructs.Thistypeissupportedinall
profiles.
sampler*
// output final vertex color.
OUT.Color.rgb = lighting.y * diffuseMaterial +
lighting.z * specularMaterial;
OUT.Color.a = 1.0;
return OUT;
}
12 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Thehandletoatextureobjectcomesinsixvariants:sampler,sampler1D,
sampler2D,sampler3D,samplerCUBE,andsamplerRECT.Withone
exception,thesetypesaresupportedinallpixelprofiles,fragment
profiles,andtheNV40vertexprogramprofile.ThesamplerRECTtypeis
notsupportedintheDirectXprofiles.
string
AlthoughitisnotpossibletousestringsinCgprogramcodeforany
currentlyexistingprofile,theycanbesetandhavetheirvaluesqueried
thoughtheCgruntimeAPI;thus,theycanbeusefulforstoring
informationaboutthecontentsofaCgfile.
Cgalsoincludesbuiltinvectordatatypesthatarebasedonthebasicdata
types.Asampleofthesebuiltinvectordatatypesincludes(butisnotlimited
to)thefollowing:
Additionalsupportisprovidedformatricesofuptofourbyfourelements.
Herearesomeexamplesofmatrixdeclarations:
Notethatthemultidimensionalarrayfloat M[4][4]isnottypeequivalent
tothematrixfloat4x4 M.
TherearenounionsorbitfieldsinCgatpresent.
Type Conversions
TypeconversionsinCgworklargelyastheydoinC.Typeconversionsmay
beexplicitlyspecifiedusingtheC(newtype)castoperator.
Cgautomaticallyperformstypepromotioninmixedtypeexpressions,just
asCdoes.Forexample,theexpressionfloatvar * halfvar iscompiledas
floatvar * (float) halfvar.
CgusesdifferenttypepromotionrulesthanCdoesinonecase:Aconstant
withoutanexplicittypesuffixdoesnotcausetypepromotion.CGcompiles
theexpression halfvar * 2.0 as halfvar * (half) 2.0.
Incontrast,Cwouldcompileitas ((double) halfvar) * 2.0.Cguses
differentrulesthanCtominimizeinadvertenttypepromotionsthatcause
float4 float3 float2 float1
bool4 bool3 bool2 bool1
float1x1 matrix1; // One element matrix
float2x3 matrix2; // Two-by-three matrix (six elements)
float4x2 matrix3; // Four-by-two matrix (eight elements)
float4x4 matrix4; // Four-by-four matrix (sixteen
elements)
808-00504-0000-006 13
NVIDIA
Introduction to the Cg Language
computationstobeperformedinslower,highprecisionarithmetic.IftheC
behaviorisdesired,theconstantshouldbeexplicitlytypedtoforcethetype
promotion:halfvar * 2.0f iscompiledas((float) halfvar) * 2.0f.
Cgusesthefollowingtypesuffixesforconstants:
f for float
h for half
x for fixed
Structures and Member Functions
CgsupportsstructuresthesamewayCdoes.CgadoptstheC++convention
ofimplicitlyperformingatypedefbasedonthetagnamewhenastructis
declared:
Structuresmaydefinememberfunctionsinadditiontomembervariables.
Memberfunctionsprovideaconvenientwayofencapsulatinghelper
functionsassociatedwiththedatainthestructure,orasameansof
describingthebehaviorofadataobject.
Structurememberfunctionsaredeclaredanddefinedwithinthebodyofthe
structuredefinition:
Memberfunctionsmayreferencetheirargumentsorthemembervariablesof
thestructureinwhichtheyaredefined.Theresultofreferringtoavariable
outsidethescopeoftheenclosingstructure(suchas,globalvariables)is
undefined;instead,passingsuchvariablesasargumentstomember
functionsthatneedthemisrecommended.
Memberfunctionsareinvokedusingtheusual.notation:
struct mystruct {
/* ... */ };
mystruct s; // Define “s” as a “mystruct”.
struct Foo {
float val;
float helper(float x) {
return val + x;
}
};
float4 main(uniform Foo myfoo, uniform float myval) : COLOR {
return myfoo.helper(myval);
}
14 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Notethatinthecurrentrelease,membervariablesmustbedeclaredbefore
memberfunctionsthatreferencethem;additionally,memberfunctionsmay
notbeoverloadedbasedonprofile.
Arrays
ArraysaresupportedinCgandaredeclaredjustasinC.BecauseCgdoes
notsupportpointers,arraysmustalwaysbedefinedusingarraysyntax
ratherthanpointersyntax:
Basicprofilesplacesubstantialrestrictionsonarraydeclarationandusage.
Generalpurposearrayscanonlybeusedasuniformparameterstoavertex
program.Theintentistoallowanapplicationtopassarraysofskinning
matricesandarraysoflightparameterstoavertexprogram.
ThemostimportantdifferencefromCisthatarraysarefirstclasstypes.That
meansarrayassignmentsactuallycopytheentirearray,andarraysthatare
passedasparametersarepassedbyvalue(theentirearrayiscopiedbefore
makinganychanges),ratherthanbyreference.
Unsized Arrays
Cgsupportsunsizedarrays—arrayswithoneormoredimensionshavingno
specifiedlength.ThismakesitpossibletowriteCgfunctionsthatoperateon
arraysofarbitrarysize.Forexample:
Here,myfunc()isdeclaredtobeafunctionofasingleparameter,vals,
whichisaonedimensionalarrayoffloats.However,thelengthofthevals
arrayisnotspecified.
Theeffectofthisdeclarationisthatanysubsequentcalltomyfunc()that
passesaonedimensionalarrayoffloatsofanysizeresolvestothedeclared
function.Forexample:
// Declare a function that accepts an array
// of five skinning matrices.
returnType foo(float4x4 mymatrix[5]) {/* ... */};
float myfunc(float vals[]) {
...
}
float myfunc(float vals[]) {
...
}
float4 main(...) {
808-00504-0000-006 15
NVIDIA
Introduction to the Cg Language
Theactuallengthofanarrayparameter(sizedorunsized)maybequeried
viathe.lengthpseudomember:
Thesizeofaparticulardimensionofamultidimensionalarraymaybe
queriedbydereferencingtheappropriatenumberofdimensionsofthearray.
Forexample,vals2d[0].lengthgivesthelengthoftheseconddimensionof
thetwodimensionalvals2darray:
Ifthelengthofanydimensionofanarrayparameterisspecified,that
parameteronlymatchescallswithvariableswhosecorresponding
dimensionisofthespecifiedlength.Forexample:
float vals1[2];
float vals2[76];
...
float myval1 = myfunc(vals1); // match
float myval2 = myfunc(vals2); // match
...
}
float myfunc(float vals[]) {
float sum = 0;
for (int i = 0; i < vals.length; i++) {
sum += vals[i];
}
return sum;
}
float myfunc(float vals2d[][]) {
float sum = 0;
for (int i = 0; i < vals2d.length; i++) {
for (int j = 0; i < vals2d[0].length; j++) {
sum += vals[i][j];
}
}
return sum;
}
float func(float vals[6][]) {
...
}
float4 main(...) {
float v1[6][7];
float v2[5][11];
...
float myv1 = func(vals1); // match: 6 == 6
16 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Unsizedarraysmayonlybedeclaredasfunctionparameters—theymaynot
bedeclaredasvariables.Furthermore,inallcurrentprofiles,theactualarray
lengthandaddresscalculationsimpliedbyarrayindexingmustbeknownat
compiletime.
Unsizedarrayparametersoftoplevelfunctions,suchas,main(),maybe
connectedtosizedarraysthatarecreatedintheruntime,ortheirsizemaybe
setdirectlyforconvenience.SeethecgSetArraySize()manualintheCg
coreruntimedocumentationfordetails.
Interfaces
Cgsupportsinterfaces,alanguageconstructfoundinotherlanguages,
includingJavaandC#(andinC++aspurevirtualclasses).Interfacesprovide
ameansofabstractlydescribingthememberfunctionsaparticularstructure
provides,withoutspecifyinghowthosefunctionsareimplemented.When
usedinconjunctionwithparameterinstantiationbytheCgruntime,this
abstractionmakesitpossibletopluginanystructurethatimplementsa
giveninterfaceintoaprogram—evenifthestructurewasnotknowntothe
authoroftheoriginalprogram.
Aninterfacedeclarationdescribesasetofmemberfunctionsthatastructure
mustdefineinordertoimplementthenamedinterface.Interfacescontain
onlyfunctionprototypedefinitions.Theydonotcontainactualfunction
implementationsordatamembers.Forexample,thefollowingexample
definesaninterfacenamedLightconsistingoftwomethods,illuminate()
andcolor():
ACgstructuremayoptionallyimplementaninterface.Thisissignifiedby
placinga:andthenameoftheinterfaceafterthenameofthestructure
beingdefined.Themethodsrequiredbytheinterfacemustbedefinedwithin
thebodyofthestructure.Forexample:
float myv2 = func(vals2); // no match: 5 != 6
}
interface Light {
float3 illuminate(float3 P, out float3 L);
float3 color(void);
};
struct SpotLight : Light {
sampler2D shadow;
samplerCUBE distribution;
float3 Plight, Clight;
float3 illuminate(float3 P, out float3 L) {
808-00504-0000-006 17
NVIDIA
Introduction to the Cg Language
Here,theSpotLightstructureisdefined,whichimplementstheLight
interface.Notethattheilluminate()andcolor()methodsaredefined
withinthebodyofthestructure,andthattheirimplementationsareableto
referencedatamembersoftheSpotLightstructure(forexample,Plight,
Clight,shadow,anddistribution).
Functionparameters,localvariables,andglobalvariablesallmayhave
interfacetypes.Interfaceparameterstotoplevelfunctions—suchas
main()—mustbedeclaredasuniform.
Astructurethatimplementsaparticularinterfacemaybeusedwhereverits
interfacetypeisexpected.Forexample:
Here,theSpotLightvariablespotmaybeusedasagenericLightinthecall
tomyfunc(),becauseSpotLightimplementstheLightinterface.
Itispossibletodeclarealocalvariableofaninterfacetype.However,a
concretestructuremustbeassignedtothatvariablebeforeanyofthe
L = normalize(Plight - P);
return Clight * tex2D(shadow, P).xxx *
texCUBE(distribution, L).xyz;
}
float3 color(void) {
return Clight;
}
};
float3 myfunc(Light light) {
float3 result = light.illuminate(...);
...
}
float4 main(uniform SpotLight spot) {
float3 color = myfunc(spot);
...
}
interface's methods may be called. For example:
Light mylight;
SpotLight spot;
float3 color;
... /* initialize spot */ ...
color = mylight.illuminate(...); // Error
mylight = spot;
color = mylight.illuminate(...); // OK
18 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Underallcurrentprofiles,theconcreteimplementationofallinterface
methodcallsmustberesolvableatcompiletime.Thereisnodynamicrun
timedeterminationofwhichimplementationtocallunderanycurrent
profile.
Seetheinterfaces_oglexample,includedintheCgdistribution,foran
exampleoftheuseofinterfaces.
Notes and Caveats
Thefollowinglimitationsmaybeaddressedinfuturereleases:
ThereisnoinheritanceperseinCg:astructuremaynotinheritfrom
anotherstructure.
Structuresmayonlyimplementasingleinterface.
Interfacescannotbeextendedorcombined.
Althoughthereisnostructureinheritance,itispossibletodefineadefault
implementationofaparticularinterfacemethod.Thedefault
implementationcanbedefinedasaglobalfunction,andstructuresthat
implementthatinterfacemaythencallthisdefaultmethodviaawrapper.
Note,also,thatinterfaceandstructureparametersoftoplevelfunctions,
suchasmain(),maybeconnectedtostructuresthatarecreatedinthe
runtime.SeetheCgruntimedocumentationformoredetails.
Statements and Operators
Cgsupportsthefollowingtypesofstatementsandoperators:
Controlflow
Functiondefinitionsandfunctionoverloads
ArithmeticoperatorsfromC
Multiplicationfunction
Vectorconstructor
Booleanandcomparisonoperators
Swizzleoperator
Writemaskoperator
Conditionaloperator
808-00504-0000-006 19
NVIDIA
Introduction to the Cg Language
Control Flow
CgusesthefollowingCcontrolconstructs:
Functioncallsandthereturnstatement
if/else
while
for
Thesecontrolconstructsrequirethattheirconditionalexpressionsbeoftype
bool.BecauseCgexpressionslike i <= 3 areoftype bool,thischangefrom
Cisnormallynotapparent.
Profileslikevs_2_x,vp30,andvp40supportbranchinstructions,soforand
whileloopsarefullysupportedintheseprofiles.Inotherprofiles,forand
whileloopsmayonlybeusedifthecompilercanfullyunrollthem(thatis,if
thecompilercandeterminetheiterationcountatcompiletime).Likewise,
returncanonlyappearasthelaststatementinafunctionintheseprofiles.
Functionrecursion(andcorecursion)isforbiddeninCg.
Theswitch,case,anddefaultkeywordsarereserved,buttheyarenot
supportedbyanyprofilesinthecurrentreleaseoftheCgcompiler.
Function Definitions and Function Overloading
TopassamodifiablefunctionparameterinC,theprogrammermust
explicitlyusepointers.C++providesabuiltinpassbyreferencemechanism
thatavoidstheneedtoexplicitlyusepointers,butthismechanismstill
implicitlyassumesthatthehardwaresupportspointers.Cgmustusea
differentmechanismbecausethevertexandfragmenthardwareoftheGPU
doesnotsupporttheuseofpointers.Cgpassesmodifiablefunction
parametersbyvalueresult,insteadofbyreference.Thedifferencebetween
thesetwomethodsissubtle;itisonlyapparentwhentwofunction
parametersarealiasedbyafunctioncall.InCg,thetwoparametershave
separatestorageinthefunction,whereasinC++theywouldsharestorage.
Toreinforcethisdistinction,CgusesadifferentsyntaxthanC++todeclare
functionparametersthataremodified:
function blah1(out float x); // x is output-only
function blah2(inout float x); // x is input and output
function blah3(in float x); // x is input-only
function blah4(float x); // x is input-only (default, as in
C)
20 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Cgsupportsfunctionoverloadingbythenumberofoperandsandby
operandtype.Thechoiceofafunctionismadebymatchingoneoperandata
time,startingatthefirstoperand.Theformallanguagespecification
providesmoredetailsonthematchingrules,butitisnotnormallynecessary
tostudythembecausetheoverloadinggenerallyworksinanintuitive
manner.Forexample,thefollowingcodedeclarestwoversionsofafunction,
onethattakestwobooloperands,andonethattakestwofloatoperands:
Arithmetic Operators from C
CgincludesallthestandardCarithmeticoperators(+,-, * , / )andallowsthe
operatorstobeusedonvectorsaswellasonscalars.Thevectoroperations
arealwaysperformedinelementwisefashion.Forexample,
Theseoperatorscanalsobeusedinaformthatmixesscalarandvector—the
scalaris“smeared”tocreateavectorofthenecessarysizetoperforman
elementwiseoperation.Thus,
Thebuiltinarithmeticoperatorsdonotcurrentlysupportmatrixoperands.It
isimportanttorememberthatmatricesarenotthesameasvectors,evenif
theirdimensionsarethesame.
Multiplication Functions
Cg’smul() functionsareformultiplyingmatricesbyvectors,andmatrices
bymatrices:
Itisimportanttousethecorrectversionof mul().Otherwise,youarelikely
togetunexpectedresults.Moredetailonthe mul()functionsareprovided
in“CgStandardLibraryFunctions”onpage 33.
bool same(float a, float b) { return (a == b);}
bool same(bool a, bool b) { return (a == b);}
float3(a, b, c) * float3(A, B, C) equals float3(a*A, b*B, c*C)
a * float3(A, B, C) is equal to float3(a*A, a*B, a*C)
// Matrix by column-vector multiply
matrix-column vector: mul(M, v);
// Row-vector by matrix multiply
row vector-matrix: mul(v, M);
// Matrix by matrix multiply
matrix-matrix: mul(M, N);
808-00504-0000-006 21
NVIDIA
Introduction to the Cg Language
Vector Constructor
Cgallowsvectors(uptosize4)tobeconstructedusingthefollowing
notation:
Thevectorconstructorcanappearanywhereinanexpression.Furthermore,
vectorscanbeconstructedfromsmallervectors:
Boolean and Comparison Operators
CgincludesthreeofthestandardCbooleanoperators:
InC,theseoperatorsconsumeandproducevaluesoftypeint,butinCg
theyconsumeandproducevaluesoftypebool.Thisdifferenceisnot
normallynoticeable,exceptwhendeclaringavariablethatwillholdthe
valueofabooleanexpression.CgalsosupportstheCcomparisonoperators,
whichproducevaluesoftypebool:
UnlikeC,Cgallowsallbooleanoperatorstobeappliedtovectors,inwhich
casebooleanoperationsareperformedinanelementwisefashion.Theresult
ofsuchabooleanexpressionisavectorofboolelementswiththatnumberof
elementsbeingthesameasthetwosourcevectors.AlsounlikeC,thelogical
AND(&&)andlogicalOR(||)operatorscannotbeusedforshortcircuiting
evaluation;sideeffectsofbothsidesoftheseexpressionsalwaysoccur,
regardlessofthevalueofthebooleanexpression.
y = x * float4(3.0, 2.0, 1.0, -1.0);
float2 a = ...;
float4 b = float4(a, 0.0, 1.0);
&& logical AND
|| logical OR
!logical negation
<less than
<= less than or equal to
!= inequality
== equality
>= greater than or equal to
>greater than
22 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Swizzle Operator
Cghasaswizzleoperator(.)thatallowsthecomponentsofavectortobe
rearrangedtoformanewvector.Thenewvectorneednotbethesamesizeas
theoriginalvector—elementscanberepeatedoromitted.Thecharactersx,y,
z,andwrepresentthefirst,second,third,andfourthcomponentsofthe
originalvector,respectively.Thecharactersr,g,b,andacanbeusedforthe
samepurpose.Becausetheswizzleoperatorisimplementedefficientlyinthe
GPUhardware,itsuseisusuallyfree.
Thefollowingaresomeexamplesofswizzling:
Theswizzleoperatorcanalsobeusedtocreateavectorfromascalar:
Theprecedenceoftheswizzleoperatoristhesameasthatofthearray
subscriptingoperator([]).
Write Mask Operator
Thewritemaskoperator(.)isplacedonthelefthandsideofanassignment
statement.Itcanbeusedtoselectivelyoverwritethecomponentsofavector.
Itisillegaltospecifyaparticularcomponentmorethanonceinawritemask,
ortospecifyawritemaskwheninitializingavariableaspartofa
declaration.
Thefollowingisanexampleofawritemask:
Thewritemaskoperatorcanbeapowerfultoolforgeneratingefficientcode
becauseitmapswelltothecapabilitiesofGPUhardware.Theprecedenceof
thewritemaskoperatoristhesameasthatoftheswizzleoperator.
Conditional Operator
CgincludesC’sif/elseconditionalstatementandconditionaloperator(?:).
Withtheconditionaloperator,thecontrolvariablemaybea boolvector.If
so,thesecondandthirdoperandsmustbesimilarlysizedvectors,and
selectionisperformedonanelementwisebasis.UnlikeC,anysideeffects
float3(a, b, c).zyx yields float3(c, b, a)
float4(a, b, c, d).xxyy yields float4(a, a, b, b)
float2(a, b).yyxx yields float4(b, b, a, a)
float4(a, b, c, d).w yields d
a.xxxx yields float4(a, a, a, a)
float4 color = float4(1.0, 1.0, 0.0, 0.0);
color.a = 1.0; // Set alpha to 1.0, leaving RGB alone.
808-00504-0000-006 23
NVIDIA
Introduction to the Cg Language
associatedwiththesecondandthirdoperandsalwaysoccur,regardlessof
theconditional.
Asanexample,thefollowingwouldbeaveryefficientwaytoimplementa
vectorclampfunction,ifthemin()andmax()functionsdidnotexist:
Texture Lookups in Advanced Fragment Profiles
Cg’sadvancedfragmentprofilesandthevp40profileprovideavarietyof
texturelookupfunctions.PleasenotethatCgusesadifferentsetoftexture
lookupfunctionsforbasicfragmentprofilesbecauseoftherestrictedpixel
programmabilityofthathardware.Basicfragmentprofilelookupfunctions
aren’tdiscussedinthisintroductorychapter.
Advancedfragmentprofiletexturelookupfunctionsalwaysrequireatleast
twoparameters:
Texturesampler
Atexture samplerisavariablewiththetypesampler,sampler1D,
sampler2D,sampler3D,samplerCUBE,orsamplerRECTandrepresents
thecombinationofatextureimagewithafilter,clamp,wrap,orsimilar
configuration.Texturesamplervariablescannotbesetdirectlywithinthe
Cglanguage;instead,theymustbeprovidedbytheapplicationas
uniformparameterstoaCgprogram.
Texturecoordinate
Dependingonthetypeoftexturelookup,thecoordinatemaybeascalar,
atwovector,athreevector,orafourvector.
Thefollowingfragmentprogramusesthetex2D()functiontoperforma2D
texturelookuptodeterminethefragment’sRGBAcolor.
Cgprovidesawidevarietyoftexturelookupfunctions,asampleofwhichis
givenbelow.Foracompletelistsee“TextureMapFunctions”onpage 38.
float3 clamp(float3 x, float minval, float maxval) {
x = (x < minval.xxx) ? minval.xxx : x;
x = (x > maxval.xxx) ? maxval.xxx : x;
return x;
}
void applytex(uniform sampler2D mytexture,
float2 uv : TEXCOORD0,
out float4 outcolor : COLOR) {
outcolor = tex2D(mytexture, uv);
}
24 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Standardnonprojectivetexturelookup:
Standardprojectivetexturelookup:
Nonprojectivetexturelookupwithuserspecifiedfilterkernelsize:
Thefiltersizeisspecifiedbyprovidingthederivativesofthetexture
coordinateswithrespecttopixelcoordinatesx(dsdx)andy(dsdy).For
moreinformationsee“TextureMapFunctions”onpage 38.
Shadowmaplookup:
Inthesefunctions,thezcomponentofthetexturecoordinateholdsa
depthvaluetobecomparedagainsttheshadowmap.Shadowmap
lookupsrequiretheassociatedtextureunittobeconfiguredbythe
applicationfordepthcomparetexturing;otherwise,nodepth
comparisonisactuallyperformed.
Effects
Cgincludesapowerful,versatileshaderspecificationandinterchange
format:CgFX.Forartistsanddevelopersofrealtimegraphics,thisformat
providesseveralkeybenefits:
Encapsulationofmultiplerenderingtechniques,enablingfallbacksfor
levelofdetail,functionality,andperformance.
SupportforCg,assemblylanguage,andfixedfunctionshaders.
EditableparametersandGUIdescriptionsembeddedinthefile.
Multipassshaders.
tex2D (sampler2D tex, float2 s);
texRECT (samplerRECT tex, float2 s);
texCUBE (samplerCUBE tex, float3 s);
tex2Dproj (sampler2D tex, float3 sq);
texRECTproj (samplerRECT tex, float3 sq)
texCUBEproj (samplerCUBE tex, float4 sq);
tex2D (sampler2D tex, float2 s,
float2 dsdx, float2 dsdy);
texRECT (samplerRECT tex, float2 s,
float2 dsdx, float2 dsdy);
texCUBE (samplerCUBE tex, float3 s,
float3 dsdx, float3 dsdy);
tex2Dproj (sampler2D tex, float4 szq);
tex2DRECT (samplerRECT tex, float4 szq);
808-00504-0000-006 25
NVIDIA
Introduction to the Cg Language
Renderstateandtexturestatespecification.
Inpracticalterms,bywrappingbothCgvertexprogramsandCgfragment
programstogetherwithrenderstate,texturestate,andpassinformation,
developerscandescribeacompleterenderingeffect.AlthoughindividualCg
programsmaycontainthecorerenderingalgorithmsnecessaryforaneffect,
onlywhencombinedwiththisadditionalenvironmentalinformationdoes
theshaderbecomecompleteandselfcontained.Theadditionofartist
friendlyGUIdescriptionsandfallbacksenablesCgFXfilestointegratewell
withtheproductionworkflowusedbyartistsandprogrammers.
CgFXencapsulates,inasingletextfile,everythingneededtoapplya
renderingeffect.Thisfeatureletsathirdpartytooloranother3Dapplication
useaCgFXtextfileasis,withnoexternalinformationotherthanthe
necessarygeometryandtexturedata.Inthissense,CgFXactsasan
interchangeformat.CgFXallowsshaderstobeexchangedwithoutthe
associatedC++codethatisnormallynecessarytomakeaCgprogramwork
withOpenGLorDirect3D.Itaddressesthefollowingfourissues:
TheCglanguageletsyoueasilyexpresshowanobjectshouldbe
rendered.AlthoughcurrentCgprofilesdescribeonlyasinglerendering
pass,manyshadingtechniques,suchasshadowvolumesorshadow
maps,requiremorethanonerenderingpass.
Manyapplicationsneedtotargetawiderangeofgraphicshardware
functionalityandperformance.Thus,versionsofshadersthatrunon
olderhardware,andversionsthataidperformancefordistantobjectsare
important.
EachCgprogramtypicallytargetsasingleprofile,anddoesnʹtspecify
howtofallbacktootherprofiles,toassemblylanguageshaders,orto
fixedfunctionvertexorfragmentprocessing.
TogenerateimageswithCgprograms,someinformationabouttheir
environmentisneeded.Forinstance,someprogramsmightrequire
alphablendingtobeturnedonanddepthwritestobedisabled.Others
mayneedacertaintextureformattoworkcorrectly.Thisinformationis
notpresentinstandardCgsourcefiles.
Techniques
EachCgFXfileusuallypresentsacertaineffectthattheshaderauthoris
tryingtoachieve—suchasbumpmapping,environmentmapping,or
anisotropiclighting.TheCgFXfilecontainsoneormoretechniques,eachof
whichdescribesawaytoachievetheeffect.Eachtechniqueusuallytargetsa
26 808-00504-0000-006
NVIDIA
Cg Language Toolkit
certainlevelofGPUfunctionality,soaCgFXfilemaycontainonetechnique
foranadvancedGPUwithpowerfulfragmentprogrammability,andanother
techniqueforoldergraphicshardwaresupportingfixedfunctiontexture
blending.CgFXtechniquescanalsobeusedforfunctionality,levelofdetail,
orperformancefallbacks.Forexample:
Anapplicationcanmakequeriesaboutwhichtechniquesarepresentinan
effectandcanchooseanappropriateoneatruntime,basedonwhatever
criteriaareappropriate.
Passes
Eachtechniquecontainsoneormorepasses.Eachpassrepresentsasetof
renderstatesandshaderstoapplyforasinglerenderingpasswithina
technique.Forinstance,thefirstpassmightlaydowndepthonlysothat
subsequentpassescanapplyanadditivealphablendingtechniquewithout
requiringpolygonsorting.
Eachpassmaycontainavertexprogram,afragmentprogram,orboth,and
eachpassmayusefixedfunctionvertex,pixelprocessing,orboth.For
example,afirstpassmightusefixedfunctionpixelprocessingtooutputthe
ambientcolor.Thenextpasscoulduseanfp30fragmentprogram,andpass
threemightuseanarbfp1fragmentprogram.
State Assignments
Eachpassalsocontainsrenderstateassignmentssuchasalphablending,
depthwrites,andtexturefilteringmodes,tonameafew.Forexample:
technique PixelShaderVersion
{…};
technique FixedFunctionVersion
{…};
technique LowDetailVersion
{…};
pass firstPass {
DepthTestEnable = true;
DepthFunc = Less;
AlphaTestEnable = true;
AlphaFunc = float2(Equal, 0);
};
808-00504-0000-006 27
NVIDIA
Introduction to the Cg Language
Parameters and Semantics
TheCgFXfilealsocontainsglobalCgparameters.Thesevariablesareusually
passedasuniformparameterstoCgfunctions,orasthevaluesforrenderor
texturestatesettings.Forinstance,aboolvariablemightbeusedasa
uniformparametertoaCgfunction,orasavalueenablingordisablingthe
alphablendrenderstate:
Thesevariablescancontainauserdefinedsemantic,whichhelps
applicationsprovidethecorrectdatatotheshaderwithouthavingto
decipherthevariablenames:
ACgFXenabledapplicationcanthenquerytheCgFXfileforitsvariables
andtheirsemantics.
Vertex and Fragment Programs
WiththeOpenGLstatemanager,vertexandfragmentprogramsaredefined
viaassignmentstotheVertexProgramandFragmentProgramstates,
respectively.Threedifferenttypesofexpressionscanbeontherighthand
sideoftheseprogramtypes:
Compilestatements
Inlineassembly
NULL
Thesethreepossibilitiesaredemonstratedintheeffectfilebelow:
bool AlphaBlending = false;
float bumpHeight = 0.5f;
float4x4 myViewMatrix : ViewMatrix;
texture2D someTexture : DiffuseMap;
float4 main(uniform float foo, float4 uv : TEXCOORD0) : COLOR{
return (foo > 0) ? uv : 2 * uv;
}
technique SimpleFrag {
pass {
VertexProgram = NULL;
FragmentProgram = compile arbfp1 main(-2.f);
}
}
technique AsmFrag {
pass {
28 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Compilestatementsaregenerallythemostcommonlyusedofthesethree
optionsforspecifyingprograms.Theytaketheprofilethattheprogramisto
becompiledto(fp30,fp40,arbfp1,vp20,andsoon),thenameofthe
functionintheeffectfiletobecompiled,andalistofexpressions(-2.finthe
aboveexample).Theseexpressionshaveaonetoonecorrespondencewith
theuniformparametersoftheprogrambeingcompiled—theremustbe
exactlyoneforeachuniformprogramparameter.
Intheexampleabove,theexpression-2.fsetsthevalueofthefoo
parametertomain().Becauseitisusingaliteralvalue,CgFXisableto
compiletheshaderintoaparticularlyefficientversionthatjustincludes
returningtheuvvalue.
Inlineassemblyisgivenwiththeasmkeyword,withtheassemblylanguage
codebetweenbracesasintheexampleabove.CgFXdependsonhavingthe
appropriateheaderatthestartoftheassembly—!!FP1.0ʺforfp30,
!!ARBvp1.0forarbvp1,andsoon—todeterminewhichassemblyprofilethe
codeisgivenin.
Itisalsopossibletoincludeeffectparametersintheexpressionusedinthe
compilestatement.Forexample:
Here,thevalue2*barisassociatedwiththefooparameterofmain().When
thevalueofbarischangedbytheapplication,thevalueoffooinmain()is
setappropriately.
FragmentProgram = asm {
!!FP1.0
TEX o[COLR], {0}.x, TEX6, 2D;
END
};
}
float4 main(uniform float foo, float4 uv : TEXCOORD0) : COLOR{
return (foo > 0) ? uv : 2 * uv;
}
float bar;
technique NewSimpleFrag {
pass {
VertexProgram = NULL;
FragmentProgram = compile arbfp1 main(2 * bar);
}
}
808-00504-0000-006 29
NVIDIA
Introduction to the Cg Language
Finally,vertexorfragmentprogramsmaybeassignedthevalueNULLinthe
stateassignment.Thissignifiesthatnoprogramshouldbeusedinthispass.
Textures and Samplers
CgFXmakesitpossibletodefinestaterelatedtotexturesintheeffectfile.The
shorteffectfilebelowshowsanexample.
Interfaces and Unsized Arrays
CgFXalsosupportsCgʹsinterfacesandunsizedarraysfeatures.Givenan
effectfilewithCgprogramsthatusethesefeatures,thecompilestatement
canbeusedintwodifferentwaystoresolvetheinterfacesandunsizedarrays
sothattheprogramcanbecompiled.
Considerthefollowingexample:aLightinterfacehasbeendefinedwith
SpotLightimplementingtheinterface.Themain()programtakesan
unsizedarrayofLightinterfaceobjects,loopsoverthem,andreturnsthe
sumofthevaluesreturnedbytheirrespectivevalue()methods.
sampler2D samp = sampler_state {
generateMipMap = true;
minFilter = LinearMipMapLinear;
magFilter = Linear;
};
float4 texsimple( uniform sampler2D sampler,
float2 uv : TEXCOORD0) : COLOR {
return tex2D(sampler, uv);
}
technique TextureSimple {
pass {
FragmentProgram = compile arbfp1 texsimple(samp);
}
}
interface Light {
float4 value();
};
struct SpotLight : Light {
float4 value() { return float4(1,2,3,4); }
};
float4 main(uniform Light l[]) : COLOR {
30 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Recallthatalluniformparameterstotheprogrammusthaveexpressionsin
theparenthesizedlistinthecompilestatementand,therefore,oneexpression
isnecessaryherefortheoneparameter.Thefirstwaythatmain()canbe
compiledistogivethenameofaneffectparameterthatresolvesboththe
actualsizeofthearrayaswellastheconcretetypethatimplementsthe
Lightinterface:
Alternatively,theapplicationcanleavetheresolutionoftheconcretetypes
andarraysizeuntillatersothattheycanbesetviaCgruntimecallsfromthe
application.(ThiswastheusualapproachbeforeCgFX1.4.)
Forthiscase,theexpressionpassedtothecompilestatementshouldjustbe
anunsizedarrayoftheabstractinterfacetype:
Running Cg Programs on the CPU
Therearemanysituations,suchastabularizingcomplexfunctionsinto
texturemaps,whereitisusefultoexecuteCgprogramsontheCPUandnot
ontheGPU.WhiletheCPUpathdoesnʹtofferthesameperformance,itcan
beusefulbecauseitdoesnʹthavetheresourcelimitsassociatedwithGPUs.
ProgramsthatrunonaCPUinthismanneraredeclaredlikethefollowing.
float4 v = float4(0,0,0,0);
for (int i = 0; i < l.length; ++l)
v += l[i].value();
return v;
}
SpotLight spots[4];
technique {
pass {
FragmentProgram = compile arbfp1 main(spots);
}
}
Light lights[];
technique {
pass {
FragmentProgram = compile arbfp1 main(lights);
}
}
float foo = 4.f;
float4 func(float2 p : POSITION, float2 delta : PSIZE) : COLOR
{
808-00504-0000-006 31
NVIDIA
Introduction to the Cg Language
ThePOSITIONsemanticdenotestheparameterorparametersthatshouldbe
setwiththecoordinatesofeachpointatwhichthefunctionisevaluated—
thereisacoordinatevaluefromzerotooneforeachdimensionoverwhich
thefunctionisbeingevaluated.ThePSIZEsemanticdenotesaparameterthat
shouldbeinitializedwiththevalueofthespacingbetweensamplesatwhich
thefunctionisbeingevaluated,andtheCOLORsemanticdenoteswherethe
resultofthefunctionshouldbereturned.(Thus,thefunctionabovecould
havebeenwrittenasavoidfunctionwithanoutfloat4ret:COLOR
parameterandanassignmenttoretinsteadofthereturnstatement.)
Givenaneffectfilewithsuchaprogram,aCGprogramhandletoitcanbe
retrievedbycreatingaprogramwiththefollowingCG_PROFILE_GENERIC
profile:
Withthisprogramhandle,cgEvaluateProgram()evaluatestheprogram
overthesameone,two,orthreedimensionaldomain.Itsparametersareas
follows:
aCGprogramhandle
afloat*toanoutputbuffer
thenumberofcomponentsintheoutputbuffer(1,2,3,or4)
thenumberofpositionsinthexdimensionatwhichtoevaluatethe
function
thenumberofpositionsintheydimension
thenumberofpositionsinthezdimension
Thetotalsizeofthebuffershouldbeequaltotheproductofthenumberof
positionsineachofthedimensionsandthenumberofcomponentsinthe
buffer.
ItisaruntimeerrortopassaCGprogramthatdoesnʹthavethe
CG_PROFILE_GENERICprofiletocgEvaluateProgram().
return foo * p.xyxy;
}
CGprogram tp = cgCreateProgramFromEffect(effect,
CG_PROFILE_GENERIC, "func", NULL);
#define RES 256
#define NCOMPS 4
float *buf = new float[NCOMPS*RES*RES];
cgEvaluateProgram(tp, buf, NCOMPS, RES, RES, 1);
// Do something with buf.
delete[] buf;
32 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Annotations
Additionally,eachvariable,technique,pass,andprograminthefilecanhave
anoptionalannotation.Theannotationisapervariableinstancestructure
thatcontainsdatathattheeffectauthorwantstocommunicatetoaCgFX
awareapplication,suchasanartisttool.Theapplicationcanthenallowthe
variabletobemanipulated,basedonaGUIelementthatisappropriatefor
thetypeofannotation.
Anannotationcanbeusedtodescribeauserinterfaceelementfor
manipulatinguniformparameters,ortodescribethetypeofrendertargeta
renderingpassisexpecting.
Theannotationappearsaftertheoptionalsemanticandbeforevariable
initialization.Applicationscanqueryforannotations,andusethemto
exposecertainparameterstoartistsinaCgFXawaretool,suchasDiscreetʹs
3dsmax5orAlias|WavefrontʹsMaya4.5.
More Details
ThepurposeofthischapterhasbeentogiveyouabriefoverviewofCgso
thatyoucangetstartedquicklyandexperimenttogainhandsonexperience.
Ifyouwouldlikesomemoredetailaboutanyofthelanguagefeatures
describedinthischapter,see“CgLanguageSpecification”onpage 221.
float bumpHeight
<
string gui = "slider";
float uimin = 0.0f;
float uimax = 1.0f;
float uistep = 0.1f;
> = 0.5f;
808-00504-0000-006 33
NVIDIA
Cg Standard Library Functions
Cgprovidesasetofbuiltinfunctionsandpredefinedstructureswith
bindingsemanticstosimplifyGPUprogramming.Thesefunctionsare
similarinspirittotheCstandardlibrary,providingaconvenientsetof
commonfunctions.Inmanycases,thefunctionsmaptoasinglenativeGPU
instruction,meaningtheyareexecutedveryquickly.Ofthosefunctionsthat
maptomultiplenativeGPUinstructions,youmayexpectthemostusefulto
becomemoreefficientinthenearfuture.
Althoughcustomizedversionsofspecificfunctionscanbewrittenfor
performanceorprecisionreasons,itisgenerallywisertousethestandard
libraryfunctionswhenpossible.Thestandardlibraryfunctionswillcontinue
tobeoptimizedforfutureGPUs,meaningthatashaderwrittentodaywill
automaticallybeoptimizedforthelatestarchitecturesatcompiletime.
Additionally,thestandardlibraryprovidesaconvenientunifiedinterfacefor
bothvertexandfragmentprograms.
ThissectiondescribesthecontentsoftheCgStandardLibrary,including
Mathematicalfunctions
Geometricfunctions
Texturemapfunctions
Derivativefunctions
Predefinedhelperstructtypes
Whereappropriate,functionsareoverloadedtosupportscalarandvector
variationswhentheinputandoutputtypesarethesame.
Mathematical Functions
Table 1.“MathematicalFunctions”liststhemathematicalfunctionsthatthe
CgStandardLibraryprovides.Thelistincludesfunctionsusefulfor
trigonometry,exponentiation,rounding,andvectorandmatrix
manipulations,amongothers.Allfunctionsworkonscalarsandvectorsof
allsizes,exceptwherenoted.
34 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Table 1. Mathematical Functions
Mathematical Functions
Function Description
abs(x)Absolute value of x.
acos(x)Arccosine of x in range [0,π], x in [-1, 1].
all(x)Returns true if every component of x is not equal to 0.
Returns false otherwise.
any(x)Returns true if any component of x is not equal to 0.
Returns false otherwise.
asin(x)Arcsine of x in range [-π/2, π/2];
x should be in [-1, 1].
atan(x)Arctangent of x in range [-π/2, π/2].
atan2(y, x) Arctangent of y/x in range [-π, π].
ceil(x)Smallest integer not less than x
clamp(x, a, b)x clamped to the range [a, b] as follows:
•Returns a if x is less than a.
•Returns b if x is greater than b.
•Returns x otherwise.
cos(x)Cosine of x.
cosh(x)Hyperbolic cosine of x.
cross(a, b) Cross product of vectors a and b;
a and b must be 3-component vectors.
degress(x)Radian-to-degree conversion.
determinant(M)Determinant of matrix M .
dot(a, b)Dot product of vectors a and b.
exp(x)Exponential function ex.
exp2(x) Exponential function 2x.
floor(x)Largest integer not greater than x.
fmod(x, y) Remainder of x/y, with the same sign as x.
If y is zero, the result is implementation-defined.
808-00504-0000-006 35
NVIDIA
Cg Standard Library Functions
frac(x) Fractional part of x.
frexp(x, out exp)Splits x into a normalized fraction in the interval [1/2,
1), which is returned, and a power of 2, which is stored
in exp.
If x is zero, both parts of the result are zero.
isfinite(x)Returns true if x is finite.
isinf(x)Returns true if x is infinite.
isnan(x)Returns true if x is NaN (not a number).
ldexp(x, n)x * 2n
lerp(a, b, f) Linear interpolation: (1-f)*a + b*f where a and b
are matching vector or scalar types. Parameter f can be
either a scalar or a vector of the same type as a and b.
lit(ndotl, ndoth, m)Computes lighting coefficients for ambient, diffuse, and
specular light contributions. Returns a 4-vector as
follows:
•The x component of the result vector contains the
ambient coefficient, which is always 1.0.
•The y component contains the diffuse coefficient
which is zero if (n z l) < 0; otherwise (n z l).
•The z component contains the specular coefficient
which is zero if either (n z l) < 0 or (n z h) < 0;
(n z h)m otherwise.
•The w component is 1.0.
There is no vectorized version of this function.
log(x)Natural logarithm ln(x);
x must be greater than zero.
log2(x)Base 2 logarithm of x;
x must be greater than zero.
log10(x)Base 10 logarithm of x;
x must be greater than zero.
max(a, b)Maximum of a and b.
min(a, b)Minimum of a and b.
Table 1. Mathematical Functions (continued)
Mathematical Functions
Function Description
36 808-00504-0000-006
NVIDIA
Cg Language Toolkit
modf(x, out ip) Splits x into integral and fractional parts, each with the
same sign as x.
Stores the integral part in ip and returns the fractional
part.
mul(M, N)Matrix product of matrix M and matrix N, as shown
below:
If M has size AxB, and N has size BxC, returns
a matrix of size AxC.
mul(M, v)Product of matrix M and column vector v, as shown
below:
If M is an AxB matrix and v is a Bx1 vector, returns an
Ax1 vector.
mul(v, M)Product of row vector v and matrix M, as shown below:
If v is a 1xA vector and M is an AxB matrix, returns a
1xB vector.
noise(x)Either a 1-, 2-, or 3-dimensional noise function
depending on the type of its argument.
The returned value is between zero and one and is
always the same for a given input value.
pow(x, y) xy
radians(x)Degree-to-radian conversion.
round(x)Closest integer to x.
Table 1. Mathematical Functions (continued)
Mathematical Functions
Function Description
PXO01 
0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

PXO0Y 
0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

9
9
9
9
PXOY0 
0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

>9
9
9
9
@
808-00504-0000-006 37
NVIDIA
Cg Standard Library Functions
rsqrt(x)Reciprocal square root of x;
x must be greater than zero.
saturate(x)Equivalent to clamp(x, 0, 1)
•Returns 0 if x is less than 0.
•Returns 1 if x is greater than 1.
•Returns x otherwise.
sign(x) 1 if x > 0;
-1 if x < 0;
0 otherwise.
sin(x) Sine of x.
sincos(float x,
out s, out c)s is set to the sine of x, and c is set to the cosine of x.
If sin(x) and cos(x) are both needed, this function
is more efficient than calculating each individually.
sinh(x) Hyperbolic sine of x.
smoothstep(min,
max, x)For values of x between min and max, returns a
smoothly varying value that ranges from 0 at x = min
to 1 at x = max. x is clamped to the range [min,
max] and then the interpolation formula is evaluated:
-2*((x-min)/(max-min))3 + 3*((x-min)/(max-min))2
step(a, x)0 if x < a;
1 if x >= a.
sqrt(x)Square root of x;
x must be greater than zero.
tan(x)Tangent of x.
tanh(x)Hyperbolic tangent of x.
transpose(M)Matrix transpose of matrix M. If M is an AxB matrix, the
transpose of M is a BxA matrix whose first column is
the first row of M, whose second column is the second
row of M, whose third column is the third row of M, and
so on.
Table 1. Mathematical Functions (continued)
Mathematical Functions
Function Description
38 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Geometric Functions
Table 2.“GeometricFunctions”presentsthegeometricfunctionsthatare
providedintheCgStandardLibrary.
Texture Map Functions
Table 3.“TextureMapFunctions”presentsthetexturefunctionsthatare
providedintheCgStandardLibrary.Thesetexturefunctionsarefully
supportedbytheps_2,arbfp1,fp30,andfp40profiles.Thetwo
dimensionalvariantsofthesefunctionsaresupportedbythevp40profile.
Allofthefunctionsinthetablereturnafloat4value.
Becauseofthelimitedpixelprogrammabilityofolderhardware,theps_1
andfp20profilesuseadifferentsetoftexturemappingfunctions.See
“LanguageProfiles”onpage 255formoreinformation.
Table 2. Geometric Functions
Geometric Functions
Function Description
distance(pt1, pt2)Euclidean distance between points pt1 and pt2.
faceforward(N, I, Ng)N if dot(Ng, I) < 0;
otherwise, -N.
length(v)Euclidean length of a vector.
normalize(v)Returns a vector of length 1 that points in the same
direction as vector v.
reflect(i, n)Computes reflection vector from entering ray
direction i and surface normal n.
Only valid for 3-component vectors.
refract(i, n, eta)Given entering ray direction i, surface normal n,
and relative index of refraction eta, computes
refraction vector. If the angle between i and n is
too large for a given eta, returns (0, 0, 0).
Only valid for 3-component vectors.
808-00504-0000-006 39
NVIDIA
Cg Standard Library Functions
Table 3. Texture Map Functions
Texture Map Functions
Function Description
tex1D(sampler1D tex, float s)
1D nonprojective
tex1D(sampler1D tex, float s, float dsdx, float dsdy)
1D nonprojective with derivatives
tex1D(sampler1D tex, float2 sz)
1D nonprojective depth compare
tex1D(sampler1D tex, float2 sz, float dsdx, float dsdy)
1D nonprojective depth compare with derivatives
tex1Dproj(sampler1D tex, float2 sq)
1D projective
tex1Dproj(sampler1D tex, float3 szq)
1D projective depth compare
tex2D(sampler2D tex, float2 s)
2D nonprojective
tex2D(sampler2D tex, float2 s, float2 dsdx, float2 dsdy)
2D nonprojective with derivatives
tex2D(sampler2D tex, float3 sz)
2D nonprojective depth compare
tex2D(sampler2D tex, float3 sz, float2 dsdx, float2 dsdy)
2D nonprojective depth compare with derivatives
tex2Dproj(sampler2D tex, float3 sq)
2D projective
tex2Dproj(sampler2D tex, float4 szq)
2D projective depth compare
40 808-00504-0000-006
NVIDIA
Cg Language Toolkit
texRECT(samplerRECT tex, float2 s)
2D RECT nonprojective
texRECT(samplerRECT tex, float2 s, float2 dsdx, float2 dsdy)
2D RECT nonprojective with derivatives
texRECT(samplerRECT tex, float3 sz)
2D RECT nonprojective depth compare
texRECT(samplerRECT tex, float3 sz, float2 dsdx, float2 dsdy)
2D RECT nonprojective depth compare with derivatives
texRECTproj(samplerRECT tex, float3 sq)
2D RECT projective
texRECTproj(samplerRECT tex, float3 szq)
2D RECT projective depth compare
tex3D(sampler3D tex, float3 s)
3D nonprojective
tex3D(sampler3D tex, float3 s, float3 dsdx, float3 dsdy)
3D nonprojective with derivatives
tex3Dproj(sampler3D tex, float4 szq)
3D projective depth compare
texCUBE(samplerCUBE tex, float3 s)
Cubemap nonprojective
texCUBE(samplerCUBE tex, float3 s, float3 dsdx, float3 dsdy)
Cubemap nonprojective with derivatives
texCUBEproj(samplerCUBE tex, float4 sq)
Cubemap projective
Table 3. Texture Map Functions (continued)
Texture Map Functions
Function Description
808-00504-0000-006 41
NVIDIA
Cg Standard Library Functions
Inthetable,thenameofthesecondargumenttoeachfunctionindicateshow
itsvaluesareusedwhenperformingthetexturelookup:sindicatesa1,2,
or3componenttexturecoordinate;zindicatesadepthcomparisonvaluefor
shadowmaplookups;qindicatesaperspectivevalueandisusedtodivide
thetexturecoordinate,s,beforethetexturelookupisperformed.
Forconvenience,thestandardlibraryalsodefinesversionsofthetexture
functionsprefixedwithh4,suchash4tex2D(),thatreturnhalf4valuesand
prefixedwithx4,suchasx4tex2D(),thatreturnfixed4values.
Whenthetexturefunctionsthatallowspecifyingadepthcomparisonvalue
areused,theassociatedtextureunitmustbeconfiguredfordepthcompare
texturing.Otherwise,nodepthcomparisonisactuallyperformed.
Derivative Functions
Table 4.“DerivativeFunctions”presentsthederivativefunctionsthatare
supportedbytheCgStandardLibrary.Vertexprofilesarenotrequiredto
supportthesefunctions.
Debugging Function
Table 5.“DebuggingFunction”presentsthedebuggingfunctionthatis
supportedbytheCgStandardLibrary.Vertexprofilesarenotrequiredto
supportthisfunction.
Table 4. Derivative Functions
Derivative Functions
Function Description
ddx(a)Approximate partial derivative of a with respect to
screen-space x coordinate.
ddy(a)Approximate partial derivative of a with respect to
screen-space y coordinate.
42 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Thedebugfunctionisintendedtoallowaprogramtobecompiledtwice—
oncewiththeDEBUGoptionandoncewithout.Byexecutingbothprograms,
youcanobtainoneframebuffercontainingthefinaloutputoftheprogram
andasecondcontaininganintermediatevaluetobeexaminedfor
debugging.
Predefined Fragment Program Output Structures
Anumberofhelperstructuretypesforuseinfragmentprogramsare
predefinedinthestandardlibrary.Variablesofthesetypescanbeusedto
holdtheoutputsofafragmentprogram.Theiruseisstrictlyoptional.
Fortheps_1andfp20profiles,thefragoutstructureisdefinedasfollows:
Theps_2,arbfp1,andfp30profileshavetwofragmentoutputtypes
defined:
Table 5. Debugging Function
Debugging Function
Function Description
void debug(float4 x)If the compiler’s DEBUG option is specified, calling
this function causes the value x to be copied to the
COLOR output of the program, and execution of the
program is terminated.
If the compiler’s DEBUG option is not specified, this
function does nothing.
struct fragout {
float4 col : COLOR;
};
struct fragout {
half4 col : COLOR;
float depth : DEPTH;
};
struct fragout_float {
float4 col : COLOR;
float depth : DEPTH;
};
808-00504-0000-006 43
NVIDIA
Introduction to the
Cg Runtime Library
ThischapterintroducestheCgRuntimeLibrary.Itassumesthatyouhave
somebasicknowledgeoftheCglanguage,aswellastheOpenGLor
Direct3DAPIs,dependingonwhichoneyouuseinyourapplications.
Thefirstsection“IntroducingtheCgRuntime”onpage 43describesthe
benefitsofusingtheCgRuntimeLibraryandgivesabriefoverviewofhowit
isusedinanapplicationtocreateandmanageCgprograms.Thenexttwo
sections,“CoreCgRuntime”onpage 49and“A P I SpecificCgRuntimes”on
page 72,describetheAPIscomposingtheCgRuntime.
ThischapterisprimarilyfocusedonusingtheCgruntimetodirectlycreate
andmanageCgprograms.Thefollowingchapter,“IntroductiontoCgFX”
describeshowtheruntimemayalsobeusedtocreateandmanageCgbased
shadereffects.
Introducing the Cg Runtime
Cgprogramsarelinesofcodethatdescribeshading,buttheyneedthe
supportofapplicationstocreateimages.TointerfaceCgprogramswith
applications,youmustdotwothings:
1. Compiletheprogramsforthecorrectprofile.Inotherwords,compilethe
programsintoaformthatiscompatiblewiththe3DAPIusedbythe
applicationandtheunderlyinghardware.
2. Linktheprogramstotheapplicationprogram.Thisallowsthe
applicationtofeedvaryinganduniformdatatotheprograms.
Youhavetwochoicesastowhentoperformtheseoperations.Youcan
performthematcompiletime,whentheapplicationprogramiscompiled
intoanexecutable,oryoucanperformthematruntime,whenthe
applicationisactuallyexecuted.TheCgruntimeisanapplication
programminginterfacethatallowsanapplicationtocompileandlinkCg
programsatruntime.
44 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Benefits of the Cg Runtime
Future Compatibility
Mostapplicationsneedtorunonarangeofprofiles.Ifanapplication
precompilesitsCgprograms(thecompiletimechoice),itmuststorea
compiledversionofeachprogramforeachprofile.Thisisreasonableforone
program,butiscumbersomeforanapplicationthatusesmanyprograms.
What’sworse,theapplicationisfrozenintime.Itsupportsonlytheprofiles
thatexistedwhenitwascompiled;itcannottakeadvantageofthe
optimizationsthatfuturecompilerscouldoffer.
Incontrast,programscompiledbyapplicationsatruntime
Benefitfromfuturecompileroptimizationsfortheexistingprofiles
Runonfutureprofilescorrespondingtonew3DAPIsortohardware
thatdidnotexistatthetimetheCgprogramswerewritten
No Dependency Limitations
IfyoulinkaCgprogramtotheapplicationwhenitiscompiled,the
applicationistoodependentontheresultofthecompilation.Theapplication
programhastorefertotheCgprograminputparametersbyusingthe
hardwareregisternamesthatareoutputbytheCgcompiler.Thisapproach
isawkwardfortworeasons:
Theregisternamescan’tbeeasilymatchedtothecorresponding
meaningfulnamesintheCgprogramwithoutlookingatthecompiler
output.
RegisterallocationscanchangeeachtimetheCgprogram,theCg
compiler,orthecompilationprofilechanges.Thismeansyouhavethe
inconvenienceofupdatingtheapplicationeachtimeaswell.
Incontrast,linkingaCgprogramtotheapplicationprogramatruntime
removesthedependencyontheCgcompiler.Withtheruntime,youneedto
altertheapplicationcodeonlywhenyouadd,delete,ormodifyCginput
parameters.
Input Parameter Management
TheCgruntimealsooffersadditionalfacilitiestomanagetheinput
parametersoftheCgprogram.Inparticular,itmakesdatatypessuchas
arraysandmatriceseasiertodealwith.Theseadditionalfunctionsalso
encompassthenecessary3DAPIcallstominimizecodelengthandreduce
programmererrors.
808-00504-0000-006 45
NVIDIA
Introduction to the Cg Runtime Library
Overview of the Cg Runtime
TheCgruntimeAPIconsistsofthreeparts(Fig. 2.):
Acoresetoffunctionsandstructuresthatencapsulatestheentire
functionalityoftheruntime
AsetoffunctionsspecifictoOpenGLbuiltontopofthecoreset
AsetoffunctionsspecifictoDirect3Dbuiltontopofthecoreset
Tomakeiteasierforapplicationwriters,theOpenGLandDirect3Druntime
librariesadoptthephilosophyanddatastructurestyleoftheirrespective
API.
Fig. 2. The Parts of the Cg Runtime API
TherestofthesectionprovidesinstructionsforusingtheCgruntimeinthe
frameworkofanapplication.EachstepincludessourcecodeforOpenGL
andDirect3Dprogramming.
FunctionsthatinvolveonlypureCgresourcemanagementbelongtothecore
runtimeandhaveacgprefix.Inthesecases,thesamecodeisusedfor
OpenGLandDirect3D.
WhenfunctionsfromtheOpenGLorDirect3DCgruntimesareused,notice
thattheAPInameisindicatedbythefunctionname.Functionsbelongingto
theOpenGLCgruntimelibraryhaveacgGLprefix,andfunctionsinthe
Direct3DCgruntimelibraryhaveacgD3Dprefix.
ThereareactuallytwoDirect3DCgruntimelibraries:OneforDirect3D8and
oneforDirect3D9.FunctionsbelongingtotheDirect3D8Cgruntimehavea
46 808-00504-0000-006
NVIDIA
Cg Language Toolkit
cgD3D8prefix,andfunctionsbelongingtotheDirect3D9Cgruntimehavea
cgD3D9prefix.Becausemostofthefunctionsareidenticalbetweenthetwo
runtimes,wedescribetheDirect3D9Cgruntimewiththeunderstanding
thatthedescriptionappliestotheDirect3D8Cgruntimeaswell,unless
otherwiseindicated.
Thesameprefixconventionusedforthefunctionnamesisalsousedforthe
typenames,macronamesandenumerantvalues.
Header Files
HereishowtoincludethecoreCgruntimeAPIintoyourCorC++program:
HereishowtoincludetheOpenGLCgruntimeAPI:
HereishowtoincludetheDirect3D9CgruntimeAPI:
And,hereishowtoincludetheDirect3D8CgruntimeAPI:
Creating a Context
AcontextisacontainerformultipleCgprograms.ItholdstheCgprograms,
aswellastheirshareddata.
Here’showtocreateacontext:
Compiling a Program
CompileaCgprogrambyaddingittoacontextwithcgCreateProgram():
CG_SOURCEindicatesthatmyVertexProgramString,astringargument,
containsCgsourcecode,notprecompiledobjectcode.Indeed,theCg
runtimealsoletsyoucreateaprogramfromprecompiledobjectcode,ifyou
wantto.
CG_PROFILE_ARBVP1istheprofiletheprogramistobecompiledto.The
mainparametergivesthenameofthefunctiontouseasthemainentry
#include <Cg/cg.h>
#include <Cg/cgGL.h>
#include <Cg/cgD3D9.h>
#include <Cg/cgD3D8.h>
CGcontext context = cgCreateContext();
CGprogram program = cgCreateProgram(context,
CG_SOURCE, myVertexProgramString,
CG_PROFILE_ARBVP1, "main", args);
808-00504-0000-006 47
NVIDIA
Introduction to the Cg Runtime Library
pointwhentheprogramisexecuted.Lastly,argsisanullterminatedlistof
nullterminatedstringsthatispassedasanargumenttothecompiler.
Loading a Program
Afteryoucompileaprogram,youneedtopasstheresultingobjectcodeto
the3DAPIthatyou’reusing.Forthis,youneedtoinvoketheCgruntime’s
APIspecificfunctions.
TheDirect3DspecificfunctionsrequiretheDirect3Ddevicestructurein
ordertomakethenecessaryDirect3Dcalls.Theapplicationpassesittothe
runtimeusingthefollowingcall:
YoumustdothiseverytimeanewDirect3Ddeviceiscreated,typicallyonly
atthebeginningoftheapplication.
YoucanthenloadaCgprograminthiswayfortheDirect3D9Cgruntime:
orthiswayfortheDirect3D8Cgruntime:
TheparametervertexDeclarationistheDirect3D8vertexdeclaration
arraythatdescribeswheretofindthenecessaryvertexattributesinthe
vertexstreams.(See“ExpandedInterfaceProgramExecution”onpage 103
forthedetailsontheargumentstocgD3D8LoadProgram()and
cgD3D9LoadProgram()).
InOpenGL,theequivalentcallis
Modifying Program Parameters
Theruntimegivesyoutheoptionofmodifyingthevaluesofyourprogram
parameters.Thefirststepistogetahandletotheparameter:
Thevariable myParameter isthenameoftheparameterasitappearsinthe
programsourcecode.
Thesecondstepistosettheparametervalue.Thefunctionuseddependson
theparametertype.
HereisanexampleinOpenGL:
cgD3D9SetDevice(Device);
cgD3D9LoadProgram(program, CG_FALSE, 0);
cgD3D8LoadProgram(program, CG_FALSE, 0, 0, vertexDeclaration);
cgGLLoadProgram(program);
CGparameter myParameter = cgGetNamedParameter(
program, "myParameter");
cgGLSetParameter4fv(myParameter, value);
48 808-00504-0000-006
NVIDIA
Cg Language Toolkit
HereisthesameexampleinDirect3D:
NumericparametersmayalsobesetusingcoreCgruntimecalls,suchas:
Thesefunctioncallsassignthefourfloatingpointvaluescontainedinthe
arrayvaluetotheparametermyParameter,whichisassumedtobeoftype
float4.
InbothAPIs,therearevariantsofthesecallstosetmatrices,arrays,textures,
andtexturestates.ThecoreCgruntimeprovidesvariantsofthesecallstoset
thevalueofnumericparameters,includingscalars,vectors,arrays,and
structures.ThegraphicsAPIspecificruntimesmustbeusedtosetAPI
specificvalues,suchassamplerhandles.
Executing a Program
BeforeyoucanexecuteaprograminOpenGL,youmustenableits
correspondingprofile:
InDirect3D,nothingexplicitlyneedstobedonetoenableaspecificprofile.
Next,youbindtheprogramtothecurrentstate.Thismeansthatin
subsequentdrawingcallstheprogramisexecutedforeveryvertexinthe
caseofavertexprogramandforeveryfragmentinthecaseofafragment
program.
Here’showtobindaprograminOpenGL:
Here’showtobindaprograminDirect3D:
Youcanonlybindonevertexandonefragmentprogramatatimefora
particularprofile.Therefore,thesamevertexprogramisexecuteduntil
anothervertexprogramisbound.Similarly,thesamefragmentprogramis
executedaslongasnootherfragmentprogramisbound.
InOpenGL,youdisableprofilesbythefollowingcall:
Disablingaprofilealsodisablestheexecutionofthecorrespondingvertexor
fragmentprogram.
cgD3D9SetUniform(myParameter, value);
cgSetParameterValuefr(myParameter, 4, value);
cgGLEnableProfile(CG_PROFILE_ARBVP1);
cgGLBindProgram(program);
cgD3D9BindProgram(program);
cgGLDisableProfile(CG_PROFILE_ARBVP1);
808-00504-0000-006 49
NVIDIA
Introduction to the Cg Runtime Library
Releasing Resources
Whenyourapplicationisreadytoclose,itisgoodprogrammingpracticeto
freeresourcesthatyou’veacquired.
BecausetheDirect3DruntimekeepsaninternalreferencetotheDirect3D
device,youmusttellittoreleasethisreferencewhenyouaredoneusingthe
runtime.Thisisdonewiththefollowingcall:
Tofreeresourcesallocatedforaprogram,callthisfunction:
Tofreeresourcesallocatedforacontext,usethisfunction:
Notethatdestroyingacontextdestroysalltheprogramsitcontainsaswell.
Core Cg Runtime
ThecoreCgruntimeprovidesallthefunctionsnecessarytomanageCg
programsfromwithintheapplication.Itmakesnoassumptionaboutwhich
3DAPItheapplicationsuses,sothatanyapplicationcouldeasilyignorethe
APIspecificCgruntimelibrariesandcontentitselfwiththecoreCgruntime.
ThecoreCgruntimeisbuiltaroundthreemainconcepts:context,program,
andparameter,whicharerepresentedbytheCGcontext,CGprogram,and
CGparameterobjecttypes.Thoseconceptsarehierarchicallyrelatedoneto
eachother:aprogramhasseveralparameters,acontextcontainsseveral
programsandsharedparameters,andtheapplicationcandefineseveral
contexts.
Thenextsectionsdescribethesethreebasicobjecttypesandtheruntime
entrypointsthatoperateonthem.Thethreeobjecttypeshavesomepointsin
common:
TheuseofCGbool,whichisanintegertypeequaltoeitherCG_TRUEor
CG_FALSE
TheuseofCGenum,whichisanenumeratetypeusedtospecifyvarious
enumeratevaluesthatarenotnecessarilyrelated
TheconventionthatfunctionsthatreturnavalueoftypeCGcontext,
CGprogram,CGparameter,orconst char*indicatefailurebyreturning
zero
cgD3D9SetDevice(0);
cgDestroyProgram(program);
cgDestroyContext(context);
50 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Core Cg Context
TheCgruntimeprovidesfunctionsforcreating,destroying,andquerying
contexts.
Context Creation and Destruction
Programscanonlybecreatedaspartofacontextthatactsasaprogram
container.AcontextiscreatedbycallingcgCreateContext():
AcontextisdestroyedbycgDestroyContext():
cgDestroyContext()deletesalldataassociatedwiththecontext,including
allprogramsitcontains.cgDestroyContext()shouldbecalledbefore
destroyinganyassociatedOpenGLcontextorDirect3Ddevice.
Context Query
Tocheckwhetheracontexthandlereferencesavalidcontextornot,use
cgIsContext():
Core Cg Program
ThereareCgfunctionsforcreating,destroying,iteratingover,andquerying
programs.
Program Creation and Destruction
AprogramiscreatedbycallingeithercgCreateProgram():
orcgCreateProgramFromFile():
CGcontext cgCreateContext();
void cgDestroyContext(CGcontext context);
CGbool cgIsContext(CGcontext context);
CGprogram cgCreateProgram(CGcontext context,
CGenum programType,
const char* program,
CGprofile profile,
const char* entry,
const char** args);
CGprogram cgCreateProgramFromFile(CGcontext context,
CGenum programType,
const char* program,
CGprofile profile,
const char* entry,
const char** args);
808-00504-0000-006 51
NVIDIA
Introduction to the Cg Runtime Library
Thesefunctionscreateaprogramobject,addittothespecifiedcontextand
compiletheassociatedsourcecode.Forbothofthem,
contextisavalidcontexthandle.
profileisanenumerantspecifyingtheprofiletowhichtheprogram
mustbecompiled.
entryisthenameofthefunctionthatmustbeconsideredasthemain
entrypointbythecompiler.Ifthevalueiszero,thename mainisused.
argsisapointertoanullterminatedarrayofnullterminatedstrings
thatarepassedasargumentstothecompiler.Thepointermayitselfbe
null.
Theonlydifferencebetweenthetwofunctionsishowprogramisinterpreted.
ForcgCreateProgramFromFile(),programisastringcontainingthename
ofafilecontainingsourcecode;forcgCreateProgram(),programdirectly
containssourcecode.IftheenumerantprogramTypeisequaltoCG_SOURCE,
thesourcecodeisCgsourcecode;ifitisequaltoCG_OBJECT,thesourcecode
isprecompiledobjectcodeanddoesnotrequireanyfurthercompilation.
TheCGprogramhandlereturnedbycgCreateProgramFromFile()isvalidif
itisdifferentfromzero,whichmeansthattheprogramhasbeensuccessfully
createdandcompiled.Theprogramisdestroyedbypassingitshandleto
cgDestroyProgram():
TheCgruntimeallowsforeitherautomaticormanualcompilationof
programs.Compilationofaprogramisrequiredbeforetheprogrammaybe
usedwhendrawing.Assuch,programcompilationisnecessarysometime
aftertheprogramisfirstcreated,orwheneveritentersanuncompiledstate.
Aprogrammayenteranuncompiledstateforavarietyofreasons,including
Changingvariabilityofparameters
Parametersmaybechangedfromuniformvariabilitytoliteralvariability
(compiletimeconstant).SeethecgSetParameterVariabilitymanual
pageformoreinformation.
Changingvalueofliteralparameters
Changingthevalueofaliteralparameterwillrequirerecompilation
sincethevalueisusedatcompiletime.SeethecgSetParameterand
cgSetMatrixParametermanualpagesformoreinformation.
Resizingunsizedarrays
Changingthelengthofaparameterarraymayrequirerecompilation
dependingonthecapabilitiesoftheprogramprofile.Seethe
void cgDestroyProgram(CGprogram program);
52 808-00504-0000-006
NVIDIA
Cg Language Toolkit
cgSetArraySizeandcgSetMultiDimArraySizemanualpagesformore
information.
Connectingstructurestointerfaceparameters
Structureparameterscanbeconnectedtointerfaceprogramparameters
tocontrolthebehavioroftheprogram.Changingtheseconnections
requiresrecompilationonallcurrentprofiles.Seethe
cgConnectParametermanualpageandtheInterfacessectionofthis
documentformoredetails.
Whenaprogramentersanuncompiledstate,itisautomaticallyunloaded
andunbound.Inordertobeusedagain,theprogrammustberecompiled
(eitherautomaticallyormanually—seethefollowing),andthenreloaded
andrebound.
Compilationcanbeperformedmanuallybytheapplicationvia
orautomaticallybytheruntime.
Compilationbehavioriscontrolledvia
Here,flagmaybeoneofthefollowingenumerants:
CG_COMPILE_MANUAL
Inthismode,theapplicationisresponsibleformanuallycompilinga
program.Theapplicationmaychecktoseeifaprogramrequires
recompilationwiththeentrypointcgIsProgramCompiled.Theprogram
maythenbecompiledviacgCompileProgram().Thismodeprovides
theapplicationwiththemostcontroloverhowandwhenprogram
recompilationoccurs.
CG_COMPILE_IMMEDIATE
Inthismode,theCgruntimewillforcecompilationautomaticallyand
immediatelywhenaprogramentersanuncompiledstate,orwhenthe
programisfirstcreated.Thisisthedefaultmode.
CG_COMPILE_LAZY
ThismodeissimilartoCG_COMPILE_IMMEDIATE,butwilldelayprogram
compilationuntiltheprogramobjectcodeisneeded.Theadvantageof
thismethodisthereductionofextraneousrecompilations.The
disadvantageisthatcompiletimeerrorswillnotbeencounteredwhen
theprogramentersanuncompiledstate,butwillinsteadbeencountered
atsomelatertime(mostlikelywhentheprogramisloadedorbound).
cgCompileProgram(CGprogram program);
void cgSetAutoCompile(CGcontext ctx, CGenum flag);
808-00504-0000-006 53
NVIDIA
Introduction to the Cg Runtime Library
AcalltocgIsProgramCompiled()determineswhetheraprogramneedsto
berecompiled:
Torecompileaprogram,usecgCompileProgram():
Program Iteration
Theprogramswithinacontextaresequentiallyorderedandcanbeiterated
overbyusingcgGetFirstProgram()andcgGetNextProgram():
ThefirstprogramofthesequenceisretrievedbycgGetFirstProgram().If
thecontextisinvalidordoesnotcontainanyprogram,thefunctionreturns
zero.Givenaprogram,cgGetNextProgram()returnstheprogram
immediatelynextinthesequence,orzeroifthereisnone.Hereishowthose
twofunctionswouldtypicallybeusedgivenavalidcontextnamedcontext:
Nothingisguaranteedregardingtheorderoftheprogramsinthesequence
orhowcgGetFirstProgram()andcgGetNextProgram()behavewhen
programsarecreatedordestroyedduringiteration.
Program Query
Programqueriesencompassvalidity,compilationresults,andattributes.
Program Validity
UsecgIsProgram()tocheckwhetheraprogramhandlereferencesavalid
program:
Compilation Result
Youcanquerytheresultofthecompilationresultingfromthelastcallto
cgCreateProgram()foragivencontextbyusingcgGetLastListing():
CGbool cgIsProgramCompiled(CGprogram program);
cgCompileProgram(CGprogram program);
CGprogram cgGetFirstProgram(CGcontext context);
CGprogram cgGetNextProgram(CGprogram program);
CGprogram program = cgGetFirstProgram(context);
while (program != 0) {
/* Here is the code that handles the program */
program = cgGetNextProgram(program);
}
CGbool cgIsProgram(CGprogram program);
const char* cgGetLastListing(CGcontext context);
54 808-00504-0000-006
NVIDIA
Cg Language Toolkit
IfnocalltocgCreateProgram()hasbeenmadeforthecontext,
cgGetLastListing()returnszero.Otherwise,itreturnsastringcontaining
theoutputyouwouldtypicallygetfromthecommandlineversionofthe
compiler.
Program Attributes
Toretrievethecontexttheprogrambelongsto,use
cgGetProgramContext():
Retrievingtheprofiletheprogramhasbeencompiledtoisdonewith
cgGetProgramProfile():
ThefunctionpaircgGetProfile()andcgGetProfileString()allowsyou
tofindthecorrespondencebetweenaprofileenumerantandits
correspondingstring:
IfthestringpassedtocgGetProfile()doesnotcorrespondtoanyprofile,
CG_PROFILE_UNKNOWNisreturned.
ThefunctioncgGetProgramString()retrievesvariousstringsrelatedtothe
programdependingonthevalueoftheenumerantstringType:
ThevariablestringTypecanhaveanyofthesevalues:
CG_PROGRAM_SOURCE:TheoriginalCgsourceprogramisreturned.
CG_PROGRAM_ENTRY:ThemainentrypointoftheCgsourceprogramis
returned.
CG_PROGRAM_PROFILE:Theprofilestringisreturned.
CG_COMPILED_PROGRAM:Theresultingcompiledprogramisreturned.
Core Cg Parameters
Cgparametersfallintothreebroadcategories:programparameters,effect
parameters,andsharedparameters.
ProgramparametersareassociatedwithCgprograms.Aparameterthatis
declaredaspartoftheprogram’sentrypointbelongstotheprogram’s
CGcontext cgGetProgramContext(CGprogram program);
CGprofile cgGetProgramProfile(CGprogram program);
CGprofile cgGetProfile(const char* profileString);
const char* cgGetProfileString(CGprofile profile);
const char* cgGetProgramString(CGprogram program,
CGenum stringType);
808-00504-0000-006 55
NVIDIA
Introduction to the Cg Runtime Library
namespace.AparameterthatisdeclaredgloballyinthefilescopeoftheCg
programbelongstotheprogram’sglobalnamespace.
EffectparametersareassociatedwithCgEffects.SeetheIntroductiontoCgFX
chapterformoreinformationonmanagingeffectparameters.
SharedparametersareassociatedwithCgcontexts.See“SharedParameters”
onpage 59,formoredetails.
Cgfunctionsexistforretrieving,creating,andqueryingprogram
parameters.
Program Parameter Retrieval
ParametersassociatedwithCgprogramsmayberetrievediterativelyor
directly.
Iteration
Aprogramhasasequenceofparametersthatcanbeiteratedoverbyusing
cgGetFirstParameter()andcgGetNextParameter():
AcalltocgGetFirstParameter()returnsthefirstparameterofthe
sequence.Iftheprogramisinvalidordoesnotcontainanyparameter,the
callreturnszero.Givenaparameter,cgGetNextParameter()returnsthe
parameterimmediatelynextinthesequenceorzeroifthereisnone.The
namespaceargumentofcgGetFirstParameter()specifiesthenamespace
oftheparametersreturnedbythisfunctionandsubsequentcallsto
cgGetNextParameter().Everyparameterbelongstoaparticularname
spacethatdefinesitsscope.WhenCG_GLOBALisspecified,theprogram’s
globalparameters(i.e.,thoseparametersthatareinthefilescopeofthe
program’sentrypoint),areiteratedover.WhenCG_PROGRAMisspecified,the
parametersspecifiedintheprogram’sentrypointdeclarationareiterated
over.
Hereishowthosetwofunctionswouldtypicallybeusedgivenavalid
programcalledprogram:
CGparameter cgGetFirstParameter(CGprogram program,
CGenum namespace);
CGparameter cgGetNextParameter(CGparameter parameter);
CGparameter parameter = cgGetFirstParameter(program,
CG_PROGRAM);
while (parameter != 0) {
/* Here is the code that handles the parameter */
parameter = cgGetNextParameter(parameter);
}
56 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Thesefunctionsdon’tprovideaccesstothefieldsofastructureparameter
(typeCG_STRUCT)ortheelementsofanarrayparameter(typeCG_ARRAY).In
otherwords,ifastructorarrayparameterisdeclared,theseentrypoints
returnwillreturnahandletothestructorarrayitself.
Onewaytoaccessthefieldsofastructureistouse
cgGetFirstStructParameter()alongwithcgGetNextParameter():
IfparameterisnotoftypeCG_STRUCT,cgGetFirstStructParameter()
returnszero.
Similarly,togetaccesstotheelementsofanarray,youcanuse
cgGetArrayDimension(),cgGetArraySize(),cgGetArrayParameter(),
andcgGetNextParameter():
Thesethreefunctionsreturn0ifparameterisnotoftypeCG_ARRAY.
FunctioncgGetArrayDimension()givesthedimensionofthearray.It
returns1forfloat4 array[10],2forfloat4 array[10][100],andsoon.
Next,cgGetArraySize()givesthesizeofeverydimension.Forexample,for
float4array[10][100],cgGetArraySize(array,0)returns10and
cgGetArraySize(array,1)returns100.Anarray,anArray,has
cgGetArraySize(anArray,0)elements.Ifitsdimensionisgreaterthanone,
thoseelementsarethemselvesarrays.
Hereishowtheseiterationfunctionscouldbeusedgivenavalidprogram
namedprogram:
CGparameter cgGetFirstStructParameter(CGparameter parameter);
int cgGetArrayDimension(CGparameter parameter);
int cgGetArraySize(CGparameter parameter, int dimension);
CGparameter cgGetArrayParameter(CGparameter parameter,
int index);
void IterateProgramParameters(CGprogram program) {
RecurseProgramParameters(cgGetFirstParameter(program,
CG_PROGRAM));
}
void RecurseProgramParameters(CGparameter parameter) {
if (parameter == 0)
return;
do {
switch(cgGetParameterType(parameter)) {
case CG_STRUCT:
RecurseProgramParameters(
cgGetFirstStructParameter(parameter));
break;
808-00504-0000-006 57
NVIDIA
Introduction to the Cg Runtime Library
Inpractice,itisusuallysimplertoiterateoverallofthe“leafparameters
(thatis,nonaggregateparameters)directlyusing
cgGetNextLeafParameter():
Thesefunctionsiteratethroughallthesimpleparameters,including
structurefieldsandarrayelementsthatserveasinputstotheprogram.
Nothingisguaranteedregardingtheorderoftheparametersinthe
sequence.
Direct Retrieval
Anyparameterofaprogramcanalsoberetrieveddirectlybyusingitsname
withcgGetNamedParameter():
Here,namespacemaybeeitherCG_GLOBALorCG_PROGRAM,asabove.Ifthe
programhasnoparametercorrespondingtoname,cgGetNamedParameter()
returnszero.
TheCgsyntaxisusedtoretrievestructurefieldsorarrayelements.Let’stake
thefollowingcodesnippetasanexample:
case CG_ARRAY:
int arraySize = cgGetArraySize(parameter, 0);
for (int i = 0; i < arraySize; ++i)
RecurseProgramParameters(
cgGetArrayParameter(parameter, i));
break;
default:
/* Here is the code that handles the parameter */
break;
}
} while((parameter = cgGetNextParameter(parameter))!= 0);
}
CGparameter cgGetFirstLeafParameter(CGprogram program,
CGenum namespace);
CGparameter cgGetNextLeafParameter(CGparameter parameter);
CGparameter cgGetNamedProgramParameter(CGprogram program,
CGenum namespace,
const char* name);
struct FooStruct {
float4 A;
float4 B;
};
struct BarStruct {
FooStruct Foo[2];
};
58 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Thefollowingarevalidnamesforretrievingthecorrespondingparameter:
Parameter Values
ThecoreCgruntimeprovidesanumberofentrypointsforsettingand
retrievingparametervalues.Inaddition,thegraphicsAPIspecificCg
runtimesprovideadditionalentrypointsformanagingparametervalues.
Whenmanagingnumericparameters,choosingwhichsetofentrypointsto
useislargelyamatterofprogrammerpreference.Insomecircumstances,it
maybeslightlymoreefficienttousethecoreCgruntimeentrypoints.
However,parametersthatholdgraphicsAPIspecificquantities,suchas
samplerhandles,mustbesetusingtheAPIspecificentrypoints.TheAPI
specificentrypointsmustbeusedbecausethecoreCgruntime,whichis
graphicsAPIagnostic,providesnosuchentrypoints.
Themostoftenusedparametervalueroutinesareusedtosetandgeta
parameterscurrentvalues.Aparameterscurrentvalueisinitializedtoany
defaultvalueassignedintheCgsource,or0otherwise.Thecurrentvalueof
anumericparametercanbequeriedusingthefamilyofentrypoints:
Thegivenparametermustbeascalar,vector,matrix,oran(possibly
multidimensional)arrayofscalars,vectors,ormatrices.Thereareversionsof
eachfunctiontoretrievethevaluesintoanint,float,ordoublebuffer;these
aresignifiedbythei,f,anddintheentrypointname,respectively.
Similarly,thereareversionsofeachfunctionthatretrieveanymatricesinthe
givenparameterinrowmajororcolumnmajororder.Thesearespecified
usingrorc,respectively.Atmost,nvalsvalueswillbecopiedintothegiven
array,v.Thetotalnumberofvaluescopiedintovisreturned.
Forexample,cgGetParameterValueic()retrievesthevaluesofthegiven
parameterintothesuppliedarrayofintegerdata,andcopiesmatrixdatain
columnmajororder.Thetotalnumberofvaluesassociatedwithagiven
void main(BarStruct Bar[3]) {
// ...
}
“Bar”
“Bar[1]”
“Bar[1].Foo”
“Bar[1].Foo[0]”
“Bar[1].Foo[0].B”
int cgGetParameterValue{i,f,d}{r,c}(CGparameter param,
int nvals, type *v);
808-00504-0000-006 59
NVIDIA
Introduction to the Cg Runtime Library
parameter,andhencetherequiredlengthofthegivenarray,canbe
computedusingthecoreCgruntime:
Asimilarfamilyofentrypointsexistforsettingaparametersvalues:
Theentrypointsinthisfamilyareidenticaltothoseofthe
cgGetParameterValuefamily.Thetotalnumberofvaluesinaparameter
maybecomputedasabove.Ifnvals islessthanthetotalsizeofthe
parameter,anerrorisgenerated.
ThecoreCgruntimealsoallowstheapplicationtoqueryaparameters
defaultvalues:
ThisentrypointretrievestheparametersdefaultvalueifvalueTypeisequal
toCG_DEFAULT.Thecomponentsofthevaluearereturnedinrowmajor
orderasapointertoanarraycontainingtypedoubleelements.Thenumber
ofcomponentsavailableinthearrayisreturnedin
numberOfValuesReturned.FunctioncgGetParameterValues()canalsobe
usedtoretrieveaparametersconstantvalues,butthisfunctionalityisrarely
used;seethecorrespondingmanualpageformoredetails.
Shared Parameters
ThecoreCgruntimesupportsthecreationofinstancesofanytypeof
concreteparameter(e.g.,builtintypes,userdefinedstructures)withinaCg
context.Aparameterinstancemaybeconnectedtoanynumberof
compatibleparameters,includinganyprogramoreffectparameterwithin
thecontext.
Whenaninstanceisconnectedtoanotherparameter,thesecondparameter
willinherititsvaluesfromtheinstance.Furthermore,ifthevariabilityofthe
secondparameterhasnotbeenexplicitlysetbyacallto
cgSetParameterVariability(),itsvariabilitywillalsobeinheritedfrom
theinstance.
int nrows = cgGetParameterRows(param);
int ncols = cgGetParameterColumns(param);
int asize = cgGetArrayTotalSize(param);
int ntotal = nrows*ncols;
if (asize > 0) ntotal *= asize;
void cgSetParameterValue{i,f,d}{r,c}(CGparameter param,
int nvals, type *v);
const double* cgGetParameterValues(CGparameter parameter,
CGenum valueType,
int* numberOfValuesReturned);
60 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Theabilitytocreateandeasilymanageshared,contextglobalparameters
providesapowerfulmeansforcreatingparametertrees,andforsharingdata
anduserdefinedobjectsbetweenmultipleCgprogramsoreffects.
Shared Parameter Creation
SharedparametersareassociatedwithaCGcontext.Theymaybecreated
withthefollowingentrypoints:
Onlyparametersofconcretetypesmaybecreated.Inparticular,parameters
ofabstractinterfacetypesmaynotbecreated.Bydefault,acreated
parameterhasuniformvariabilityandundefinedvalues.
Shared Parameter Deletion
Sharedparametersmaybedeletedusing
Whenasharedparameterisdeleted,allparametersconnectedtoitare
disconnected,andviceversa.
Connecting Parameters
Oncecreated,asharedparametermaybeconnectedtoanynumberof
program,effect,orsharedparametersusing
wheresourceisthesharedparameter,andsinkisthetargetparameterthat
willinheritthesharedparametersvalues.
Onceaparameterhashadasourceconnectedtoit,itsvalueshouldno
longerbesetdirectly.Instead,itsvaluecanbesetindirectlybysettingthe
valueoftheassociatedsink.
Aparameterthathasbeenconnectedtoasharedsourceparametermaybe
disconnectedusing
Shared Parameters and Interfaces
UsingCg,itispossibletocreatefamiliesofcode“modules”thatsharea
commoninterface,eachmemberofwhichhasadifferentimplementation.
Thisabilitymakesiteasyforapplicationstoconstructmaterialtreesonthe
CGparameter cgCreateParameter(CGcontext ctx, CGtype type);
CGparameter cgCreateParameterArray(CGtype type, int length);
CGparameter cgCreateParameterMultiDimArray(CGtype type,
int dim, int *lengths);
Void cgDeleteParameter(CGparameter param);
void cgConnectParamteer(CGparameter source, CGparameter sink);
Void cgDisconnectParameter(param);
808-00504-0000-006 61
NVIDIA
Introduction to the Cg Runtime Library
fly,tochangethenumberortypeoftexturemapsappliedtoanobjectat
applicationruntime,andsoon.
Specifyingwhichparticularimplementationofaninterfacetouseis
accomplishedthrough“connecting”parameters.Inparticular,ashared
instanceofastructthatimplementstheinterfaceiscreatedbythe
application.Thissharedinstanceisthenconnectedtotheinterface
parameter.Theactofconnectingtheparameterscausestheinterface
parametertoinheritthesharedparametersimplementationoftheinterface.
Thisprocesscanbethoughtofasimplementingcompiletime
polymorphism.
Itislegaltoconnectasharedparameterofauserdefinedstructuretypetoan
interfaceparameter,aslongasthestructuretypeimplementsthatinterface
type.Atruntime,theentrypoint’scgIsParentType,coupledwith
cgGetParameterNamedType,canbeusedtodeterminetypeparenthood.
Whenastructureparameterisconnectedtoaninterfaceparameter,copiesof
anychild(thatis,member)variablesassociatedwiththesourcestructure
parameterareautomaticallycreatedaschildrenofthesinkparameter.
Undermostcircumstances,thesemembervariablecopiescanbeignoredby
theapplication,sincetheirvaluesandvariabilityareautomaticallysetbythe
Cgruntime.However,insomesituationsitmaybeusefultoquerya“sink
side”memberparameterforitsunderlyingresource,forexample.
AsharedinstanceofastructurewhosetypeindefinedinoneCgprogramor
effectmaybeconnectedtoparametersofotherprogramsoreffects,provided
thattheentitiesinvolveddefinethesourcestructuretypesanddestination
interfacetypesequivalently.See“ParameterTypeEquivalency”onpage 65
ormoredetails.Ifthetypesarenotequivalent,cgConnectParameter()
generatesaruntimeerror.
Thefollowingexampleillustratesstructuretointerfaceconnectionby
creatingthreeprograms,allofwhichdefineatypenamedFoo,withone
program’sdefinitiondifferingfromtheothers:
interface MyInterface {
float Val(float x);
};
struct MyStruct : MyInterface {
float Scale;
float Val(float x) { return(Scale * x);
};
float4 main(MyInterface foo) : COLOR {
return(foo.Val(.2).xxxx);
}
62 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Listing 1: Cg Program 1
Listing 2: Cg Program 2
Listing 3: Cg Program 3
NoticethatbothCgProgram1andCgProgram2definetheVal()method
oftheMyInterfaceandMyStructtypesusingthefloattype,whereasCg
Program3doessousingthehalftype.Asaresult,theMyInterfaceand
MyStructtypesdefinedinCgProgramThreearenotequivalenttotypesin
theothertwoprograms,eventhoughthetypeshavethesamenames.
ThefollowingCprogramcreatesallthreeoftheaboveCgprogramsand
connectssharedparameterinstancestotheirinputparameters:
interface MyInterface {
float Val(float x);
};
struct MyStruct : MyInterface {
float Scale;
float Val(float x) { return(Scale * x);
};
float4 main(MyInterface foo) : COLOR {
return(foo.Val(.3).xxxx);
}
interface MyInterface {
half Val(half x);
};
struct MyStruct : MyInterface {
float Scale;
half Val(half x) { return(Scale * x);
};
float4 main(MyInterface foo) : COLOR {
return(foo.Val(.5).xxxx);
}
static CGprogram CreateProgram(const char *program_str) {
return cgCreateProgram(Context, CG_SOURCE,
program_str, CG_PROFILE_ARBFP1,
"main", NULL);
}
int main(int argc, char *argv[]) {
CGContext Context;
CGprogram Program1, Program2, Program3;
CGparameter ms1, ms3;
// Disable automatic compilation, since the
// programs cannot be compiled until concrete structs
// are connected to each program's interface parameters.
808-00504-0000-006 63
NVIDIA
Introduction to the Cg Runtime Library
Context = cgCreateContext();
cgSetAutoCompile(Context, CG_COMPILE_MANUAL);
// Create the programs
Program1 = CreateProgram(Program1String);
Program2 = CreateProgram(Program2String);
Program3 = CreateProgram(Program3String);
// Create two shared parameters,
// one of the MyStruct type from Program1, and
// one of the MyStruct type from Program3.
ms1 = cgCreateParameter(cgGetNamedUserType(Program1,
"MyStruct"));
ms3 = cgCreateParameter(cgGetNamedUserType(Program3,
"MyStruct"));
/* Connect the same shared parameter to Program1 and
Program2 */
cgConnectParameter(Foo1, cgGetNamedParameter(Program1,
"foo"));
cgConnectParameter(Foo1, cgGetNamedParameter(Program2,
"foo"));
// The following would generate an error because the type
// of the Foo1 parameter is not equivalent to type
// "MyStruct" from Program3.
// cgConnectParameter(ms1,
// cgGetNamedParameter(Program3, "foo"));
cgConnectParameter(ms3, cgGetNamedParameter(Program3,
"foo"));
// Now we can compile all three programs.
cgCompileProgram(Program1);
cgCompileProgram(Program2);
cgCompileProgram(Program3);
// … and so on …
}
64 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Parameter Properties
Parameterpropertiesencompassvalidity,references,size,andother
attributes.
Parameter Type
TheCglanguagedefinesanumberofbuiltinparametertypes,suchas
float4,int3x3,andsoon.Inaddition,userdefinedtypesmaybespecified
inaprogramwhendeclaringstructureandinterfacetypes.Forexample,if
thefollowingCgcodeisincludedinthesourcetoaCGprogramcreatedvia
cgCreateProgram(),thetypesMyInterfaceandMyStructwillbeaddedto
theresultingCGprogram.
Inordertoobtaintheuniqueenumerantassociatedwithaparameterstype,
thefollowingentrypointshouldbeused
TheCGtypeassociatedwithanameduserdefinedtypeinaprogramcanbe
retrievedusing
Here,handlecanbeeitheraCGprogramoraCGeffect.
Thestructtypescanimplementagiveninterface.Insuchacase,the
indicatedinterfaceisknownasaparenttypeofthestructtype.Inthe
exampleabove,MyStructhasasingleparenttype,MyInterface.Theparent
typesofagivennamedtypemaybeobtainedwiththefollowingentry
points:
NotethattheCglanguagespecificationcurrentlymakesitimpossiblefora
structtypetohavemorethanasingleparenttype.
interface MyInterface {
float SomeMethod(float x);
};
struct MyStruct : MyInterface {
float Scale;
SomeMethod(float x) {
return(Scale * x);
}
};
CGtype cgGetParameterNamedType(CGparameter param);
CGtype cgGetNamedUserType(CGhandle handle, const char *name);
int cgGetNumParentTypes(CGtype type);
CGtype cgGetParentType(CGtype type, int index);
808-00504-0000-006 65
NVIDIA
Introduction to the Cg Runtime Library
Alloftheuserdefinedtypesassociatedwithaprogrammaybeobtained
withthefollowingentrypoints:
Notethattheruntimetreatsinterfaceprogramparametersasiftheywere
structureparameterswithnoconcretedataorfunctionmembers.
InolderapplicationsthatusetheCgruntime,youmayencounterthe
deprecatedentrypoint:
ThisentrypointdiffersfromcgGetNamedUserType()inthatitalways
returnsCG_STRUCTforanystructparameter,ratherthanreturningthe
enumerantassociatedwiththeuserdefinedtypeofthestruct.
Thenameassociatedwithagiventypeenumerantcanbequeriedusing
IfthestringpassedtocgGetType()doesnotcorrespondtoanytype,
CG_UNKNOWN_TYPEisreturned.
FunctioncgGetParameterBaseType()returnsthebasictypeofvector
matrixandmatrixparameters.Forexample,givenafloat4x4parameter,
cgGetParameterBaseType()returnstheCG_FLOATtype.Similarly,givena
multidimensionalarrayoffloat4x4s,italsoreturnsCG_FLOAT.
Itisalsopossibletodeterminethegeneralclassofthetypeofaparameter:
Itreturnsoneofthefollowingenumeratedvalues:
Parameter Type Equivalency
Ifaprogramcontainingauserdefinedtypeiscreatedinacontextthat
alreadycontainsanotherprogramoreffectthatdefinesausertypewiththe
samename,thetwotypedefinitionsarecompared.Ifbothtypedefinitions
arefoundtobeequivalent,theCGtypeenumerantassociatedwiththeuser
typeinthenewprogramwillbeidenticaltothatoftheidenticalusertypein
theexistingprogramoreffect.Ifthetypesarenotequivalent,thenewtype
willbeassignedauniqueCGtype.Inthisway,typeequivalencyof
int cgGetNumUserTypes(CGprogram program);
CGtype cgGetUserType(CGprogram program, int index);
CGtype cgGetParameterType(CGparameter parameter);
const char* cgGetTypeString(CGtype type);
CGparameterclass cgGetParameterClass(CGparameter param);
CG_PARAMETERCLASS_UNKNOWN CG_PARAMETERCLASS_SCALAR
CG_PARAMETERCLASS_VECTOR CG_PARAMETERCLASS_OBJECT
CG_PARAMETERCLASS_MATRIX CG_PARAMETERCLASS_STRUCT
CG_PARAMETERCLASS_ARRAY
66 808-00504-0000-006
NVIDIA
Cg Language Toolkit
parameterssharedbetweenmultipleprogramsandeffectscanbeassured
simplybycomparingCGtypeenumerants.
Inorderfortwotypestobeconsideredequivalent,theymustmeetthe
followingrequirements:
Thetypenamesmustmatch.
Bothtypesmusthavetheexactsamename.
Theparenttypes,ifany,mustmatch.
Ifthetypeisastructure,bothmusteithernotimplementaninterface,or
bothimplementinterfacesthataretypeequivalent.
Themembervariablesandmethodsmustmatch.
Theymustbothhavetheexactsamemembervariablesandmethods.
Theorderandnameofthevariablesmustmatchexactly,andtheorder
andnameofthemethodsmustmatch.Thesignatureofthemethods,
includingargumentandreturntypes,mustbeidentical.
Typeequivalencyisusefulwhenusingsharedparametersinstanceswith
multipleprogramsbyconnectingthemwithcgConnectParameter().
Parameter Validity
ThefunctioncgIsParameter()allowsyoutocheckwhetheraparameter
handlereferencesavalidparameterornot:
Aparameterhandlebecomesinvalidwhentheprogramorthecontextofthe
programitcorrespondstoisdestroyed.
Parameter References
AparameterthatisreferencedbytheoriginalCgsourcecodemaybe
optimizedoutofthecompiledprogrambythecompiler,inwhichcasethe
applicationcansimplyignoreitandnotsetitsvalue.Calling
cgIsParameterReferenced()allowsyoutocheckwhetheraparameteris
potentiallyusedbythefinalcompiledprogram:
Notethatthevaluereturnedbythisentrypointisconservative,butnot
alwaysexact,particularlyiftheprogramhasnotyetbeencompiled.Also,
notethatnoerrorisgeneratedifyousetthevalueofaparameterthatisnot
referenced.
CGbool cgIsParameter(CGparameter parameter);
CGbool cgIsParameterReferenced(CGparameter parameter);
808-00504-0000-006 67
NVIDIA
Introduction to the Cg Runtime Library
Parameter Size
AnumberofcoreCgruntimeentrypointsareprovidedforqueryingand
settingparametersizeandlength.
Thenumberofrowsorcolumnsassociatedwithaparametercanberetrieved
using
Ascalarparameterisconsideredtohaveasinglerowandasinglecolumn,
whileavectorparameterhasasinglerowandcolumnsequaltothelengthof
thevector.Ifparamisamatrixparameter,thevaluesreturnedcorrespondto
thoseofthematrix.Ifparamisanarray,thenumberofrowsorcolumns
associatedwitheachelementofthearrayisreturned.Ifparamisnota
numerictype,0isreturnedbyeitherentrypoint.
Thedimensionalityofanarrayisqueriedusing
Dimensionsareenumeratedstartingat0(zero).Thelengthofaparticular
dimensionofanarraycanberetrievedbycalling
Thetotalnumberofelementsinanarraymaybequeriedusing
Here,parammaybeanarrayofanydimension;thereturnedvalueisthe
totalnumberofelementsacrossalldimensionsofthearray.
Thetypeofeachelementofanarraycanbequeriedusing
Forexample,ifaparameterweredeclared
cgGetArrayType()wouldreturnCG_FLOAT4.Ifitweredeclared
cgGetArrayType()wouldreturntheenumerantcorrespondingtotheuser
definedmystructtype.
Unsized Array Length
Unsizedarrayscanbeassignedconcretesizesviatheruntime.Undermany
profiles,settingthesizeofunsizedarraysassociatedwithaCgprogramis
requiredbeforetheprogramcanbecompiled.
int cgGetParameterRows(CGparameter param);
int cgGetParameterColumns(CGparameter param);
int cgGetArrayDimension(CGparameter param);
int cgGetArraySize(CGparameter param, int dimension);
int cgGetArrayTotalSize(CGparameter param);
CGtype cgGetArrayType(CGparameter param);
float4 array[2][3];
mystruct array[3];
68 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Thelengthofonedimensionalunsizedarrayscanbesetusing
Thesizeofmultidimensionalarraysmaybesetusing
Notethatarrayswithcompletelydeterminedlengthsmaynothavetheirsize
changedusingeitherentrypoint.Onlyunsizedarraysmaybemodified
usingtheseentrypoints.
Parameter Attributes
Aparameterʹsgeneralclasscanbequeriedusing
ThereturnedCGparameterclassvalueenumeratesthehighlevelparameter
classes:
CG_PARAMETERCLASS_SCALAR
Ascalartype,suchasCG_INTorCG_FLOAT
CG_PARAMETERCLASS_VECTOR
Avectortype,suchasCG_INT1orCG_FLOAT4
CG_PARAMETERCLASS_MATRIX
Amatrixtype,suchasCG_INT1X2orCG_FLOAT4X4
CG_PARAMETERCLASS_STRUCT
Astructorinterface
CG_PARAMETERCLASS_SAMPLER
Asamplertype,suchassampler1DorsamplerCUBE
CG_PARAMETERCLASS_OBJECT
Atexture,string,orprogram
Theprogramthattheparametercorrespondstoisfoundusing
cgGetParameterProgram():
Todeterminewhethertheparameterisvarying,uniform,orconstant,
cgGetParameterVariability()isused:
ThecallreturnsCG_VARYINGiftheparameterisavaryingparameter,
CG_UNIFORMiftheparameterisauniformparameter,orCG_CONSTANTifthe
parameterisaconstantparameter.Aconstantparameterisaparameterwhose
valueneverchangesforthelifeofacompiledprogram,sothatchangingits
void cgSetArraySize(CGparameter param, int size);
void cgSetMultiDimArraySize(CGparameter param, int *sizes);
CGparameterclass cgGetParameterClass(CGparameter param);
CGprogram cgGetParameterProgram(CGparameter parameter);
CGenum cgGetParameterVariability(CGparameter parameter);
808-00504-0000-006 69
NVIDIA
Introduction to the Cg Runtime Library
valuerequiresrecompilingtheprogram.Forsomeprofiles,thecompilerhas
toaddsomethatcorrespondtoliteralconstantvaluesinthecode.
AparameterʹsvariabilitycanalsobemodifiedviathecoreCgruntimeusing
Here,varymaybeoneof:
CG_UNIFORM
Theparameterissettouniformvariability.
CG_LITERAL
Theparameterismarkedasaliteral,whosevaluecanbeassumedtobea
compiletimeconstantcompilation.Thisfeaturecanbeusedto“bake”
parametervaluesintothecompiledCgprogram,whichoftenproduces
muchmoreefficientcompiledcode.
CG_DEFAULT
Theparameterrevertstoitsdefaultvariabilityasspecifiedinthe
programtext,orismadetoinherititsvariabilityfromanysourceithas
beenconnectedto.
NotethatparametersmaynotcurrentlybesettoCG_VARYINGvariability.
Toobtaintheparameterdirection,usecgGetParameterDirection():
ItreturnsCG_INiftheparameterisaninputparameter,CG_OUTifthe
parameterisanoutputparameter,orCG_INOUTiftheparameterisbothan
inputandanoutputparameter.
TheentrypointcgGetParameterType()retrievestheparametername:
UsecgGetParameterSemantic()toretrievetheparametersemanticstring:
Iftheparameterdoesnothaveanysemantic,anemptystringisreturned.
Thereisaonetoonecorrespondencebetweenasetofpredefinedsemantics
(POSITION,COLOR,andsoon)andhardwareresources(registers,texture
units,andsoon).IntheCgruntime,ahardwareresourceisrepresentedby
thetypeCGresourceandcgGetParameterResource()retrievesthe
resourceassignedtoaparameter:
void cgSetParameterVariability(CGparameter parameter,
CGenum vary);
CGenum cgGetParameterDirection(CGparameter parameter);
const char* cgGetParameterName(CGparameter parameter);
const char* cgGetParameterSemantic(CGparameter parameter);
CGresource cgGetParameterResource(CGparameter parameter);
70 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Iftheparameterdoesnothaveanyassociatedresource,
cgGetParameterResource()returnsCG_UNDEFINED.
ThetwofunctionscgGetResource()andcgGetResourceString()allow
youtodeterminethecorrespondencebetweenaresourceenumerantandits
correspondingstring:
IfthestringpassedtocgGetResource()doesnotcorrespondtoany
resource,CG_UNDEFINEDisreturned.
UsingcgGetParameterBaseResource()allowsyoutoretrievethebase
resourceforaparameterinaCgprogram:
Thebaseresourceisthefirstresourceinasetofsequentialresources.For
example,ifagivenparameterhasaresourceequaltoCG_TEXCOORD7,itsbase
resourceisCG_TEXCOORD0.Onlyparameterswithresourceswhosename
endswithanumberhaveabaseresource.Allotherparametersreturn
CG_UNDEFINEDwhencgGetParameterBaseResource()iscalled.
FunctioncgGetParameterResourceIndex()retrievesthenumericalportion
oftheresource:
Forexample,iftheresourceforagivenparameterisCG_TEXCOORD7,
cgGetParameterResourceIndex()returns7.
ThecgGetParameterValues()functionretrievesthedefaultorconstant
valueofauniformparameter:
ItretrievesthedefaultvalueifvalueTypeisequaltoCG_DEFAULTandthe
constantvalueifvalueTypeisequaltoCG_CONSTANT.Thecomponentsofthe
valuearereturnedinrowmajororderasapointertoanarraycontaining
typedoubleelements.AftercgGetParameterValues()iscalled,thenumber
ofcomponentsavailableinthearrayispointedtoby
numberOfValuesReturned.
CGresource cgGetResource(const char* resourceString);
const char* cgGetResourceString(CGresource resource);
CGresource cgGetParameterBaseResource(
CGparameter parameter);
unsigned long cgGetParameterResourceIndex(
CGparameter parameter);
const double* cgGetParameterValues(CGparameter parameter,
CGenum valueType, int* numberOfValuesReturned);
808-00504-0000-006 71
NVIDIA
Introduction to the Cg Runtime Library
Core Cg Error Reporting
Anerrorcodeisassociatedwitheachtypeofruntimeerrorthatcanbe
generated.Theruntimecachesboththemostrecentlygeneratederror,as
wellastheerrorthatwasfirstgeneratedsincetheerrorcodewaslast
checkedbytheapplication.Applicationscanquerythecachederrorcodes,as
wellastheerrormessagecorrespondingtoeither,using
Anerrorcodeof0indicatesnoerror.Wheneithererrorfetchingentrypoint
iscalled,itscachederrorvalueisresetto0.
Morecomprehensiveerrorcheckingandhandlingcanbeachievedusing
Cgʹserrorhandlercallbackmechanism.Eachtimeanerroroccurs,thecore
Cgruntimecallsanerrorhandlercallbackfunction,optionallyprovidedby
theapplication.Theapplicationregisterstheerrorhandlerusing
Whenanerroroccurs,theCgruntimecallsthespecifiedfunction,passing
theCGcontextinwhichtheerroroccurred,thecodeassociatedwiththe
triggeringerror,andacopyofthedatapointerregisteredbytheapplication.
Atypicalimplementationoftheerrorhandlermightlooklikethis:
HereisalistofsomeoftheCGerrorcodesspecifictothecoreCgruntime:
CG_NO_ERROR:Returnedwhennoerrorhasoccurred.
CG_COMPILER_ERROR:Returnedwhenthecompilergeneratedanerror.A
calltocgGetLastListing()shouldbemadetogetmoredetailsonthe
actualcompilererror.
CG_INVALID_PARAMETER_ERROR:Returnedwhentheparameterusedis
invalid.
CG_INVALID_PROFILE_ERROR:Returnedwhentheprofileisnot
supported.
CGerror error = cgGetError();
CGerror error = cgGetFirstEror();
const char* errorString = cgGetErrorString(error);
typedef void (*CGerrorHandlerFunc)(CGcontext ctx, CGerror err,
void *appdata);
void cgSetErrorHandler(CGerrorHandlerFunc func, void *data);
void HandleCgError(CGcontext ctx, CGerror err, void *appdata)
{
fprintf(stderr, "Cg error: %s\n", cgGetErrorString(err));
const char *listing = cgGetLastListing(ctx);
if (listing != NULL)
fprintf(stderr, " last listing: %s\n", listing);
}
72 808-00504-0000-006
NVIDIA
Cg Language Toolkit
CG_INVALID_VALUE_TYPE_ERROR:Returnedwhenanunknownvalue
typeisassignedtoaparameter.
CG_NOT_MATRIX_PARAM_ERROR:Returnedwhentheparameterisnotofa
matrixtype.
CG_INVALID_ENUMERANT_ERROR:Returnedwhentheenumerant
parameterhasaninvalidvalue.
CG_NOT_4x4_MATRIX_ERROR:Returnedwhentheparametermustbea
4x4matrixtype.
CG_FILE_READ_ERROR:Returnedwhenthefilecannotberead.
CG_FILE_WRITE_ERROR:Returnedwhenthefilecannotbewritten.
CG_MEMORY_ALLOC_ERROR:Returnedwhenamemoryallocationfails.
CG_INVALID_CONTEXT_HANDLE_ERROR:Returnedwhenaninvalid
contexthandleisused.
CG_INVALID_PROGRAM_HANDLE_ERROR:Returnedwhenaninvalid
programhandleisused.
CG_INVALID_PARAM_HANDLE_ERROR:Returnedwhenaninvalid
parameterhandleisused.
CG_UNKNOWN_PROFILE_ERROR:Returnedwhenthespecifiedprofileis
unknown.
CG_VAR_ARG_ERROR:Returnedwhenthevariableargumentsarespecified
incorrectly.
CG_INVALID_DIMENSION_ERROR:Returnedwhenthedimensionvalueis
invalid.
CG_ARRAY_PARAM_ERROR:Returnedwhentheparametermustbean
array.
CG_OUT_OF_ARRAY_BOUNDS_ERROR:Returnedwhentheindexintoan
arrayisoutofbounds.
API-Specific Cg Runtimes
EachAPIspecificCgruntimesprovidesanadditionalsetoffunctionsontop
ofthecoreCgruntimetoeasetheintegrationofCgtoanapplicationbased
onthisAPI.Theyessentiallyinterfacebetweenthecoreruntimedata
structuresandtheAPIdatastructurestoprovidethefollowingfacilities:
808-00504-0000-006 73
NVIDIA
Introduction to the Cg Runtime Library
Settingtheparametervalues:Adistinctionismadebetweentexture,
matrix,array,vectorandscalarvaluesasthosevarioustypesarehandled
differentlybyeachAPIandhavedifferentdatastructures.
Executingtheprogram:Programexecutionisdividedintoprogram
loading(passingtheresultoftheCgcompilertotheAPI)andprogram
binding(settingtheprogramastheonetoexecuteforanysubsequent
drawcalls).Thisisbecausethosetwooperationsareusuallydoneata
differenttime:Aprogramisloadedeachtimeitisrecompiledanditis
boundeachtimeitneedstobeexecutedforaparticulardrawcall.
Parameter Shadowing
Whenthevalueofauniformparameterissetbysomefunctionofthe
OpenGLCgruntime,itisactuallystoredinternally(orshadowed)byeither
theCgortheOpenGLruntimesothatitdoesnotneedtobereseteverytime
theprogramisabouttobeexecuted.Thisbehaviorisreferredtoasparameter
shadowing.
IftheDirect3DCgruntimeexpandedinterface(describedin“Direct3D
ExpandedInterface”onpage 98)isused,parametershadowingcanbe
turnedonoroffonaperprogrambasis.Whenparametershadowingis
turnedoffforagivenprogramandthevalueofanyofitsuniform
parametersissetbysomefunctionoftheDirect3DCgruntime,itis
immediatelydownloadedtotheGPUconstantmemory(thememory
containingthevaluesofalltheuniformparameters).Whenparameter
shadowingisturnedon,thevalueisshadowedinsteadandnoDirect3Dcall
ismadeatthetimeitisset;onlywhentheprogramisboundareallofits
parametersactuallydownloadedtotheconstantmemory.Thismeansthata
parametervaluesetafterbindingtheprogramisnotusedduringthe
executionoftheprogramuntilthenexttimetheprogramisbound.
Parametershadowingappliestoallparametersettingsincludingtexture
statestageandtexturemode.
Disablingparametershadowingallowstheruntimetoconsumeless
memory,butforcestheapplicationtodotheworkofmakingsurethatthe
constantmemorycontainsalltherightvalueseverytimeitactivatesa
program.
OpenGL Cg Runtime
Thissectiondiscussessettingparametersandprogramexecutionforthe
OpenGLCgruntime.
74 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Note: Before any OpenGL Cg runtime functions can be executed, an OpenGL context must
be created with either wglCreateContext() or glXCreateContext().
Setting Parameters in OpenGL
InaccordancewiththeOpenGLconvention,manyofthefunctionsdescribed
belowcomeintwoversions:aversionoperatingonfloatvalues,marked
withanf,andaversionoperatingondoublevalues,markedwithad.
Setting Uniform Scalar and Uniform Vector Parameters
Tosetthevaluesofscalarparametersorvectorparameters,usethe
cgGLSetParameterfunctions:
void cgGLSetParameter1f(CGparameter parameter, float x);
void cgGLSetParameter1fv(CGparameter parameter,
const float* array);
void cgGLSetParameter1d(CGparameter parameter, double x);
void cgGLSetParameter1dv(CGparameter parameter,
const double* array);
void cgGLSetParameter2f(CGparameter parameter, float x,
float y);
void cgGLSetParameter2fv(CGparameter parameter,
const float* array);
void cgGLSetParameter2d(CGparameter parameter, double x,
double y);
void cgGLSetParameter2dv(CGparameter parameter,
const double* array);
void cgGLSetParameter3f(CGparameter parameter, float x,
float y, float z);
void cgGLSetParameter3fv(CGparameter parameter,
const float* array);
void cgGLSetParameter3d(CGparameter parameter, double x,
double y, double z);
void cgGLSetParameter3dv(CGparameter parameter,
const double* array);
void cgGLSetParameter4f(CGparameter parameter, float x,
float y, float z, float w);
void cgGLSetParameter4fv(CGparameter parameter,
const float* array);
808-00504-0000-006 75
NVIDIA
Introduction to the Cg Runtime Library
Thedigitinthenameofthosefunctionsindicateshowmanyscalarvalues
aresetbythefunction.Thevsuffixisforfunctionsthatoperateonanarray
ofvaluesasopposedtoindividualarguments.
Ifmorevaluesaresetthantheparameterrequires,theextravaluesare
ignored.Iflessvaluesaresetthantheparameterrequires,thelastvalueis
smeared.ThecgGLSetParameterfunctionsmaybecalledforeitheruniform
orvaryingparameters.Whencalledforavaryingparameter,theappropriate
immediatemodeOpenGLentrypointiscalled.
Thecorrespondingparametervalueretrievalfunctionsareasfollows:
Setting Uniform Matrix Parameters
ThecgGLSetMatrixParameterfunctionsareusedtosetanymatrix:
Thematrixispassedasanarrayoffloatingpointvalueswhosesizematches
thenumberofcoefficientsofthematrix.Thersuffixisforfunctionsthat
assumethematrixislaidoutinroworder,andthecsuffixisforfunctions
thatassumethematrixislaidoutincolumnorder.
Thecorrespondingparametervalueretrievalfunctionsare
void cgGLSetParameter4d(CGparameter parameter, double x,
double y, double z, double w);
void cgGLSetParameter4dv(CGparameter parameter,
const double* array);
cgGLGetParameter1f(CGparameter parameter, float* array);
cgGLGetParameter1d(CGparameter parameter, double* array);
cgGLGetParameter2f(CGparameter parameter, float* array);
cgGLGetParameter2d(CGparameter parameter, double* array);
cgGLGetParameter3f(CGparameter parameter, float* array);
cgGLGetParameter3d(CGparameter parameter, double* array);
cgGLGetParameter4f(CGparameter parameter, double* array);
cgGLGetParameter4d(CGparameter parameter, type* array);
void cgGLSetMatrixParameterfr(CGparameter parameter,
const float* matrix);
void cgGLSetMatrixParameterfc(CGparameter parameter,
const float* matrix);
void cgGLSetMatrixParameterdr(CGparameter parameter,
const double* matrix);
void cgGLSetMatrixParameterdc(CGparameter parameter,
const double* matrix);
void cgGLGetMatrixParameterfr(CGparameter parameter,
float* matrix);
void cgGLGetMatrixParameterfc(CGparameter parameter,
float* matrix);
76 808-00504-0000-006
NVIDIA
Cg Language Toolkit
UsecgGLSetStateMatrixParameter()tosetaOpenGL4x4statematrix:
ThevariablestateMatrixTypeisanenumeratetypespecifyingthestate
matrixtobeusedtosettheparameter:
CG_GL_MODELVIEW_MATRIXforthecurrentmodelviewmatrix
CG_GL_PROJECTION_MATRIXforthecurrentprojectionmatrix
CG_GL_TEXTURE_MATRIXforthecurrenttexturematrix
CG_GL_MODELVIEW_PROJECTION_MATRIXfortheconcatenatedmodel
viewandprojectionmatrices
Thevariabletransformisanenumeratetypespecifyingatransformation
appliedtothestatematrixbeforeitisusedtosettheparametervalue:
CG_GL_MATRIX_IDENTITYforapplyingnotransformationatall
CG_GL_MATRIX_TRANSPOSEfortransposingthematrix
CG_GL_MATRIX_INVERSEforinvertingthematrix
CG_GL_MATRIX_INVERSE_TRANSPOSEforinvertingandtransposingthe
matrix
Setting Uniform Arrays of Scalar, Vector, and Matrix Parameters
Tosetthevaluesofarraysofuniformscalarorvectorparameters,usethe
cgGLSetParameterArrayfunctions:
void cgGLGetMatrixParameterdr(CGparameter parameter,
double* matrix);
void cgGLGetMatrixParameterdc(CGparameter parameter,
double* matrix);
void cgGLSetStateMatrixParameter(CGparameter parameter,
GLenum stateMatrixType, GLenum transform);
void cgGLSetParameterArray1f(CGparameter parameter,
long startIndex, long numberOfElements,
const float* array);
void cgGLSetParameterArray1d(CGparameter parameter,
long startIndex, long numberOfElements,
const double* array);
void cgGLSetParameterArray2f(CGparameter parameter,
long startIndex, long numberOfElements,
const float* array);
void cgGLSetParameterArray2d(CGparameter parameter,
long startIndex, long numberOfElements,
const double* array);
808-00504-0000-006 77
NVIDIA
Introduction to the Cg Runtime Library
Thedigitinthenameofthosefunctionsindicatesthetypeoftheparameter
arrayelements:1forarraysoffloat1,2forarraysoffloat2,andsoon.The
variablesstartIndexandnumberOfElementsspecifywhichelementsofthe
arrayparameterareset:TheyarethenumberOfElementselementsofthe
indicesthatrangefromstartIndextostartIndex+numberOfElements-1.
Passingavalueof0fornumberOfElementstellsthefunctionstosetallthe
valuesstartingatindexstartIndexuptothelastvalidindexofthearray,
namelycgGetArraySize(parameter,0)-1.Thisisequivalenttosetting
numberOfElementstocgGetArraySize(parameter,0)-startIndex.The
parameterarrayisanarrayofscalarvalues.Itmusthave
numberOfElementsforthecgGLSetParameterArray1functions,
2*numberOfElementsforthecgGLSetParameterArray2functions,andso
on.
Thecorrespondingparametervalueretrievalfunctionsareasfollows:
void cgGLSetParameterArray3f(CGparameter parameter,
long startIndex, long numberOfElements,
const float* array);
void cgGLSetParameterArray3d(CGparameter parameter,
long startIndex, long numberOfElements,
const double* array);
void cgGLSetParameterArray4f(CGparameter parameter,
long startIndex, long numberOfElements,
const float* array);
void cgGLSetParameterArray4d(CGparameter parameter,
long startIndex, long numberOfElements,
const double* array);
void cgGLGetParameterArray1f(CGparameter parameter,
long startIndex, long numberOfElements, float* array);
void cgGLGetParameterArray1d(CGparameter parameter,
long startIndex, long numberOfElements, double* array);
void cgGLGetParameterArray2f(CGparameter parameter,
long startIndex, long numberOfElements, float* array);
void cgGLGetParameterArray2d(CGparameter parameter,
long startIndex, long numberOfElements, double* array);
void cgGLGetParameterArray3f(CGparameter parameter,
long startIndex, long numberOfElements, float* array);
void cgGLGetParameterArray3d(CGparameter parameter,
long startIndex, long numberOfElements, double* array);
void cgGLGetParameterArray4f(CGparameter parameter,
long startIndex, long numberOfElements, float* array);
void cgGLGetParameterArray4d(CGparameter parameter,
long startIndex, long numberOfElements, double* array);
78 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Similarfunctionsexisttosetthevaluesofarraysofuniformmatrix
parameters:
andtoquerythosevalues:
Thecandrsuffixeshavethesamemeaningastheydoforthe
cgGLSetMatrixParameterfunctions.
Setting Varying Parameters
Thevaluesoffragmentprogramvaryingparametersaresetastheresultof
theinterpolationacrossthetrianglesperformedbytheGPU,soonlythe
valuesofvertexprogramvaryingparametersaresetbytheapplication.
Settingavertexvaryingparameterrequirestwosteps.
Thefirststepconsistsinpassingapointertoanarraycontainingthevalues
foreachvertex.ThisisdoneusingcgGLSetParameterPointer():
Thevariablesizeindicatesthenumberofvaluespervertexthatarestoredin
array.Itisequalto1,2,3,or4.Iffewervaluesaresetthantheparameter
requires,thenonspecifiedvaluesdefaultto0forx,y,andz,and1forw.
void cgGLSetMatrixParameterArrayfr(CGparameter parameter,
long startIndex, long numberOfElements,
const float* array);
void cgGLSetMatrixParameterArrayfc(CGparameter parameter,
long startIndex, long numberOfElements,
const float* array);
void cgGLSetMatrixParameterArraydc(CGparameter parameter,
long startIndex, long numberOfElements,
const double* array);
void cgGLSetMatrixParameterArraydc(CGparameter parameter,
long startIndex, long numberOfElements,
const double* array);
void cgGLGetMatrixParameterArrayfr(CGparameter parameter,
long startIndex, long numberOfElements, float* array);
void cgGLGetMatrixParameterArrayfc(CGparameter parameter,
long startIndex, long numberOfElements, float* array);
void cgGLGetMatrixParameterArraydc(CGparameter parameter,
long startIndex, long numberOfElements, double* array);
void cgGLGetMatrixParameterArraydc(CGparameter parameter,
long startIndex, long numberOfElements, double* array);
void cgGLSetParameterPointer(CGparameter parameter,
GLint size, GLenum type, GLsizei stride,
GLvoid* array);
808-00504-0000-006 79
NVIDIA
Introduction to the Cg Runtime Library
Theenumeratetypetypespecifiesthedatatypeofthevaluesstoredin
array:GL_SHORT,GL_INT,GL_FLOAT,orGL_DOUBLE.
Theparameterstrideisthebyteoffsetbetweenanytwoconsecutive
vertices.Passingavalueofzeroforstrideisequivalenttopassingabyte
offsetequaltosizemultipliedbythesizeoftypeinbytes;inotherwords,it
meansthatthereisnogapbetweentwoconsecutivevertexvalues.Notethat
theminimumsizeforarrayisimplicitlydefinedbythebiggestvertexindex
specifiedinthetrianglesdrawn.
Thesecondstepconsistsinenablingthevaryingparameterforaspecific
drawingcall:
Theequivalentdisablingfunctionis
Anotherwaytosetthevertexvaryingparameteristousethe
cgGLSetParameterfunctions.WhenacgGLSetParameterfunctioniscalled
foravaryingparameter,theappropriateimmediatemodeOpenGLentry
pointiscalled.ThecgGLGetParameterfunctionsdonotapplytovarying
parameters.
Setting Sampler Parameters
Settingasamplerparameterrequirestwosteps.First,anOpenGLtexture
objecthandlemustbeassignedtothesamplerparameter.Next,thetexture
unitassociatedwiththesamplermustbeenabledpriortodrawing.Thefirst
stepmustbedoneexplicitlybytheapplication.Thesecondstepmayalsobe
performedexplicitlybytheapplication,ortheOpenGLCgruntimecanbe
instructedtoautomaticallymanagetextureunitsitself.
ThefirststepconsistsinassigninganOpenGLtextureobjecttothesampler
parameterusing
wheretextureNameistheOpenGLtexturename.Notethatwhenyour
applicationmakesOpenGLcallstoinitializethetextureenvironmentfora
givensampler,itisimportanttoremembertosettheactivetextureunitto
thatassociatedwiththesamplerbeforedoingso.Thesampler’stextureunit
canberetrievedbycallingcgGLGetTextureEnum();seethefollowing
discussion.
Thesecondstepconsistsofenablingthetextureunitassociatedwiththe
samplerparameterforaspecificdrawingcall.Itisstronglyrecommended
void cgGLEnableClientState(CGparameter parameter);
void cgGLDisableClientState(CGparameter parameter);
void cgGLSetTextureParameter(CGparameter parameter,
GLuint textureName);
80 808-00504-0000-006
NVIDIA
Cg Language Toolkit
thatapplicationsallowtheCgOpenGLruntimelibrarytoperformthis
secondstepitself.Thisisaccomplishedbycalling:
withenablesettoanonzerovalueaftertheCgcontexthasbeencreated.
Whenautomatictextureparametermanagementisineffect,theCgOpenGL
runtimewillautomaticallyenableallappropriatetextureunitswhena
CGprogramisbound.
If,despitetheabove,youwishtomanagetextureparametersyourself,you
canusethehelperfunction
whichmustbecalledaftercgGLSetTextureParameter()andbeforethe
actualdrawingcall.
Theequivalentdisablingfunctionis:
Youcanretrievethetextureobjectassignedtoasamplerparameterusing
YoucanretrievetheOpenGLenumerantforthetextureunitassociatedwith
asamplerparameterusing
ThereturnedenumeranthastheformGL_TEXTURE#_ARBwhere#isthe
textureunitindex.
OpenGL Profile Support
Aconvenientfunctionisprovidedthatgivesthebestavailableprofilefor
vertexorfragmentprogramsdependingontheavailableOpenGL
extensions.
ParameterprofileTypeisequaltoCG_GL_VERTEXorCG_GL_FRAGMENT.
FunctioncgGLGetLatestProfile()maybeusedinconjunctionwith
cgCreateProgram()orcgCreateProgramFromFile()toensurethatthebest
availablevertexandfragmentprofilesareusedforcompilation.Thisallows
youtomakeyourapplicationfutureready,becausetheCgprogramsare
automaticallycompiledforthebestprofilesthatareavailableatruntime,
eveniftheseprofilesdidnotexistatthetimetheapplicationwaswritten.
Anotherfunctionthatallowsyouoptimalcompilationis
cgGLSetOptimalOptions().Itsetsimplicitcompilerargumentsthatare
void cgGLSetManageTextureParameters(CGcontext context,
CGbool enable);
void cgGLEnableTextureParameter(CGparameter parameter);
void cgGLDisableTextureParameter(CGparameter parameter);
GLuint cgGLGetTextureParameter(CGparameter parameter);
GLenum cgGLGetTextureEnum(CGparameter parameter);
CGprofile cgGLGetLatestProfile(CGGLenum profileType);
808-00504-0000-006 81
NVIDIA
Introduction to the Cg Runtime Library
appendedtotheargumentlistpassedtocgCreateProgram()or
cgCreateProgramFromFile().
OpenGL Program Execution
Allprogramsmustbeloadedbeforetheycanbebound.Toloadaprogram
usecgGLLoadProgram():
Bindingaprogramonlyworksifitsprofileisenabled.Thisisdonebycalling
cgGLEnableProfile()withtheprogramprofile:
ThebindingitselfisdoneusingcgGLBindProgram():
Onlyonevertexprogramandonefragmentprogramcanbeboundatany
giventime,sobindingaprogramimplicitlyunbindsanyotherprogramof
thattype.
ProfilesaredisabledusingcgGLDisableProfile():
Someprofilesmaynotbesupportedonsomesystems.Forexample,agiven
profileisnotsupportediftheOpenGLextensionsitrequiresarenot
available.Youcancheckifaprofileissupportedbyusing
cgGLIsProfileSupported():
ItreturnsCG_TRUEifprofileissupportedandCG_FALSEotherwise.
OpenGL Program Examples
Thissectionpresentscodethatillustrateshowtousefunctionsfromthe
OpenGLCginterfacetomakeCgprogramsworkwithOpenGL.Thevertex
andfragmentprogramsbelowareusedin“OpenGLApplication”on
page 82.
OpenGL Vertex Program
ThefollowingCgcodeisassumedtobeinafilecalledVertexProgram.cg.
void cgGLSetOptimalOptions(CGprofile profile);
void cgGLLoadProgram(CGprogram program);
void cgGLEnableProfile(CGprofile profile);
void cgGLBindProgram(CGprogram program);
void cgGLDisableProfile(CGprofile profile);
CGbool cgGLIsProfileSupported(CGprofile profile);
void VertexProgram(
in float4 position : POSITION,
in float4 color : COLOR0,
in float4 texCoord : TEXCOORD0,
82 808-00504-0000-006
NVIDIA
Cg Language Toolkit
OpenGL Fragment Program
ThefollowingCgcodeisassumedtobeinafilecalledFragmentProgram.cg.
OpenGL Application
ThisCcodelinksthepreviousvertexandfragmentprogramstothe
application.
out float4 positionO : POSITION,
out float4 colorO : COLOR0,
out float4 texCoordO : TEXCOORD0,
const uniform float4x4 ModelViewMatrix )
{
positionO = mul(position, ModelViewMatrix);
colorO = color;
texCoordO = texCoord;
}
void FragmentProgram(
in float4 color : COLOR0,
in float4 texCoord : TEXCOORD0,
out float4 colorO : COLOR0,
const uniform sampler2D BaseTexture,
const uniform float4 SomeColor)
{
colorO = color * tex2D(BaseTexture, texCoord) + SomeColor;
}
#include <cg/cg.h>
#include <cg/cgGL.h>
float* vertexPositions; // Initialized somewhere else
float* vertexColors; // Initialized somewhere else
float* vertexTexCoords; // Initialized somewhere else
GLuint texture; // Initialized somewhere else
float constantColor[]; // Initialized somewhere else
CGcontext context;
CGprogram vertexProgram, fragmentProgram;
CGprofile vertexProfile, fragmentProfile;
CGparameter position, color, texCoord, baseTexture, someColor,
modelViewMatrix;
// Called at initialization
void CgGLInit()
{
// Create context
context = cgCreateContext();
808-00504-0000-006 83
NVIDIA
Introduction to the Cg Runtime Library
// Initialize profiles and compiler options
vertexProfile = cgGLGetLatestProfile(CG_GL_VERTEX);
cgGLSetOptimalOptions(vertexProfile);
fragmentProfile = cgGLGetLatestProfile(CG_GL_FRAGMENT);
cgGLSetOptimalOptions(fragmentProfile);
// Create the vertex program
vertexProgram = cgCreateProgramFromFile(
context, CG_SOURCE, "VertexProgram.cg",
vertexProfile, "VertexProgram", 0);
// Load the program
cgGLLoadProgram(vertexProgram);
// Create the fragment program
fragmentProgram = cgCreateProgramFromFile(
context, CG_SOURCE, "FragmentProgram.cg",
fragmentProfile, "FragmentProgram", 0);
// Load the program
cgGLLoadProgram(fragmentProgram);
// Grab some parameters.
position = cgGetNamedParameter(vertexProgram, "position");
color = cgGetNamedParameter(vertexProgram, "color");
texCoord = cgGetNamedParameter(vertexProgram, "texCoord");
modelViewMatrix = cgGetNamedParameter(vertexProgram,
"ModelViewMatrix");
baseTexture = cgGetNamedParameter(fragmentProgram,
"BaseTexture");
someColor = cgGetNamedParameter(fragmentProgram,
"SomeColor");
// Set parameters that don't change:
// They can be set only once because of parameter shadowing.
cgGLSetTextureParameter(baseTexture, texture);
cgGLSetParameter4fv(someColor, constantColor);
}
// Called to render the scene
void Display()
{
// Set the varying parameters
cgGLEnableClientState(position);
84 808-00504-0000-006
NVIDIA
Cg Language Toolkit
cgGLSetParameterPointer(position, 3, GL_FLOAT, 0,
vertexPositions);
cgGLEnableClientState(color);
cgGLSetParameterPointer(color, 1, GL_FLOAT, 0,
vertexColors);
cgGLEnableClientState(texCoord);
cgGLSetParameterPointer(texCoord, 2, GL_FLOAT, 0,
vertexTexCoords);
// Set the uniform parameters that change every frame
cgGLSetStateMatrixParameter(modelViewMatrix,
CG_GL_MODELVIEW_PROJECTION_MATRIX,
CG_GL_MATRIX_IDENTITY);
// Enable the profiles
cgGLEnableProfile(vertexProfile);
cgGLEnableProfile(fragmentProfile);
// Bind the programs
cgGLBindProgram(vertexProgram);
cgGLBindProgram(fragmentProgram);
// Enable texture
cgGLEnableTextureParameter(baseTexture);
// Draw scene
// ...
// Disable texture
cgGLDisableTextureParameter(baseTexture);
// Disable the profiles
cgGLDisableProfile(vertexProfile);
cgGLDisableProfile(fragmentProfile);
// Set the varying parameters
cgGLDisableClientState(position);
cgGLDisableClientState(color);
cgGLDisableClientState(texCoord);
}
// Called before application shuts down
void CgShutdown()
{
// This frees any runtime resource.
808-00504-0000-006 85
NVIDIA
Introduction to the Cg Runtime Library
OpenGL Error Reporting
HereisthelistoftheCGerrorerrorsspecifictotheOpenGLCgruntime:
CG_PROGRAM_LOAD_ERROR:Returnedwhentheprogramcouldnotbe
loaded.
CG_PROGRAM_BIND_ERROR:Returnedwhentheprogramcouldnotbe
bound.
CG_PROGRAM_NOT_LOADED_ERROR:Returnedwhentheprogrammustbe
loadedbeforetheoperationmaybeused.
CG_UNSUPPORTED_GL_EXTENSION_ERROR:Returnedwhenan
unsupportedOpenGLextensionisrequiredtoperformtheoperation.
AnyOpenGLCgruntimefunctioncangenerateanOpenGLerrorinaddition
totheCgspecificerror.TheseerrorsarecheckedinCg,asinanyOpenGL
application,byusingglGetError().
Direct3D Cg Runtime
TheDirect3DCgruntimeiscomposedoftwointerfaces:
Minimalinterface:ThisinterfacemakesnoDirect3Dcallsitselfandshould
beusedwhenyouprefertokeeptheDirect3Dcodeintheapplication
itself.
Expandedinterface:ThisinterfacemakestheDirect3Dcallsnecessaryto
provideenhancedprogramandparametermanagementandshouldbe
usedwhenyouprefertolettheCgruntimemanagetheDirect3D
shaders.
Direct3D Minimal Interface
Theminimalinterfacesimplysuppliesconvenientfunctionstoconvertsome
informationprovidedbythecoreruntimetoinformationspecificto
Direct3D.
Vertex Declaration
InDirect3D,youhavetosupplyavertexdeclarationthatestablishesa
mappingbetweenthevertexshaderinputregistersandthedataprovidedby
theapplicationasdatastreams.InDirect3D9,thisvertexdeclarationis
boundtothecurrentstatethesamewaythevertexshaderis(seethe
cgDestroyContext(context);
}
86 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Direct3D9documentationon
IDirect3DDevice9::CreateVertexDeclaration()and
IDirect3DDevice9::SetVertexDeclaration()foradetailedexplanation).
InDirect3D8,thevertexdeclarationisrequiredatthetimeyoucreatethe
vertexshader(formoreinformation,seetheDirect3D8documentationon
IDirect3DDevice8::CreateVertexShader()).
Adatastreamisbasicallyanarrayofdatastructures.Eachofthosestructures
isofaparticulartypecalledthevertexformatofthestream.Hereisan
exampleofavertexdeclarationforDirect3D9:
HereisanexampleofavertexdeclarationforDirect3D8:
BothdeclarationstelltheDirect3Druntimetofind(1)thepositionsofthe
verticesinstream0asthefirstthreefloatingpointvaluesofthevertex
format,(2)thenormalsasthenextthreefloatingpointvaluesfollowingthe
threefloatingpointvaluesinstream0,and(3)thetexturecoordinatesasthe
twofloatingpointvalueslocatedatanoffsetequaltotwicethesizeofa
DWORDfromtheendofthenormaldatainstream0.Thetangentsare
const D3DVERTEXELEMENT9 declaration[] = {
{ 0, 0 * sizeof(float),
D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_POSITION, 0 }, // Position
{ 0, 3 * sizeof(float),
D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_NORMAL, 0 }, // Normal
{ 0, 8 * sizeof(float),
D3DDECLTYPE_FLOAT2, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_TEXCOORD, 0 }, // Base texture
{ 1, 0 * sizeof(float),
D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_TEXCOORD, 1 }, // Tangent
D3DD3CL_END()
};
const DWORD declaration[] = {
D3DVSD_STREAM(0),
D3DVSD_REG(D3DVSDE_POSITION, D3DVSDT_FLOAT3), // Position
D3DVSD_REG(D3DVSDE_NORMAL, D3DVSDT_FLOAT3), // Normal
D3DVSD_SKIP(2), // Skip the diffuse and specular color
D3DVSD_REG(D3DVSDE_TEXCOORD0,
D3DVSDT_FLOAT2), // Base texture
D3DVSD_STREAM(1), // Tangent basis stream
D3DVSD_REG(D3DVSDE_TEXCOORD1, D3DVSDT_FLOAT3),// Tangent
D3DVSD_END()
};
808-00504-0000-006 87
NVIDIA
Introduction to the Cg Runtime Library
providedinstream1asasecondtexturecoordinatesetthatisfoundasthe
firstthreefloatingpointvaluesofthevertexformat.
TogetavertexdeclarationfromaCgvertexprogramfortheDirect3D9Cg
runtimeusecgD3D9GetVertexDeclaration():
MAXD3DDECLLENGTHisaDirect3D9constantthatgivesthemaximumlength
ofaDirect3D9declaration.Ifnodeclarationcanbederivedfromthe
program,cgD3D9GetVertexDeclaration()failsandreturnsCG_FALSE.
TogetavertexdeclarationfromaCgvertexprogramfortheDirect3D8Cg
runtimeusecgD3D8GetVertexDeclaration():
MAX_FVF_DECL_SIZEisaDirect3Dconstantthatgivesthemaximumlength
ofaDirect3Ddeclaration.Ifnodeclarationcanbederivedfromtheprogram,
cgD3D8GetVertexDeclaration()failsandreturnsCG_FALSE.
ThedeclarationreturnedbycgD3D9GetVertexDeclaration()or
cgD3D8GetVertexDeclaration()isforasinglestream,sothatforthe
followingprogram:
itisequivalentto:
fortheDirect3D9Cgruntime,anditisequivalentto:
CGbool cgD3D9GetVertexDeclaration(CGprogram program,
D3DVERTEXELEMENT9 declaration[MAXD3DDECLLENGTH]);
CGbool cgD3D8GetVertexDeclaration(CGprogram program,
DWORD declaration[MAX_FVF_DECL_SIZE]);
void main(in float4 position : POSITION,
in float4 color : COLOR0,
in float4 texCoord : TEXCOORD0,
out float4 hpos : POSITION)
{ }
const D3DVERTEXELEMENT9 declaration[] = {
{ 0, 0 * sizeof(float),
D3DDECLTYPE_FLOAT4, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_POSITION, 0 },
{ 0, 4 * sizeof(float),
D3DDECLTYPE_FLOAT4, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_COLOR, 0 },
{ 0, 8 * sizeof(float),
D3DDECLTYPE_FLOAT4, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_TEXCOORD, 0 },
D3DD3CL_END()
};
const DWORD declaration[] = {
88 808-00504-0000-006
NVIDIA
Cg Language Toolkit
fortheDirect3D8Cgruntime.
Usuallythough,youwanttoapplyavertexprogramtogeometricdatathat
comeinmultiplestreamsorwithspecificvertexformats.Inthiscase,the
vertexdeclarationisbasedonthevertexformatsratherthantheprogram.To
seeifitiscompatiblewiththeprogram,use
cgD3D9ValidateVertexDeclaration():
fortheDirect3D9CgruntimeorcgD3D8ValidateVertexDeclaration().
UsecgD3D8ValidateVertexDeclaration():
fortheDirect3D8Cgruntime.
AcalltocgD3D9ValidateVertexDeclaration()or
cgD3D8ValidateVertexDeclaration()returnsCG_TRUEifthevertex
declarationiscompatiblewiththeprogram.ADirect3D9declarationis
compatiblewiththeprogramifthedeclarationhasanentrymatchingevery
varyinginputparameterusedbytheprogram.ADirect3D8declarationis
compatiblewiththeprogramifthedeclarationhasaD3DVSD_REG()macro
callmatchingeveryvaryinginputparameterusedbytheprogram.Forthe
program
thefollowingDirect3D9vertexdeclarationisvalid:
D3DVSD_STREAM(0),
D3DVSD_REG(D3DVSDE_POSITION, D3DVSDT_FLOAT4),
D3DVSD_REG(D3DVSDE_DIFFUSE, D3DVSDT_FLOAT4),
D3DVSD_REG(D3DVSDE_TEXCOORD0, D3DVSDT_FLOAT4),
D3DVSD_END()
};
CGbool cgD3D9ValidateVertexDeclaration(CGprogram program,
const D3DVERTEXELEMENT9* declaration);
CGbool cgD3D8ValidateVertexDeclaration(CGprogram program,
const DWORD* declaration);
void main(float4 position : POSITION,
float4 color : COLOR0,
float4 texCoord : TEXCOORD0)
{ }
const D3DVERTEXELEMENT9 declaration[] = {
{ 0, 0 * sizeof(float),
D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_POSITION, 0 },
{ 0, 3 * sizeof(float),
D3DDECLTYPE_D3DCOLOR, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_COLOR, 0 },
{ 1, 4 * sizeof(float),
808-00504-0000-006 89
NVIDIA
Introduction to the Cg Runtime Library
andthefollowingDirect3D8vertexdeclarationisvalid:
ThisistruebecauseD3DDECLUSAGE_POSITIONandD3DVSDE_POSITIONmatch
thehardwareregisterassociatedwiththepredefinedsemanticPOSITION,
D3DDECLUSAGE_DIFFUSEandD3DVSDE_DIFFUSEmatchtheregister
associatedwithCOLOR0,andD3DDECLUSAGE_TEXCOORD0and
D3DVSDE_TEXCOORD0matchtheregisterassociatedwithTEXCOORD0.
Theabovedeclarationscanalsobewrittenthefollowingwayusing
cgD3D9ResourceToDeclUsage()orcgD3D8ResourceToInputRegister():
D3DDECLTYPE_FLOAT2, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_TEXCOORD, 0 },
D3DD3CL_END()
};
DWORD declaration[] = {
D3DVSD_STREAM(0),
D3DVSD_REG(D3DVSDE_POSITION, D3DVSDT_FLOAT3),
D3DVSD_REG(D3DVSDE_DIFFUSE, D3DVSDT_D3DCOLOR),
D3DVSD_STREAM(1),
D3DVSD_SKIP(4),
D3DVSD_REG(D3DVSDE_TEXCOORD0, D3DVSDT_FLOAT2),
D3DVSD_END()
};
const D3DVERTEXELEMENT9 declaration[] = {
{ 0, 0 * sizeof(float),
D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT,
cgD3D9ResourceToDeclUsage(CG_POSITION), 0 },
{ 0, 3 * sizeof(float),
D3DDECLTYPE_D3DCOLOR, D3DDECLMETHOD_DEFAULT,
cgD3D9ResourceToDeclUsage(CG_COLOR0), 0 },
{ 1, 4 * sizeof(float),
D3DDECLTYPE_FLOAT2, D3DDECLMETHOD_DEFAULT,
cgD3D9ResourceToDeclUsage(CG_TEXCOORD0), 0 },
D3DD3CL_END()
};
DWORD declaration[] = {
D3DVSD_STREAM(0),
D3DVSD_REG(cgD3D8ResourceToInputRegister(CG_POSITION),
D3DVSDT_FLOAT3),
D3DVSD_REG(cgD3D8ResourceToInputRegister(CG_COLOR0),
D3DVSDT_D3DCOLOR),
D3DVSD_STREAM(1),
D3DVSD_SKIP(4),
D3DVSD_REG(cgD3D8ResourceToInputRegister(CG_TEXCOORD0),
90 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Ifitispossibletodoso,thefunctionscgD3D9ResourceToDeclUsage()and
cgD3D8ResourceToInputRegister()convertaCGresourceenumerated
typeintoaDirect3Dvertexshaderinputregister:
Iftheresourceisnotavertexshaderinputresource,thecallto
cgD3D9ResourceToDeclUsage()returnsCGD3D9_INVALID_REGandthecall
tocgD3D8ResourceToInputRegister()returnsCGD3D8_INVALID_REG.
Towritethevertexdeclarationsdescribedabovebasedontheprogram
parameters,whicheliminatesthereferencetoanysemantic,use
cgD3D9ResourceToDeclUsage()orcgD3D8ResourceToInputRegister():
D3DVSDT_FLOAT2),
D3DVSD_END()
};
BYTE cgD3D9ResourceToDeclUsage(CGresource resource);
DWORD cgD3D8ResourceToInputRegister(CGresource resource);
CGparameter position =
cgGetNamedParameter(program, "position");
CGparameter color =
cgGetNamedParameter(program, "color");
CGparameter texCoord =
cgGetNamedParameter(program, "texCoord");
const D3DVERTEXELEMENT9 declaration[] = {
{ 0, 0 * sizeof(float),
D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT,
cgD3D9ResourceToDeclUsage(
cgGetParameterResource(position)),
cgGetParameterResourceIndex(position) },
{ 0, 3 * sizeof(float),
D3DDECLTYPE_D3DCOLOR, D3DDECLMETHOD_DEFAULT,
cgD3D9ResourceToDeclUsage(cgGetParameterResource(color)),
cgGetParameterResourceIndex(color) },
{ 1, 4 * sizeof(float),
D3DDECLTYPE_FLOAT2, D3DDECLMETHOD_DEFAULT,
cgD3D9ResourceToDeclUsage(
cgGetParameterResource(texCoord)),
cgGetParameterResourceIndex(texCoord) },
D3DD3CL_END()
};
DWORD declaration[] = {
D3DVSD_STREAM(0),
D3DVSD_REG(cgD3D8ResourceToInputRegister(
cgGetParameterResource(position)), D3DVSDT_FLOAT3),
D3DVSD_REG(cgD3D8ResourceToInputRegister(
808-00504-0000-006 91
NVIDIA
Introduction to the Cg Runtime Library
ThesizespecifiedasthesecondargumentoftheD3DVSD_REG()macrocallof
aDirect3D8declarationdoesnotneedtomatchthesizeofthe
correspondingparameterforthevertexdeclarationtobevalid.Thosesizes
arespecifiedtodescribehowthedataislaidoutinthestreams,notto
performanytypecheckingwiththeshadercode.Thedatareferredtobya
D3DVSD_REG()macrocallisexpandedtothefourfloatingpointvaluesofthe
correspondinghardwareregister,andthemissingvaluesaresetto0forx,y,
andz,andto1forw.
Minimal Interface Type Retrieval
UsecgD3D9TypeToSize()toretrievethesizeofaCGtypeenumeratedtype
intermsoffloatingpointnumbers:
Moreprecisely,itisthenumberoffloatingpointvaluesrequiredtostorea
parameteroftypetype.Thisfunctiondoesnotapplytosometypes,likethe
samplertypes,inwhichcaseitreturnszero.Itisusefulbecauseapplications
candeterminehowmanyfloatingpointvaluestheyhavetoprovidetoset
thevalueofagivenparameter.
Minimal Interface Program Examples
Inthissectionweprovidesomecodesamplesthatillustratehowandwhen
tousefunctionsfromtheminimalinterfacetomakeCgprogramsworkwith
Direct3D.Toenhanceclarity,theexamplesdoverylittleerrorchecking,buta
productionapplicationshouldcheckthereturnvaluesofallCgfunctions.
Thevertexandfragmentprogramsbelowarereferencedin“Direct3D9
Application”onpage 92and“Direct3D8Application”onpage 95.
Vertex Program
ThefollowingCgcodeisassumedtobeinafilecalledVertexProgram.cg.
cgGetParameterResource(color)), D3DVSDT_D3DCOLOR),
D3DVSD_STREAM(1),
D3DVSD_SKIP(4),
D3DVSD_REG(cgD3D8ResourceToInputRegister(
cgGetParameterResource(texCoord)), D3DVSDT_FLOAT2),
D3DVSD_END()
};
DWORD cgD3D9TypeToSize(CGtype type);
void VertexProgram(
in float4 position : POSITION,
in float4 color : COLOR0,
in float4 texCoord : TEXCOORD0,
out float4 positionO : POSITION,
92 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Fragment Program
ThefollowingCgcodeisassumedtobeinafilecalledFragmentProgram.cg.
Direct3D 9 Application
ThefollowingCcodelinksthepreviousvertexandfragmentprogramsto
theDirect3D9application.
out float4 colorO : COLOR0,
out float4 texCoordO : TEXCOORD0,
const uniform float4x4 ModelViewMatrix)
{
positionO = mul(position, ModelViewMatrix);
colorO = color;
texCoordO = texCoord;
}
void FragmentProgram(
in float4 color : COLOR0,
in float4 texCoord : TEXCOORD0,
out float4 colorO : COLOR0,
const uniform sampler2D BaseTexture,
const uniform float4 SomeColor)
{
colorO = color * tex2D(BaseTexture, texCoord) + SomeColor;
}
#include <cg/cg.h>
#include <cg/cgD3D9.h>
IDirect3DDevice9* device; // Initialized somewhere else
IDirect3DTexture9* texture; // Initialized somewhere else
D3DXMATRIX matrix; // Initialized somewhere else
D3DXCOLOR constantColor; // Initialized somewhere else
CGcontext context;
CGprogram vertexProgram, fragmentProgram;
IDirect3DVertexDeclaration9* vertexDeclaration;
IDirect3DVertexShader9* vertexShader;
IDirect3DPixelShader9* pixelShader;
CGparameter baseTexture, someColor, modelViewMatrix;
// Called at application startup
void OnStartup()
{
// Create context
context = cgCreateContext();
}
808-00504-0000-006 93
NVIDIA
Introduction to the Cg Runtime Library
// Called whenever the Direct3D device needs to be created
void OnCreateDevice()
{
// Create the vertex shader
vertexProgram = cgCreateProgramFromFile(context, CG_SOURCE,
"VertexProgram.cg", CG_PROFILE_VS_2_0, "VertexProgram", 0);
CComPtr<ID3DXBuffer> byteCode;
const char* progSrc = cgGetProgramString(vertexProgram,
CG_COMPILED_PROGRAM);
D3DXAssembleShader(progSrc, strlen(progSrc), 0, 0, 0,
&byteCode, 0);
// If your program uses explicit binding semantics (like
// this one), you can create a vertex declaration
// using those semantics.
const D3DVERTEXELEMENT9 declaration[] = {
{ 0, 0 * sizeof(float),
D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_POSITION, 0 },
{ 0, 3 * sizeof(float),
D3DDECLTYPE_D3DCOLOR, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_COLOR, 0 },
{ 0, 4 * sizeof(float),
D3DDECLTYPE_FLOAT2, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_TEXCOORD, 0 },
D3DD3CL_END()
};
// Make sure the resulting declaration is compatible with
// the shader. This is really just a sanity check.
assert(cgD3D9ValidateVertexDeclaration(vertexProgram,
declaration));
device->CreateVertexDeclaration(
declaration, &vertexDeclaration);
device->CreateVertexShader(
byteCode->GetBufferPointer(), &vertexShader);
// Create the pixel shader.
fragmentProgram = cgCreateProgramFromFile(context,
CG_SOURCE, "FragmentProgram.cg",
CG_PROFILE_PS_2_0, "FragmentProgram", 0);
{
CComPtr<ID3DXBuffer> byteCode;
const char* progSrc = cgGetProgramString(fragmentProgram,
CG_COMPILED_PROGRAM);
D3DXAssembleShader(progSrc, strlen(progSrc), 0, 0, 0,
94 808-00504-0000-006
NVIDIA
Cg Language Toolkit
&byteCode, 0);
device->CreatePixelShader(byteCode->GetBufferPointer(),
&pixelShader)
}
// Grab some parameters.
modelViewMatrix = cgGetNamedParameter(vertexProgram,
"ModelViewMatrix");
baseTexture = cgGetNamedParameter(fragmentProgram,
"BaseTexture");
someColor = cgGetNamedParameter(fragmentProgram,
"SomeColor");
// Sanity check that parameters have the expected size
assert(cgD3D9TypeToSize(cgGetParameterType(
modelViewMatrix)) == 16);
assert(cgD3D9TypeToSize(cgGetParameterType(someColor))
== 4);
}
// Called to render the scene
void OnRender()
{
// Get the Direct3D resource locations for parameters
// This can be done earlier and saved
DWORD modelViewMatrixRegister =
cgGetParameterResourceIndex(modelViewMatrix);
DWORD baseTextureUnit =
cgGetParameterResourceIndex(baseTexture);
DWORD someColorRegister =
cgGetParameterResourceIndex(someColor);
// Set the Direct3D state.
device->SetVertexShaderConstantF(modelViewMatrixRegister,
&matrix, 4);
device->SetPixelShaderConstantF(someColorRegister,
&constantColor, 1);
device->SetVertexDeclaration(vertexDeclaration);
device->SetTexture(baseTextureUnit, texture);
device->SetVertexShader(vertexShader);
device->SetPixelShader(pixelShader);
// Draw scene.
// ...
}
808-00504-0000-006 95
NVIDIA
Introduction to the Cg Runtime Library
Direct3D 8 Application
ThefollowingCcodelinksthepreviousvertexandfragmentprogramsto
theDirect3D8application.
// Called before the device changes or is destroyed
void OnDestroyDevice() {
vertexShader->Release();
pixelShader->Release();
vertexDeclaration->Release();
}
// Called before application shuts down
void OnShutdown() {
// This frees any core runtime resources.
// The minimal interface has no dynamic storage to free.
cgDestroyContext(context);
}
#include <cg/cg.h>
#include <cg/cgD3D8.h>
IDirect3DDevice8* device; // Initialized somewhere else
IDirect3DTexture8* texture; // Initialized somewhere else
D3DXMATRIX matrix; // Initialized somewhere else
D3DXCOLOR constantColor; // Initialized somewhere else
CGcontext context;
CGprogram vertexProgram, fragmentProgram;
DWORD vertexShader, pixelShader;
CGparameter baseTexture, someColor, modelViewMatrix;
// Called at application startup
void OnStartup()
{
// Create context
context = cgCreateContext();
}
// Called whenever the Direct3D device needs to be created
void OnCreateDevice()
{
// Create the vertex shader
vertexProgram = cgCreateProgramFromFile(context, CG_SOURCE,
"VertexProgram.cg", CG_PROFILE_VS_1_1, "VertexProgram", 0);
CComPtr<ID3DXBuffer> byteCode;
const char* progSrc = cgGetProgramString(vertexProgram,
96 808-00504-0000-006
NVIDIA
Cg Language Toolkit
CG_COMPILED_PROGRAM);
// Normally, you also grab the constants and prepend them
// to your vertex declaration. Not shown here for brevity.
D3DXAssembleShader(progSrc, strlen(progSrc), 0, 0, 0,
&byteCode, 0);
// If your program uses explicit binding semantics (like
// this one), you can create a vertex declaration
// using those semantics.
DWORD declaration[] = {
D3DVSD_STREAM(0),
D3DVSD_REG(D3DVSDE_POSITION, D3DVSDT_FLOAT3),
D3DVSD_REG(D3DVSDE_DIFFUSE, D3DVSDT_D3DCOLOR),
D3DVSD_REG(D3DVSDE_TEXCOORD0, D3DVSDT_FLOAT2),
D3DVSD_END()
}
// Make sure the resulting declaration is compatible with
// the shader. This is really just a sanity check.
assert(cgD3D8ValidateVertexDeclaration(vertexProgram,
declaration));
// Create the shader handle using the declaration.
device->CreateVertexShader(declaration,
byteCode->GetBufferPointer(), &vertexShader, 0);
// Create the pixel shader.
fragmentProgram = cgCreateProgramFromFile(context,
CG_SOURCE, "FragmentProgram.cg",
CG_PROFILE_PS_1_1, "FragmentProgram", 0);
{
CComPtr<ID3DXBuffer> byteCode;
const char* progSrc = cgGetProgramString(fragmentProgram,
CG_COMPILED_PROGRAM);
D3DXAssembleShader(progSrc, strlen(progSrc), 0, 0, 0,
&byteCode, 0);
device->CreatePixelShader(byteCode->GetBufferPointer(),
&pixelShader);
}
// Grab some parameters.
modelViewMatrix = cgGetNamedParameter(vertexProgram,
"ModelViewMatrix");
baseTexture = cgGetNamedParameter(fragmentProgram,
"BaseTexture");
someColor = cgGetNamedParameter(fragmentProgram,
"SomeColor");
808-00504-0000-006 97
NVIDIA
Introduction to the Cg Runtime Library
// Sanity check that parameters have the expected size
assert(cgD3D8TypeToSize(cgGetParameterType(
modelViewMatrix)) == 16);
assert(cgD3D8TypeToSize(cgGetParameterType(someColor))
== 4);
}
// Called to render the scene
void OnRender()
{
// Get the Direct3D resource locations for parameters
// This can be done earlier and saved
DWORD modelViewMatrixRegister =
cgGetParameterResourceIndex(modelViewMatrix);
DWORD baseTextureUnit =
cgGetParameterResourceIndex(baseTexture);
DWORD someColorRegister =
cgGetParameterResourceIndex(someColor);
// Set the Direct3D state.
device->SetVertexShaderConstant(modelViewMatrixRegister,
&matrix, 4);
device->SetPixelShaderConstant(someColorRegister,
&constantColor, 1);
device->SetTexture(baseTextureUnit, texture);
device->SetVertexShader(vertexShader);
device->SetPixelShader(pixelShader);
// Draw scene.
// ...
}
// Called before the device changes or is destroyed
void OnDestroyDevice() {
device->DeleteVertexShader(vertexShader);
device->DeletePixelShader(pixelShader);
}
// Called before application shuts down
void OnShutdown() {
// This frees any core runtime resources.
// The minimal interface has no dynamic storage to free.
cgDestroyContext(context);
}
98 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Direct3D Expanded Interface
Ifyouusetheexpandedinterfaceforaprogram,inordertoavoidany
unfortunateinconsistenciesitisadvisabletostickwiththeexpanded
interfaceforallshaderrelatedoperationsthatcanbeperformedthroughits
functions,suchasshadersetting,shaderactivation,andparametersetting—
includingsettingtexturestagestates.
Setting the Direct3D Device
Theexpandedinterfaceencapsulatesmorefunctionalitythantheminimal
interfacetoeaseprogramandparametermanagement.Itdoesthisby
makingtheappropriateDirect3Dcallsattheappropriatetimes.Because
someofthesecallsrequiretheDirect3Ddevice,itmustbecommunicatedto
theCgruntime:
YoucangettheDirect3Ddevicecurrentlyassociatedwiththeruntimeusing
cgD3D9GetDevice():
WhencgD3D9SetDevice()iscalledwithzeroasaninput,allDirect3D
resourcesusedbytheexpandedinterfacearereleased.SinceaDirect3D
deviceisdestroyedonlywhenallreferencestoitareremoved,the
applicationshouldcallcgD3D9SetDevice()withzeroasaninputwhenitis
donewithaDirect3Ddevicesothatitgetsdestroyedwhentheapplication
shutsdown.Otherwise,Direct3Ddoesnotshutdownproperlyandreports
memoryleakstothedebugconsole.
NotethatcallingcgD3D9SetDevice()withzeroasaninputdoesnotaffect
theCgcoreruntimeresourcesinanyway:alltherelatedcoreruntime
handles(oftypeCGprogram,CGparameter,andsoon)remainvalid.
IfyoucallcgD3D9SetDevice()asecondtimewithadifferentdevice,all
programsmanagedbytheolddevicearerebuiltusingthenewdevice.
Responding to Lost Direct3D Devices
TheexpandedinterfacemayholdreferencestoDirect3Dresourcesthatneed
toberecreatedinresponsetoalostdevice.Inparticular,certainsampler
parametersmightneedtobereleasedbeforeaDirect3Ddevicecanbereset
fromaloststate.Theexpandedinterfaceisholdingareferencetoatexture
thatneedstoberesetinresponsetoalostdeviceifbothofthefollowingare
trueforatexture:
ItwascreatedintheD3DPOOL_DEFAULTpool.
HRESULT cgD3D9SetDevice(IDirect3DDevice9* device);
IDirect3DDevice9* cgD3D9GetDevice();
808-00504-0000-006 99
NVIDIA
Introduction to the Cg Runtime Library
Itwasboundtoasamplerparameter(usingcgD3D9SetTexture())ofa
programforwhichparametershadowingisenabled.
Inthiscase,theparametermustbesettozero(usingcgD3D9SetTexture())
toremovetheexpandedinterface’sreferencetothattexturesoitcanbe
destroyedandtheDirect3Ddevicecanberesetfromaloststate.Later,after
resettingtheDirect3Ddeviceandrecreatingthetexture,itneedstobere
boundtothesamplerparameter.Forexample,
IDirect3DDevice9* device; // Initialized elsewhere
IDirect3DTexture9* myDefaultPoolTexture;
CGprogram program;
void OneTimeLoadScene()
{
// Load the program with cgD3D9LoadProgram and
// enable parameter shadowing
/* ... */
cgD3D9LoadProgram(program, TRUE, 0, 0, 0);
/* ... */
// Bind sampler parameter
GCparameter parameter;
parameter = cgGetParameterByName(program, "MySampler");
cgD3D9SetTexture(parameter, myDefaultPoolTexture);
}
void OnLostDevice()
{
// First release all necessary resources
PrepareForReset();
// Next actually reset the Direct3D device
device->Reset( /* ... */ );
// Finally recreate all those resource
OnReset();
}
void PrepareForReset()
{
/* ... */
// Release expanded interface reference
cgD3D9SetTexture(mySampler, 0);
// Release local reference
// and any other references to the texture
myDefaultPoolTexture->Release();
/* ... */
}
100 808-00504-0000-006
NVIDIA
Cg Language Toolkit
SeetheDirect3Ddocumentationforafullexplanationoflostdevicesand
howtoproperlyhandlethem.
Setting Expanded Interface Parameters
Thissectiondiscussessettingthevarioustypesofparametersofthe
expandedinterface,includinguniformscalar,uniformvector,uniform
matrix,uniformarraysofthethreeprevioustypes,andsampler.
Setting Uniform Scalar, Vector, and Matrix Parameters
ThefunctioncgD3D9SetUniform()setsfloatingpointparameterslike
float3andfloat4x3:
Theamountofdatarequireddependsonthetypeofparameter,butis
alwaysspecifiedasanarrayofoneormorefloatingpointvalues.Thetypeis
void*soauserdefinedstructurethatiscompatiblecanbepassedinwithout
typecasting.HereissomecodeillustratingtheuseofcgD3D9SetUniform()
forsettingavectorParamoftypefloat3,matrixParamoftypefloat2x3,
andarrayParamoftypefloat2x2[3]:
Asmentionedpreviously,cgD3D9TypeToSize()canbeusedtodetermine
howmanyvaluesarerequiredforsettingaparameterofaparticulartype.
void OnReset()
{
// Recreate myDefaultPoolTexture in D3DPOOL_DEFAULT
/* ... */
// Since the texture was just recreated,
// it must be re-bound to the parameter
GCparameter parameter;
parameter = cgGetParameterByName(prog, "MySampler");
cgD3D9SetTexture(mySampler, myDefaultPoolTexture);
/* ... */
}
HRESULT cgD3D9SetUniform(CGparameter parameter,
const void* value);
D3DXVECTOR3 vectorData(1,2,3);
float matrixData[2][3] = {{1, 2, 3}, {4, 5, 6}};
float arrayData[3][2][2] =
{{{1, 2}, {3, 4}},{{5, 6},{7,8}}, {{9, 10}, {11, 12}}};
cgD3D9SetUniform(vectorParam, &vectorData);
cgD3D9SetUniform(matrixParam, matrixData);
cgD3D9SetUniform(arrayParam, arrayData);
808-00504-0000-006 101
NVIDIA
Introduction to the Cg Runtime Library
Forconvenience,thereisalsoafunctiontosetaparameterfroma4x4matrix
oftypeD3DMATRIX:
Theupperleftportionofthematrixisextractedtofitthesizeoftheinput
parameter,sothatyoucouldsetmatrixParamthiswayaswell:
Intheexampleabove,everyelementofmatrixParamissetto1.
Setting Uniform Arrays of Scalar, Vector, and Matrix Parameters
Tosetanarrayparameter,usecgD3D9SetUniformArray():
TheparametersstartIndexandnumberOfElementsspecifywhichelements
ofthearrayparameterareset:ThosearethenumberOfElementselementsof
indicesrangingfromstartIndextostartIndex + numberOfElements-1.It
isassumedthatarraycontainsenoughvaluestosetallthoseelements.As
withcgD3D9SetUniform(),cgD3D9TypeToSize()canbeusedtodetermine
howmanyvaluesarerequired,andthetypeisvoid*soacompatibleuser
definedstructurecanbepassedinwithouttypecasting.
ThereisaconveniencefunctionequivalenttocgD3D9SetUniformMatrix():
TheparametersstartIndexandnumberOfElementshavethesame
meaningsasforcgD3D9SetUniformMatrix().
Theupperleftportionofeachmatrixofthearraymatricesisextractedtofit
thesizeoftheelementofthearrayparameterparameter.Arraymatricesis
assumedtohavenumberOfElementselements.
HRESULT cgD3D9SetUniformMatrix(CGparameter parameter,
const D3DMATRIX* matrix);
D3DXMATRIX matrix(
1, 1, 1, 0,
1, 1, 1, 0,
0, 0, 0, 0,
0, 0, 0, 0,
);
cgD3D9SetUniformMatrix(matrixParam, &matrix);
HRESULT cgD3D9SetUniformArray(CGparameter parameter,
DWORD startIndex, DWORD numberOfElements,
const void* array);
HRESULT cgD3D9SetUniformMatrixArray(CGparameter parameter,
DWORD startIndex, DWORD numberOfElements,
const D3DMATRIX* matrices);
102 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Setting Sampler Parameters
YouassignaDirect3Dtexturetoasamplerparameterusing
TosetthesamplerstateintheDirect3D9Cgruntime,use
ParametertypeisanyoftheD3DSAMPLERSTATETYPEenumerantsand
parametervalueisavalueappropriateforthecorrespondingtype.Hereis
anexampleofhowtousethisfunction:
TosetthetexturestagestateintheDirect3D8Cgruntime,use:
Parametertypemustbeoneofthefollowingvalues:
Parametervalueisavalueappropriateforthecorrespondingtype.Hereis
anexampleofhowtousethisfunction:
Thetexturewrapmodeissetusing
TheinputvalueiseitherzerooracombinationofD3DWRAP_U,D3DWRAP_V,
andD3DWRAP_W.Hereisanexampleofhowtousethisfunction:
Parameter Shadowing
Parametershadowingcanbeenabledordisabledonaperprogrambasis:
Whenloadingtheprogram(see“ExpandedInterfaceProgram
Execution”onpage 103)
HRESULT cgD3D9SetTexture(CGparameter parameter,
IDirect3DBaseTexture9* texture);
HRESULT cgD3D9SetSamplerState(CGparameter parameter,
D3DSAMPLERSTATETYPE type, DWORD value);
cgD3D9SetSamplerState(parameter, D3DSAMP_MAGFILTER,
D3DTEXF_LINEAR);
HRESULT cgD3D8SetTextureStageState(CGparameter parameter,
D3DTEXTURESTAGESTATETYPE type, DWORD value);
D3DTSS_ADDRESSU D3DTSS_ADDRESSV
D3DTSS_ADDRESSW D3DTSS_BORDERCOLOR
D3DTSS_MAGFILTER D3DTSS_MINFILTER
D3DTSS_MIPFILTER D3DTSS_MIPMAPLODBIAS
D3DTSS_MAXMIPLEVEL D3DTSS_MAXANISOTROPY
cgD3D8SetTextureStageState(parameter, D3DTSS_MAGFILTER,
D3DTEXF_LINEAR);
HRESULT cgD3D9SetTextureWrapMode(CGparameter parameter,
DWORD value);
cgD3D9SetTextureWrapMode(parameter, D3DWRAP_U | D3DWRAP_V);
808-00504-0000-006 103
NVIDIA
Introduction to the Cg Runtime Library
Atanytimeusing
forwhichenableshouldbesettoCG_TRUEtoenableparameter
shadowingandtoCG_FALSEtodisableit.
Toknowifparametershadowingisenabledforagivenprogram,use:
ThisfunctionreturnsCG_TRUEifparametershadowingisenabledfor
program.
Expanded Interface Program Execution
ToloadaprograminDirect3D9usecgD3D9LoadProgram():
Thisfunctionassemblestheresultofthecompilationofprogramusing
D3DXAssembleShader()withassembleFlagsastheD3DXASMflags.
Dependingontheprogram’sprofile,ittheneitheruses
IDirect3DDevice9::CreateVertexShader()tocreateaDirect3D9vertex
shader,orusesIDirect3DDevice9::CreatePixelShader() tocreatea
Direct3D9pixelshader.
Hereisatypicaluseofthefunction:
ToloadaprograminDirect3D8usecgD3D8LoadProgram():
Thisfunctionassemblestheresultofthecompilationofprogramusing
D3DXAssembleShader()withassembleFlagsastheD3DXASMflags.
Dependingontheprogram’sprofile,ittheneitheruses
IDirect3DDevice8::CreateVertexShader()tocreateaDirect3Dvertex
shaderwithdeclarationasthevertexdeclarationandvertexShaderUsage
astheusagecontrol,orusesIDirect3DDevice8::CreatePixelShader()to
createaDirect3Dpixelshader.
HRESULT cgD3D9EnableParameterShadowing(
CGprogram program, CGbool enable);
CGbool cgD3D9IsParameterShadowingEnabled(CGprogam program);
HRESULT cgD3D9LoadProgram(CGprogram program,
CG_BOOL parameterShadowingEnabled,
DWORD assembleFlags);
HRESULT hresult = cgD3D9LoadProgram(vertexProgram, TRUE,
D3DXASM_DEBUG);
HRESULT hresult = cgD3D9LoadProgram(fragmentProgram, TRUE, 0);
HRESULT cgD3D8LoadProgram(CGprogram program,
BOOL parameterShadowingEnabled, DWORD assembleFlags,
DWORD vertexShaderUsage, const DWORD* declaration);
104 808-00504-0000-006
NVIDIA
Cg Language Toolkit
ThevalueofparameterShadowingEnabledshouldbesettoTRUEtoenable
parametershadowingfortheprogram.Thisbehaviorcanbechangedafter
theprogramiscreatedbycallingcgD3DEnableParameterShadowing().
Hereisatypicaluseofthefunction:
Ifyouwanttoapplythesamevertexprogramtoseveralsetsofgeometric
data,eachhavingadifferentlayout,youneedtoloadtheprogramwith
differentvertexdeclarationsinDirect3D8.Todoso,youneedtomakea
duplicateoftheprogram,usingcgCopyProgram(),foreachofthese
declarations.Hereisacodesampleillustratingthisoperation:
OnlytheloadingfunctionsdifferbetweenDirect3D9andDirect3D8;the
unloadingandbindingfunctionsarethesame.
ToreleasetheDirect3DresourcesallocatedbycgD3D9LoadProgram(),such
astheDirect3Dshaderobjectandanyshadowedparameter,use
NotethatcgD3D9UnloadProgam()doesnotfreeanycoreruntimeresources,
suchasprogramandanyofitsparameterhandles.Ontheotherhand,
destroyingaprogramwithcgDestroyProgram()orcgDestroyContext()
releasesanyDirect3Dresourcesbyindirectlycalling
cgD3D9UnloadProgam().
FunctioncgD3D9IsProgramLoaded()returnsCG_TRUEifaprogramis
loaded:
HRESULT hresult = cgD3D8LoadProgram(vertexProgram, TRUE,
D3DXASM_DEBUG, D3DUSAGE_SOFTWAREVERTEXPROCESSING,
declaration);
HRESULT hresult = cgD3D8LoadProgram(fragmentProgram, TRUE,
0, 0, 0);
CGprogam program1, program2;
program1 = cgCreateProgramFromFile(context, CG_SOURCE,
"VertexProgram.cg", CG_PROFILE_VS_1_1, 0, 0);
const DWORD declaration1 =
cgD3D8GetVertexDeclaration(program1);
cgD3D8LoadProgram(program1, TRUE, 0, 0, declaration1);
program2 = cgCopyProgram(program1);
const DWORD declaration2[] = {
//... Custom declaration ...
};
if (cgD3D8ValidateVertexDeclaration(program2, declaration2))
cgD3D8LoadProgram(program2, TRUE, 0, 0, declaration2);
HRESULT cgD3D9UnloadProgam(CGprogram program);
CGbool cgD3D9IsProgramLoaded(CGprogram program);
808-00504-0000-006 105
NVIDIA
Introduction to the Cg Runtime Library
Allprogramsmustbeloadedbeforetheycanbebound.Bindingaprogram
isdonebycallingcgD3D9BindProgram():
ThisfunctionbasicallyactivatestheDirect3Dshadercorrespondingto
programbycallingIDirect3DDevice9::SetVertexShader()or
IDirect3DDevice9::SetPixelShader()dependingontheprogram’s
profile.Ifparametershadowingisenabledforprogram,italsosetsallthe
shadowedparametersandtheirassociatedDirect3Dstates(suchastexture
stagestatesforthesamplerparameters).Novalueorstatetrackingis
performedbytheruntimesothatthissettingisdoneregardlessofwhatthe
currentvaluesoftheseparametersoroftheirstatesare.Ifashadowed
parameterhasnotbeensetbythetimecgD3D9BindProgram()iscalled,no
Direct3Dcallofanysortisissuedforthisparameter.
Onlyonevertexprogramandonefragmentprogramcanbeboundatany
giventime,sobindingaprogramofagiventypeimplicitlyunbindsany
otherprogramofthesametype.
Expanded Interface Profile Support
Twoconvenientfunctionsareprovidedthatgivethehighestvertexandpixel
shaderversionssupportedbythedevice:
Thisallowsyoutomakeyourapplicationfutureready,becausetheCg
programsareautomaticallycompiledforthebestprofilesthatareavailable
atruntime,eveniftheseprofilesdidnotexistatthetimetheapplicationwas
written.Anotherfunctionthatallowsyouoptimalcompilationis
cgD3D9GetOptimalOptions().Itreturnsastringrepresentingtheoptimal
setofcompileroptionsforagivenprofile:
Thisstringismeanttobeusedaspartoftheargumentparameterto
cgCreateProgram().Itdoesnotneedtobedestroyedbytheapplication.
However,itscontentcouldchangeifcgD3D9GetOptimalOptions()iscalled
againforthesameprofilebutforadifferentDirect3Ddevice.
Expanded Interface Program Examples
Inthissectionweprovideprogramsthatillustrateshowandwhentouse
functionsfromtheexpandedinterfacetomakeCgprogramsworkwith
Direct3D.Forthesakeofclarity,theexamplesdoverylittleerrorchecking,
butaproductionapplicationshouldcheckthereturnvaluesofallCg
HRESULT cgD3D9BindProgram(CGprogram program);
CGprofile cgD3D9GetLatestVertexProfile();
CGprofile cgD3D9GetLatestPixelProfile();
char const* cgD3D9GetOptimalOptions(CGprofile profile);
106 808-00504-0000-006
NVIDIA
Cg Language Toolkit
functions.Thevertexandfragmentprogramsthatfollowarereferencedin
“ExpandedInterfaceDirectD3D9Application”onpage 106and“Expanded
InterfaceDirectD3D8Application”onpage 109.
Expanded Interface Vertex Program
ThefollowingCgcodeisassumedtobeinafilecalledVertexProgram.cg.
Expanded Interface Fragment Program
ThefollowingCgcodeisassumedtobeinafilecalledFragmentProgram.cg.
Expanded Interface DirectD3D 9 Application
ThefollowingCcodelinksthepreviousvertexandfragmentprogramsto
theDirect3D9application.
void VertexProgram(
in float4 position : POSITION,
in float4 color : COLOR0,
in float4 texCoord : TEXCOORD0,
out float4 positionO : POSITION,
out float4 colorO : COLOR0,
out float4 texCoordO : TEXCOORD0,
const uniform float4x4 ModelViewMatrix)
{
positionO = mul(position, ModelViewMatrix);
colorO = color;
texCoordO = texCoord; }
void FragmentProgram(
in float4 color : COLOR0,
in float4 texCoord : TEXCOORD0,
out float4 colorO : COLOR0,
const uniform sampler2D BaseTexture,
const uniform float4 SomeColor)
{
colorO = color * tex2D(BaseTexture, texCoord) + SomeColor;
}
#include <cg/cg.h>
#include <cg/cgD3D9.h>
IDirect3DDevice9* device; // Initialized somewhere else
IDirect3DTexture9* texture; // Initialized somewhere else
D3DXCOLOR constantColor; // Initialized somewhere else
CGcontext context;
IDirect3DVertexDeclaration9* vertexDeclaration;
CGprogram vertexProgram, fragmentProgram;
CGparameter baseTexture, someColor, modelViewMatrix;
808-00504-0000-006 107
NVIDIA
Introduction to the Cg Runtime Library
// Called at application startup
void OnStartup()
{
// Create context
context = cgCreateContext();
}
// Called whenever the Direct3D device needs to be created
void OnCreateDevice()
{
// Pass the Direct3D device to the expanded interface.
cgD3D9SetDevice(device);
// Determine the best profiles to use
CGprofile vertexProfile = cgD3D9GetLatestVertexProfile();
CGprofile pixelProfile = cgD3D9GetLatestPixelProfile();
// Grab the optimal options for each profile.
const char* vertexOptions[] = {
cgD3D9GetOptimalOptions(vertexProfile), 0 };
const char* pixelOptions[] = {
cgD3D9GetOptimalOptions(pixelProfile), 0 };
// Create the vertex shader.
vertexProgram = cgCreateProgramFromFile(
context, CG_SOURCE, "VertexProgram.cg",
vertexProfile, "VertexProgram", vertexOptions);
// If your program uses explicit binding semantics, you
// can create a vertex declaration using those semantics.
const D3DVERTEXELEMENT9 declaration[] = {
{ 0, 0 * sizeof(float),
D3DDECLTYPE_FLOAT3, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_POSITION, 0 },
{ 0, 3 * sizeof(float),
D3DDECLTYPE_D3DCOLOR, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_COLOR, 0 },
{ 0, 4 * sizeof(float),
D3DDECLTYPE_FLOAT2, D3DDECLMETHOD_DEFAULT,
D3DDECLUSAGE_TEXCOORD, 0 },
D3DD3CL_END()
};
// Ensure the resulting declaration is compatible with the
// shader. This is really just a sanity check.
108 808-00504-0000-006
NVIDIA
Cg Language Toolkit
assert(cgD3D9ValidateVertexDeclaration(vertexProgram,
declaration));
device->CreateVertexDeclaration(
declaration, &vertexDeclaration);
// Load the program with the expanded interface.
// Parameter shadowing is enabled (second parameter = TRUE).
cgD3D9LoadProgram(vertexProgram, TRUE, 0);
// Create the pixel shader.
fragmentProgram = cgCreateProgramFromFile(
context, CG_SOURCE, "FragmentProgram.cg",
pixelProfile, "FragmentProgram", pixelOptions);
// Load the program with the expanded interface. Parameter
// shadowing is enabled (second parameter = TRUE). Ignore
// vertex shader specifc flags, such as declaration usage.
cgD3D9LoadProgram(fragmentProgram, TRUE, 0);
// Grab some parameters.
modelViewMatrix = cgGetNamedParameter(vertexProgram,
"ModelViewMatrix");
baseTexture = cgGetNamedParameter(fragmentProgram,
"BaseTexture");
someColor = cgGetNamedParameter(fragmentProgram,
"SomeColor");
// Sanity check that parameters have the expected size
assert(cgD3D9TypeToSize(cgGetParameterType(
modelViewMatrix)) == 16);
assert(cgD3D9TypeToSize(cgGetParameterType(someColor))
== 4);
// Set parameters that don't change. They can be set
// only once since parameter shadowing is enabled
cgD3D9SetTexture(baseTexture, texture);
cgD3D9SetUniform(someColor, &constantColor);
}
// Called to render the scene
void OnRender()
{
// Load model-view matrix.
D3DXMATRIX modelViewMatrix;
// ...
808-00504-0000-006 109
NVIDIA
Introduction to the Cg Runtime Library
Expanded Interface DirectD3D 8 Application
ThefollowingCcodelinksthepreviousvertexandfragmentprogramsto
theDirect3D8application.
// Set the parameters that change every frame
// This must be done before binding the programs
cgD3D9SetUniformMatrix(modelViewMatrix, &modelViewMatrix);
// Set the vertex declaration
device->SetVertexDeclaration(vertexDeclaration);
// Bind the programs. This downloads any parameter values
// that have been previously set.
cgD3D9BindProgram(vertexProgram);
cgD3D9BindProgram(fragmentProgram);
// Draw scene.
// ...
}
// Called before the device changes or is destroyed
void OnDestroyDevice()
{
// Calling this function tells the expanded interface to
// release its internal reference to the Direct3D device
// and free its Direct3D resources.
cgD3D9SetDevice(0);
}
// Called before application shuts down
void OnShutdown()
{
// This frees any core runtime resource.
cgDestroyContext(context);
}
#include <cg/cg.h>
#include <cg/cgD3D8.h>
IDirect3DDevice8* device; // Initialized somewhere else
IDirect3DTexture8* texture; // Initialized somewhere else
D3DXCOLOR constantColor; // Initialized somewhere else
CGcontext context;
CGprogram vertexProgram, fragmentProgram;
CGparameter baseTexture, someColor, modelViewMatrix;
110 808-00504-0000-006
NVIDIA
Cg Language Toolkit
// Called at application startup
void OnStartup()
{
// Create context
context = cgCreateContext();
}
// Called whenever the Direct3D device needs to be created
void OnCreateDevice()
{
// Pass the Direct3D device to the expanded interface.
cgD3D8SetDevice(device);
// Determine the best profiles to use
CGprofile vertexProfile = cgD3D8GetLatestVertexProfile();
CGprofile pixelProfile = cgD3D8GetLatestPixelProfile();
// Grab the optimal options for each profile.
const char* vertexOptions[] = {
cgD3D8GetOptimalOptions(vertexProfile), 0 };
const char* pixelOptions[] = {
cgD3D8GetOptimalOptions(pixelProfile), 0 };
// Create the vertex shader.
vertexProgram = cgCreateProgramFromFile(
context, CG_SOURCE, "VertexProgram.cg",
vertexProfile, "VertexProgram", vertexOptions);
// If your program uses explicit binding semantics (like
// this one), you can create a vertex declaration
// using those semantics.
DWORD declaration[] = {
D3DVSD_STREAM(0),
D3DVSD_REG(D3DVSDE_POSITION, D3DVSDT_FLOAT3),
D3DVSD_REG(D3DVSDE_DIFFUSE, D3DVSDT_D3DCOLOR),
D3DVSD_REG(D3DVSDE_TEXCOORD0, D3DVSDT_FLOAT2),
D3DVSD_END()
}
// Ensure the resulting declaration is compatible with the
// shader. This is really just a sanity check.
assert(cgD3D8ValidateVertexDeclaration(vertexProgram,
declaration));
// Load the program with the expanded interface.
// Parameter shadowing is enabled (second parameter = TRUE).
808-00504-0000-006 111
NVIDIA
Introduction to the Cg Runtime Library
cgD3D8LoadProgram(vertexProgram, TRUE, 0, 0, declaration);
// Create the pixel shader.
fragmentProgram = cgCreateProgramFromFile(
context, CG_SOURCE, "FragmentProgram.cg",
pixelProfile, "FragmentProgram", pixelOptions);
// Load the program with the expanded interface.
// Parameter shadowing is enabled (second parameter = TRUE).
// Ignore vertex shader specifc flags, like declaration and
// usage.
cgD3D8LoadProgram(fragmentProgram, TRUE, 0, 0, 0);
// Grab some parameters.
modelViewMatrix = cgGetNamedParameter(vertexProgram,
"ModelViewMatrix");
baseTexture = cgGetNamedParameter(fragmentProgram,
"BaseTexture");
someColor = cgGetNamedParameter(fragmentProgram,
"SomeColor");
// Sanity check that parameters have the expected size
assert(cgD3D8TypeToSize(cgGetParameterType(
modelViewMatrix)) == 16);
assert(cgD3D8TypeToSize(cgGetParameterType(someColor))
== 4);
// Set parameters that don't change. They can be set
// only once since parameter shadowing is enabled
cgD3D8SetTexture(baseTexture, texture);
cgD3D8SetUniform(someColor, &constantColor);
}
// Called to render the scene
void OnRender()
{
// Load model-view matrix.
D3DXMATRIX modelViewMatrix;
// ...
// Set the parameters that change every frame
// This must be done before binding the programs
cgD3D8SetUniformMatrix(modelViewMatrix, &modelViewMatrix);
// Bind the programs. This downloads any parameter values
112 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Direct3D Debugging Mode
Inadditiontotheerrorreportingmechanismsdescribedin“Direct3DError
Reporting”onpage 114,adebugversionoftheDirect3D9orDirect3D8Cg
runtimeDLLisprovidedtoassistyouwiththedevelopmentofapplications
usingtheDirect3D9orDirect3D8Cgruntime.Thisversiondoesnothave
debugsymbols,butwhenusedinplaceoftheregularversion,itusesthe
Win32functionOutputDebugString()tooutputmanyhelpfulmessages
andtracestothedebugoutputconsole.Examplesofinformationthedebug
DLLoutputsarethefollowing:
AnyDirect3DorCgcoreruntimeerrors
Debugginginformationaboutparametersthataremanagedbythe
expandedinterface
Potentialperformancewarnings
Hereisasampletrace:
cgD3D(TRACE): Creating vertex shader for program 3
cgD3D(TRACE): Discovering parameters for vertex program 3
cgD3D(TRACE): Discovered uniform parameter 'ModelViewProj'
of type float4x4
// that have been previously set.
cgD3D8BindProgram(vertexProgram);
cgD3D8BindProgram(fragmentProgram);
// Draw scene.
// ...
}
// Called before the device changes or is destroyed
void OnDestroyDevice()
{
// Calling this function tells the expanded interface to
// release its internal reference to the Direct3D device
// and free its Direct3D resources.
cgD3D8SetDevice(0);
}
// Called before application shuts down
void OnShutdown()
{
// This frees any core runtime resource.
cgDestroyContext(context);
}
808-00504-0000-006 113
NVIDIA
Introduction to the Cg Runtime Library
cgD3D(TRACE): Finished discovering parameters for vertex
program 3
cgD3D(TRACE): Creating pixel shader for program 24
cgD3D(TRACE): Discovering parameters for pixel program 24
cgD3D(TRACE): Discovered sampler parameter 'BaseTexture'
cgD3D(TRACE): Discovered uniform parameter 'SomeColor' of
type float4
cgD3D(TRACE): Finished discovering parameters for pixel
program 24
cgD3D(TRACE): Shadowing state for sampler parameter
BaseTexture
cgD3D(TRACE): Shadowing sampler state D3DTSS_MAGFILTER for
sampler parameter 'BaseTexture'
cgD3D(TRACE): Shadowing sampler state D3DTSS_MINFILTER for
sampler parameter 'BaseTexture'
cgD3D(TRACE): Shadowing sampler state D3DTSS_MIPFILTER for
sampler parameter 'BaseTexture'
cgD3D(TRACE): Shadowing 16 values for uniform parameter
'ModelViewProj' of type float4x4
cgD3D(TRACE): Activating vertex shader for program 3
cgD3D(TRACE): Setting shadowed parameters for program 3
cgD3D(TRACE): Setting registers for uniform parameter
'ModelViewProj' of type float4x4
cgD3D(TRACE): Setting constant registers [0 - 3] for
parameter 'ModelViewProj' of type float4x4
cgD3D(TRACE): Activating pixel shader for program 24
cgD3D(TRACE): Setting shadowed parameters for program 24
cgD3D(TRACE): Setting texture for sampler parameter
'BaseTexture'
cgD3D(TRACE): Setting SamplerState[0].D3DTSS_MAGFILTER for
sampler parameter 'BaseTexture'
cgD3D(TRACE): Setting SamplerState[0].D3DTSS_MINFILTER for
sampler parameter 'BaseTexture'
cgD3D(TRACE): Setting SamplerState[0].D3DTSS_MIPFILTER for
sampler parameter 'BaseTexture'
cgD3D(TRACE): Deleting vertex shader for program 3
cgD3D(TRACE): Deleting pixel shader for program 24
TousethedebugDLL:
1. LinkyourapplicationagainstcgD3D9d.lib(orcgD3D8d.lib)insteadof
cgD3D9.lib (orcgD3D8.lib).
2. MakesurethattheapplicationcanfindcgD3D9d.dll(orcgD3D8d.dll).
114 808-00504-0000-006
NVIDIA
Cg Language Toolkit
3. Turnonandturnofftracingofportionsofyourcodeusing
cgD3D9EnableDebugTracing():
Hereishowyouwouldenabledebugtracingforpartoftheapplicationcode:
NotethateachdebugtraceoutputsetsanerrorequaltocgD3D9DebugTrace.
So,ifanerrorcallbackhasbeenregisteredwiththecoreruntimeusing
cgSetErrorCallback(),eachdebugtraceoutputtriggersacalltothiserror
callback(see“UsingErrorCallbacks”onpage 116).
Direct3D Error Reporting
ErrorreportinginCgincludesdefinederrortypes,functionsthatallow
testingforerrors,andsupportforerrorcallbacks.
Direct3D Error Types
TheDirect3DruntimegenerateserrorsoftypeCGerror,reportedbytheCg
coreruntimeandoftypeHRESULT,reportedbytheDirect3Druntime.In
addition,itreturnstheerrorslistedinthenexttwogroupsthatarespecificto
theDirect3DCgruntime.
CGerror
ªcgD3D9Failed:SetwhenaDirect3Druntimefunctionmakesa
Direct3Dcallthatreturnsanerror.
ªcgD3D9DebugTrace:Setwhenadebugmessageisoutputtothe
debugconsolewhenusingthedebugDLL(see“Direct3D
DebuggingMode”onpage 112).
HRESULT
ªCGD3D9ERR_INVALIDPARAM:Returnedwhenaparametervalue
cannotbeset.
ªCGD3D9ERR_INVALIDPROFILE:Returnedwhenaprogramwithan
unexpectedprofileispassedtoafunction.
ªCGD3D9ERR_INVALIDSAMPLERSTATE:Returnedwhenaparameterof
typeD3DTEXTURESTAGESTATETYPE,whichisnotavalidsampler
state,ispassedtoasamplerstatefunction.
void cgD3D9EnableDebugTracing(CGbool enable);
cgD3D9EnableDebugTracing(CG_TRUE);
// ...
// Application code that is traced
// ...
cgD3D9EnableDebugTracing(CG_FALSE);
808-00504-0000-006 115
NVIDIA
Introduction to the Cg Runtime Library
ªCGD3D9ERR_INVALIDVEREXDECL:Returnedwhenaprogramis
loadedwiththeexpandedinterface,butthegivendeclarationis
incompatible.
ªCGD3D9ERR_NODEVICE:ReturnedwhenarequiredDirect3Ddeviceis
0.Thistypicallyoccurswhenanexpandedinterfacefunctionis
calledandaDirect3Ddevicehasnotbeensetwith
cgD3D9SetDevice().
ªCGD3D9ERR_NOTMATRIX:Returnedwhenaparameterthatisnota
matrixtypeispassedtoafunctionthatexpectsone.
ªCGD3D9ERR_NOTLOADED:Returnedwhenaparameterhasnotbeen
loadedwiththeexpandedinterfacebycgD3D9LoadProgram().
ªCGD3D9ERR_NOTSAMPLER:Returnedwhenaparameterthatisnota
samplerparameterispassedtoafunctionthatexpectsone.
ªCGD3D9ERR_NOTUNIFORM:Returnedwhenaparameterthatisnot
uniformispassedtoafunctionthatexpectsone.
ªCGD3D9ERR_NULLVALUE:Returnedwhenavalueofzeroispassedtoa
functionthatrequiresanonzerovalue.
ªCGD3D9ERR_OUTOFRANGE:Returnedwhenanarrayrangespecifiedto
afunctionisoutofrange.
ªCGD3D9_INVALID_REG:Returnedwhenaregisternumberis
requestedforaninvalidparametertype.Thiserrorisspecifictothe
minimalinterfacefunctionsanddoesnottriggeranerrorcallback.
Testing for Errors
WhenaDirect3Druntimefunctioniscalledthatreturnsanerroroftype
HRESULT,thepropermethodoftestingforsuccessorfailureistousethe
Win32macrosFAILED()andSUCCEEDED().Simplytestingtheerroragainst
zeroorD3D_OKisnotsufficient,becausetherecouldbemorethanone
successvalue.
Asanaddedconvenience,andforuniformitywiththecoreruntime,the
Direct3DruntimealsosuppliescgD3D9GetLastError(),whichisanalogous
tocgGetLastError()butreturnsthelastDirect3Druntimeerroroftype
HRESULTforwhichtheFAILED()macroreturnsTRUE:
Thelasterrorisalwaysclearedimmediatelyafterthecall.
HRESULT cgD3D9GetLastError();
116 808-00504-0000-006
NVIDIA
Cg Language Toolkit
ThefunctioncgD3D9TranslateHRESULT()convertsanerroroftypeHRESULT
intoastring:
ThisfunctionshouldbecalledinsteadofDXGetErrorDescription9()
becauseitalsotranslateserrorsthattheCgDirect3Druntimegenerates.
Using Error Callbacks
Hereisanexampleofapossibleerrorcallbackthatsortsoutdebugtrace
errorsfromcoreruntimeerrorsandfromDirect3Druntimeerrors:
const char* cgD3D9TranslateHRESULT(HRESULT hr);
void MyErrorCallback() {
CGerror error = cgGetError();
if (error == cgD3D9DebugTrace) {
// This is a debug trace output.
// A breakpoint could be set here to step from one
// debug output to the other.
return;
}
char buffer[1024];
if (error == cgD3D9Failed)
sprintf(buffer, "A Direct3D error occurred: %s'\n",
cgD3D9TranslateHRESULT(cgD3D9GetLastError()));
else
sprintf(buffer, "A Cg error occurred: '%s'\n",
cgD3D9TranslateCGerror(error));
OutputDebugString(buffer);
}
cgSetErrorCallback(MyErrorCallback);
808-00504-0000-006 117
NVIDIA
Introduction to CgFX
CgFX Overview
CgFXisanextendedfileformatforCg.InadditiontoCgprograms,CgFX
filescanalsorepresentbothfixedfunctiongraphicsstateandmeta
informationaboutshaderparameters.TheCgFXAPImakesitpossibleto
loadCgFXeffectsfiles,traversethedatainthem,settheassociatedgraphics
state,andsoon.ThischapterintroducesthisnewAPIandtheideasbehindit
andisintendedtomakeiteasytogetstartedusingCgFX.
ThischapterassumesthattheOpenGLstatemanager,implementedaspart
oftheCgGLruntime,isbeingused.BecauseCgFXallowsforextensible,
customstatemanagers,alternatestatemanagersthatacceptdifferentstate
syntaxmayalsobeavailable.Forexample,aDirect3Dstatemanagermight
acceptDirect3Dstylestatenames,whileaDirect3DUnderOpenGLstate
managermightacceptDirect3Dstylestatenames,butallowforrendering
usingOpenGL.
Key Concepts
Effect
Aneffectfilecontainsacollectionofshadersourcecode,parameters,and
renderingtechniques.Aneffectencapsulatesoneormoredifferentmethods
torenderaparticularvisualeffect.Forexample,theeffectmightprovideone
approachintendedforuseonfixedfunctionhardware,andadifferent
approachonmoremodern,programmablehardware.
Technique
Eacheffectcontainsoneormoretechniques.Atechniqueisintendedto
encapsulatetheinformationneededtoproduceavisualeffect—graphics
state,shaders,andatleastonerenderingpass.
Pass
Eachtechniquecontainsoneormorerenderingpasses.Passesstoregraphics
state,possiblyincludingfixedfunctionstatesettingsandvertexand
118 808-00504-0000-006
NVIDIA
Cg Language Toolkit
fragmentshaders.Thepassesaregenerallyprocessedinorder:CgFXsetsthe
graphicsstateforapass,theapplicationdrawsthescenegeometry,thestate
forthenextpassisset,geometryisdrawnagain,andsoon.
State assignment
Passesholdstate assignmentsthatdescribethegraphicsstateforthepass.
Annotation
Annotationsmakeitpossibletoassociatemetadatawithparameters,
techniques,passes,andsoon.Forexample,aparameterlike
lightIntensitymighthaveannotationsindicatingtheminimumand
maximumvalidvaluesfortheparameter.
Effect parameter
Parametersdeclaredintheglobalscopeoftheeffectfileareeffect parameters.
EffectparametervaluesmaybesetandqueriedusingtheCgruntimeAPI.
Effectparametersmaybereferencedontherighthandsideofstate
assignmentsandalsoasglobalparameterswithinCgfunctionsand
programsdefinedwithintheeffect.
Getting Started
WeexpectthatthereaderisgenerallyfamiliarwiththeCgruntime.See
“IntroductiontotheCgRuntimeLibrary”onpage 43formoredetails.
Considerthefollowingeffect:
float3 DiffuseColor<
string type = "color";
float3 minValue = float3(0,0,0);
float3 maxValue = float3(10,10,10);
> = { 1, 1, 1 };
technique FixedFunctionLighting {
pass {
LightingEnable = true;
LightEnable[0] = true;
LightPosition[0] = float4(-10, 10, 10, 1);
LightAmbient[0] = float4(.1,.1,.1,.1);
LightDiffuse[0] = (float4(2*DiffuseColor, 1));
LightSpecular[0] = float4(1,1,1,1);
MaterialShininess = 10.f;
MaterialAmbient = float4(1,1,1,1);
808-00504-0000-006 119
NVIDIA
Introduction to CgFX
Theeffectdefinesasingleeffectparameter,DiffuseColor,withthree
associatedannotations:astringnamedtypeandtwofloat3snamed
minValueandmaxValue.Theseannotationsexistpurelyfortheuseofthe
applicationusingtheeffectfile;theCgruntimedoesnotinterpretthe
annotationnamesorvaluesinanyway.Theeffectparameterisinitializedto
thevalue[1,1,1].
Theeffectalsodefinesasingletechnique,namedFixedFunctionLighting,
whichinturncontainsasinglerenderingpass.Therenderingpasssetsthe
appropriateOpenGLstatetoperformpervertexlightingusingthebuiltin
fixedfunctionmaterialmodelofOpenGL.Thecompletesetofsupported
OpenGLstatesislistedinthesection“OpenGLStateonpage 129.
NotethattheLightDiffuse[0]statevalue,correspondingtothefixed
functionlightʹsdiffusecolor,issetwithanexpressioninvolvingthe
DiffuseColoreffectparameter.Ifthevalueofthisparameterischangedby
theapplicationandthepass’sstateislaterset,theparametersnewvalueis
usedintheexpressionthatsetsthelight’sdiffusecolor.
Notealsothatthisexpressionisparenthesized.Ingeneral,CgFXrequires
thatmostexpressions,likethisone,involvingeffectparametersbein
parenthesis.ThisisnecessarysothatCgFXcandistinguishbetweeneffect
parametersandbuiltinenumerantvaluesrepresentingconstants.
Thecodebelowdemonstrateshowtocreateaneffectgiventhenameofan
effectfile.AftercreatingaCgcontext,cgGLRegisterStates()setsupthe
stateassignmentsthatsupportthestandardOpenGLstatemanager.Most
applicationswillwanttodothisimmediatelyaftercreatingtheCGcontext.
Next,theeffectiscreatedandassociatedwiththegivencontext.
MaterialDiffuse = float4(.5, .5, .5, 1);
MaterialSpecular = float4(.5, .5, .5, 1);
}
}
CGcontext context = cgCreateContext();
cgGLRegisterStates(context);
CGeffect effect = cgCreateEffectFromFile(context,
"simple.cgfx", NULL);
if (!effect) {
fprintf(stderr, "Unable to create effect!\n");
const char *listing = cgGetLastListing(context);
if (listing)
fprintf(stderr, "%s\n", listing);
exit(1);
}
120 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Technique Validation
Beforeusinganyofthetechniquesinaneffect,itsimportanttovalidatethe
techniques.Validationfails,forinstance,ifatechniquesincludesa“compile”
stateassignmentthatreferencesaprofilethatisnʹtsupportedonthecurrent
graphicshardware.Similarly,validationfailsifthetechniqueincludesastate
assignmentthatusesanunsupportedOpenGLextension.Effectsare
commonlywrittensuchthattheapplicationcaniterateoverthegiven
techniquesinorderandthenchoosethefirsttechniquethatpassesvalidation
toapplytheeffect.Forthisreason,techniquesareusuallygiveninorderof
decreasingquality.
ThecodebelowiteratesthroughthetechniquesinaCGeffectinturn,
attemptingtovalidateeachofthemandprintinganerrorfortheonesthat
fail.
ThefunctioncgIsTechniqueValidated()canbeusedtocheckifthegiven
techniquehasbeenvalidated.
NotethatanyCgprogramsreferencedinatechniquearenotcompileduntil
thetechniqueisvalidated.Thismakesitpossibletomodifytheuncompiled
programbyconnectingconcretesharedstructstointerfaceeffect
parameters,markinguniformsasliterals,changingtheprogram’sprofile,
andsoon.
Passes and Pass State
TheheartofCgFXisapplyingthestatedefinedinthepassesinatechnique.
Theloopbelowdemonstratesthestandardapproachforloopingovera
technique’spassesandapplyingtheirstatesinturn.
CGtechnique technique = cgGetFirstTechnique(effect);
while (technique) {
if (cgValidateTechnique(technique) == CG_FALSE)
fprintf(stderr,
"Technique %s did not validate. Skipping.\n",
cgGetTechniqueName(technique));
technique = cgGetNextTechnique(technique);
}
CGpass pass = cgGetFirstPass(technique);
while (pass) {
cgSetPassState(pass);
drawGeom();
cgResetPassState(pass);
pass = cgGetNextPass(pass);
}
808-00504-0000-006 121
NVIDIA
Introduction to CgFX
EachofthestateassignmentsinapasstranslatesdirectlytoanOpenGLAPI
call.Forexample,LightingEnable = true;translatestothecall
glEnable(GL_LIGHTING),andLightPosition[0] = float4(-10, 10,
10, 1)translatestothecallglLightfv(GL_LIGHT0, GL_POSITION, v)
wherevisanarrayoffourGLfloatvalues.
BeforeorafterthecalltocgSetPassState(),theapplicationisofcoursefree
tosetotherOpenGLstateasdesired.However,anystatesetbeforethecallto
cgSetPassState()maybeoverriddenbythepass.
Notethatifthetechniquecontainingtheindicatedpasshasnotbeen
validated,callingcgSetStatePass()triggersanattemptedvalidationofthe
technique.Ifvalidationfails,aruntimeerrorresults.
Afterthegeometryhasbeendrawn,cgResetPassState()resetsthestate
thatwassetbythepasstothedefaultvaluesasspecifiedbyOpenGL.Note
thatitdoesnotresetstatetoitsvaluesbeforecgSetPassState()—an
applicationthatdesiresthisbehaviorshouldeitherpushandpopOpenGL
state,orshouldmanuallyexaminethestateassignmentsinthepassinorder
todeterminewhatstatewaschanged,sothatitcansetitbacktothedesired
values.(Theroutinestomanuallytraversethestateinapassareexplainedin
“OpenGLStateonpage 129.)
Effect Parameters
Handlestoeffectparameterscanberetrievedusing
cgGetNamedEffectParameter().Givensuchahandle,thenameofthe
parametercanbefoundwithcgGetParameterName(),itsvaluecanbeset
usingtheCgruntimevaluesettingentrypoints,andsoon.
Vertex and Fragment Programs
WiththeOpenGLstatemanager,vertexandfragmentprogramsaredefined
viaassignmentstotheVertexProgramandFragmentProgramstates,
respectively.Threedifferentclassesofexpressionscanbegivenontheright
handsideofthesestateassignments:
Compilestatements
CGparameter c = cgGetNamedEffectParameter(effect, "Color");
cgSetParameter3fv(c, Color);
CGparameter mvp = cgGetNamedEffectParameter(effect,
"ModelViewProjection");
cgGLSetStateMatrixParameter(mvp,
CG_GL_MODELVIEW_PROJECTION_MATRIX,
CG_GL_MATRIX_IDENTITY);
122 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Inlineassembly
NULL
Thesethreepossibilitiesaredemonstratedintheeffectfilebelow:
Themostcommonofthesethreeoptionsforspecifyingprogramsisusing
compilestatements.Thefirstargumentfollowingthecompilekeywordis
thenameoftheprofiletowhichtheprogramistobecompiled(forexample,
fp30,fp40,arbfp1,orvp20).Thenextargumentgivesthenameofthe
functionintheeffectfilethatservesastheprogramentrypoint,followedby
alistofexpressions(forexample,-2.f).Theseexpressionshaveaonetoone
correspondencewiththeuniformparametersoftheprogrambeing
compiled—theremustbeexactlyoneforeachuniformprogramparameter,
nomore,andnoless.
Intheexampleabove,theexpression-2.fsetsthevalueforthefoo
parametertomain().Becauseitisaliteralvalue,CgFXisabletocompilethe
programtoaparticularlyefficientversionthatjustincludesreturningtheuv
value.
Itisalsopossibletoincludereferencestoeffectparametersintheexpression
usedinthecompilestatement;forexample:
float4 main(uniform float foo, float4 uv : TEXCOORD0) : COLOR
{
return (foo > 0) ? uv : 2 * uv;
}
technique SimpleFrag {
pass {
VertexProgram = NULL;
FragmentProgram = compile arbfp1 main(-2.f);
}
}
technique AsmFrag {
pass {
FragmentProgram = asm {
!!FP1.0
TEX o[COLR], {0}.x, TEX6, 2D;
END
};
}
float4 main(uniform float foo, float4 uv : TEXCOORD0) : COLOR
{
return (foo > 0) ? uv : 2 * uv;
808-00504-0000-006 123
NVIDIA
Introduction to CgFX
Here,thevalue2 * barisassociatedwiththefooparameterofmain().
Whenthevalueofbarischangedbytheapplication,thevalueoffooin
main()issetappropriately.
Thesecondclassofprogramstateassignmenttypesisassemblycode.Inline
assemblyisindicatedusingtheasmkeyword,withtheassemblylanguage
codebetweenbraces,asintheexampleabove.CgFXdependsonhavingthe
appropriateheaderatthestartoftheassembly—!!FP1.0forfp30,
!!ARBvp1.0forarbvp1,andsoon—todeterminetheprofileforwhichthe
codeisgiven.
Finally,vertexorfragmentprogramsmaybeassignedthevalueNULLinthe
stateassignment.Thissignifiesthatnosuchprogramshouldbeusedinthis
pass.
Textures and Samplers
CgFXalsomakesitpossibletodefinestaterelatedtotexturesintheeffect
file.Theeffectfilebelowshowsanexample.Thefullsetofsupported
OpenGLtexturestateislistedin“OpenGLStateonpage 129.
}
float bar;
technique NewSimpleFrag {
pass {
VertexProgram = NULL;
FragmentProgram = compile arbfp1 main(2 * bar);
}
}
sampler2D samp = sampler_state {
generateMipMap = true;
minFilter = LinearMipMapLinear;
magFilter = Linear;
};
float4 texsimple(uniform sampler2D sampler,
float2 uv : TEXCOORD0) : COLOR {
return tex2D(sampler, uv);
}
technique TextureSimple {
pass {
FragmentProgram = compile arbfp1 texsimple(samp);
124 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Giventhiseffectfile,theapplicationmusttakeanextrasteportwowhen
settingupthetextureinOpenGL.First,theapplicationmustindicatewhich
texturehandleshouldbeusedforthesampler2Dintheeffectfile.Secondly,
theapplicationmustusetheCgruntimetosetthetexturestategiveninthe
sampler_stateblockattheappropriatetime.
UnderOpenGL,theeasiestwaytoachievethesegoalsistocall
cgGLSetupSampler(param, textureID).Thisentrypointsbindsthegiven
texture,associatesthetexturehandlewiththegivenparameter,and
initializesthesamplerstatebycallingcgSetSamplerState().
Alternately,anapplicationcanperformthesestepsitself.Thecodebelow
showsthisinpractice:
NotethecallstocgGLSetTextureParameter()andcgSetSamplerState().
Thefirstcallistheusualruntimecallthatneedstobemadetotellthe
runtimewhichOpenGLtextureobjectisassociatedwithagivenparameter.
ThecgSetSamplerState()callendsupmakingtheglTexParametercalls
thatsetupthetexturestatedefinedinthesampler_stateblock.Itexpects
thattheappropriatetextureobjecthasbeenboundwithglBindTexturefirst.
Afterthesamplerhasbeeninitializedineitherofthesemanners,thereare
twopossibilitiesforhowthetextureparametersaremanaged.Byfarthe
easiestmethodistoenabletexturemanagementinthecontext:
Ifthisisdone,thenwhentheCGprogramisboundbyacallto
cgSetPassState(),thetextureparametersusedareassociatedwiththe
appropriatehardwaretextureunitsautomatically.
}
}
CGparameter p = cgGetNamedEffectParameter(effect, "samp");
GLuint handle;
glGenTextures(1, &handle);
glBindTexture(GL_TEXTURE_2D, handle);
cgGLSetTextureParameter(p, handle);
cgSetSamplerState(p);
...
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, RES, RES, 0, GL_RGBA,
GL_FLOAT, data);
cgGLSetManageTextureParameters(context, CG_TRUE);
808-00504-0000-006 125
NVIDIA
Introduction to CgFX
Alternatively,themappingoftextureparameterstohardwareunitscanbe
handledexplicitlybytheapplication,usingtheroutine
cgGLEnableTextureParameter():
However,notethatitisnotpossibletocallcgGLEnableTextureParameter()
withahandletoaneffect’ssamplerparameter;thehandlemustbetoan
actualprogramparameter.
Ingeneral,thefirstapproachistobepreferredforitssimplicity.
Interfaces and Unsized Arrays
CgFXalsosupportsCg’sinterfacesandunsizedarraysfeatures.Givenan
effectfilewithCgprogramsthatusethesefeatures,thecompilestatement
canbeusedintwodifferentwaystoresolvetheinterfacesandunsized
arrayssothattheprogramcanbecompiled.Theabstracttypesmaybe
resolvedusingCgcodeitself,ortheymayberesolvedusingtheCgruntime.
Considerthefollowingexample:aLightinterfacehasbeendefinedwith
SpotLightimplementingtheinterface.Themain()programtakesan
unsizedarrayofLightinterfaceobjects,loopsoverthem,andreturnsthe
sumofthevaluesreturnedbytheirrespectivevalue()methods.
Recallthatalluniformparameterstotheprogrammusthaveexpressionsin
theparenthesizedlistinthecompilestatement,andthereforeoneexpression
isnecessaryhereforthelparameter.
CGparameter progParam = cgGetNamedParameter(prog, "sampler");
cgGLEnableTextureParameter(progParam);
interface Light {
float4 value();
};
struct SpotLight : Light {
float4 value() { return float4(1,2,3,4); }
};
float4 main(uniform Light l[]) : COLOR {
float4 v = float4(0,0,0,0);
for (int i = 0; i < l.length; ++l)
v += l[i].value();
return v;
}
126 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Resolution using Cg
Thefirstwaythatmain()canbecompiledistoprovidethenameofaneffect
parameterthatresolvesboththeactualsizeofthearrayaswellasthe
concretetypethatimplementstheLightinterface:
Resolution using the Cg runtime
Alternatively,theapplicationcanleavetheresolutionoftheconcretetypes
andarraysizeuntillatersothattheymaybesetviaCgruntimecallsfrom
theapplication,asonetypicallydoesforCgprogramsthatarenotCgFX.
Forthiscase,theexpressionpassedtothecompilestatementshouldjustbe
anunsizedarrayoftheabstractinterfacetype:
Theapplicationmustthencreateasharedarrayofconcretelightinstances.
Todoso,theapplicationproceedsasitwouldwhenoperatingona
CGprogram—byretrievingtheCGtypecorrespondingtoeachtypeofconcrete
instancetobecreated,andcallingcgCreateParameter()or
cgCreateParameterArray()tocreatethesharedparameterofthegiven
type.Lastly,thesharedparameterisconnectedtotheeffectparameter.
Thisprocessisillustratedbelow:
NotethatcgGetNamedUserType()inthiscaseispassedaCGeffecthandle,
ratherthanaCGprogramhandle.
SpotLight spots[4];
technique {
pass {
FragmentProgram = compile arbfp1 main(spots);
}
}
Light lights[];
technique {
pass {
FragmentProgram = compile arbfp1 main(lights);
}
}
CGtype spotType = cgGetNamedUserType(effect, "SpotLight");
CGparameter spots = cgCreateParameterArray(context,
spotType, 4);
CGparameter lights = cgGetNamedEffectParameter(effect,
"lights");
cgConnectParameter(spots, lights);
808-00504-0000-006 127
NVIDIA
Introduction to CgFX
Later,whentheassociatedtechniqueisvalidated,anyprogramsthatmake
useoftheabstracteffectparametersarecompiled.
Notethatabstractparametersmaynotbeusedontherighthandsideofany
stateassignmentsotherthancompilestateassignments.Doingsoresultsin
anerrorateffectcreationtime.
Evaluating Cg Programs using the Virtual Machine
TherearemanysituationswhereitisusefultoexecuteCgprogramsonthe
CPUusingtheCgruntimeVirtualMachine(VM).AlthoughrunningCg
programsontheCPUdoesnʹtofferthesameperformanceasexecutiononthe
GPU,itissometimesuseful,asintabularizingcomplexfunctionsintotexture
maps.
ProgramsthataretorunontheVMaredeclaredasfollows:
ThePOSITIONsemanticdenotestheparameterorparametersthatare
initializedwiththecoordinatesofeachpointatwhichthefunctionis
evaluated.Thevaluepassedvariesfromzerotooneineachofthe
dimensionsoverwhichthefunctionisbeingevaluated.ThePSIZEsemantic
denotestheparameterthatisinitializedwiththespacingbetweensamplesat
whichthefunctionisbeingevaluated.Lastly,theCOLORsemanticdenotes
whichparameter(orfunctionreturnvalue)holdsthecomputedvalue.Thus,
thefunctionabovecouldhavebeenwrittenasavoidfunctionbutwithan
out float4 ret :COLORparameterandanassignmenttoret,insteadof
usingareturnstatement.
Givenaneffectfilewithsuchaprogram,aCGprogramhandletoitcanbe
retrievedbycreatingaprogramusingtheCG_PROFILE_GENERICprofile:
Givensuchaprogramhandle,cgEvaluateProgramevaluatestheprogram
overthesameone,two,orthreedimensionaldomain:
WhereprogistheCgprogramhandleretrievedusing
cgCreateProgramFromEffect(),obufisthebuffertowhichoutputvalues
float foo = 4.f;
float4 func(float2 p : POSITION, float2 delta : PSIZE) : COLOR
{
return foo * p.xyxy;
}
CGprogram tp = cgCreateProgramFromEffect(effect,
CG_PROFILE_GENERIC,
"func", NULL);
cgEvaluateProgram(Cgprogram prog, float *obuf, int ncomp,
int nx, int ny, int nz);
128 808-00504-0000-006
NVIDIA
Cg Language Toolkit
aretobewritten,ncompisthenumberofcomponentsperpixelintheoutput
buffer(1,2,3,or4),andnx,ny,andnzindicatethenumberofpositionsat
whichthefunctionshouldbeevaluatedineachofthex,y,andzdimensions.
Thetotalsizeofthebuffershouldbeequaltotheproductofthenumberof
positionsineachofthedimensionsandthenumberofcomponentsinthe
buffer,asintheexamplebelow:
ItisaerrortopassaCGprogramthatdoesnʹthavetheCG_PROFILE_GENERIC
profiletocgEvalauteProgram().
Annotations
Usingannotations,itispossibletoattachadditionalinformationto
parameters,techniques,programs,andpassesintheeffectfileforusebythe
application.Anannotationisalistofvariablesandvaluesdenotedbyangle
bracketsimmediatelyfollowingadeclaration,asintheeffectbelow:
CgFXdoesnotinterpretthemeaningofannotationsinanyway;annotations
existsolelyfortheconvenienceoftheapplication.Theexampleaboveshows
afewcommonusesforannotations:theannotationofLightDirindicates
whatsortofuserinterfacewidgetwouldbeappropriatetoprovidetheuser
forsettingthatparameter.Thetechnique’sannotationmightindicatethat
applyingthetechniquewasoptionalwhenrenderingthescene.Inthe
exampleabove,thepassannotationsindicatestotheapplicationwhichpart
ofthescenegeometrytodrawwhenrenderingthatpass,aswellaswhereto
storetheimagefromrenderingthepass.
#define RES 256
#define NCOMPS 4
float *buf = new float[NCOMPS*RES*RES];
cgEvaluateProgram(tp, buf, NCOMPS, RES, RES, 1);
// do something with buf
delete[] buf;
float3 LightDir < string UItype = "direction"; >;
technique fancyHalo <
bool optional = true;
> {
pass < string geometry = "character";
string destination = "texture"; > {
...
}
}
808-00504-0000-006 129
NVIDIA
Introduction to CgFX
Givenahandletoatechnique,pass,orparameter,thereareAPIentrypoints
foriteratingthroughtheannotationsinturn:
Inaddition,thereareentrypointsforretrievingannotationsbyname:
Givenanannotationhandle,itsvaluesmayberetrievedthroughtheuseof
oneofthecgGet*AnnotationValues()entrypoints:
OpenGL State
WhencgGLRegisterStates()iscalled,theCgFXOpenGLruntime
initializesstateassignmentsthatcorrespondtoalmostallappropriateor
usefulOpenGLAPIcalls.Thesetofstatesandstatecallbacksthatare
registeredbythiscallcomposetheCgFXOpenGLstatemanager.
Thereisaonetoonemappingbetweenthestateassignmentsthatare
providedbytheOpenGLstatemanagerandthecorrespondingOpenGL
calls.GivenanOpenGLcallofinterest,itisintendedtobesimpleto
determinewhichstateassignmentitcorrespondsto,andviceversa.For
example,thestateassignmentClearColor = float4(0,1,0,1)leadstothe
callglClearColor(0,1,0,1)whenthestateassignmentisexecutedduring
acalltocgSetPassState().
Forcallsthattakeenumeratedvalues(forexample,GL_DEST_COLORfor
glBlendFunc()),correspondingenumerantsaredefinedbytheCgFX
CGannotation cgGetFirstTechniqueAnnotation(CGtechnique);
CGannotation cgGetFirstPassAnnotation(CGpass);
CGannotation cgGetFirstParameterAnnotation(CGparameter);
CGannotation cgGetFirstProgramAnnotation(CGprogram);
CGannotation cgGetNextAnnotation(CGannotation);
CGannotation cgGetNamedTechniqueAnnotation(CGtechnique,
const char *);
CGannotation cgGetNamedPassAnnotation(CGpass, const char *);
CGannotation cgGetNamedParameterAnnotation(CGparameter,
const char *);
CGannotation cgGetNamedProgramAnnotation(CGprogram,
const char *);
const float *cgGetFloatAnnotationValues(CGannotation,
int *nvalues);
const int *cgGetIntAnnotationValues(CGannotation,
int *nvalues);
const char *cgGetStringAnnotationValue(CGannotation);
const int *cgGetBooleanAnnotationValues(CGannotation,
int *nvalues);
130 808-00504-0000-006
NVIDIA
Cg Language Toolkit
OpenGLstatemanager,againwithastraightforwardmapping:
GL_DEST_COLORcorrespondstoDestColor,andsoforth.WhenanOpenGL
calltakesmultipleparametersormultipleenumerants,acorresponding
vectortypeisused;forexample,acalltoglBlendFunc(GL_ZERO,
GL_DST_ALPHA)correspondstotheCgFXstateassignmentBlendFunc =
int2(Zero,DstAlpha).
WhenastateassignmentdependsonthepresenceofanOpenGLextension
(forexample,BlendFuncSeparaterequireseither
EXT_blend_func_separateorthepresenceofOpenGL1.4),itispossibleto
successfullyloadaneffectfilethatusesthatextensioninoneofits
techniques,eveniftheOpenGLcontextdoesnʹtsupportthatextension.
However,validationofanytechniquethatusessuchanunsupported
extensioninofitspasseswillfail.
ThefollowingtableliststhenamesofthestatessupportedbytheCgFX
OpenGLstatemanager,theirtypes,andvalidenumerants.The“Requires”
columninthetablesbelowindicateswhatOpenGLversionorextensionis
requiredforeachstateassignment.
Table 6. CgFX OpenGL State Manager States
State Name Type Valid Enumerants Requires
AlphaFunc float2
(enum,
reference_
value)
Never, Less,
LEqual, Equal,
Greater, NotEqual,
GEqual, Always
OpenGL 1.0
BlendFunc int2 (src_
factor,
dst_factor)
Zero, One,
DestColor,
OneMinusDestColor,
SrcAlpha,
OneMinusSrcAlpha,
DstAlpha,
OneMinusDstAlpha,
SrcAlphaSaturate,
SrcColor,
OneMinusSrcColor,
ConstantColor,
OneMinusConstantColor,
ConstantAlpha,
OneMinusConstantAlpha
1.0; 1.4 or
NV_blend_square for
SrcColor or
OneMinusSrcColor for
src_factor, and
DstColor or
OneMinusDstColor for
dst_factor
808-00504-0000-006 131
NVIDIA
Introduction to CgFX
BlendFuncSeparate int4
(rgb_src,
rgb_dst,
a_src,
a_dst)
Zero, One,
DestColor,
OneMinusDestColor,
SrcAlpha,
OneMinusSrcAlpha,
DstAlpha,
OneMinusDstAlpha,
SrcAlphaSaturate,
SrcColor,
OneMinusSrcColor,
ConstantColor,
OneMinusConstantColor,
ConstantAlpha,
OneMinusConstantAlpha
OpenGL 1.4 or
EXT_blend_func_separate;
1.4 or NV_blend_square
for SrcColor or
OneMinusSrcColor for
rgb_src, and DstColor or
OneMinusDstColor for
rgb_dst
BlendEquation int FuncAdd,
FuncSubtract, Min,
Max, LogicOp
1.4 or ARB_imaging; or
EXT_blend_subtract for
FuncSubtract or
FuncReverseSubtract;
or EXT_blend_minmax for
Min or Max; or
EXT_blend_logic_op for
LogicOp
BlendEquationSeparate int2 (rgb,
alpha) FuncAdd,
FuncSubtract, Min,
Max, LogicOp
EXT_blend_equation_
separate; or 1.4,
ARB_imaging, or
EXT_blend_subtract for
FuncSubtract or
FuncReverseSubtract; or
1.4, ARB_imaging, or
EXT_blend_minmax for
Min or Max; or
EXT_blend_logic_op for
LogicOp
BlendColor float4 1.4, ARB_imaging, or
EXT_blend_color
ClearColor float4 1.0
ClearStencil int 1.0
ClearDepth float 1.0
Table 6. CgFX OpenGL State Manager States (continued)
State Name Type Valid Enumerants Requires
132 808-00504-0000-006
NVIDIA
Cg Language Toolkit
ClipPlane[ndx] float4 OpenGL 1.0; ndx must be
greater than or equal to zero
and less than the value of
GL_MAX_CLIP_PLANES
ColorMask bool4 1.0
ColorMatrix float4x4 ARB_imaging
ColorMaterial int2 Front, Back,
FrontAndBack,
Emission, Ambient,
Diffuse, Specular,
AmbientAndDiffuse
1.0
CullFace int Front, Back,
FrontAndBack 1.0
DepthBounds float2 EXT_depth_bounds_test
DepthFunc int Never, Less,
LEqual, Equal,
Greater, NotEqual,
GEqual, Always
1.0
DepthMask bool 1.0
DepthRange float2 1.0
FogMode int Linear, Exp, Exp2 1.0
FogDensity float 1.0
FogStart float 1.0
FogEnd float 1.0
FogColor float4 1.0
FragmentEnvParameter
[ndx] float4 ARB_fragment_program;
ndx must be greater than or
equal to zero and less than
the value of
GL_MAX_PROGRAM_ENV_
PARAMETERS_ARB for the
GL_FRAGMENT_PROGRAM_
ARB target to
glGetProgramivARB
Table 6. CgFX OpenGL State Manager States (continued)
State Name Type Valid Enumerants Requires
808-00504-0000-006 133
NVIDIA
Introduction to CgFX
FragmentLocalParameter
[ndx] float4 ARB_fragment_program;
ndx must be greater or
equal to zero and less than
the value of
GL_MAX_PROGRAM_LOCAL_
PARAMETERS_ARB for the
GL_FRAGMENT_PROGRAM_ARB
target to
glGetProgramivARB
FogCoordSrc int FragmentDepth,
FogCoord OpenGL 1.4 or
EXT_fog_coord
FogDistanceMode int EyeRadial,
EyePlane,
EyePlaneAbsolute
NV_fog_distance
FragmentProgram compile
statement ARB_fragment_program
or NV_fragment_program
FrontFace int CW, CCW 1.0
LightModelAmbient float4 1.0
LightAmbient[ndx] float4 1.0; ndx must be greater or
equal to 0 and less than the
value of GL_MAX_LIGHTS
LightConstantAttenuation
[ndx]
float Same as LightAmbient
LightDiffuse[ndx] float4 Same as LightAmbient
LightLinearAttenuation
[ndx]
float Same as LightAmbient
LightPosition[ndx] float4 Same as LightAmbient
LightQuadraticAttenuation[
ndx]
float Same as LightAmbient
LightSpecular[ndx] float4 Same as LightAmbient
LightSpotCutoff[ndx] float Same as LightAmbient
LightSpotDirection[ndx] float3 Same as LightAmbient
Table 6. CgFX OpenGL State Manager States (continued)
State Name Type Valid Enumerants Requires
134 808-00504-0000-006
NVIDIA
Cg Language Toolkit
LightSpotExponent
[ndx] float Same as LightAmbient
LightModelColorControl int SingleColor,
SeparateSpecular OpenGL 1.2 or
EXT_separate_
specular_color
LineStipple int2 1.0
LineWidth float 1.0
LogicOp int Clear, And,
AndReverse, Copy,
AndInverted, Noop,
Xor, Or, Nor,
Equiv, Invert,
OrReverse,
CopyInverted,
Nand, Set
1.0
MaterialAmbient float4 1.0
MaterialDiffuse float4 1.0
MaterialEmission float4 1.0
MaterialShininess float 1.0
MaterialSpecular float4 1.0
ModelViewMatrix float4x4 1.0
PointDistanceAttenuation float3 1.4,
ARB_point_parameters,
or
EXT_point_parameters
PointFadeThresholdSize float 1.4,
ARB_point_parameters,
or
EXT_point_parameters
PointSize float 1.0
PointSizeMin float 1.4,
ARB_point_parameters,
or
EXT_point_parameters
Table 6. CgFX OpenGL State Manager States (continued)
State Name Type Valid Enumerants Requires
808-00504-0000-006 135
NVIDIA
Introduction to CgFX
PointSizeMax float OpenGL 1.4,
ARB_point_parameters,
or
EXT_point_parameters
PointSpriteCoordOrigin int LowerLeft,
UpperLeft 2.0
PointSpriteCoordReplace
[ndx]
bool 2.0, ARB_point_sprite,
or NV_point_sprite; ndx
must be greater than or
equal to zero and less than
the value of
GL_MAX_TEXTURE_COORDS
PointSpriteRMode int Zero, R, S NV_point_sprite
PolygonMode int2 Front, Back,
FrontAndBack,
Point, Line, Fill
1.0
PolygonOffset float2 1.1
ProjectionMatrix float4x4 1.0
Scissor int4 1.0
ShadeModel int Flat, Smooth 1.0
StencilFunc int3 Never, Less,
LEqual, Equal,
Greater, NotEqual,
GEqual, Always
1.0
StencilMask int 1.0
StencilOp int3 Keep, Zero,
Replace, Incr,
Decr, Invert,
IncrWrap, DecrWrap
1.0
StencilFuncSeparate int4 Front, Back,
FrontAndBack,
Never, Less,
LEqual, Equal,
Greater, NotEqual,
GEqual, Always
2.0 or
EXT_stencil_two_side
Table 6. CgFX OpenGL State Manager States (continued)
State Name Type Valid Enumerants Requires
136 808-00504-0000-006
NVIDIA
Cg Language Toolkit
StencilMaskSeparate int2 Front, Back,
FrontAndBack OpenGL 2.0 or
EXT_stencil_two_side
StencilOpSeparate int4 Keep, Zero,
Replace, Incr,
Decr, Invert,
IncrWrap, DecrWrap
2.0 or
EXT_stencil_two_side
TexGenSMode[ndx] int ObjectLinear,
EyeLinear,
SphereMap,
ReflectionMap,
NormalMap
1.0; or 1.3,
ARB_texture_cube_map,
EXT_texture_cube_map, or
NV_texgen_reflection for
ReflectionMap, or
NormalMap; ndx must be
greater or equal to zero and
less than the value of
GL_MAX_TEXTURE_COORDS
TexGenTMode[ndx] int Same as TexGenSMode
TexGenRMode[ndx] int ObjectLinear,
EyeLinear,
ReflectionMap,
NormalMap
1.0; or 1.3,
ARB_texture_cube_map,
EXT_texture_cube_map, or
NV_texgen_reflection for
ReflectionMap or
NormalMap; ndx must be
greater or equal to zero and
less than the value of
GL_MAX_TEXTURE_COORDS
TexGenQMode[ndx] int ObjectLinear,
EyeLinear 1.0; ndx must be greater or
equal to zero and less than
the value of
GL_MAX_TEXTURE_COORDS
TexGenSEyePlane[ndx] float4 1.0; ndx must be greater or
equal to zero and less than
the value of
GL_MAX_TEXTURE_COORDS
TexGenTEyePlane[ndx] float4 Same as
TexGenSEyePlane
TexGenREyePlane[ndx] float4 Same as
TexGenSEyePlane
Table 6. CgFX OpenGL State Manager States (continued)
State Name Type Valid Enumerants Requires
808-00504-0000-006 137
NVIDIA
Introduction to CgFX
TexGenQEyePlane[ndx] float4 Same as
TexGenSEyePlane
TexGenSObjectPlane
[ndx] float4 Same as
TexGenSEyePlane
TexGenTObjectPlane
[ndx] float4 Same as
TexGenSEyePlane
TexGenRObjectPlane
[ndx] float4 Same as
TexGenSEyePlane
TexGenQObjectPlane
[ndx] float4 Same as
TexGenSEyePlane
Texture1D[ndx] sampler1D OpenGL 1.0; ndx must be
greater or equal to zero and
less than the value of
GL_MAX_TEXTURE_IMAGE_
UNITS
Texture2D[ndx] sampler2D Same as Texture1D
Texture3D[ndx] sampler3D 1.2 or EXT_texture3D;
ndx must be greater or
equal to zero and less than
the value of
GL_MAX_TEXTURE_IMAGE_
UNITS
TextureRectangle[ndx] samplerRECT ARB_texture_rectangle,
EXT_texture_rectangle
(Apple), or
NV_texture_rectangle;
ndx must be greater or
equal to zero and less than
the value of
GL_MAX_TEXTURE_IMAGE_
UNITS
Table 6. CgFX OpenGL State Manager States (continued)
State Name Type Valid Enumerants Requires
138 808-00504-0000-006
NVIDIA
Cg Language Toolkit
TextureCubeMap[ndx] samplerCUBE 1.3,
ARB_texture_cube_map,
or
EXT_texture_cube_map;
ndx must be greater or
equal to zero and less than
the value of
GL_MAX_TEXTURE_IMAGE_
UNITS
TextureEnvColor[ndx] float4 OpenGL 1.0; ndx must be
greater or equal to zero and
less than the value of
GL_MAX_TEXTURE_UNITS
TextureEnvMode[ndx] int Modulate, Decal,
Blend, Replace,
Add
1.0; 1.3,
ARB_texture_env_add, or
EXT_texture_env_add for
Add; ndx must be greater or
equal to zero and less than
the value of
GL_MAX_TEXTURE_UNITS
VertexEnvParameter
[ndx] float4 ARB_vertex_program;
ndx must be greater or
equal to zero and less than
the value of
GL_MAX_PROGRAM_LOCAL_
PARAMETERS_ARB for the
GL_VERTEX_PROGRAM_ARB
target to
glGetProgramivARB
VertexLocalParameter
[ndx] float4 ARB_vertex_program;
ndx must be greater or
equal to zero and less than
the value of
GL_MAX_PROGRAM_LOCAL_
PARAMETERS_ARB for the
GL_VERTEX_PROGRAM_ARB
target to
glGetProgramivARB
VertexProgram compile
statement ARB_vertex_program or
NV_vertex_program
Table 6. CgFX OpenGL State Manager States (continued)
State Name Type Valid Enumerants Requires
808-00504-0000-006 139
NVIDIA
Introduction to CgFX
Similarly,thereisasimplealgorithmfordeterminingtherelationship
betweenenumerantsforglEnable()andforglDisable()andeachofthe
statesinthetablebelow:forexample,thestateassignmentBlendEnable =
falsecorrespondstoacalltoglDisable(GL_BLEND).
Table 7. Enable/Disable States
Enable/Disable State Name Type Requires
AlphaTestEnable bool OpenGL 1.0
AutoNormalEnable bool 1.0
BlendEnable bool 1.0
ClipPlaneEnable[ndx] bool 1.0; ndx must be greater or equal to zero and less
than the value of GL_MAX_CLIP_PLANES
ColorLogicOpEnable bool 1.2
CullFaceEnable bool 1.0
DepthBoundsEnable bool EXT_depth_bounds
DepthClampEnable bool NV_depth_clamp
DepthTestEnable bool 1.0
DitherEnable bool 1.0
FogEnable bool 1.0
LightEnable[ndx] bool 1.0; ndx must be greater or equal to 0 and less than
the value of GL_MAX_LIGHTS
LightingEnable bool 1.0
LightModelLocalViewerEnable bool 1.0
LightModelTwoSideEnable bool 1.0
LineSmoothEnable bool 1.0
LineStippleEnable bool 1.0
LogicOpEnable bool 1.0
MultisampleEnable bool 1.3 or ARB_multisample
NormalizeEnable bool 1.0
PointSmoothEnable bool 1.0
140 808-00504-0000-006
NVIDIA
Cg Language Toolkit
PointSpriteEnable bool 2.0, ARB_point_sprite, or NV_point_sprite
PolygonOffsetFillEnable bool OpenGL 1.1
PolygonOffsetLineEnable bool 1.1
PolygonOffsetPointEnable bool 1.1
PolygonSmoothEnable bool 1.0
PolygonStippleEnable bool 1.0
RescaleNormalEnable bool 1.2 or EXT_rescale_normal
SampleAlphaToCoverageEnable bool 1.3 or ARB_multisample
SampleAlphaToOneEnable bool 1.3 or ARB_multisample
SampleCoverageEnable bool 1.3 or ARB_multisample
ScissorTestEnable bool 1.0
StencilTestEnable bool 1.0
TexGenSEnable[ndx] bool 1.0; ndx must be greater or equal to zero and less
than the value of GL_MAX_TEXTURE_COORDS
TexGenTEnable[ndx] bool Same as TexGenSEnable
TexGenREnable[ndx] bool Same as TexGenSEnable
TexGenQEnable[ndx] bool Same as TexGenSEnable
Texture1DEnable[ndx] bool 1.0; ndx must be greater or equal to zero and less
than the value of GL_MAX_TEXTURE_IMAGE_UNITS
Texture2DEnable[ndx] bool same as Texture1DEnable
Texture3DEnable[ndx] bool 1.2 or EXT_texture3D; ndx must be greater or
equal to zero and less than the value of
GL_MAX_TEXTURE_IMAGE_UNITS
Table 7. Enable/Disable States (continued)
Enable/Disable State Name Type Requires
808-00504-0000-006 141
NVIDIA
Introduction to CgFX
OpenGL Sampler State
Thefollowingtableliststhestateassignmentsavailableinsampler_state
blockswhenusingtheCgFXOpenGLstatemanager.Anystatevaluesgiven
aresetwhenthecgSetSamplerState()routineiscalledwiththe
CGparameterhandleforaparticularsample.
NotethatsomeofthesestatesaredefinedinOpenGLextensions—for
example,MirrorClampToBorderisdefinedinthe
EXT_texture_mirror_clampextension.Anystateusedthatisbasedonan
extensionnotsupportedbythecurrentOpenGLcontextisignoredbythe
CgFXruntime.
TextureRectangleEnable[ndx] bool ARB_texture_rectangle,
EXT_texture_rectangle (Apple), or
NV_texture_rectangle; ndx must be greater or
equal to zero and less than the value of
GL_MAX_TEXTURE_IMAGE_UNITS
TextureCubeMapEnable[ndx] bool OpenGL 1.3, ARB_texture_cube_map, or
EXT_texture_cube_map; ndx must be greater or
equal to zero and less than the value of
GL_MAX_TEXTURE_IMAGE_UNITS
Table 7. Enable/Disable States (continued)
Enable/Disable State Name Type Requires
Table 8. sampler_state State Assignments
Name Type Valid Values Requires
WrapS, WrapT,
WrapR int Repeat, Clamp,
ClampToEdge,
ClampToBorder,
MirroredRepeat,
MirrorClamp,
MirrorClampToEdge,
MirrorClampToBorder
OpenGL 1.2 or EXT_texture3D for
WrapR; 1.2 or
EXT_texture_edge_clamp for
ClampToEdge; 1.3 or
ARB_texture_border_clamp for
ClampToBorder; 1.4,
ARB_texture_mirrored_repeat, or
IBM_texture_mirrored_repeat for
MirroredRepeat;
EXT_texture_mirror_clamp or
ATI_texture_mirror_once for
MirrorClamp or MirrorClampToEdge;
EXT_texture_mirror_clamp for
MirrorClampToBorder
142 808-00504-0000-006
NVIDIA
Cg Language Toolkit
OpenGL State Not Specifiable with State Assignments
Bydesign,stateassignmentsarelimitedtoOpenGLstaterelatedto
renderinggeometricprimitives.OpenGLstatethatisnotassignableusing
thebuiltinOpenGLstatemanagerincludesthefollowing:
Pixelpathstate(suchaspixeltransferandconvolutionstate)
Pervertexattributes(suchasglColororglNormal)
Clientsidestatesuchasvertexarraysandpixelstoremodes
BorderColor float4 OpenGL 1.0
CompareMode int None,
CompareRToTexture 1.4 or ARB_shadow
CompareFunc int Never, Less, LEqual,
Equal, Greater,
NotEqual, GEqual,
Always
1.4 or ARB_shadow; 1.5 or
EXT_shadow_funcs for Never, Less,
Equal, Greater, NotEqual, or Always
DepthMode int Alpha, Intensity,
Luminance 1.4 or ARB_depth_texture
GenerateMipMa
p bool 1.4 or SGIS_generate_mipmap
LODBias float 1.4
MinFilter int Nearest, Linear,
LinearMipMapNearest,
NearestMipMapNearest,
NearestMipMapLinear,
LinearMipMapLinear
1.0
MagFilter int Nearest, Linear 1.0
MaxMipLevel float 1.2 or EXT_texture_lod
MaxAnisotropy float EXT_texture_filter_anisotropic
MinMipLevel float 1.2 or EXT_texture_lod
Texture texture (Reference to texture
parameter)
Table 8. sampler_state State Assignments (continued)
Name Type Valid Values Requires
808-00504-0000-006 143
NVIDIA
Introduction to CgFX
Vertexandpixelbufferobjectstate
Miscellaneousstateforevaluators,feedback,selection,orocclusion
queries
TextureenvironmentGL_COMBINEstate
Althoughrelatedtorendering,itiscomplexandredundantwith
fragmentcoloroperationsbetterspecifiedwithCgfragmentprograms.
Futureenhancementsmayallowassignmentsforcurrentlyunassignable
OpenGLstate.
144 808-00504-0000-006
NVIDIA
Cg Language Toolkit
808-00504-0000-006 145
NVIDIA
A Brief Tutorial
ThissectionwalksyouthroughthesampleCgMicrosoftVisualStudio
workspacewehaveprovided,alongwithasimpleCgprogramthatyoucan
useforexperimentation.
Loading the Workspace
WhenyouloadtheCg_Simplefile,yourworkspaceshouldlooklikethe
imageinFig. 3.
Fig. 3. The Cg_Simple Workspace
146 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Asusual,clicktheFileViewtabtoviewthevariousfilesintheproject.
What’sdifferentinthiscase,though,isthatinadditiontotheusualSource
FilesandHeaderFilesfolders,thereisalsoaCgProgramsfolder.
ThisCgProgramsfoldershouldcontainoneCgprogram,simple.cg,which
iswhatyoucanuseforexperimentation.Doubleclicksimple.cgtoopenit
forediting.Whileyouareeditingsimple.cg,youcanpressControl+F7at
anytimetocompileit.Becauseofthewaytheprojectissetup,anyerrorsin
yourcodewillbeshownjustaswhenyoucompileanormalCorC++
program.
Youcanalsodoubleclickonanerror,whichtakesyoutothelocationinthe
sourcecodethatcausedtheerror.
Understanding simple.cg
TheCg_Simpleapplicationrunstheshaderdefinedinsimple.cgonatorus.
Theprovidedversionofsimple.cgcalculatesdiffuseandspecularlighting
foreachvertex.AscreenshotoftheshaderisshowninFig. 4.
Fig. 4. The simple.cg Shader
808-00504-0000-006 147
NVIDIA
A Brief Tutorial
Program Listing for simple.cg
Thefollowingistheprogramlistingforsimple.cg:
// Define inputs from application.
struct appin
{
float4 Position : POSITION;
float4 Normal : NORMAL;
};
// Define outputs from vertex shader.
struct vertout
{
float4 HPosition : POSITION;
float4 Color : COLOR;
};
vertout main(appin IN,
uniform float4x4 ModelViewProj,
uniform float4x4 ModelViewIT,
uniform float4 LightVec)
{
vertout OUT;
// Transform vertex position into homogenous clip-space.
OUT.HPosition = mul(ModelViewProj, IN.Position);
// Transform normal from model-space to view-space.
float3 normalVec = normalize(mul(ModelViewIT,
IN.Normal).xyz);
// Store normalized light vector.
float3 lightVec = normalize(LightVec.xyz);
// Calculate half angle vector.
float3 eyeVec = float3(0.0, 0.0, 1.0);
float3 halfVec = normalize(lightVec + eyeVec);
// Calculate diffuse component.
float diffuse = dot(normalVec, lightVec);
// Calculate specular component.
float specular = dot(normalVec, halfVec);
// Use the lit function to compute lighting vector from
148 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Definitions for Structures with Varying Data
Thefirstthingtonoticeisthedefinitionsofstructureswithbinding
semanticsforvaryingdata.
Let’stakealookattheappinstructure:
Thisstructurecontainsonlytwomembers:PositionandNormal.Because
thisdatavariespervertex,thebindingsemanticsPOSITIONandNORMALtell
thecompilerthatthepositioninformationisassociatedwiththepredefined
attributePOSITIONandthatthenormalinformationisassociatedwiththe
predefinedattributeNORMAL.
Theotherstructurethatisdefinedinsimple.cgisvertout,whichconnects
thevertextothefragment:
// diffuse and specular values.
float4 lighting = lit(diffuse, specular, 32);
// Blue diffuse material
float3 diffuseMaterial = float3(0.0, 0.0, 1.0);
// White specular material
float3 specularMaterial = float3(1.0, 1.0, 1.0);
// Combine diffuse and specular contributions and
// output final vertex color.
OUT.Color.rgb = lighting.y * diffuseMaterial +
lighting.z * specularMaterial;
OUT.Color.a = 1.0;
return OUT;
}
// define inputs from application
struct appin
{
float4 Position : POSITION;
float4 Normal : NORMAL;
};
// define outputs from vertex shader
struct vertout
{
float4 HPosition : POSITION;
float4 Color : COLOR;
};
808-00504-0000-006 149
NVIDIA
A Brief Tutorial
Thevertoutstructurealsocontainsonlytwomembers:Hposition,the
vertexpositioninhomogeneouscoordinates,andColor,thevertexcolor.
Again,bindingsemanticsareusedtospecifyregisterlocationsforthe
variables.Inthiscase,thehomogeneouspositioninformationresidesinthe
hardwareregistercorrespondingtoPOSITIONandthatthecolorinformation
residesinthehardwareregistercorrespondingtoCOLOR.
Passing Arguments
Nowlet’stakealookatthebodyoftheprogram,sectionbysection,starting
withthedeclarationofmain():
Asrequiredforavertexprogram,main()takesanapplicationtovertex
structureasinputandreturnsavertextofragmentstructure.Inthiscase,we
areusingthetwostructuretypeswehavealreadydefined:appinand
vertout.Noticethatmain()takesinthreeuniformparameters:two
matricesandonevector.Allthreeparametersarepassedtosimple.cgby
theapplication,usingtheruntimelibrary.
Thefirstmatrix,ModelViewProj,istheconcatenationofthemodelviewand
projectionmatrices.Together,thesematricestransformpointsfrommodel
spacetoclipspace.Thesecondmatrix,ModelViewIT,istheinversetranspose
ofthemodelviewmatrix.Thethirdparameter,LightVec,isavectorthat
specifiesthelocationofthelightsource.
Basic Transformations
Nowwestartthebodyofthevertexprogram:
Avertexprogramisresponsibleforcalculatingthehomogenousclipspace
positionofthevertex(giventhevertexsmodelspacecoordinates).
Therefore,thevertexsmodelspaceposition(givenbyIN.Position)needs
tobetransformedbytheconcatenationofthemodelviewandprojection
matrices(calledModelViewProjinthisexample).Thetransformedposition
isassigneddirectlytoOUT.HPosition.Notethatyouarenotresponsiblefor
vertout main(appin IN,
uniform float4x4 ModelViewProj,
uniform float4x4 ModelViewIT,
uniform float4 LightVec)
vertout OUT;
OUT.HPosition = mul(ModelViewProj, IN.Position);
150 808-00504-0000-006
NVIDIA
Cg Language Toolkit
theperspectivedivisionwhenusingvertexprograms.Thehardware
automaticallyperformsthedivisionafterexecutingthevertexprogram.
Sincewewanttodoourlightingineyespace,wehavetotransformthe
modelspacenormalIN.Normaltoeyespace:
Rememberthatwhentransformingnormals,weneedtomultiplybythe
inversetransposeofthemodelviewmatrix.Thenwenormalizetheeyespace
normalvectorandstoreitasnormalVec.
Prepare for Lighting
Thesubsequentstepsprepareforlighting:
Atthispointwehavetoensurethatallourvectorsarenormalized.Westart
bynormalizingLightVec1.Then,inpreparationforspecularlighting,we
havetodefinethe“halfangle”vectorhalfVec,whichisthevectorhalfway
betweenthelightandtheeyevectors(thatis,(lightVec+eyeVec)/2).We
normalizehalfVec,sowedon’tneedtobotherwiththedivisionbytwo,
becauseitcancelsoutafternormalizationanyway.Inthisexample,we
assumethattheeyeisat(0,0,1),butanapplicationwouldtypicallypass
theeyepositionalsoasauniformparameter,sinceitwouldbeunchanged
fromvertextovertex.WeuseCg’sinlinevectorconstructioncapabilityto
builda3componentfloatvectorthatcontainstheeyeposition,andthen
weassignthisvaluetoeyeVec.
// transform normal from model-space to view-space
float3 normalVec = normalize(mul(ModelViewIT,
IN.Normal).xyz);
// store normalized light vector
float3 lightVec = normalize(LightVec.xyz);
// calculate half angle vector
float3 eyeVec = float3(0.0, 0.0, 1.0);
float3 halfVec = normalize(lightVec + eyeVec);
1. BecauseLightVecisuniform,itismoreefficienttonormalizeitonceintheapplication
ratherthanonapervertexbasis.Itisdonehereforillustrativepurposes.
808-00504-0000-006 151
NVIDIA
A Brief Tutorial
Calculating the Vertex Color
Nowwehavetocalculatethevertexcolortooutput.
Calculating the Diffuse and Specular Lighting Contributions
Inthisexample,we’regoingtocalculatejustasimplecombinationofdiffuse
andspecularlighting:
HereweusetheCgStandardLibrarytoperformdotproducts(usingdot()).
WealsomakeuseoftheStandardLibrary’slit()functiontocalculatea
Blinnstylelightingvectorbasedonthepreviouslycomputeddotproducts.
Thereturnedvectorholdsthediffuselightingcontributioninthey
coordinate,andthespecularlightingcontributioninthezcoordinate.
RemembertotakeadvantageoftheStandardLibrarytohelpspeedupyour
developmentcycle.
Modulating the Diffuse and Specular Lighting Contributions
Oncethediffuseandspecularlightingcontributionslighting.yand
lighting.zhavebeencalculated,weneedtomodulatethemwiththe
object’smaterialproperties:
// calculate diffuse component
float diffuse = dot(normalVec, lightVec);
// calculate specular component
float specular = dot(normalVec, halfVec);
// Use the lit function to compute lighting vector from
// diffuse and specular values
float4 lighting = lit(diffuse, specular, 32);
// blue diffuse material
float3 diffuseMaterial = float3(0.0, 0.0, 1.0);
// white specular material
float3 specularMaterial = float3(1.0, 1.0, 1.0);
// combine diffuse and specular contributions and
// output final vertex color
OUT.Color.rgb = lighting.y * diffuseMaterial +
lighting.z * specularMaterial;
OUT.Color.a = 1.0;
return OUT;
152 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Wedefinetheobject’sdiffusematerialcolorasblue.Wemodulatethe
lightingcontributionswiththematerialpropertiestogetthefinalvertex
color,andweassignittotheoutputstructure’scolorfield,OUT.Color.
Finally,wesetthealphachannelofthefinalcolorto1.0,sothatourobject
willbeopaque,andreturnthecomputedpositionandcolorvaluesstoredin
theOUTstructure.
Further Experimentation
Usesimple.cgasaframeworktotrymoreadvancedexperiments,perhapsby
addingmoreparameterstotheprogramorbyperformingmorecomplex
calculationsinthevertexprogram.Havefunexperimenting!
808-00504-0000-006 153
NVIDIA
Advanced Profile Sample Shaders
ThischapterprovidesasetofadvancedprofilesampleshaderswritteninCg.
Eachshadercomeswithanaccompanyingsnapshot,description,andsource
code.
Examplesshownare
ImprovedSkinning
ImprovedWater
MeltingPaint
MultiPaint
RayTracedRefraction
Skin
ThinFilmEffect
CarPaint9
154 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Improved Skinning
Description
Thisshadertakesinasetofallthetransformationmatricesthatcanaffecta
particularbone.Eachbonealsosendsinalistofmatricesthataffectit.There
isthenasimpleloopthatforeachvertexgoesthrougheachbonethataffects
thatvertexandtransformsit.ThisallowsjustoneCgprogramtodothe
entireskinningforverticesaffectedbyanynumberofbones,insteadof
havingoneprogramforonebone,anotherprogramfortwobones,andsoon.
Fig. 5. Example of Improved Skinning
808-00504-0000-006 155
NVIDIA
Advanced Profile Sample Shaders
Vertex Shader Source Code for Improved Skinning
struct inputs
{
float4 position : POSITION;
float4 weights : BLENDWEIGHT;
float4 normal : NORMAL;
float4 matrixIndices : TESSFACTOR;
float4 numBones : SPECULAR;
};
struct outputs
{
float4 hPosition : POSITION;
float4 color : COLOR0;
};
outputs main(inputs IN,
uniform float4x4 modelViewProj,
uniform float3x4 boneMatrices[30],
uniform float4 color,
uniform float4 lightPos)
{
outputs OUT;
float4 index = IN.matrixIndices;
float4 weight = IN.weights;
float4 position;
float3 normal;
for (float i = 0; i < IN.numBones.x; i += 1) {
// transform the offset by bone i
position = position + weight.x *
float4(mul(boneMatrices[index.x], IN.position).xyz,
1.0);
// transform normal by bone i
normal = normal + weight.x *
mul((float3x3)boneMatrices[index.x],
IN.normal.xyz).xyz;
// shift over the index/weight variables; this moves
// the index and weight for the current bone into
// the .x component of the index and weight variables
156 808-00504-0000-006
NVIDIA
Cg Language Toolkit
index = index.yzwx;
weight = weight.yzwx;
}
normal = normalize(normal);
OUT.hPosition = mul(modelViewProj, position);
OUT.color = dot(normal, lightPos.xyz) * color;
return OUT;
}
808-00504-0000-006 157
NVIDIA
Advanced Profile Sample Shaders
Improved Water
Description
Thisdemogivestheappearancethattheviewerissurroundedbyalargegrid
ofvertices(becauseofthefreerotation),butswitchingtowireframeor
increasingthefrustumanglemakesitapparentthattheverticesareastatic
meshwiththeheight,normal,andtexturecoordinatesbeingcalculatedon
theflybasedonthedirectionandheightoftheviewer.Thistechniqueallows
forveryGPUfriendlywateranimationsbecausethestaticmeshcanbe
precomputed.Theverticesaredisplacedusingsinewaves,andinthis
examplealoopisusedtosumfivesinewavestoachieverealisticeffects.
Fig. 6. Example of Improved Water
158 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Vertex Shader Source Code for Improved Water
struct app2vert
{
float4 Position : POSITION;
};
struct vert2frag
{
float4 HPosition : POSITION;
float4 TexCoord0 : TEXCOORD0;
float4 TexCoord1 : TEXCOORD1;
float4 Color0 : COLOR0;
float4 Color1 : COLOR1;
};
void calcWave(out float disp, out float2 normal,
float dampening, float3 viewPosition,
float waveTime, float height,
float frequency, float2 waveDirection)
{
float distance1 = dot(viewPosition.xy, waveDirection);
distance1 = frequency * distance1 + waveTime;
disp = height * sin(distance1) / dampening;
normal = -cos(distance1) * height * frequency *
(waveDirection.xy) / (.4*dampening);
}
vert2frag main(
app2vert IN,
uniform float4x4 ModelViewProj,
uniform float4x4 ModelView,
uniform float4x4 ModelViewIT,
uniform float4x4 TextureMat,
uniform float Time,
uniform float4 Wave1,
uniform float4 Wave1Origin,
uniform float4 Wave2,
uniform float4 Wave2Origin,
const uniform float4 WaveData[5])
{
vert2frag OUT;
808-00504-0000-006 159
NVIDIA
Advanced Profile Sample Shaders
float4 position = float4(IN.Position.x, 0,
IN.Position.y,1);
float4 normal = float4(0,1,0,0);
float dampening = 1 + dot(position.xyz, position.xyz)/1000;
float i, disp;
float2 norm;
for (i = 0; i < 5; i = i + 1)
{
float waveTime = Time.x * WaveData[i].z;
float frequency = WaveData[i].z;
float height = WaveData[i].w;
float2 waveDir = WaveData[i].xy;
calcWave(disp, norm, dampening, IN.Position.xyz,
waveTime, height, frequency, waveDir);
position.y = position.y + disp;
normal.xz = normal.xz + norm;
}
OUT.HPosition = mul(ModelViewProj, position);
// transfom normal into eye-space
normal = mul(ModelViewIT, normal);
normal.xyz = normalize(normal.xyz);
// get a vector from the vertex to the eye
float3 eyeToVert = mul(ModelView, position).xyz;
eyeToVert = normalize(eyeToVert);
// calculate the reflected vector for cubemap look-up
float4 reflected = mul(TextureMat,
reflect(eyeToVert, normal.xyz).xyzz);
// output two reflection vectors for the two
// environment cubemaps
OUT.TexCoord0 = reflected;
OUT.TexCoord1 = reflected;
// Calculate a fresnel term (note that f0 = 0)
float fres = 1+dot(eyeToVert,normal.xyz);
fres = pow(fres, 5);
// set the two color coefficients (the magic constants
// are arbitrary), these two color coefficients are used
160 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Pixel Shader Source Code for Improved Water
// to calculate the contribution from each of the two
// environment cubemaps (one bright, one dark)
OUT.Color0 = (fres*1.4 + min(reflected.y,0)).xxxx +
float4(.2,.3,.3,0);
OUT.Color1 = (fres*1.26).xxxx;
return OUT;
}
float4 main(in float3 color0 : COLOR0,
in float3 color1 : COLOR1,
in float3 reflectVec : TEXCOORD0,
in float3 reflectVecDark : TEXCOORD1,
uniform samplerCUBE environmentMaps[2]
) : COLOR
{
float3 reflectColor = texCUBE(environmentMaps[0],
reflectVec).rgb;
float3 reflectColorDark = texCUBE(environmentMaps[1],
reflectVecDark).rgb;
float3 color = (reflectColor * color0) +
(reflectColorDark * color1);
return float4(color, 1.0);
}
808-00504-0000-006 161
NVIDIA
Advanced Profile Sample Shaders
Melting Paint
Description
Thisshaderusesanenvironmentmapwithprocedurallymodifiedtexture
lookupstocreateameltingeffectonthesurfacetexture(theNVIDIAlogoin
thisexample).Thereflectionvectorisshiftedusinganoisefunction,giving
theappearanceofabumpysurface.Thesurfacetexture’stexturecoordinates
areshiftedinatimedependentmanner,alsobasedonanoisetexture.
Fig. 7. Example of Melting Paint
Vertex Shader Source Code for Melting Paint
// define inputs from application
struct app2vert
{
float4 Position : POSITION;
float4 Normal : NORMAL;
162 808-00504-0000-006
NVIDIA
Cg Language Toolkit
float4 Color0 : COLOR0;
float4 TexCoord0 : TEXCOORD0;
};
struct vert2frag
{
float4 HPosition : POSITION;
float3 OPosition : TEXCOORD2;
float3 EPosition : TEXCOORD3;
float3 Normal : TEXCOORD1;
float3 TexCoord0 : TEXCOORD0;
float4 Color0 : COLOR0;
float3 LightPos : TEXCOORD4;
float3 ViewerPos : TEXCOORD5;
};
vert2frag main(app2vert In,
uniform float4x4 ModelViewProj,
uniform float4x4 ModelView,
uniform float4x4 ModelViewI,
uniform float4 ViewerPos,
uniform float4 LightPos)
{
vert2frag Out;
// Vertex positions:
// In clip space
Out.HPosition = mul(ModelViewProj, In.Position);
// In object space
Out.OPosition = In.Position.xyz;
// In eye space
Out.EPosition = mul(ModelView, In.Position).xyz;
Out.Normal = normalize(In.Normal.xyz);
// Copy the texture coordinates
Out.TexCoord0 = In.TexCoord0.xyz;
// Generate a white color
Out.Color0 = LightPos;
Out.LightPos = mul(ModelViewI, LightPos).xyz;
Out.ViewerPos = mul(ModelViewI, float4(0,0,0,1)).xyz;
return Out;
}
808-00504-0000-006 163
NVIDIA
Advanced Profile Sample Shaders
Pixel Shader Source Code for Melting Paint
struct vert2frag
{
float4 HPosition : POSITION;
float3 OPosition : TEXCOORD2;
float3 EPosition : TEXCOORD3;
float3 Normal : TEXCOORD1;
float3 TexCoord0 : TEXCOORD0;
float4 Color0 : COLOR0;
float3 LightPos : TEXCOORD4;
float3 ViewerPos : TEXCOORD5;
};
void calcLighting(out float diffuse, out float specular,
float3 normal, float3 fragPos, float3 lightPos,
float3 eyePos, float specularExp)
{
float3 light = lightPos - fragPos;
float len = length(light);
light = light / len;
float3 eye = normalize(eyePos - fragPos);
float3 halfVec = normalize(eyePos + light);
float attenuation = 1. / (.3 * len);
float4 lighting = lit(dot(light, normal),
dot(halfVec, normal), specularExp);
diffuse = lighting.y * attenuation;
specular = lighting.z * attenuation;
}
float4 main(vert2frag IN,
uniform float4 LightPos,
uniform sampler3D noise_map,
uniform sampler2D nv_map,
uniform samplerCUBE cube_map,
uniform float4 interpolate
) : COLOR
{
float diffuse, specular;
float3 biVariate = float3(IN.OPosition.x-IN.OPosition.z,
164 808-00504-0000-006
NVIDIA
Cg Language Toolkit
IN.OPosition.y+IN.OPosition.z, 0);
float3 uniVariate = float3(IN.OPosition.x+IN.OPosition.z,
0, 0);
float3 normal = normalize(IN.Normal);
float3 noiseTex = float3((IN.OPosition.x+IN.OPosition.z)*6,
IN.OPosition.y/2, 0);
float3 noiseSum = tex3D(noise_map, biVariate/3).rgb/12 +
tex3D(noise_map, noiseTex).rgb/18 +
tex3D(noise_map, biVariate*6).rgb/18;
normal = normalize(normal + noiseSum);
calcLighting(diffuse, specular, normal, IN.OPosition,
IN.LightPos, IN.ViewerPos, 32);
float3 nvShift = tex3D(noise_map, uniVariate/3).rgb / 2 +
tex3D(noise_map, uniVariate).rgb / 4 +
tex3D(noise_map, biVariate*3).rgb / 16;
nvShift.x = nvShift.x*nvShift.x * interpolate.x * 3;
nvShift.y = 0;
biVariate = float3(IN.OPosition.x - IN.OPosition.z,
IN.OPosition.y, 0);
float2 texCoord = biVariate.xy/4 + float2(1.1, .5) +
nvShift.yx + float2(0, interpolate.x/8);
float3 nvDecal =
tex2D(nv_map, float2(1-texCoord.x, texCoord.y)).rgb *
(1-interpolate.x * .7).xxx;
float3 eye = IN.ViewerPos - IN.OPosition;
float3 lightMetal = texCUBE(cube_map,
reflect(normal, eye)).rgb;
float3 darkMetal = (diffuse * float3(.5,.25,0) +
specular * float3(.7,.4,0));
float3 finalColor = lerp(lightMetal, darkMetal, nvDecal.x);
return float4(finalColor, 1);
}
808-00504-0000-006 165
NVIDIA
Advanced Profile Sample Shaders
MultiPaint
Description
MultiPaintpresentsasinglepasssolutiontoacommonproductionproblem:
mixingmultiplekindsofmaterialsonasinglepolygonalsurface.MultiPaint
providesasimpleBRDF(bidirectionalreflectancedistributionfunction)that
isstillcomplexenoughtorepresentmanycommonmetallicanddielectric
surfaces,andcontrolsallkeyfactorsofthevariableBRDFthroughtexturing.
Thispermitsyoutocreatemultiplematerialswithoutswitchingshaders,
splittingyourmodel,orresortingtomultiplepasses.
UsesforMultiPaintmightincludecomplexarmorbuiltofinlaidmetals,
woods,andstones—allmodeledonasingle,simplepolymesh;buildings
composedofmultipletypesofstone,glass,andmetal,expressedassimple
cubes;clothwithinlaidmetallicthreads;orasinthisdemo,metalpartially
coveredwithpeelingpaint.
UsingmultipleBRDFsiscommonintheofflineworld,butrarelyoptimized;
instead,twodifferentshadersmaybeevaluatedandtheirresultsblended
usingamasktextureorchainedthroughifstatements.Formaximumreal
timeperformance,MultiPaintinsteadintegratesallofthekeypartsofthe
BRDFsasmultiplepaintedtexturessothatonlyonepassthroughtheshader
isrequiredtocreatethemixedappearance.Thispermitsasinglepassshader
containingdiffuse,specular,andenvironmentallightingeffectsinacompact,
fastexecutingpackage.
Fig. 8. Example of MultiPaint
166 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Vertex Shader Source Code for MultiPaint
// define inputs from vertex buffer
struct appin
{
float4 Position : POSITION;
float4 UV : TEXCOORD0;
float4 Tangent : TEXCOORD1;
float4 Binormal : TEXCOORD2;
float4 Normal : TEXCOORD3;
};
// output -- same struct is the input to "cg_multipaint.cg"
struct MultiPaintV2F {
float4 HPosition : POSITION; // position (clip space)
float4 TexCoords : TEXCOORD0; // base ST coordinates
float3 OPosition : TEXCOORD1; // position (obj space)
float3 Normal : TEXCOORD2; // normal (eye space)
float3 VPosition : TEXCOORD3; // view pos (obj space)
float3 T : TEXCOORD4; // tangent (obj space)
float3 B : TEXCOORD5; // binormal (obj space)
float3 N : TEXCOORD6; // normal (obj space)
float4 LightVecO : TEXCOORD7; // light dir (obj space)
};
MultiPaintV2F main(appin IN,
uniform float4x4 ModelViewProj,
uniform float4x4 ModelViewIT,
uniform float4x4 ModelViewI,
uniform float4 TexRepeats,
uniform float4 LightVec) // (eye space)
{
MultiPaintV2F OUT;
OUT.HPosition = mul(ModelViewProj, IN.Position);
// pass through object-space position
OUT.OPosition = IN.Position.xyz;
// transform normal to eye space
OUT.Normal = normalize(mul(ModelViewIT, IN.Normal).xyz);
OUT.TexCoords = IN.UV * TexRepeats;
// pass through object-space normal, tangent, binormal.
808-00504-0000-006 167
NVIDIA
Advanced Profile Sample Shaders
Pixel Shader Source Code for MultiPaint
OUT.N = normalize(IN.Normal.xyz);
OUT.T = IN.Tangent.xyz;
OUT.B = IN.Binormal.xyz;
// transform view pos (origin) to obj space
OUT.VPosition = mul(ModelViewI, float4(0,0,0,1)).xyz;
// transform light vector to obj space
OUT.LightVecO = mul(ModelViewI, LightVec);
return OUT;
}
#define WHITE half4(1.0h,1.0h,1.0h,1.0h)
// input -- same struct is output from "cg_multipaintVP.cg"
struct MultiPaintV2F {
float4 HPosition : POSITION; // position (clip space)
float4 TexCoords : TEXCOORD0; // base ST coordinates
float3 OPosition : TEXCOORD1; // position (obj space)
float3 Normal : TEXCOORD2; // normal (eye space)
float3 VPosition : TEXCOORD3; // view pos (obj space)
float3 T : TEXCOORD4; // tangent (obj space)
float3 B : TEXCOORD5; // binormal (obj space)
float3 N : TEXCOORD6; // normal (obj space)
float4 LightVecO : TEXCOORD7; // light dir (obj space)
};
// channels in our material map:
#define SPEC_STR x
#define METALNESS y
#define NORM_SPEC_EXPON z
// subfields in "SpecData"
#define MINPOWER x
#define MAXPOWER y
#define MAXSPEC z
// subfields in "ReflData"
#define FRESNEL_MIN x
#define FRESNEL_MAX y
#define FRESNEL_EXPON z
#define REFL_STRENGTH w
168 808-00504-0000-006
NVIDIA
Cg Language Toolkit
// subfields in "BumpData"
#define BUMP_SCALE x
half4 main(MultiPaintV2F IN,
uniform sampler2D ColorMap, // color
uniform sampler2D MaterialMap, // see above
uniform sampler2D NormalMap, // tangent-space normals
uniform samplerCUBE EnvMap, // environment skybox
uniform float4 SpecData, // see above
uniform float4 ReflData, // see above
uniform float4 BumpData // see above
) : COLOR
{
half4 surfCol = tex2D(ColorMap, IN.TexCoords.xy);
half4 material = tex2D(MaterialMap, IN.TexCoords.xy);
half3 Nt = tex2D(NormalMap, IN.TexCoords.xy).rgb -
half3(0.5h,0.5h,0.5h);
// SpecData.MAXSPEC *should* range from 0 - 1.
half specStr = material.SPEC_STR * SpecData.MAXSPEC;
half specPower = SpecData.MINPOWER +
material.NORM_SPEC_EXPON *
(SpecData.MAXPOWER - SpecData.MINPOWER);
half3 Vn = -normalize(IN.VPosition - IN.OPosition);
half3 Ln = normalize(IN.LightVecO).xyz;
half3 Nb = normalize(BumpData.BUMP_SCALE *
(Nt.x*IN.T + Nt.y*IN.B) +
(Nt.z*IN.N));
half diff = dot(-Ln, Nb);
half3 Hn = -normalize(Vn + Ln);
half4 lighting = lit(diff, dot(Hn, Nb), specPower);
half4 diffResult = lighting.y * surfCol;
half4 specCol = lerp(WHITE, surfCol, material.METALNESS);
half4 specResult = lighting.z * specStr * specCol;
half3 reflVect = reflect(Vn, Nb);
half4 reflColor = texCUBE(EnvMap, reflVect);
half fakeFresnel = ReflData.FRESNEL_MIN +
ReflData.FRESNEL_MAX *
pow(saturate(1.0h-dot(-Vn,IN.N)),
ReflData.FRESNEL_EXPON);
808-00504-0000-006 169
NVIDIA
Advanced Profile Sample Shaders
half4 paintShine = fakeFresnel * reflColor;
half4 metalShine = surfCol * reflColor;
half4 shineCol = ReflData.REFL_STRENGTH *
lerp(paintShine, metalShine,
material.METALNESS);
half4 finalColor = specResult + diffResult + shineCol;
finalColor.w = 1.0h;
return finalColor;
}
170 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Ray-Traced Refraction
Description
Thisshaderpresentsamethodforaddinghighqualitydetailstosmall
objectsusingasinglebounce,raytracedpass.Inthisexample,thepolygonal
surfaceissampledandarefractionvectoriscalculated.Thisvectoristhen
intersectedwithaplanethatisdefinedasbeingperpendiculartotheobject’s
xaxis.Theintersectionpointiscalculatedandusedastextureindicesfora
paintediris.
Thedemopermitsvaryingtheindexofrefraction,thedepthanddensityof
thelens.Notethatthechoiceofgeometryisarbitrary—thissampleisa
sphere,butanypolygonalmodelcanbeused.
Fig. 9. Example of Ray-Traced Refraction
808-00504-0000-006 171
NVIDIA
Advanced Profile Sample Shaders
Vertex Shader Source Code for Ray-Traced Refraction
struct appin
{
float4 Position : POSITION;
float4 Normal : NORMAL;
};
// output -- same struct is the input to fragment shader
struct EyeV2F {
float4 HPosition : POSITION; // clip space pos
float3 OPosition : TEXCOORD0; // Obj-coords location
float3 VPosition : TEXCOORD1; // eye pos (obj space)
float3 N : TEXCOORD2; // normal (obj space)
float4 LightVecO : TEXCOORD3; // light dir (obj sp)
};
EyeV2F main(appin IN,
uniform float4x4 ModelViewProj,
uniform float4x4 ModelViewI,
uniform float4 LightVec) // in EYE coords
{
EyeV2F OUT;
// calculate clip space position for rasterizer use
OUT.HPosition = mul(ModelViewProj, IN.Position);
// pass through object space position
OUT.OPosition = IN.Position.xyz;
// object-space normal
OUT.N = normalize(IN.Normal.xyz);
// transform view pos and light vec to obj space
OUT.VPosition = mul(ModelViewI, float4(0,0,0,1)).xyz;
OUT.LightVecO = normalize(mul(ModelViewI, LightVec));
return OUT;
}
172 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Pixel Shader Source Code for Ray-Traced Refraction
// Assume ray direction is normalized.
// Vector "planeEq" is encoded half3(A,B,C,D) where
// (Ax+By+Cz+D)=0 and half3(A,B,C) has been normalized.
// Returns distance along to to intersection; distance is
// negative if no intersection.
half intersect_plane(half3 rayOrigin,half3 rayDir,
half4 planeEq) {
half3 planeN = planeEq.xyz;
half denominator = dot(planeN, rayDir);
half result = -1.0h;
// d==0 -> parallel || d>0 -> faces away
if (denominator < 0.0h) {
half top = dot(planeN,rayOrigin) + planeEq.w;
result = -top/denominator;
}
return result;
}
// subfields in "BallData"
#define RADIUS x
#define IRIS_DEPTH y
#define ETA z
#define LENS_DENSITY w
// subfields in "SpecData"
#define PHONG x
#define GLOSS1 y
#define GLOSS2 z
#define DROP w
struct EyeV2F {
float4 HPosition : POSITION;
float3 OPosition : TEXCOORD0;
float3 VPosition : TEXCOORD1;
float3 N : TEXCOORD2;
float4 LightVecO : TEXCOORD3;
};
half4 main(EyeV2F IN,
uniform sampler2D ColorMap, // color
// components: {radius,irisDepth,eta,lensDensity)
uniform float4 BallData,
808-00504-0000-006 173
NVIDIA
Advanced Profile Sample Shaders
// components: {phongExp,gloss1,gloss2,drop)
uniform float4 GlossData,
uniform float3 AmbiColor,
uniform float3 DiffColor,
uniform float3 SpecColor,
uniform float3 LensColor,
uniform float3 BgColor) : COLOR
{
const half3 baseTex = half3(1.0h,1.0h,1.0h);
const half GRADE = 0.05h;
const half3 yAxis = half3(0.0h,1.0h,0.0h);
const half3 xAxis = half3(1.0h,0.0h,0.0h);
const half3 ballCtr = half3(0.0h,0.0h,0.0h);
// (actually constants - could be done in VP or on CPU)
half irisSize = BallData.RADIUS *
sqrt(1.0h-BallData.IRIS_DEPTH * BallData.IRIS_DEPTH);
half irisScale = 0.3333h / max(0.01h, irisSize);
half irisDist = BallData.RADIUS * BallData.IRIS_DEPTH;
half3 pupilCenter = ballCtr + half3(irisDist,0.0h,0.0h);
// if x axis, returns simple -irisDist
half D = -dot(pupilCenter, xAxis);
half slice = IN.OPosition.x - irisDist;
half4 planeEquation = half4(xAxis, D);
// view vector TO surface
half3 Vn = normalize(IN.OPosition - IN.VPosition);
half3 Nf = normalize(IN.N);
half3 Ln = IN.LightVecO.xyz;
half3 DiffLight = DiffColor * saturate(dot(Nf, -Ln));
half3 missColor = AmbiColor + baseTex * DiffLight;
half3 DiffPupil = AmbiColor + saturate(dot(xAxis, -Ln));
half3 halfAng = normalize(-Ln - Vn);
half ndh = abs(dot(Nf,halfAng));
half spec1 = pow(ndh, GlossData.PHONG);
half s2 = smoothstep(GlossData.GLOSS1, GlossData.GLOSS2,
spec1);
spec1 = lerp(GlossData.DROP, spec1, s2);
half3 SpecularLight = SpecColor * spec1;
half3 hitColor = missColor;
if (slice >= 0.0h) {
half gradedEta = BallData.ETA;
174 808-00504-0000-006
NVIDIA
Cg Language Toolkit
gradedEta = 1.0h/gradedEta;
half3 faceColor = BgColor;
half3 refVector = refract(Vn, Nf, gradedEta);
if (dot(refVector, refVector) > 0) {
// now let's intersect with the iris plane
half irisT = intersect_plane(IN.OPosition, refVector,
planeEquation);
half fadeT = irisT * BallData.LENS_DENSITY;
fadeT = fadeT * fadeT;
faceColor = DiffPupil.xxx;
if (irisT > 0) {
half3 irisPoint = IN.OPosition + irisT*refVector;
half3 irisST = (irisScale*irisPoint) +
half3(0.0h, 0.5h, 0.5h);
faceColor = tex2D(ColorMap, irisST.yz).rgb;
}
faceColor = lerp(faceColor, LensColor, fadeT);
hitColor = lerp(missColor, faceColor,
smoothstep(0.0h, GRADE, slice));
}
}
hitColor = hitColor + SpecularLight;
return half4(hitColor, 1.0h);
}
808-00504-0000-006 175
NVIDIA
Advanced Profile Sample Shaders
Skin
Description
Thiseffectdemonstratessometechniquesforrenderingskinrangingfrom
simpleBlinnPhongBumpMappingtomorecomplexSubsurfaceScattering
lightingmodels.Italsoillustratestheuseof“Rim”lightingandsimple
translucencyforcapturingsomeofthemoresubtlepropertiesofskin
resultingfromcomplex,nonlocallightinginteractions.Finally,itshowshow
thevarioustechniquescanbecombinedtoproducecompelling,stylized
skin.
Fig. 10. Example of Skin
Pixel Shader Source Code for Skin
struct fragin
{
float2 texcoords : TEXCOORD0;
176 808-00504-0000-006
NVIDIA
Cg Language Toolkit
float4 shadowcoords : TEXCOORD1;
float4 tangentToEyeMat0 : TEXCOORD4;
float3 tangentToEyeMat1 : TEXCOORD5;
float3 tangentToEyeMat2 : TEXCOORD6;
float3 eyeSpacePosition : TEXCOORD7;
};
float3 hgphase( float3 v1, float3 v2, float3 g )
{
float costheta;
float3 g2;
float3 gtemp;
costheta = dot( -v1, v2 );
g2 = g*g;
gtemp = 1.0.xxx + g2 - 2.0*g*costheta;
gtemp = pow( gtemp, 1.5.xxx );
gtemp = (1.0.xxx - g2) / gtemp;
return gtemp;
}
// Computes the single-scattering approximation to
// scattering from a one-dimensional volumetric surface.
float3 singleScatter( float3 wi, float3 wo, float3 n,
float3 g, float3 albedo,
float thickness )
{
float win = abs(dot(wi,n));
float won = abs(dot(wo,n));
float eterm;
float3 result;
eterm = 1.0 - exp( (-((1./win)+(1./won))*thickness) );
result = eterm * (albedo * hgphase( wo, wi, g ) /
(win + won));
return result;
}
// i is the incident ray
// n is the surface normal
// eta is the ratio of indices of refraction
// r is the reflected ray
// t is the transmitted ray
808-00504-0000-006 177
NVIDIA
Advanced Profile Sample Shaders
float fresnel( float3 i, float3 n, float eta,
out float3 r, out float3 t )
{
float result;
float c1;
float cs2;
float tflag;
// Refraction vector courtesy Paul Heckbert.
c1 = dot(-i,n);
cs2 = 1.0-eta*eta*(1.0-c1*c1);
tflag = (float) (cs2 >= 0.0);
t = tflag * (((eta*c1-sqrt(cs2))*n) + eta*i);
// t is already unit length or (0,0,0)
// Compute Fresnel terms
// (From Global Illumination Compendeum.)
float ndott;
float cosr_div_cosi;
float cosi_div_cosr;
float fs;
float fp;
float kr;
ndott = dot(-n,t);
cosr_div_cosi = ndott / c1;
cosi_div_cosr = c1 / ndott;
fs = (cosr_div_cosi - eta) / (cosr_div_cosi + eta);
fs = fs * fs;
fp = (cosi_div_cosr - eta) / (cosi_div_cosr + eta);
fp = fp * fp;
kr = 0.5 * (fs+fp);
result = tflag*kr + (1.-tflag);
r = reflect( i, n );
return result;
}
float4 main( fragin In,
uniform sampler2D tex0,
uniform sampler2D tex1,
uniform sampler2D tex2,
uniform sampler2D tex3,
uniform float3 eyeSpaceLightPosition,
uniform float thickness,
178 808-00504-0000-006
NVIDIA
Cg Language Toolkit
uniform float4 ambient ) : COLOR
{
float bscale = In.tangentToEyeMat0.w;
float eta = (1.0/1.4);
// ratio of indices of refraction (air/skin)
float m = 34.; // specular exponent
float4 lightColor = { 1, 1, 1, 1 }; // light color
float4 sheenColor = { 1, 1, 1, 1 }; // sheen color
float4 skinColor = tex2D( tex1, In.texcoords );
float3 g = { 0.8, 0.3, 0.0 };
float3 albedo = { 0.8, 0.5, 0.4 };
// oiliness mask
float4 oiliness = 0.9 * tex2D( tex2, In.texcoords);
// Get eye-space eye vector.
float3 v = normalize( -In.eyeSpacePosition );
// Get eye-space light and halfangle vectors.
float3 l = normalize( eyeSpaceLightPosition -
In.eyeSpacePosition );
float3 h = normalize( v + l );
// Get tangent-space normal vector from normal map.
float3 tangentSpaceNormal = tex2D(tex0, In.texcoords).rgb;
float3 bumpscale = { bscale, bscale, 1.0 };
tangentSpaceNormal = tangentSpaceNormal * bumpscale;
// Transform it into eye-space.
float3 n;
n[0] = dot( In.tangentToEyeMat0.xyz, tangentSpaceNormal );
n[1] = dot( In.tangentToEyeMat1, tangentSpaceNormal );
n[2] = dot( In.tangentToEyeMat2, tangentSpaceNormal );
n = normalize( n );
// Compute the lighting equation.
float ndotl = max( dot(n,l), 0 ); // clamp 0 to 1
float ndoth = max( dot(n,h), 0 ); // clamp 0 to 1
float flag = (float)(ndotl > 0);
// Compute oil, sheen, subsurf scattering contributions.
float4 oil;
float4 sheen;
808-00504-0000-006 179
NVIDIA
Advanced Profile Sample Shaders
float4 subsurf;
float Kr, Kr2;
float Kt, Kt2;
float3 T, T2;
float3 R, R2;
// Compute fresnel at sheen layer, ramp it up a bit.
Kr = fresnel( -v, n, eta, R, T );
Kr = smoothstep( 0.0, 0.5, Kr );
Kt = 1.0 - Kr;
// Compute the refracted light ray and the refraction
// coefficient.
Kr2 = fresnel( -l, n, eta, R2, T2 );
Kr2 = smoothstep( 0.0, 0.5, Kr2 );
Kt2 = 1.0 - Kr2;
// For oil contribution, modulate the oiliness mask by a
// specular term.
oil = 0.5 * oiliness * pow( ndoth, m );
// For sheen contribution, modulate Fresnel term by
// sheen color times specular. Modulate by additional
// diffuse term to soften it a bit.
sheen = 2.5*Kr*sheenColor*(ndotl*(0.2 + pow( ndoth, m)));
// Compute single scattering approximation to subsurface
// scattering. Here we compute 3 scattering terms
// simultaneously and the results end up in the x,y,z
// components of a float3. Using 3 terms approximates
// distribution of multiply-scattered light. For
// details see: Matt Pharr’s SIGGRAPH 2001 RenderMan
// course notes “Layered Media for Surface Shaders”.
float3 temp = singleScatter( T2, T, n, g, albedo,
thickness );
subsurf = 2.5 * skinColor * ndotl * Kt * Kt2 *
(temp.x+temp.y+temp.z);
// Add contributions from oil, sheen, and subsurface
// scattering and modulate by light color and result
// of a shadow map lookup.
return lightColor*tex2Dproj( tex3, In.shadowcoords ).r *
(oil + sheen + subsurf);
}
180 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Thin Film Effect
Description
Thisdemoshowsathinfilminterferenceeffect.Specularanddiffuse
lightingarecomputedpervertexinaCgprogram,alongwithaviewdepth
parameter,whichiscomputedusingtheviewvector,surfacenormal,and
thedepthofthethinfilmonthesurfaceoftheobject.Theviewdepthisthen
perturbedinanadhocmannerperfragmentbytheunderlyingdecal
texture,andisthenusedtolookupintoa1Dtexturecontainingthe
precomputeddestructiveinterferenceforred/green/bluewavelengths
givenaparticularviewdepth.Thisinterferencevalueisthenusedto
modulatethespecularlightingcomponentofthestandardlightingequation.
Fig. 11. Example of Thin Film Effect
Vertex Shader Source Code for Thin Film Effect
// define inputs from application
struct a2v
{
float4 Position : POSITION;
808-00504-0000-006 181
NVIDIA
Advanced Profile Sample Shaders
float3 Normal : NORMAL;
};
// define outputs from vertex shader
struct v2f
{
float4 HPOS : POSITION;
float4 diffCol : COLOR0;
float4 specCol : COLOR1;
float2 filmDepth : TEXCOORD0;
};
v2f main(a2v IN,
uniform float4x4 WorldViewProj,
uniform float4x4 WorldViewIT,
uniform float4x4 WorldView,
uniform float4 LightVector,
uniform float4 FilmDepth,
uniform float4 EyeVector)
{
v2f OUT;
//transform position to clip space
OUT.HPOS = mul(WorldViewProj, IN.Position);
float4 tempnorm = float4(IN.Normal, 0.0);
// transform normal from model-space to view-space
float3 normalVec = mul(WorldViewIT, tempnorm).xyz;
normalVec = normalize(normalVec);
// compute the eye->vertex vector
float3 eyeVec = EyeVector.xyz;
// compute the view depth for the thin film
float viewdepth = (1.0 / dot(normalVec, eyeVec)) *
FilmDepth.x;
OUT.filmDepth = viewdepth.xx;
// store normalized light vector
float3 lightVec = normalize((float3)LightVector);
// calculate half angle vector
float3 halfAngleVec = normalize(lightVec + eyeVec);
182 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Pixel Shader Source Code for Thin Film Effect
// calculate diffuse component
float diffuse = dot(normalVec, lightVec);
// calculate specular component
float specular = dot(normalVec, halfAngleVec);
// use the lit instruction to calculate lighting,
// automatically clamp
float4 lighting = lit(diffuse, specular, 32);
// output final lighting results
OUT.diffCol = (float4)lighting.y;
OUT.specCol = (float4)lighting.z;
return OUT;
}
struct v2f
{
float3 diffCol : COLOR0;
float3 specCol : COLOR1;
float2 filmDepth : TEXCOORD0;
};
void main( v2f IN,
out float4 color : COLOR,
uniform sampler2D fringeMap,
uniform sampler2D diffMap)
{
// diffuse material color
float3 diffCol = float3(0.3, 0.3, 0.5);
// lookup fringe value based on view depth
float3 fringeCol = (float3)tex2D(fringeMap, IN.filmDepth);
// modulate specular lighting by fringe color,
// combine with regular lighting
color.rgb = fringeCol*IN.specCol + IN.diffCol*diffCol;
color.a = 1.0;
}
808-00504-0000-006 183
NVIDIA
Advanced Profile Sample Shaders
Car Paint 9
Description
Thiscarpaintshaderusesgonioreflectometricpaintsamplesmeasuredby
CornellUniversity.Thesampleswereconvertedintoa2Dtexturemapwhich
isindexedusingNdotLandNdotHasthe(s,t)coordinatepair,andwhich
providesthediffusecomponentofourlightingequation.Thespecularterm
iscalculatedusingtheBlinnmodel,andalsoincludesatermwhichsimulates
theclearcoat’smetallicflecks.
Theflecknormalmipmapchainhasrandomlygeneratedvectorswhich
residewithinapositiveZconeintangentspace.Theconeisreduced
graduallyateverylevelsuchthatinthedistancetheflecksarepointing
mostlyup.Theflecks’specularpowerandtheircontributionarereducedby
distance,togiveitagrainierappearanceupcloseandamoreuniform
appearancefromafar.Next,theviewvectorisreflectedoffawavynormal
map—whichrepresentstheobject’snaturalundulations—toindexintothe
environmentmap.Theshininessoftheclearcoatitselfiscalculatedby
scalingtheFresneltermbytheluminanceoftheenvironmentmap.(The
luminancetransferfunctionselectsonlytheperceptuallybrightareasofthe
environmentmapinordernottoreflectthedarkerareasofthescene.)
Finally,theshaderlerpsbetweenthediffusepaintcolorandthereflection
basedontheFresnelterm,andaddsthespecularhighlights.
Fig. 12. Example of Car Paint 9
184 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Vertex Shader Source Code for Car Paint 9
// This shader is based on the Time Machine temporal rust
// shader. Car paint data was measured by Cornell
// University from samples provided by Ford Motor Company.
struct a2v {
float4 OPosition : POSITION;
float3 ONormal : NORMAL;
float2 uv : TEXCOORD0;
float3 Tangent : TEXCOORD1;
float3 Binormal : TEXCOORD2;
float3 Normal : TEXCOORD3;
};
struct VS_OUTPUT {
float4 HPosition : POSITION; // coord position in window
float2 uv : TEXCOORD0; // wavy/fleckmap coords
float3 light : TEXCOORD1; // light pos (tangent space)
float4 halfangle : TEXCOORD2; // Blinn halfangle
float3 reflection: TEXCOORD3; // Refl vector (per-vertex)
float4 view : TEXCOORD4; // view (tangent space)
float3 tangent : TEXCOORD5; // view-tangent matrix
float3 binormal : TEXCOORD6; // ...
float3 normal : TEXCOORD7; // ...
float fresn : COLOR0;
};
VS_OUTPUT main( a2v vert,
// TRANSFORMATIONS
uniform float4x4 ModelView,
uniform float4x4 ModelViewIT,
uniform float4x4 ModelViewProj,
uniform float3 LightVector, // Obj space
uniform float3 EyePosition ) // Obj space
{
VS_OUTPUT O;
// Generate homogeneous POSITION
O.HPosition = mul(ModelViewProj, vert.OPosition);
// Generate BASIS matrix
float3x3 ModelTangent = { normalize(vert.Tangent),
normalize(vert.Binormal),
normalize(vert.Normal) };
808-00504-0000-006 185
NVIDIA
Advanced Profile Sample Shaders
// FRESNEL = { OFFSET, SCALE, POWER, UNUSED };
float4 Fresnel = { 0.1f, 4.2f, 4.4f, 0.0f };
float3x3 ViewTangent = mul(ModelTangent,
(float3x3)ModelViewIT);
// Generate VIEW SPACE vectors
float3 viewN = normalize(mul((float3x3)ModelView,
vert.ONormal));
float4 viewP = mul(ModelView, vert.OPosition);
viewP.w = 1-saturate(sqrt(dot(viewP.xyz,
viewP.xyz))*0.01);
float3 viewV = -viewP.xyz;
// Generate OBJECT SPACE vectors
float3 objV = normalize(EyePosition-vert.OPosition.xyz);
float3 objL = normalize(LightVector);
float3 objH = normalize(objL + objV);
// Generate TANGENT SPACE vectors
float3 tanL = mul(ModelTangent, objL);
float3 tanV = mul(ModelTangent, objV);
float3 tanH = mul(ModelTangent, objH);
// Generate REFLECTION vector for per-vertex
// reflection look-up
float3 reflection = reflect(-viewV, viewN);
// Generate FRESNEL term
float ndv = saturate(dot(viewN, viewV));
float FresnelApprox = (pow((1-ndv),Fresnel.z)*Fresnel.y +
Fresnel.x);
// Fill OUTPUT parameters
O.uv.xy = vert.uv; // TEXCOORD0.xy
O.light = tanL; // Tangent space LIGHT
// Tangent space HALF-ANGLE
O.halfangle = float4(tanH.x, tanH.y,
tanH.z, 1-exp(-viewP.w));
O.reflection = reflection; // View space REFLECTION
// Tangent space VIEW + distance attenuation
O.view = float4(tanV.x, tanV.y,
tanV.z, viewP.w);
186 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Pixel Shader Source Code for Car Paint 9
// VIEWTANGENT
O.tangent = normalize(ViewTangent[0]); // column 0
O.binormal = normalize(ViewTangent[1]); // column 1
O.normal = normalize(ViewTangent[2]); // column 2
O.fresn = FresnelApprox;
return O;
}
// This shader is based on the Time Machine temporal rust
// shader. Car paint data was measured by Cornell
// University from samples provided by Ford Motor Company.
//
struct VS_OUTPUT {
float4 HPosition : POSITION; // coord position in window
float2 uv : TEXCOORD0; // wavy/fleckmap coords
float3 light : TEXCOORD1; // light pos (tangent space)
float4 halfangle : TEXCOORD2; // Blinn halfangle
float3 reflection: TEXCOORD3; // Refl vector (per-vertex)
float4 view : TEXCOORD4; // view (tangent space)
float3 tangent : TEXCOORD5; // view-tangent matrix
float3 binormal : TEXCOORD6; // ...
float3 normal : TEXCOORD7; // ...
float fresn : COLOR0;
};
// PIXEL SHADER
float4 main( VS_OUTPUT vert,
uniform sampler2D WavyMap : register(s0),
uniform samplerCUBE EnvironmentMap : register(s1),
uniform sampler2D PaintMap : register(s2),
uniform sampler2D FleckMap : register(s3),
uniform float Ambient ) : COLOR
{
// NEWPAINTSPEC = { UNUSED, SPEC POWER, GLOSSINESS,
// FLECK SPEC POWER }
float4 NewPaintSpec = { 0.0f, 64.0f, 3.8f, 8.0f };
float3 ClearCoat = { 0.299f,0.587f, 0.114f };
float3 FleckColor = { 0.9, 1.05, 1.0 };
float3 WavyScale = { 0.2, -0.2, 1.0 };
808-00504-0000-006 187
NVIDIA
Advanced Profile Sample Shaders
// Tangent space LIGHT vector
float3 L = normalize(vert.light);
// Tangent space HALF-ANGLE vector
float3 H = normalize(vert.halfangle.xyz);
// Tangent space VIEW vector
float3 V = normalize(vert.view.xyz);
float v_dist = vert.view.w;
// Tangent space WAVY_NORMAL
float3 wavyN = (float3)tex2D(WavyMap, vert.uv)*2-1;
wavyN = normalize(wavyN*WavyScale);
// PAINT
// A normal map map could be loaded here instead if
// we wanted more detail. In this case we have a
// uniform tangent space normal (0,0,1)
float n_d_l = L.z;
float n_d_h = H.z;
float3 paint_color = (float3)tex2D(PaintMap,
float2(n_d_l, n_d_h));
// SPECULAR POWER - use a saturated diffuse term
// to clamp the backlighting
n_d_h = saturate(n_d_l*4)*pow(n_d_h, NewPaintSpec.y);
// REFLECTION ENVIRONMENT
// Reflect view vector about wavy normal and bring
// to view space
float3 R = reflect(-V, wavyN);
R = R.x*vert.tangent + R.y*vert.binormal +
R.z*vert.normal;
float3 reflect_color = (float3)texCUBE(EnvironmentMap, R);
// FLECKS
// Load random 3-vector flecks from fleck_map
// Reduce tiling artifacts by sampling at
// different frequencies
float3 fleckN = (float3)tex2D(FleckMap, vert.uv*37)*2-1;
fleckN = ((float3)tex2D(FleckMap, vert.uv*23)*2-1)/2 +
fleckN/2;
float fleck_n_d_h = saturate(dot(fleckN, H));
float3 fleck_color = FleckColor * pow(fleck_n_d_h,
188 808-00504-0000-006
NVIDIA
Cg Language Toolkit
lerp(NewPaintSpec.y, NewPaintSpec.w, v_dist));
// Control the ambient fleckiness and also
// attenuate with distance
fleck_color = fleck_color*Ambient*vert.halfangle.w;
// DIFFUSE
float k_d = saturate(n_d_l*1.2);
float3 paintResult = lerp(Ambient*paint_color,
paint_color, k_d);
// FRESNEL
float Fresnel = saturate(dot(ClearCoat, reflect_color));
Fresnel = pow(Fresnel, NewPaintSpec.z);
// This helps make the clear coat less omnipresent --
// only the really (perceptually) bright areas reflect
// the most.
Fresnel = saturate(vert.fresn*Fresnel);
// Show more of the specular reflection environment
// when in fresnel zones
// diffuse * (1-fresnel) + environment * (fresnel)
paintResult = lerp(paintResult, reflect_color, Fresnel);
// SPECULAR
// diffuse + specular + flecks
paintResult = paintResult + n_d_h + fleck_color;
// OUTPUT
return paintResult.xyzz;
}
808-00504-0000-006 189
NVIDIA
Basic Profile Sample Shaders
ThischapterprovidesasetofbasicprofilesampleshaderswritteninCg.
Eachshadercomeswithanaccompanyingsnapshot,description,andsource
code.
Examplesshownare:
AnisotropicLighting
BumpDot3x2DiffuseandSpecular
BumpReflectionMapping
Fresnel
Grass
Refraction
ShadowMapping
ShadowVolumeExtrusion
SineWaveDemo
MatrixPaletteSkinning
190 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Anisotropic Lighting
Description
Theanisotropiclightingeffect(Fig. 13.)showsthevertexprogram’shalf
anglevectorcalculation.ItusesHdotNandLdotNpervertextolookupintoa
2Dtexturetoachieveinterestinglightingeffects.
Fig. 13. Example of Anisotropic Lighting
808-00504-0000-006 191
NVIDIA
Basic Profile Sample Shaders
Vertex Shader Source Code for Anisotropic Lighting
struct appdata {
float3 Position : POSITION;
float3 Normal : NORMAL;
};
struct vpconn {
float4 Hposition : POSITION;
float4 TexCoord0 : TEXCOORD0;
};
vpconn main(appdata IN,
uniform float4x4 WorldViewProj,
uniform float3x3 WorldIT,
uniform float3x4 World,
uniform float3 LightVec,
uniform float3 EyePos)
{
vpconn OUT;
float3 worldNormal = normalize(mul(WorldIT, IN.Normal));
//build float4
float4 tempPos;
tempPos.xyz = IN.Position.xyz;
tempPos.w = 1.0;
//compute world space position
float3 worldSpacePos = mul(World, tempPos);
//vector from vertex to eye, normalized
float3 vertToEye = normalize(EyePos - worldSpacePos);
//h = normalize(l + e)
float3 halfAngle = normalize(vertToEye + LightVec);
OUT.TexCoord0.x = max(dot(LightVec,worldNormal),0.0);
OUT.TexCoord0.y = max(dot(halfAngle,worldNormal),0.0);
// transform into homogeneous-clip space
OUT.Hposition = mul(WorldViewProj, tempPos);
return OUT;
}
192 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Bump Dot3x2 Diffuse and Specular
Description
Thebumpdot3x2diffuseandspeculareffectmixesbumpmappingwith
diffuseandspecularlightingbasedonthetexm3x2texDirectX8pixel
shaderinstruction(DOT_PRODUCT_TEXTURE_2DinOpenGL).This
instructioncomputesthedotproductofthenormalandthelightvector,
correspondingtothediffuselightcomponent,andthedotproductofthe
normalandthehalfanglevector,correspondingtothespecularlight
component.Thisresultsintotwoscalarvaluesthatareusedastexture
coordinatestolookupa2Dilluminationtexturecontainingthediffusecolor
andthespecularterminitsalphacomponent.Sincethenormalfetchedfrom
thenormalmapisintangentspace,boththelightvectorandthehalfangle
vectoraretransformedtothisspacebythevertexshader(Fig. 14.).
Fig. 14. Example of Bump Dot3x2 Diffuse and Specular
808-00504-0000-006 193
NVIDIA
Basic Profile Sample Shaders
Vertex Shader Source Code for Bump Dot3x2
struct a2v {
float4 Position : POSITION; //in object space
float3 Normal : NORMAL; //in object space
float2 TexCoord : TEXCOORD0;
float3 T : TEXCOORD1; //in object space
float3 B : TEXCOORD2; //in object space
float3 N : TEXCOORD3; //in object space
};
struct v2f {
float4 Position : POSITION; //in projection space
float4 Normal : COLOR0; //in tangent space
float4 LightVectorUnsigned : COLOR1; //in tangent space
float3 TexCoord0 : TEXCOORD0;
float3 TexCoord1 : TEXCOORD1;
float4 LightVector : TEXCOORD2; //in tangent space
float4 HalfAngleVector : TEXCOORD3; //in tangent space
};
v2f main(a2v IN,
uniform float4x4 WorldViewProj,
uniform float4 LightVector, //in object space
uniform float4 EyePosition //in object space
)
{
v2f OUT;
// pass texture coordinates for
// fetching the diffuse map
OUT.TexCoord0.xy = IN.TexCoord.xy;
// pass texture coordinates for
// fetching the normal map
OUT.TexCoord1.xy = IN.TexCoord.xy;
// compute the 3x3 transform from
// tangent space to object space
float3x3 objToTangentSpace;
objToTangentSpace[0] = IN.T;
objToTangentSpace[1] = IN.B;
objToTangentSpace[2] = IN.N;
// transform normal from
194 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Pixel Shader Source Code for Bump Dot3x2
// object space to tangent space
OUT.Normal.xyz = 0.5 * mul(objToTangentSpace, IN.Normal) +
0.5;
// transform light vector from
// object space to tangent space
float3 lightVectorInTangentSpace =
mul(objToTangentSpace, LightVector.xyz);
OUT.LightVector.xyz = lightVectorInTangentSpace;
OUT.LightVectorUnsigned.xyz = 0.5 *
lightVectorInTangentSpace + 0.5;
// compute view vector
float3 viewVector =
normalize(EyePosition.xyz - IN.Position.xyz);
// compute half angle vector
float3 halfAngleVector =
normalize(LightVector.xyz + viewVector);
// transform half-angle vector from
// object space to tangent space
OUT.HalfAngleVector.xyz =
mul(objToTangentSpace, halfAngleVector);
// transform position to projection space
OUT.Position = mul(WorldViewProj, IN.Position);
return OUT;
}
struct v2f {
float4 Position : POSITION; //in projection space
float4 Normal : COLOR0; //in tangent space
float4 LightVectorUnsigned : COLOR1; //in tangent space
float3 TexCoord0 : TEXCOORD0;
float3 TexCoord1 : TEXCOORD1;
float4 LightVector : TEXCOORD2; //in tangent space
float4 HalfAngleVector : TEXCOORD3; //in tangent space
};
float4 main(v2f IN,
uniform sampler2D DiffuseMap,
808-00504-0000-006 195
NVIDIA
Basic Profile Sample Shaders
uniform sampler2D NormalMap,
uniform sampler2D IlluminationMap,
uniform float Ambient) : COLOR
{
// fetch base color
float4 color = tex2D(DiffuseMap, IN.TexCoord0.xy);
// fetch bump normal and expand it to [-1,1]
float4 bumpNormal = 2 *
(tex2D(NormalMap, IN.TexCoord1.xy) - 0.5);
// compute the dot product between
// the bump normal and the light vector,
// compute the dot product between
// the bump normal and the half angle vector,
// fetch the illumination map using
// the result of the two previous dot products
// as texture coordinates
// returns the diffuse color in the
// color components and the specular color in the
// alpha component
float2 illumCoord =
float2(dot(IN.LightVector.xyz, bumpNormal.xyz),
dot(IN.HalfAngleVector.xyz, bumpNormal.xyz));
float4 illumination = tex2D(IlluminationMap, illumCoord);
// expand iterated normal to [-1,1]
float4 normal = 2 * (IN.Normal - 0.5);
// compute self-shadowing term
float shadow = saturate(4 * dot(normal.xyz,
IN.LightVectorUnsigned.xyz));
// compute final color
return (Ambient * color + shadow)
* (illumination * color + illumination.wwww);
}
196 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Bump-Reflection Mapping
Description
Thiseffectmixesbumpmappingandreflectionmappingbasedonthe
texm3x3vspecDirectX8pixelshaderinstruction
(DOT_PRODUCT_REFLECT_CUBE_MAPinOpenGL).Thisinstruction
computesthreedotproductstotransformthenormalfetchedfromthe
normalmapintotheenvironmentcubespace,reflectsthetransformed
normalwithrespecttotheeyevectorandfetchesacubemaptogetthefinal
color.Thevertexshaderisresponsibleforcomputingthetransformmatrix
andtheeyevector(Fig. 15.).
Fig. 15. Example of Bump-Reflection Mapping
808-00504-0000-006 197
NVIDIA
Basic Profile Sample Shaders
Vertex Shader Source Code for Bump-Reflection Mapping
struct a2v {
float4 Position : POSITION; // in object space
float2 TexCoord : TEXCOORD0;
float3 T : TEXCOORD1; // in object space
float3 B : TEXCOORD2; // in object space
float3 N : TEXCOORD3; // in object space
};
struct v2f {
float4 Position : POSITION; // in projection space
float4 TexCoord : TEXCOORD0;
// first row of the 3x3 transform
// from tangent to cube space
float4 TangentToCubeSpace0 : TEXCOORD1;
// second row of the 3x3 transform
// from tangent to cube space
float4 TangentToCubeSpace1 : TEXCOORD2;
// third row of the 3x3 transform
// from tangent to cube space
float4 TangentToCubeSpace2 : TEXCOORD3;
};
v2f main(a2v IN,
uniform float4x4 WorldViewProj,
uniform float3x4 ObjToCubeSpace,
uniform float3 EyePosition, // in cube space
uniform float BumpScale)
{
v2f OUT;
// pass texture coordinates for
// fetching the normal map
OUT.TexCoord.xy = IN.TexCoord.xy;
// compute 3x3 transform from tangent to object space
float3x3 objToTangentSpace;
// first rows are the tangent and binormal
// scaled by the bump scale
objToTangentSpace[0] = BumpScale * IN.T;
198 808-00504-0000-006
NVIDIA
Cg Language Toolkit
objToTangentSpace[1] = BumpScale * IN.B;
objToTangentSpace[2] = IN.N;
// compute the 3x3 transform from
// tangent space to cube space:
// TangentToCubeSpace
// = object2cube * tangent2object
// = object2cube * transpose(objToTangentSpace)
// (since the inverse of a rotation is its transpose)
//
// So a row of TangentToCubeSpace is the transform by
// objToTangentSpace of the corresponding row of
// ObjToCubeSpace
OUT.TangentToCubeSpace0.xyz =
mul(objToTangentSpace, ObjToCubeSpace[0].xyz);
OUT.TangentToCubeSpace1.xyz =
mul(objToTangentSpace, ObjToCubeSpace[1].xyz);
OUT.TangentToCubeSpace2.xyz =
mul(objToTangentSpace, ObjToCubeSpace[2].xyz);
// compute the eye vector
// (going from eye to shaded point) in cube space
float3 eyeVector = mul(ObjToCubeSpace, IN.Position) -
EyePosition;
OUT.TangentToCubeSpace0.w = eyeVector.x;
OUT.TangentToCubeSpace1.w = eyeVector.y;
OUT.TangentToCubeSpace2.w = eyeVector.z;
// transform position to projection space
OUT.Position = mul(WorldViewProj, IN.Position);
return OUT;
}
808-00504-0000-006 199
NVIDIA
Basic Profile Sample Shaders
Pixel Shader Source Code for Bump and Reflection Mapping
struct v2f {
float4 Position : POSITION; //in projection space
float4 TexCoord : TEXCOORD0;
// first row of the 3x3 transform
// from tangent to cube space
float4 TangentToCubeSpace0 : TEXCOORD1;
// second row of the 3x3 transform
// from tangent to cube space
float4 TangentToCubeSpace1 : TEXCOORD2;
// third row of the 3x3 transform
// from tangent to cube space
float4 TangentToCubeSpace2 : TEXCOORD3;
};
float4 main(v2f IN,
uniform sampler2D NormalMap,
uniform samplerCUBE EnvironmentMap,
uniform float3 EyeVector) : COLOR
{
// fetch the bump normal from the normal map
float4 normal = tex2D(NormalMap, IN.TexCoord.xy);
// transform the bump normal into cube space
// then use the transformed normal and eye vector
// to compute the reflection vector that is
// used to fetch the cube map
return texCUBE_reflect_eye_dp3x3(EnvironmentMap,
IN.TangentToCubeSpace2.xyz,
IN.TangentToCubeSpace0,
IN.TangentToCubeSpace1,
normal,
EyeVector);
}
200 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Fresnel
Description
Thiseffectcomputesareflectionvectortolookupintoanenvironmentmap
forreflections,andmodulatesthisbyaFresnelterm.Theresultisreflections
onlyatgrazingangles(Fig. 16.).
Fig. 16. Example of Fresnel
Vertex Shader Source Code for Fresnel
struct app2vert
{
float4 Position : POSITION;
float4 Normal : NORMAL;
float4 TexCoord0 : TEXCOORD0;
};
808-00504-0000-006 201
NVIDIA
Basic Profile Sample Shaders
struct vert2frag
{
float4 HPosition : POSITION;
float4 Color0 : COLOR0;
float4 TexCoord0 : TEXCOORD0;
};
vert2frag main(app2vert IN,
uniform float4x4 ModelViewProj,
uniform float4x4 ModelView,
uniform float4x4 ModelViewIT)
{
vert2frag OUT;
#ifdef PROFILE_ARBVP1
ModelViewProj = glstate.matrix.mvp;
ModelView = glstate.matrix.modelview[0];
ModelViewIT = glstate.matrix.invtrans.modelview[0];
#endif
OUT.HPosition = mul(ModelViewProj, IN.Position);
float3 normal = normalize(mul(ModelViewIT,
IN.Normal).xyz);
float3 eyeToVert = normalize(mul(ModelView,
IN.Position).xyz);
// reflect the eye vector across the normal vector
// for reflection
OUT.TexCoord0 = float4(reflect(eyeToVert, normal), 1.0);
float f0 = .1;
// compute the fresnel term
float oneMCosAngle = 1+dot(eyeToVert,normal);
oneMCosAngle = pow(oneMCosAngle, 5);
OUT.Color0 = lerp(oneMCosAngle, 1, f0).xxxx;
return OUT;
}
202 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Grass
Description
ThiseffectshowsproceduralanimationofgeometryusingaSinefunction,
alongwithcalculationofanormalfortheprocedurallydeformedgeometry
(Fig. 17.).
Fig. 17. Example of Grass
Vertex Shader Source Code for Grass
struct app2vert {
float4 Position : POSITION;
float4 Normal : NORMAL;
808-00504-0000-006 203
NVIDIA
Basic Profile Sample Shaders
float4 TexCoord0 : TEXCOORD0;
float4 Color0 : COLOR0;
};
struct vertout {
float4 Hposition : POSITION;
float4 Color0 : COLOR0;
float4 TexCoord0 : TEXCOORD0;
};
vertout main(app2vert IN,
uniform float4x4 ModelViewProj,
uniform float4x4 ModelView,
uniform float4x4 ModelViewIT,
uniform float4 Constants)
{
vertout OUT;
// we need to figure OUT what the position is
float4 position = IN.Position;
position.z = 0;
position.y = 0;
// add IN the actual base location of
// the straw (stored IN Color0.xz)
position.x = position.x + IN.Color0.x;
position.z = position.z + IN.Color0.z;
// figure OUT where the wind is coming from
float4 origin = float4(20,0,20,0);
float4 dir = position - origin;
// find the intensity of the wind
float inten = sin(Constants.x + .2*length(dir)) *
IN.Position.y;
dir = normalize(dir);
// we need to do some Bezier curve stuff here.
float4 ctrl1 = float4(0,0,0,0);
float4 ctrl2 = float4(0,IN.Color0.y/2,0,0);
float4 ctrl3 = float4(dir.x*inten, IN.Color0.y,
dir.z*inten, 0);
// do the Bezier linear interpolation steps
float t = IN.Color0.w;
204 808-00504-0000-006
NVIDIA
Cg Language Toolkit
float4 temp = lerp(ctrl1, ctrl2, t);
float4 temp2 = lerp(ctrl2, ctrl3, t);
float4 result = lerp(temp, temp2, t);
// add IN the height and wind displacement components
position = position + result;
position.w = 1;
// transform for sending to the reg. combiners
OUT.Hposition = mul(ModelViewProj, position);
// calculate the texture coordinate
// from the position passed IN
OUT.TexCoord0 = float4((IN.Position.x + .05)*10,t,1,1);
// find the normal
// we need one more point to do a partial
temp = lerp(ctrl1, ctrl2, t+0.05);
temp2 = lerp(ctrl2, ctrl3, t+0.05);
float4 newResult = lerp(temp, temp2, t+0.05);
// do a crossproduct with a vector that
// is horizontal across the screen
float normal = cross((result - newResult).xyz,
float3(1,0,0));
normal = normalize(normal);
// calculate diffuse lighting off the normal
// that was just calculated
float3 lightPos = float3(0,5,15);
float3 lightVec = normalize(lightPos - position);
float diffuseInten = dot(lightVec, normal);
// Set up the final color
// The first term is a semi random term based
// on the total height of this straw
// The second term is the diffuse lighting component
OUT.Color0 = normalize(ctrl3) * diffuseInten *
IN.Position.z;
return OUT;
}
808-00504-0000-006 205
NVIDIA
Basic Profile Sample Shaders
Refraction
Description
Thiseffectperformscustomtexturecoordinategenerationtocomputea
refractedvectorpervertexthatisthenusedtolookupinacubemap.Fresnel
isalsocalculatedtoblendbetweenreflectionandrefraction(Fig. 18.).
Fig. 18. Example of Refraction
206 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Vertex Shader Source Code for Refraction
struct inputs
{
float4 Position : POSITION;
float4 Normal : NORMAL;
};
struct outputs
{
float4 hPosition : POSITION;
float4 fresnelTerm : COLOR0;
float4 refractVec : TEXCOORD0;
float4 reflectVec : TEXCOORD1;
};
// fresnel approximation
fixed fast_fresnel(float3 I, float3 N,
float3 fresnelValues)
{
fixed power = fresnelValues.x;
fixed scale = fresnelValues.y;
fixed bias = fresnelValues.z;
return bias + pow(1.0 - dot(I, N), power) * scale;
}
outputs main(inputs IN,
uniform float4x4 ModelViewProj,
uniform float4x4 ModelView,
uniform float4x4 ModelViewIT,
uniform float theta)
{
outputs OUT;
OUT.hPosition = mul(ModelViewProj, IN.Position);
// convert the position and normal into
// appropriate spaces
float3 eyeToVert = mul(ModelView, IN.Position).xyz;
eyeToVert = normalize(eyeToVert);
float3 normal = mul(ModelViewIT, IN.Normal).xyz;
normal = normalize(normal);
OUT.refractVec.xyz = refract(eyeToVert, normal, theta);
808-00504-0000-006 207
NVIDIA
Basic Profile Sample Shaders
Pixel Shader Source Code for Refraction
OUT.refractVec.w = 1;
OUT.reflectVec.xyz = reflect(eyeToVert, normal);
OUT.reflectVec.w = 1;
// calculate the fresnel reflection
OUT.fresnelTerm = fast_fresnel(-eyeToVert, normal,
float3(5.0, 1.0, 0.0));
return OUT;
}
float4 main(in float3 refractVec : TEXCOORD0,
in float3 reflectVec : TEXCOORD1,
in float3 fresnelTerm : COLOR0,
uniform samplerCUBE environmentMaps[2],
uniform float enableRefraction,
uniform float enableFresnel) : COLOR
{
float3 refractColor = texCUBE(environmentMaps[0],
refractVec).rgb;
float3 reflectColor = texCUBE(environmentMaps[1],
reflectVec).rgb;
float3 reflectRefract = lerp(refractColor, reflectColor,
fresnelTerm);
float3 finalColor = enableRefraction ?
(enableFresnel ? reflectRefract : refractColor) :
(enableFresnel ? reflectColor : fresnelTerm);
return float4(finalColor, 1.0);
}
208 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Shadow Mapping
Description
Thiseffectshowsgeneratingtexturecoordinatesforshadowmapping,along
withusingtheshadowmapinthelightingequationperpixel(Fig. 19.).
Fig. 19. Example of Shadow Mapping
808-00504-0000-006 209
NVIDIA
Basic Profile Sample Shaders
Vertex Shader Source Code for Shadow Mapping
struct appdata {
float3 Position : POSITION;
float3 Normal : NORMAL;
};
struct vpconn {
float4 Hposition : POSITION;
float4 TexCoord0 : TEXCOORD0;
float4 TexCoord1 : TEXCOORD1;
float4 Color0 : COLOR0;
};
vpconn main(appdata IN,
uniform float4x4 WorldViewProj,
uniform float4x4 TexTransform,
uniform float3x3 WorldIT,
uniform float3 LightVec)
{
vpconn OUT;
float3 worldNormal = normalize(mul(WorldIT, IN.Normal));
float ldotn = max(dot(LightVec, worldNormal), 0.0);
OUT.Color0.xyz = ldotn.xxx;
float4 tempPos;
tempPos.xyz = IN.Position.xyz;
tempPos.w = 1.0;
OUT.TexCoord0 = mul(TexTransform, tempPos);
OUT.TexCoord1 = mul(TexTransform, tempPos);
OUT.Hposition = mul(WorldViewProj, tempPos);
return OUT;
}
210 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Pixel Shader Source Code for Shadow Mapping
struct v2f_simple {
float4 Hposition : POSITION;
float4 TexCoord0 : TEXCOORD0;
float4 TexCoord1 : TEXCOORD1;
float4 Color0 : COLOR0;
};
float4 main(v2f_simple IN,
uniform sampler2D ShadowMap,
uniform sampler2D SpotLight) : COLOR
{
float4 shadow = tex2D(ShadowMap, IN.TexCoord0.xy);
float4 spotlight = tex2D(SpotLight, IN.TexCoord1.xy);
float4 lighting = IN.Color0;
return shadow * spotlight * lighting;
}
808-00504-0000-006 211
NVIDIA
Basic Profile Sample Shaders
Shadow Volume Extrusion
Description
Thiseffectusesvertexprogramstogenerateshadowvolumesbyextruding
geometryalongthelightvector(Fig. 20.).
Fig. 20. Example of Shadow Volume Extrusion
212 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Vertex Shader Source Code for Shadow Volume Extrusion
struct appdata
{
float4 Position : POSITION;
float3 Normal : NORMAL;
float4 DiffuseColor : COLOR0;
float2 TexCoord0 : TEXCOORD0;
};
struct vpconn {
float4 Hposition : POSITION;
float4 Color0 : COLOR0;
float2 TexCoord0 : TEXCOORD0;
};
vpconn main(appdata IN,
uniform float4x4 WorldViewProj,
uniform float4 LightPos, // (in object space)
uniform float4 Fatness,
uniform float4 ShadowExtrudeDist,
uniform float4 Factors
)
{
vpconn OUT;
// Create normalized vector from vertex to light
float4 light_to_vert = normalize(IN.Position - LightPos);
// N dot L to decide if point should be moved away
// from the light to extrude the volume
float ndotl = dot(-light_to_vert.xyz, IN.Normal.xyz);
// Inset the position along
// the normal vector direction
// This moves the shadow volume points
// inside the model slightly to minimize
// popping of shadowed areas as
// each facet comes in and out of shadow.
// The Fatness value should be negative
float4 inset_pos = (IN.Normal * Fatness.xyz +
IN.Position.xyz).xyzz;
inset_pos.w = IN.Position.w;
// scale the vector from light to vertex
808-00504-0000-006 213
NVIDIA
Basic Profile Sample Shaders
float4 extrusion_vec = light_to_vert * ShadowExtrudeDist;
// if ndotl < 0 then the vertex faces
// away from the light, so move it.
// It will be moved along the direction from
// light to vertex to extrude the shadow volume.
float away = (float)(ndotl < 0);
// Move the back-facing shadow volume points
float4 new_position = extrusion_vec * away + inset_pos;
// Transform position to hclip space;
OUT.Hposition = mul(WorldViewProj, new_position);
// Set the color to blue for when the shadow volume
// is rendered in color for illustrative purposes
float4 color = float4(0, 0, Factors.x, 0);
OUT.Color0 = color;
OUT.TexCoord0.xy = IN.TexCoord0;
return OUT;
}
214 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Sine Wave Demo
Description
Thiseffectmodifiesthevertexpositionsusingasinefunctionbasedonthe
currenttime.Itdemonstratesuseofthebuiltinsin()function.Italso
computesanormalbasedontheperturbedmesh,andusesthistocomputea
reflectionvectortolookupinacubemap(Fig. 21.).
Fig. 21. Example of Sine Wave
808-00504-0000-006 215
NVIDIA
Basic Profile Sample Shaders
Vertex Shader Source Code for Sine Wave
struct appdata {
float4 TexCoord0 : TEXCOORD0;
};
struct vpconn {
float4 HPOS : POSITION;
float4 COL0 : COLOR0;
float4 TEX0 : TEXCOORD0;
};
vpconn main(appdata IN,
uniform float4x4 WorldViewProj,
uniform float3x4 WorldView,
uniform float3x3 WorldViewIT,
uniform float3 WavesX,
uniform float3 WavesY,
uniform float3 WavesH,
uniform float3 Time
)
{
vpconn OUT;
float3 angle = WavesX * IN.TexCoord0.x +
WavesY * IN.TexCoord0.y;
angle = angle + Time;
float3 sine, cosine;
sincos(angle, sine, cosine);
// position is: (u, sum(hi * sin(anglei)), v, 1)
float4 position;
position.xz = IN.TexCoord0.xy;
position.y = dot(WavesH, sine);
position.w = 1.0f;
OUT.HPOS = mul(WorldViewProj, position);
// normal is (t h WaveX cos(angle),
//-1,
//t h WaveY cos(angle))
float3 normal;
normal.x = dot(WavesH * WavesX, cosine);
normal.y = -1.0f;
216 808-00504-0000-006
NVIDIA
Cg Language Toolkit
normal.z = dot(WavesH * WavesY, cosine);
// transform normal into eye-space
normal = mul(WorldViewIT, normal);
normal = normalize(normal);
// Transform vertex to eye-space and
// compute the vector from the eye to the vertex.
// Because the eye is at 0, no subtraction is
// necessary. Because the reflection of this vector
// looks into a cube-map normalization is also
// unnecessary!
float3 eyeVector = mul(WorldView, position);
OUT.TEX0.xyz = reflect(eyeVector, normal);
return OUT;
}
808-00504-0000-006 217
NVIDIA
Basic Profile Sample Shaders
Matrix Palette Skinning
Description
Thiseffectperformsmatrixpaletteskinningusingtwobonespervertex.All
thebonesforthemesharesetintheconstantmemory,andeachvertex
includestwoindicesthatindicatewhichbonesinfluencethisvertex.The
finalskinnedpositionsarecomputedusingthesebones,alongwiththe
weightssuppliedpervertex.Tangentspacebasesareskinnedinasimilar
fashionandthenusedtotransformthelightvectorintotangentspacefor
perpixelbumpmapping(Fig. 22.).
Fig. 22. Example of Matrix Palette Skinning
218 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Vertex Shader Source Code for Matrix Palette Skinning
struct appdata {
float3 Position : POSITION;
float2 Weights : BLENDWEIGHT0;
float2 Indices : BLENDINDICES;
float3 Normal : NORMAL;
float2 TexCoord0 : TEXCOORD0;
float3 S : TEXCOORD1;
float3 T : TEXCOORD2;
float3 SxT : TEXCOORD3;
};
struct vpconn {
float4 Hposition : POSITION;
float4 TexCoord0 : TEXCOORD0;
float4 TexCoord1 : TEXCOORD1;
float4 Color0 : COLOR0;
};
vpconn main(appdata IN,
uniform float4x4 WorldViewProj,
uniform float3x4 Bones[26],
uniform float3 LightVec)
{
vpconn OUT;
float4 tempPos;
tempPos.xyz = IN.Position.xyz;
tempPos.w = 1.0;
// grab first bone matrix
float i = IN.Indices.x;
//transform position
float3 pos0 = mul(Bones[i], tempPos);
//create 3x3 version of bone matrix
float3x3 m;
m._m00_m01_m02 = Bones[i]._m00_m01_m02;
m._m10_m11_m12 = Bones[i]._m10_m11_m12;
m._m20_m21_m22 = Bones[i]._m20_m21_m22;
// transform S, T, SxT
float3 s0 = mul(m, IN.S);
808-00504-0000-006 219
NVIDIA
Basic Profile Sample Shaders
float3 t0 = mul(m, IN.T);
float3 sxt0 = mul(m, IN.SxT);
// next bone
i = IN.Indices.y;
// create 3x3 version of bone
m._m00_m01_m02 = Bones[i]._m00_m01_m02;
m._m10_m11_m12 = Bones[i]._m10_m11_m12;
m._m20_m21_m22 = Bones[i]._m20_m21_m22;
float3 pos1 = mul(Bones[i], tempPos);
// transform S, T, SxT
float3 s1 = mul(m, IN.S);
float3 t1 = mul(m, IN.T);
float3 sxt1 = mul(m, IN.SxT);
// final blending
// blend s, t, sxt
float3 finalS = s0 * IN.Weights.x + s1 * IN.Weights.y;
float3 finalT = t0 * IN.Weights.x + t1 * IN.Weights.y;
float3 finalSxT = sxt0 * IN.Weights.x+sxt1 * IN.Weights.y;
// blend between the two positions
float3 finalPos = pos0 * IN.Weights.x+pos1*IN.Weights.y;
float3x3 worldToTangentSpace;
worldToTangentSpace._m00_m01_m02 = finalS;
worldToTangentSpace._m10_m11_m12 = finalT;
worldToTangentSpace._m20_m21_m22 = finalSxT;
float3 tangentLight =
normalize(mul(worldToTangentSpace, LightVec));
// scale and bias, add bit of ambient
tangentLight = ((tangentLight + 1.0) * 0.5) + 0.2;
// create float4 with 1.0 alpha
float4 tempLight;
tempLight.xyz = tangentLight.xyz;
tempLight.w = 1.0;
OUT.Color0 = tempLight;
220 808-00504-0000-006
NVIDIA
Cg Language Toolkit
// pass through texcoords
OUT.TexCoord0.xy = IN.TexCoord0.xy;
OUT.TexCoord1.xy = IN.TexCoord0.xy;
float4 tempPos2;
tempPos2.xyz = finalPos.xyz;
tempPos2.w = 1.0;
OUT.Hposition = mul(WorldViewProj, tempPos2);
return OUT;
}
808-00504-0000-006 221
NVIDIA
Appendix A
Cg Language Specification
Language Overview
TheCglanguageisprimarilymodeledonANSIC,butadoptssomeideas
frommodernlanguagessuchasC++andJava,andfromearliershading
languagessuchasRenderManandtheStanfordshadinglanguage.The
languagealsointroducesafewnewideas.Inparticular,itincludesfeatures
designedtorepresentdataflowinstreamprocessingarchitecturessuchas
GPUs.Profiles,whicharespecifiedatcompiletime,maysubsetcertain
featuresofthelanguage,includingtheabilitytoimplementloopsandthe
precisionatwhichcertaincomputationsareperformed.
Silent Incompatibilities
MostofthechangesfromANSICareeitheromissionsoradditions,butthere
areafewpotentiallysilentincompatibilities.ThesearechangeswithinCgthat
couldcauseaprogramthatcompileswithouterrorstobehaveinamanner
differentfromC:
Thetypepromotionrulesforconstantsaredifferentwhentheconstantis
notexplicitlytypedusingatypecastortypesuffix.Ingeneral,abinary
operationbetweenaconstantthatisnotexplicitlytypedandavariableis
performedatthevariablesprecision,ratherthanattheconstant’sdefault
precision.
Declarationsofstructperformanautomatictypedef(asinC++)and
thuscouldoverrideapreviouslydeclaredtype.
Arraysarefirstclasstypesthataredistinctfrompointers.Asaresult,
arrayassignmentssemanticallyperformacopyoperationfortheentire
array.
222 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Similar Operations That Must be Expressed Differently
Thereareseveralchangesthatforcethesameoperationtobeexpressed
differentlyinCgthaninC:
ABooleantype,bool,isintroduced,withcorrespondingimplicationsfor
operatorsandcontrolconstructs.
ArraysarefirstclasstypesbecauseCgdoesnotsupportpointers.
Functionspassvaluesbyvalue/result,andthususeanoutorinout
modifierintheformalparameterlisttoreturnaparameter.Bydefault,
formalparametersarein,butitisacceptabletospecifythisexplicitly.
Parameterscanalsobespecifiedasin out,whichissemanticallythe
sameasinout.
Differences from ANSI C
CgwasdevelopedbasedontheANSIClanguagewiththefollowingmajor
additions,deletions,andchanges.(Thisisasummary—moredetailis
providedlaterinthisdocument):
Languageprofiles(describedin“Profiles”onpage 225)maysubset
languagecapabilitiesinavarietyofways.Inparticular,languageprofiles
mayrestricttheuseofforandwhileloops.Forexample,someprofiles
mayonlysupportloopsthatcanbefullyunrolledatcompiletime.
Abinding semanticmaybeassociatedwithastructuretag,avariable,ora
structureelementtodenotethatobject’smappingtoaspecifichardware
orAPIresource.See“BindingSemantics”onpage 242.
Reservedkeywordsgoto,break,andcontinuearenotsupported.
Reservedkeywordsswitch,case,anddefaultarenotsupported.
Labelsarenotsupportedeither.
Pointersandpointerrelatedcapabilities(suchasthe&and->operators)
arenotsupported.
Arraysaresupported,butwithsomelimitationsonsizeand
dimensionality.Restrictionsontheuseofcomputedsubscriptsarealso
permitted.Arraysmaybedesignatedaspacked.Theoperationsallowed
onpackedarraysmaybedifferentfromthoseallowedonunpacked
arrays.Predefinedpackedtypesareprovidedforvectorsandmatrices.It
isstronglyrecommendedthesepredefinedtypesbeused.
808-00504-0000-006 223
NVIDIA
Appendix A Cg Language Specification
Unsizedarrayscanbecreatedbydeclaringanarray’sdimensionas[].
Thearray’sactualdimensioncanbesetatruntimebeforeafinal
compilationstep.
Thereisabuiltinswizzleoperator:.xyzwor.rgbaforvectors.This
operatorallowsthecomponentsofavectortoberearrangedandalso
replicated.Italsoallowsthecreationofavectorfromascalar.
Foranlvalue,theswizzleoperatorallowscomponentsofavectoror
matrixtobeselectivelywritten.
Thereisasimilarbuiltinswizzleoperatorformatrices:
Thisoperatorallowsaccesstoindividualmatrixcomponentsandallows
thecreationofavectorfromelementsofamatrix.Forcompatibilitywith
DirectX8notation,thereisasecondformofmatrixswizzle,whichis
describedlater.
Numericdatatypesaredifferent.Cg’sprimarynumericdatatypesare
float,half,andfixed.Fragmentprofilesarerequiredtosupportall
threedatatypes,butmaychoosetoimplementhalfandfixedatfloat
precision.Vertexprofilesarerequiredtosupporthalfandfloat,but
maychoosetoimplementhalfatfloatprecision.Vertexprofilesmay
omitsupportforfixedoperations,butmuststillsupportdefinitionof
fixedvariables.Cgallowsprofilestoomitruntimesupportforint.Cg
allowsprofilestotreatdoubleasfloat.
Manyoperatorssupportperelementvectoroperations.
The?:,||,&&,!,andcomparisonoperatorscanbeusedwithboolfour
vectorstoperformfourconditionaloperationssimultaneously.Theside
effectsofalloperandstothe?:,||,and&&operatorsarealways
executed.
Nonstaticglobalvariablesandparameterstotoplevelfunctions—such
asmain()—maybedesignatedasuniform.Auniformvariablemaybe
readandwrittenwithinaprogram,justlikeanyothervariable.
However,theuniformmodifierindicatesthattheinitialvalueofthe
variableorparameterisexpectedtobeconstantacrossalargenumberof
invocationsoftheprogram.
Anewsetofsampler*typesrepresentshandlestotextureobjects.
Functionsmayhavedefaultvaluesfortheirparameters,asinC++.These
defaultsareexpressedusingassignmentsyntax.
Functionoverloadingissupported.
._m<row><col>[_m<row><col>][…]
224 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Thereisnoenumorunion.
Bitfielddeclarationsinstructuresarenotallowed.
Therearenobitfielddeclarationsinstructures.
Variablesmaybedefinedanywherebeforetheyareused,ratherthanjust
atthebeginningofascopeasinC.(Thatis,weadopttheC++rulesthat
governwherevariabledeclarationsareallowed.)Variablesmaynotbe
redeclaredwithinthesamescope.
Vectorconstructors,suchastheformfloat4(1,2,3,4),maybeused
anywhereinanexpression.
Astructdefinitionautomaticallyperformsacorrespondingtypedef,
asinC++.
Aninterfacecanbespecifiedtodefineasetofmethodsthatcomprises
anabstractinterface.
Astructtypecanbedeclaredasimplementinganinterfaceby
addingacolon:andthenameoftheinterfaceafterthenameofthe
struct.
Methodscanbedefinedinthebodyofastructdefinition.
C++style//commentsareallowedinadditiontoCstyle/*…*/
comments.
Detailed Language Specification
Definitions
ThefollowingdefinitionsarebasedontheANSICstandard:
Object
Anobjectisaregionofdatastorageintheexecutionenvironment,the
contentsofwhichcanrepresentvalues.Whenreferenced,anobjectmay
beinterpretedashavingaparticulartype.
Declaration
Adeclarationspecifiestheinterpretationandattributesofasetof
identifiers.
Definition
Adeclarationthatalsocausesstoragetobereservedforanobjectorcode
thatwillbegeneratedforafunctionnamedbyanidentifierisa
definition.
808-00504-0000-006 225
NVIDIA
Appendix A Cg Language Specification
Profiles
CompilationofaCgprogram,atoplevelfunction,alwaysoccursinthe
contextofacompilationprofile.Theprofilespecifieswhethercertain
optionallanguagefeaturesaresupported.Theseoptionallanguagefeatures
includecertaincontrolconstructsandstandardlibraryfunctions.The
compilationprofilealsodefinestheprecisionofthefloat,half,andfixed
datatypes,andspecifieswhetherthefixedandsampler*datatypesare
fullyoronlypartiallysupported.Thechoiceofacompilationprofileismade
externallytothelanguage,byusingacompilercommandlineswitch,for
example.
Theprofilerestrictionsareonlyappliedtothetoplevelfunctionthatisbeing
compiledandtoanyvariablesorfunctionsthatitreferences,eitherdirectly
orindirectly.Ifafunctionispresentinthesourcecode,butnotcalleddirectly
orindirectlybythetoplevelfunction,itisfreetousecapabilitiesthatarenot
supportedbythecurrentprofile.
TheintentoftheserulesistoallowasingleCgsourcefiletocontainmany
differenttoplevelfunctionsthataretargetedatdifferentprofiles.Thecore
Cglanguagespecificationissufficientlycompletetoallowallofthese
functionstobeparsed.Therestrictionsprovidedbyacompilationprofileare
onlyneededforcodegeneration,andarethereforeonlyappliedtothose
functionsforwhichcodeisbeinggenerated.Thisspecificationusestheword
programtorefertothetoplevelfunction,anyfunctionsthetoplevelfunction
calls,andanyglobalvariablesortypedefdefinitionsitreferences.
Eachprofilemusthaveaseparatespecificationthatdescribesits
characteristicsandlimitations.
ThiscoreCgspecificationrequirescertainminimumcapabilitiesforall
profiles.Insomecases,thecorespecificationdistinguishesbetweenvertex
programandfragmentprogramprofiles,withdifferentminimum
capabilitiesforeach.
The Uniform Modifier
Nonstaticglobalvariablesandparameterspassedtofunctions,suchas
main(),canbedeclaredwithanoptionalqualifieruniform.Tospecifya
uniformvariable,usethissyntax:
Forexample,
uniform <type> <variable>
uniform float4 myVector;
226 808-00504-0000-006
NVIDIA
Cg Language Toolkit
or
Iftheuniformqualifierisspecifiedforafunctionthatisnottoplevel,itis
meaninglessandisignored.Theintentofthisruleistoallowafunctionto
serveeitherasatoplevelfunctionorasonethatisnot.
Notethatuniformvariablesmaybereadandwrittenjustlikenonuniform
variables.Theuniformqualifiersimplyprovidesinformationabouthowthe
initialvalueofthevariableistobespecifiedandstored,througha
mechanismexternaltothelanguage.
Typically,theinitialvalueofauniformvariableorparameterisstoredina
differentclassofhardwareregister.Furthermore,theexternalmechanismfor
specifyingtheinitialvalueofuniformvariablesorparametersmaybe
differentthanthatusedforspecifyingtheinitialvalueofnonuniform
variablesorparameters.Parametersqualifiedasuniformarenormally
treatedaspersistentstate,whilenonuniformparametersaretreatedas
streamingdata,withanewvaluespecifiedforeachstreamrecord(suchas
withinavertexarray).
Function Declarations
FunctionsaredeclaredessentiallyasinC.Afunctionthatdoesnotreturna
valuemustbedeclaredwithavoidreturntype.Afunctionthattakesno
parametersmaybedeclaredinoneoftwoways:
AsinC,usingthevoidkeyword:functionName(void)
Withnoparametersatall:functionName()
Functionsmaybedeclaredasstatic.Ifso,theymaynotbecompiledasa
programandarenotvisiblefromothercompilationunits.
Overloading of Functions by Profile
Cgsupportsoverloadingoffunctionsbycompilationprofile.Thiscapability
allowsafunctiontobeimplementeddifferentlyfordifferentprofiles.Itis
alsousefulbecausedifferentprofilesmaysupportdifferentsubsetsofthe
languagecapabilities,andbecausethemostefficientimplementationofa
functionmaybedifferentfordifferentprofiles.
float4 foo(uniform float4 uv);
808-00504-0000-006 227
NVIDIA
Appendix A Cg Language Specification
Theprofilenamemustimmediatelyprecedethetypenameinthefunction
declaration.Forexample,todefinetwodifferentversionsofthefunction
myfunc()fortheprofileAandprofileBprofiles:
Ifatypeisdefined(usingatypedef)thathasthesamenameasaprofile,the
identifieristreatedasatypenameandisnotavailableforprofile
overloadingatanysubsequentpointinthefile.
Ifafunctiondefinitiondoesnotincludeaprofile,thefunctionisreferredto
asanopenprofilefunction.Openprofilefunctionsapplytoallprofiles.
Severalwildcardprofilenamesaredefined.Thenamevsmatchesanyvertex
profile,whilethenamepsmatchesanyfragmentorpixelprofile.
Thenamesps_1andps_2matchanyDirectX8pixelshader1.xprofileor
DirectX9pixelshader2.xprofile,respectively.Similarly,thenamesvs_1and
vs_2matchanyDirectXvertexshader1.xor2x,respectively.Additional
validwildcardprofilenamesmaybedefinedbyindividualprofiles.
Ingeneral,themostspecificversionofafunctionisused.Moredetailsare
providedin“FunctionOverloading”onpage 240,butroughlyspeaking,the
searchorderisthefollowing:
1. Versionofthefunctionwiththeexactprofileoverload
2. Versionofthefunctionwiththemostspecificwildcardprofileoverload
(suchasvsorps_1)
3. Versionofthefunctionwithnoprofileoverload
Thissearchprocessallowsgenericversionsofafunctiontobedefinedthat
canbeoverriddenasneededforparticularhardware.
Syntax for Parameters in Function Definitions
FunctionsaredeclaredinamannersimilartoC,buttheparametersin
functiondefinitionsmayincludeabindingsemantic(see“Binding
Semantics”onpage 242)andadefaultvalue.
Eachparameterinafunctiondefinitiontakesthefollowingform:
where
<type>mayincludethequalifiersin,out,inout,andconst,as
discussedin“TypeQualifiers”onpage 233.
profileA float myfunc(float x) {/*...*/};
profileB float myfunc(float x) {/*...*/};
[uniform] <type> identifier [: <binding_semantic>] [= <default>]
228 808-00504-0000-006
NVIDIA
Cg Language Toolkit
<default>isanexpressionthatresolvestoaconstantatcompiletime.
Defaultvaluesareonlypermittedforuniformparameters,andforin
parameterstofunctionsthatarenottoplevel.
Function Calls
Afunctioncallreturnsanrvalue.Therefore,ifafunctionreturnsanarray,the
arraymaybereadbutnotwritten.Forexample,thefollowingisallowed:
But,thisisnot:myfunc(x)[2] = y;.
Formultiplefunctioncallswithinanexpression,thecallscanoccurinany
order—itisundefined.
Method Calls
Structuresmayhavemethodsdeclaredanddefinedintheirstructure
definitions.Forexample,
Structuremethodsarecalledusingthe.notation:givenanobjectfoftype
Foo,thevalueTimesTwo()methodiscalledbyf.valueTimesTwo().
Interfaces
Interfacesmaybedeclaredinordertodefineasetofmethodsthatastructure
mustprovideinordertoimplementthatinterface.
Programsandfunctionscantakeinterfacesasparameters,wherethespecific
structuretypesbeingpassedtothemmayberesolvedatruntime.Depending
onhardwarelimitations,someprofilesmayrequirethattheconcretetypes
associatedwithaparticularusageofinterfacesberesolvedbytheruntime
beforetheprogramcanexecute.
Interfacesarespecifiedwiththeinterfacekeyword:
y = myfunc(x)[2];
struct Foo {
float value;
float valueTimesTwo() { return 2 * value; }
};
interface Light {
float3 illuminate(float3 position);
};
808-00504-0000-006 229
NVIDIA
Appendix A Cg Language Specification
Astructureindicatesthatitimplementsaparticularinterfacewithacolon
andthenameoftheinterface:
Astructuremayonlyimplementasingleinterfaceandinheritancebetween
structuresisnotsupported.
Types
Cg’stypesareasfollows:
Theinttypeispreferably32bittwo’scomplement.Profilesmay
optionallytreatintasfloat.
ThefloattypeisascloseaspossibletotheIEEEsingleprecision(32bit)
floatingpoint.Profilesmustsupportthefloatdatatype.
ThehalftypeislowerprecisionIEEElikefloatingpoint.Profilesmust
supportthehalftype,butmaychoosetoimplementitwiththesame
precisionasthefloattype.
Thefixedtypeisasignedtypewitharangeofatleast[2,2)andwithat
least10bitsoffractionalprecision.Overflowoperationsonthedatatype
clampratherthanwrap.Fragmentprofilesmustsupportthefixedtype,
butmayimplementitwiththesameprecisionasthehalforfloat
types.Vertexprofilesarerequiredtoprovidepartialsupport(see
“PartialSupportofTypesonpage 231)forthefixedtype.Vertex
profileshavetheoptiontoprovidefullsupportforthefixedtypeorto
implementthefixedtypewiththesameprecisionasthehalforfloat
types.
ThebooltyperepresentsBooleanvalues.Objectsofbooltypeareeither
trueorfalse.
Thecinttypeis32bittwo’scomplement.Thistypeismeaningfulonly
atcompiletime;itisnotpossibletodeclareobjectsoftypecint.
ThecfloattypeisIEEEsingleprecision(32bit)floatingpoint.Thistype
ismeaningfulonlyatcompiletime;itisnotpossibletodeclareobjectsof
typecfloat.
Thevoidtypemaynotbeusedinanyexpression.Itmayonlybeusedas
thereturntypeoffunctionsthatdonotreturnavalue.
struct PointLight : Light {
float3 illuminate(float3 position) { ... }
};
230 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Thesampler*typesarehandlestotextureobjects.Formalparametersof
aprogramorfunctionmaybeoftypesampler*.Nootherdefinitionof
sampler*variablesispermitted.Asampler*variablemayonlybeused
bypassingittoanotherfunctionasaninparameter.Assignmentto
sampler*variablesisnotpermitted,andsampler*expressionsarenot
permitted.
Thefollowingsampler*typesarealwaysdefined:sampler, sampler1D,
sampler2D,sampler3D,samplerCUBE,andsamplerRECT.Thebase
samplertypemaybeusedinanycontextinwhichamorespecific
samplertypeisvalid.However,asamplervariablemustbeusedina
consistentwaythroughouttheprogram.Forexample,itcannotbeused
inplaceofbothasampler1Dandasampler2Dinthesameprogram.
Fragmentprofilesarerequiredtofullysupportthesampler,sampler1D,
sampler2D,sampler3D,andsamplerCUBEdatatypes.Fragmentprofiles
arerequiredtoprovidepartialsupport(see“PartialSupportofTypes
onpage 231)forthesamplerRECTdatatypeandmayoptionallyprovide
fullsupportforthisdatatype.
Vertexprofilesarerequiredtoprovidepartialsupportforthesix
samplerdatatypesandmayoptionallyprovidefullsupportforthese
datatypes.
Anarraytypeisacollectionofoneormoreelementsofthesametype.
Anarrayvariablehasasingleindex.
Somearraytypesmaybeoptionallydesignatedaspacked,usingthe
packedtypemodifier.Thestorageformatofapackedtypemaybe
differentfromthestorageformatofthecorrespondingunpackedtype.
Thestorageformatofpackedtypesisimplementationdependent,but
mustbeconsistentforanyparticularcombinationofcompilerand
profile.Theoperationssupportedonapackedtypeinaparticularprofile
maybedifferentthantheoperationssupportedonthecorresponding
unpackedtypeinthatsameprofile.Profilesmaydefineamaximum
allowablesizeforpackedarrays,butmustsupportatleastsize4for
packedvector(onedimensionalarray)types,and4x4forpackedmatrix
(twodimensionalarray)types.
Whendeclaringanarrayofarraysinasingledeclaration,thepacked
modifieronlyreferstotheoutermostarray.However,itispossibleto
declareapackedarrayofpackedarraysbydeclaringthefirstlevelof
arrayinatypedefusingthepackedkeywordandthendeclaringa
packedarrayofthistypeinasecondstatement.Itisnotpossibletohave
apackedarrayofunpackedarrays.
808-00504-0000-006 231
NVIDIA
Appendix A Cg Language Specification
ForanysupportednumericdatatypeTYPE,implementationsmust
supportthefollowingpackedarraytypes,whicharecalledvectortypes.
Typeidentifiersmustbepredefinedforthesetypesintheglobalscope:
Forexample,implementationsmustpredefinethetypeidentifiers
float1,float2,float3,float4,andsoonforanyothersupported
numerictype.
ForanysupportednumericdatatypeTYPE,implementationsmust
supportthefollowingpackedarraytypes,whicharecalledmatrixtypes.
Implementationsmustalsopredefinetypeidentifiers(intheglobal
scope)torepresentthesetypes:
Forexample,implementationsmustpredefinethetypeidentifiers
float2x1,float3x3,float4x4,andsoon.Atypedeffollowstheusual
matrixnamingconventionofTYPE_rows_X_columns.Ifwedeclare
float4x4a,thena[3]isequivalenttoa._m30_m31_m32_m33.
Bothexpressionsextractthethirdrowofthematrix.
Implementationsarerequiredtosupportindexingofvectorsand
matriceswithconstantindices.
Astructtypeisacollectionofoneormoremembersofpossibly
differenttypes.
Aninterfacetypedefinesacollectionofmethodsthatcomprisesan
abstractinterface.
Partial Support of Types
Thisspecificationmandatespartialsupportforsometypes.Partialsupportfor
atyperequiresthefollowing:
Definitionsanddeclarationsusingthetypearesupported.
typedef packed TYPE TYPE1[1];
typedef packed TYPE TYPE2[2];
typedef packed TYPE TYPE3[3];
typedef packed TYPE TYPE4[4];
packed TYPE1 TYPE1x1[1]; packed TYPE1 TYPE3x1[3];
packed TYPE2 TYPE1x2[1]; packed TYPE2 TYPE3x2[3];
packed TYPE3 TYPE1x3[1]; packed TYPE3 TYPE3x3[3];
packed TYPE4 TYPE1x4[1]; packed TYPE4 TYPE3x4[3];
packed TYPE1 TYPE2x1[2]; packed TYPE1 TYPE4x1[4];
packed TYPE2 TYPE2x2[2]; packed TYPE2 TYPE4x2[4];
packed TYPE3 TYPE2x3[2]; packed TYPE3 TYPE4x3[4];
packed TYPE4 TYPE2x4[2]; packed TYPE4 TYPE4x4[4];
232 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Assignmentandcopyofobjectsofthattypearesupported(including
implicitcopieswhenpassingfunctionparameters).
Toplevelfunctionparametersmaybedefinedusingthattype.
Ifatypeispartiallysupported,variablesmaybedefinedusingthattypebut
nousefuloperationscanbeperformedonthem.Partialsupportfortypes
makesiteasiertosharedatastructuresincodethatistargetedatdifferent
profiles.
Type Categories
Theintegraltypecategoryincludestypescintandint.
Thefloatingtypecategoryincludestypescfloat,float,half,and
fixed.(Notethatfloatingreallymeansfloatingorfixed/fractional.)
Thenumerictypecategoryincludesintegralandfloatingtypes.
Thecompiletimetypecategoryincludestypescfloatandcint.These
typesareusedbythecompilerforconstanttypeconversions.
Theconcretetypecategoryincludesalltypesthatarenotincludedinthe
compiletimetypecategory.
Thescalartypecategoryincludesalltypesinthenumericcategory,the
booltype,andalltypesinthecompiletimecategory.Inthis
specification,areferencetoa<category> type(suchasareferencetoa
numerictype)meansoneofthetypesincludedinthecategory(suchas
float,half,orfixed).
Constants
Aconstantmaybeexplicitlytypedorimplicitlytyped.Explicittypingofa
constantisperformed,asinC,bysuffixingtheconstantwithasingle
characterindicatingthetypeoftheconstant:
fforfloat
dfordouble
hforhalf
xforfixed
Anyconstantthatisnotexplicitlytypedisimplicitlytyped.Iftheconstant
includesadecimalpoint,itisimplicitlytypedascfloat.Ifitdoesnot
includeadecimalpoint,itisimplicitlytypedascint.
808-00504-0000-006 233
NVIDIA
Appendix A Cg Language Specification
Bydefault,constantsarebase10.ForcompatibilitywithC,integer
hexadecimalconstantsmaybespecifiedbyprefixingtheconstantwith0x,
andintegeroctalconstantsmaybespecifiedbyprefixingtheconstantwith0.
Compiletimeconstantfoldingispreferablyperformedatthesameprecision
thatwouldbeusediftheoperationwereperformedatruntime.Some
compilationprofilesmayallowsomeprecisionflexibilityforthehardware;
insuchcasesthecompilershouldideallyperformtheconstantfoldingatthe
highesthardwareprecisionallowedforthatdatatypeinthatprofile.
Ifconstantfoldingcannotbeperformedatruntimeprecision,itmay
optionallybeperformedusingtheprecisionindicatedbelowforeachofthe
numericdatatypes:
float:s23e8(fp32)IEEEsingleprecisionfloatingpoint
half:s10e5(fp16)floatingpointwithIEEEsemantics
fixed:s1.10fixedpoint,clampingto[2,2)
double:s52e11(fp64)IEEEdoubleprecisionfloatingpoint
int:signed32bitinteger
Type Qualifiers
Thetypeofanobjectmaybequalifiedwithoneormorequalifiers.Qualifiers
applyonlytoobjects.Qualifiersareremovedfromthevalueofanobject
whenusedinanexpression.Thequalifiersare
const
Thevalueofaconstqualifiedobjectcannotbechangedafteritsinitial
assignment.Thedefinitionofaconstqualifiedobjectthatisnota
parametermustcontainaninitializer.Namedcompiletimevaluesare
inherentlyqualifiedasconst,butanexplicitqualificationisalso
allowed.
Thevalueofastatic constcannotbechangedaftercompilation,and
thusitsvaluemaybeusedinconstantfoldingduringcompilation.A
uniform const,ontheotherhand,isonlyconstforagivenexecutionof
theprogram;itsvaluemaybechangedviatheruntimebetween
executions.
inandout
Formalparametersmaybequalifiedasin,out,orboth(byusinginout
orinout).Bydefault,formalparametersareinqualified.Anin
qualifiedparameterisequivalenttoacallbyvalueparameter.Anout
qualifiedparameterisequivalenttoacallbyresultparameter,andan
234 808-00504-0000-006
NVIDIA
Cg Language Toolkit
inoutqualifiedparameterisequivalenttoavalue/resultparameter.An
outqualifiedparametercannotbeconstqualified,normayithavea
defaultvalue.
Type Conversions
Sometypeconversionsareallowedimplicitly,whileothersrequireancast.
Someimplicitconversionsmaycauseawarning,whichcanbesuppressedby
usinganexplicitcast.ExplicitcastsareindicatedusingCstylesyntax:
castingvariabletothefloat4typecanbeachievedusing
(float4)variable.
Scalarconversions
Implicitconversionofanyscalarnumerictypetoanyotherscalar
numerictypeisallowed.Awarningmaybeissuediftheconversionis
implicitandalossofprecisionispossible.Implicitconversionofany
scalarobjecttypetoanycompatiblescalarobjecttypeisallowed.
Conversionsbetweenincompatiblescalarobjecttypesorbetweenobject
andnumerictypesarenotallowed,evenwithanexplicitcast.Asampler
iscompatiblewithsampler1D,sampler2D,sampler3D,samplerCube,
andsamplerRECT.Nootherobjecttypesarecompatible—sampler1Dis
notcomparablewithsampler2D,eventhoughbotharecompatiblewith
sampler.
Scalartypesmaybeimplicitlyconvertedtovectorsandmatricesof
compatibletype.Thescalarisreplicatedtoallelementsofthevectoror
matrix.Scalartypesmayalsobeexplicitlycasttostructuretypesifthe
scalartypecanbelegallycasttoeverymemberofthestructure.
Vectorconversions
Vectorsmaybeconvertedtoscalartypes(thefirstelementofthevectoris
selected).Awarningisissuedifthisisdoneimplicitly.Avectormayalso
beimplicitlyconvertedtoanothervectorofthesamesizeandcompatible
elementtype.
Avectormaybeconvertedtoasmallercompatiblevectororamatrixof
thesametotalsize,butawarningisissuedifanexplicitcastisnotused.
Matrixconversions
Matricesmaybeconvertedtoascalartype—element(0,0)isselected.As
withvectors,thiscausesawarningifitisdoneimplicitly.Amatrixmay
alsobeconvertedimplicitlytoamatrixofthesamesizeandshapeand
compatibleelementtype.
808-00504-0000-006 235
NVIDIA
Appendix A Cg Language Specification
Amatrixmaybeconvertedtoasmallermatrixtype(theupperleft
submatrixisselected)ortoavectorofthesametotalsize,butawarning
isissuedifanexplicitcastisnotused.
Structureconversions
Astructuremaybeexplicitlycasttothetypeofitsfirstmemberorto
anotherstructuretypewiththesamenumberofmembers,ifeach
memberofthestructcanbeconvertedtothecorrespondingmemberof
thenewstruct.Noimplicitconversionsofstructtypesareallowed.
Arrayconversions
Noconversionsofarraytypesareallowed.
Table 9.summarizesthetypeconversionsdiscussedhere.Thetableentries
havethefollowingmeanings,butpleasepayattentiontothefootnotes:
Allowed:allowedimplicitlyorexplicitly
Warning:allowed,butwarningissuedifimplicit
Explicit:onlyallowedwithexplicitcast
No:notallowed
Explicitcastsare
Compiletimetypewhenappliedtoexpressionsofcompiletimetype
Table 9. Type Conversions
Target Type Source Type
Scalar Vector Matrix Struct Array
Scalar Allowed Warning Warning Expliciti
i. Onlyallowedifthefirstmemberofthesourcecanbeconvertedtothetarget.
No
Vector Allowed Allowedii
ii. Notallowediftargetislargerthansource.Warningissuediftargetissmallerthansource.
Warningiii ExplicitiNo
Matrix Allowed Warningiii
iii. Onlyallowedifsourceandtargetarethesametotalsize.
Allowedii ExplicitiNo
Struct Explicit No No Explicitiv
iv. Onlyallowedifbothsourceandtargethavethesamenumberofmembers,andeach
memberofthesourcecanbeconvertedtothecorrespondingmemberofthetarget.
No
Array No No No No No
236 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Numerictypewhenappliedtoexpressionsofnumericorcompiletime
type
Numericvectortypewhenappliedtoanothervectortypeofthesame
numberofelements
Numericmatrixtypewhenappliedtoanothermatrixtypeofthesame
numberofrowsandcolumns
Type Equivalency
TypeT1isequivalenttotypeT2ifanyofthefollowingaretrue:
T2isequivalenttoT1.
T1andT2arethesamescalar,vector,orstructuretype.
Apackedarraytypeisnotequivalenttothesamesizeunpackedarray.
T1isatypedefnameofT2.
T1andT2arearraysofequivalenttypeswiththesamenumberof
elements.
TheunqualifiedtypesofT1andT2areequivalent,andbothtypeshave
thesamequalifications.
T1andT2arefunctionswithequivalentreturntypes,thesamenumber
ofparameters,andallcorrespondingparametersarepairwise
equivalent.
Type-Promotion Rules
Thecfloatandcinttypesbehavelikefloatandinttypesexceptforthe
usualarithmeticconversionbehaviorandfunctionoverloadingrules(see
“FunctionOverloading”onpage 240).
Theusualarithmeticconversionsforbinaryoperatorsaredefinedasfollows:
1. Ifeitheroperandisdouble,theotherisconvertedtodouble.
2. Otherwise,ifeitheroperandisfloat,theotheroperandisconvertedto
float.
3. Otherwise,ifeitheroperandishalf,theotheroperandisconvertedto
half.
4. Otherwise,ifeitheroperandisfixed,theotheroperandisconvertedto
fixed.
808-00504-0000-006 237
NVIDIA
Appendix A Cg Language Specification
5. Otherwise,ifeitheroperandiscfloat,theotheroperandisconvertedto
cfloat.
6. Otherwise,ifeitheroperandisint,theotheroperandisconvertedto
int.
7. Otherwise,bothoperandshavetypecint.
Notethatconversionshappenpriortoperformingtheoperation.
Assignment
Assignmentofanexpressiontoanobjectorcompiletimetypedvalue
convertstheexpressiontothetypeoftheobjectorvalue.Theresultingvalue
isthenassignedtotheobjectorvalue.
Thevalueoftheassignmentexpressions(=,*=,andsoon)isdefinedasinC:
Anassignmentexpressionhasthevalueoftheleftoperandafterthe
assignmentbutisnotanlvalue.Thetypeofanassignmentexpressionisthe
typeoftheleftoperandunlesstheleftoperandhasaqualifiedtype,inwhich
caseitistheunqualifiedversionofthetypeoftheleftoperand.Theside
effectofupdatingthestoredvalueoftheleftoperandoccursbetweenthe
previousandthenextsequencepoint.
Smearing of Scalars to Vectors
Ifabinaryoperatorisappliedtoavectorandascalar,thescalaris
automaticallytypepromotedtoasamesizedvectorbyreplicatingthescalar
intoeachcomponent.Theternary?:operatoralsosupportssmearing.The
binaryruleisappliedtothesecondandthirdoperandsfirst,andthenthe
binaryruleisappliedtothisresultandthefirstoperand.
Namespaces
JustasinC,therearetwonamespaces.Eachhasmultiplescopes,asinC.
Tagnamespace,whichconsistsofstructtags
Regularnamespace:
ªtypedefnames(includinganautomatictypedeffromastruct
declaration)
ªVariables
ªFunctionnames
238 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Arrays and Subscripting
ArraysaredeclaredasinC,exceptthattheymayoptionallybedeclaredtobe
packed,asdescribedunder“Types”onpage 229.ArraysinCgarefirstclass
types,soarrayparameterstofunctionsandprogramsmustbedeclared
usingarraysyntax,ratherthanpointersyntax.Likewise,assignmentofan
arraytypedobjectimpliesanarraycopyratherthanapointercopy.
Arrayswithsize[1]maybedeclaredbutareconsideredadifferenttype
fromthecorrespondingnonarraytype.
Becausethelanguagedoesnotcurrentlysupportpointers,thestorageorder
ofarraysisonlyvisiblewhenanapplicationpassesparameterstoavertexor
fragmentprogram.Therefore,thecompileriscurrentlyfreetoallocate
temporaryvariablesasitseesfit.
ThedeclarationanduseofarraysofarraysisinthesamestyleasinC.That
is,ifthe2DarrayAisdeclaredas
then,thefollowingstatementsaretrue:
ThearrayisindexedasA[row][column].
Thearraycanbebuiltwithaconstructorusing
A[0]isequivalentto{A[0][0],A[0][1],A[0][2],A[0][3]}.
Supportmustbeprovidedforanystructcontainingarrays.
Minimum Array Requirements
Profilesarerequiredtoprovidepartialsupportforcertainkindsofarrays.
Thispartialsupportisdesignedtosupportvectorsandmatricesinall
profiles.Forvertexprofiles,itisadditionallydesignedtosupportarraysof
lightstate(indexedbylightnumber)passedasuniformparameters,and
arraysofskinningmatricespassedasuniformparameters.
Profilesmustsupportsubscripting,copying,andswizzlingofvectorsand
matrices.However,subscriptingwithruntimecomputedindicesisnot
requiredtobesupported.
Vertexprofilesmustsupportthefollowingoperationsforanynonpacked
arraythatisauniformparametertotheprogram,orisanelementofa
float A[4][4];
A = { {A[0][0], A[0][1], A[0][2], A[0][3]},
{A[1][0], A[1][1], A[1][2], A[1][3]},
{A[2][0], A[2][1], A[2][2], A[2][3]},
{A[3][0], A[3][1], A[3][2], A[3][3]} };
808-00504-0000-006 239
NVIDIA
Appendix A Cg Language Specification
structurethatisauniformparametertotheprogram.Thisrequirementalso
applieswhenthearrayisindirectlyauniformprogramparameter(thatis,it
andorthestructurecontainingithasbeenpassedviaachainofinfunction
parameters).Therearetwooperationsthatmustbesupported:
Rvaluesubscriptingbyaruntimecomputedvalueoracompiletime
value
Passingtheentirearrayasaparametertoafunction,wherethe
correspondingformalfunctionparameterisdeclaredasin
Thefollowingoperationsareexplicitlynotrequiredtobesupported:
Lvaluesubscripting
Copying
Otheroperators,includingmultiply,add,compare,andsoon
Notethatwhenthearrayisrvaluesubscripted,theresultisanexpression,
andthisexpressionisnolongerconsideredtobeauniformprogram
parameter.Therefore,ifthisexpressionisanarray,itssubsequentusemust
conformtothestandardrulesforarrayusage.
Theserulesarenotlimitedtoarraysofnumerictypes,andthusimply
supportforarraysofstruct,arraysofmatrices,andarraysofvectorswhen
thearrayisauniformprogramparameter.Maximumarraysizesmaybe
limitedbythenumberofavailableregistersorotherresourcelimits,and
compilersarepermittedtoissueerrormessagesinthesecases.However,
profilesmustsupportsizesofatleastfloatarr[8],float4arr[8],and
float4x4arr[4][4].
Fragmentprofilesarenotrequiredtosupportanyoperationsonarbitrarily
sizedarrays;onlysupportforvectorsandmatricesisrequired.
Unsized Arrays
Anunsizedarraymaybedeclaredbydeclaringanarraywithnolength
specifiedbetweenthebrackets:floata[].Theactuallengthofthearray
maythenbesetbytheruntimebeforeprogramexecution.Inprogramcode,
thelengthofanyarraycanbequeriedusingthesyntaxa.length,where
lengthactslikeanundeclaredstructureparameterthatholdstheactual
lengthofthearrayatruntime.
240 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Function Overloading
Multiplefunctionsmaybedefinedwiththesamename,aslongasthe
definitionscanbedistinguishedbyunqualifiedparametertypesanddonot
haveanopenprofileconflict(see“OverloadingofFunctionsbyProfile”on
page 226).
Functionmatchingrules:
1. Addallvisiblefunctionswithamatchingnameinthecallingscopeto
thesetoffunctioncandidates.
2. Eliminatefunctionswhoseprofileconflictswiththecurrentcompilation
profile.
3. Eliminatefunctionswiththewrongnumberofformalparameters.Ifa
candidatefunctionhasexcessformalparameters,andeachoftheexcess
parametershasadefaultvalue,donoteliminatethefunction.
4. Ifthesetisempty,fail.
5. Foreachactualparameterexpressioninsequence,performthe
following:
a. Ifthetypeoftheactualparametermatchestheunqualifiedtypeofthe
correspondingformalparameterinanyfunctionintheset,removeall
functionswhosecorrespondingparameterdoesnotmatchexactly.
b. Ifthereisadefinedpromotionforthetypeoftheactualparameterto
theunqualifiedtypeoftheformalparameterofanyfunction,remove
allfunctionsforwhichthisisnottruefromtheset.
c. Ifthereisavalidimplicitcastthatconvertsthetypeoftheactual
parametertotheunqualifiedtypeoftheformalparameterofany
function,removeallfunctionswithoutthiscast.
d. Fail.
6. Chooseafunctionbasedonprofile:
a. Ifthereisatleastonefunctionwithaprofilethatexactlymatchesthe
compilationprofile,discardallfunctionsthatdon’texactlymatch.
b. Otherwise,ifthereisatleastonefunctionwithawildcardprofilethat
matchesthecompilationprofile,determinethe“mostspecific”
matchingwildcardprofileinthecandidateset.Discardallfunctions
exceptthosewiththismostspecificwildcardprofile.How“specific”a
givenwildcardprofilenameisrelativetoaparticularprofileis
determinedbytheprofilespecification.
808-00504-0000-006 241
NVIDIA
Appendix A Cg Language Specification
7. Ifthenumberoffunctionsremaininginthesetisnotone,thenfail.
Global Variables
GlobalvariablesaredeclaredandusedasinC.Uniformnonstaticvariables
mayhaveasemanticassociatedwiththem.Uniformnonstaticvariablesmay
havetheirvaluesetthroughtheruntimeAPI.
Use of Uninitialized Variables
Itisincorrectforaprogramtouseanuninitializedvariable.However,the
compilerisnotobligatedtodetectsucherrors,evenifitwouldbepossibleto
dosobycompiletimedataflowanalysis.Thevalueobtainedfromreading
anuninitializedvariableisundefined.Thissameruleappliestotheimplicit
useofavariablethatoccurswhenitisreturnedbyatoplevelfunction.In
particular,ifatoplevelfunctionreturnsastruct,andsomeelementofthat
structisneverwritten,thenthevalueofthatelementisundefined.
Note: Variables are not defined as being initialized to zero because this would result in a
performance penalty in cases where the compiler is unable to determine if a
variable is properly initialized by the programmer.
Preprocessor
CgprofilesmustsupportthefullANSICstandardpreprocessorcapabilities:
#if,#define,andsoon.However,Cgprofilesarenotrequiredtosupport
macrolike#defineortheuseof#includedirectives.
Overview of Binding Semantics
Instreamprocessingarchitectures,datapacketsflowbetweendifferent
programmableunits.OnaGPU,forexample,packetsofvertexdataflow
fromtheapplicationtothevertexprogram.
Becausepacketsareproducedbyoneprogram(theapplication,inthiscase),
andconsumedbyanother(thevertexprogram),theremustbesomemethod
fordefiningtheinterfacebetweenthetwo.TheapproachusedinCgisto
associateabindingsemanticwitheachelementofthepacket.Thisisabind
bynameapproach.Forexample,anoutputwiththebindingsemanticFOOis
fedtoaninputwiththebindingsemanticFOO.Profilesmayallowtheuserto
definearbitraryidentifiersinthis“semanticnamespace,”ortheymayrestrict
242 808-00504-0000-006
NVIDIA
Cg Language Toolkit
theallowedidentifierstoapredefinedset.Often,thesepredefinednames
correspondtothenamesofhardwareregistersorAPIresources.
Insomecases,predefinednamesmaycontrolnonprogrammablepartsof
thehardware.Forexample,vertexprogramsnormallycomputeaposition
thatisfedtotherasterizer,andthispositionisstoredinanoutputwiththe
bindingsemanticPOSITION.
Foranyprofile,therearetwonamespacesforpredefinedbinding
semantics—thenamespaceusedforinvariablesandthenamespaceusedfor
outvariables.Theprimaryimplicationofhavingtwonamespacesisthatthe
bindingsemanticcannotbeusedtoimplicitlyspecifywhetheravariableis
inorout.
Binding Semantics
Abindingsemanticmaybeassociatedwithaninputtoatoplevelfunction
inoneofthreeways:
Thebindingsemanticisspecifiedintheformalparameterdeclarationfor
thefunction.Thesyntaxforformalparameterstoafunctionis
Iftheformalparameterisastruct,thebindingsemanticmaybe
specifiedwithanelementofthestructwhenthestructisdefined:
Iftheinputtothefunctionisimplicit(anonstaticglobalvariablethatis
readbythefunction),thebindingsemanticmaybespecifiedwhenthe
nonstaticglobalvariableisdeclared:
Ifthenonstaticglobalvariableisastruct,thebindingsemanticmaybe
specifiedwhenthestructisdefined,asdescribedinthesecondbullet
above.
Abindingsemanticmaybeassociatedwiththeoutputofatoplevel
functioninasimilarmanner:
Anothermethodavailableforspecifyingasemanticforanoutputvalue
istoreturnastructandtospecifythebindingsemantic(s)with
[const] [in | out | inout]
<type> <identifier> [ : <binding-semantic>][= <initializer>]
struct <struct-tag> {
<type> <identifier>[ : <binding-semantic>];
/*...*/ };
<type> <identifier>[ : <binding-semantic>][ = <initializer>]
<type> <identifier> ( <parameter-list> )[ : <binding-semantic>]
{ <body> }
808-00504-0000-006 243
NVIDIA
Appendix A Cg Language Specification
elementsofthestructwhenthestructisdefined.Inaddition,ifthe
outputisaformalparameter,thebindingsemanticmaybespecified
usingthesameapproachusedtospecifybindingsemanticsforinputs.
Aliasing of Semantics
Semanticsmusthonoracopyoninputandcopyonoutputmodel.Thus,if
thesameinputbindingsemanticisusedfortwodifferentvariables,those
variablesareinitializedwiththesamevalue,butthevariablesarenotaliased
thereafter.Outputaliasingisillegal,butimplementationsarenotrequiredto
detectit.Ifthecompilerdoesnotissueanerroronaprogramthataliases
outputbindingsemantics,theresultsareundefined.
Restrictions on Semantics Within a Structure
Foraparticularprofile,itisillegaltomixinputbindingsemanticsand
outputbindingsemanticswithinaparticularstruct.Thatis,foraparticular
toplevelfunction,astructmustbeeitherinputonlyoroutputonly.
Likewise,astructmustconsistexclusivelyofuniforminputsorexclusively
ofnonuniforminputs.Itisillegaltousebindingsemanticstomixthetwo
withinasinglestruct.
Additional Details for Binding Semantics
Thefollowingrulesaresomewhatredundant,butprovideextraclarity:
Semanticsnamesarecaseinsensitive.
Semanticsattachedtoparameterstononmainfunctionsareignored.
Inputsemanticsmaybealiasedbymultiplevariables.
Outputsemanticsmaynotbealiased.
How Programs Receive and Return Data
Aprogramisjustanonstaticfunctionthathasbeendesignatedasthemain
entrypointatcompilationtime.Thevaryinginputstotheprogramcome
fromthistoplevelfunction’svaryinginparameters.Theuniforminputsto
theprogramcomefromthetoplevelfunction’suniforminparametersand
fromanynonstaticglobalvariablesthatarereferencedbythetoplevel
functionorbyanyfunctionsthatitcalls.Theoutputoftheprogramcomes
fromthereturnvalueofthefunction(whichisalwaysimplicitlyvarying),
andfromanyoutparameters,whichmustalsobevarying.
Parameterstoaprogramoftypesampler*areimplicitlyconst.
244 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Statements
StatementsareexpressedjustasinC,unlessanexceptionisstatedelsewhere
inthisdocument.Additionally,
Theif,while,andforstatementsrequireboolexpressionsinthe
appropriateplaces.
Assignmentisperformedusing=.Theassignmentoperatorreturnsa
value,justasinC,soassignmentsmaybechained.
Thenewdiscardstatementterminatesexecutionoftheprogramforthe
currentdataelement—suchasthecurrentvertexorcurrentfragment—
andsuppressesitsoutput.Vertexprofilesmaychoosetoomitsupport
fordiscard.
Minimum Requirements for if, while, and for Statements
Theminimumrequirementsareasfollows:
Allprofilesshouldsupportif,butsuchsupportisnotstrictlyrequired
forolderhardware.
Allprofilesshouldsupportforandwhileloopsifthenumberofloop
iterationscanbedeterminedatcompiletime.
“Canbedeterminedatcompiletime”isdefinedasfollows:
Theloopiterationexpressionscanbeevaluatedatcompiletimeby
useofintraproceduralconstantpropagationandfolding,wherethe
variablesthroughwhichconstantvaluesarepropagateddonot
appearaslvalueswithinanykindofcontrolstatement(if,for,or
while)or?:construct.
Profilesmaychoosetosupportmoregeneralconstantpropagation
techniques,butsuchsupportisnotrequired.
Profilesmayoptionallysupportfullygeneralforandwhileloops.
New Vector Operators
Thesenewoperatorsaredefinedforvectortypes:
Vectorconstructionoperator:<typeID>(…)
Thisoperatorbuildsavectorfrommultiplescalarsorshortervectors:
Matrixconstructionoperator:<typeID>(…)
float4(scalar, scalar, scalar, scalar)
float4(float3, scalar)
808-00504-0000-006 245
NVIDIA
Appendix A Cg Language Specification
Thisoperatorbuildsamatrixfrommultiplerows.Eachrowmaybe
specifiedeitherasmultiplescalarsorasanycombinationofscalarsand
vectorswiththeappropriatesize.
Swizzleoperator:(.)
ªAtleastoneswizzlecharactermustfollowtheoperator.
ªTherearetwosetsofswizzlecharactersandtheymaynotbemixed.
Setoneisxyzw = 0123,andsettwoisrgba = 0123.
ªThevectorswizzleoperatormayonlybeappliedtovectorsorto
scalars.
ªApplyingthevectorswizzleoperatortoascalargivesthesame
resultasapplyingtheoperatortoavectoroflengthone.
Thus,myscalar.xxxandfloat3(myscalar,myscalar,myscalar)
yieldthesamevalue.
ªIfonlyoneswizzlecharacterisspecified,theresultisascalar,nota
vectoroflengthone.Therefore,theexpressionb.yreturnsascalar.
ªCareisrequiredwhenswizzlingaconstantscalarbecauseof
ambiguityintheuseofthedecimalpointcharacter.Forexample,to
createathreevectorfromascalar,useoneofthefollowing:
ªThesizeofthereturnedvectorisdeterminedbythenumberof
swizzlecharacters.Therefore,thesizeoftheresultmaybelargeror
smallerthanthesizeoftheoriginalvector.
Forexample,float2(0,1).xxyyandfloat4(0,0,1,1)yieldthe
sameresult.
Matrixswizzleoperator:
Foranymatrixtypeoftheform<type><rows>x<columns>,thenotation
canbeusedtoaccessindividualmatrixelements(inthecaseofonlyone
<row><col> pair)ortoconstructvectorsfromelementsofamatrix(in
thecaseofmorethanone<row><col> pair).Therowandcolumn
numbersarezerobased.
float3x3(1, 2, 3, 4, 5, 6, 7, 8, 9)
float3x3(float3, float3, float3)
float3x3(1, float2, float3, float3, 1, 1, 1)
a = b.xxyz; // A swizzle operator example
(1).xxx or 1..xxx or 1.0.xxx or 1.0f.xxx
<matrixObject>._m<row><col>[_m<row><col>][…]
246 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Forexample,
ªForcompatibilitywiththe D3DMatrixdatatype,Cgalsoallowsone
basedswizzles,usingaformwiththe momittedafterthe _symbol:
Inthisform,theindexesfor<row>and<col>areonebased,rather
thantheCstandardzerobased.So,thetwoformsarefunctionally
equivalent:
Becauseoftheconfusionthatcanbecausedbytheonebased
indexing,useofthelatternotationisstronglydiscouraged.
ªThematrixswizzlesmayonlybeappliedtomatrices.Whenmultiple
componentsareextractedfromamatrixusingaswizzle,theresultis
anappropriatelysizedvector.Whenaswizzleisusedtoextracta
singlecomponentfromamatrix,theresultisascalar.
Thewritemaskoperator:(.)
Itcanonlybeappliedtoanlvaluethatisavector.Itallowsassignmentto
particularelementsofavectorormatrix,leavingotherelements
unchanged.Theonlyrestrictionisthatacomponentcannotberepeated.
Arithmetic Precision and Range
SomehardwaremaynotconformexactlytoIEEEarithmeticrules.Fixed
pointdatatypesdonothaveIEEEdefinedrules.
Optimizationsareallowedtoproduceslightlydifferentresultsthan
unoptimizedcode.Constantfoldingmustbedonewithapproximatelythe
float4x4 myMatrix;
float myFloatScalar;
float4 myFloatVec4;
// Set myFloatScalar to myMatrix[3][2].
myFloatScalar = myMatrix._m32;
// Assign the main diagonal of myMatrix to myFloatVec4.
myFloatVec4 = myMatrix._m00_m11_m22_m33;
<matrixObject>._<row><col>[_<row><col>][…]
float4x4 myMatrix;
float4 myVec;
// These two statements are functionally equivalent:
myVec = myMatrix._m00_m23_m11_m31;
myVec = myMatrix._11_34_22_42;
808-00504-0000-006 247
NVIDIA
Appendix A Cg Language Specification
correctprecisionandrange,butisnotrequiredtoproducebitexactresults.It
isrecommendedthatcompilersprovideanoptioneithertoforbidthese
optimizationsortoguaranteethattheyaremadeinbitexactfashion.
Operator Precedence
CgusesthesameoperatorprecedenceasCforoperatorsthatarecommon
betweenthetwolanguages.
Theswizzleandwritemaskoperators(.)havethesameprecedenceasthe
structurememberoperator(.)andthearrayindexoperator([]).
Operator Enhancements
ThestandardCarithmeticoperators(+,-,*,/,%,unary-)areextendedto
supportvectorsandmatrices.Sizesofvectorsandmatricesmustbe
appropriatelymatched,accordingtostandardmathematicalrules.Scalarto
vectorpromotion(see“SmearingofScalarstoVectorsonpage 237)allows
relaxationoftheserules.
Table 10. Expanded Operators
Operator Description
M[n][m] Matrix with n rows and m columns
V[n] Vector with n elements
-V[n] -> V[n] Unary vector negate
-M[n] -> M[n] Unary matrix negate
V[n] * V[n] -> V[n] Componentwise *
V[n] / V[n] -> V[n] Componentwise /
V[n] % V[n] -> V[n] Componentwise %
V[n] + V[n] -> V[n] Componentwise +
V[n] - V[n] -> V[n] Componentwise -
M[n][m] * M[n][m] -> M[n][m] Componentwise *
M[n][m] / M[n][m] -> M[n][m] Componentwise /
M[n][m] % M[n][m] -> M[n][m] Componentwise %
M[n][m] + M[n][m] -> M[n][m] Componentwise +
M[n][m] - M[n][m] -> M[n][m] Componentwise -
248 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Operators
Boolean
&& || !
Booleanoperatorsmaybeappliedtoboolpackedboolvectors,inwhich
casetheyareappliedinelementwisefashiontoproducearesultvectorofthe
samesize.Eachoperandmustbeaboolvectorofthesamesize.
Bothsidesof&&and||arealwaysevaluated;thereisnoshortcircuitingas
thereisinC.
Comparisons
< > <= >= != ==
Comparisonoperatorsmaybeappliedtonumericvectors.Bothoperands
mustbevectorsofthesamesize.Thecomparisonoperationisperformedin
elementwisefashiontoproduceaboolvectorofthesamesize.
Comparisonoperatorsmayalsobeappliedtoboolvectors.Forthepurpose
ofrelationalcomparisons,trueistreatedasoneandfalseistreatedaszero.
Thecomparisonoperationisperformedinelementwisefashiontoproducea
boolvectorofthesamesize.
Comparisonoperatorsmayalsobeappliedtonumericorboolscalars.
Arithmetic
+ - * / % ++ -- unary- unary+
Thearithmeticoperator % istheremainderoperator,asinC.Itmayonlybe
appliedtotwooperandsofcintorinttype.
When/or% isusedwithcintorint operands,Crulesforinteger/and%
apply.
TheCoperatorsthatcombineassignmentwitharithmeticoperations(such
as+=)arealsosupportedwhenthecorrespondingarithmeticoperatoris
supportedbyCg.
Conditional Operator
?:
Ifthefirstoperandisoftypebool,oneofthefollowingstatementsmusthold
forthesecondandthirdoperands:
Bothoperandshavecompatiblestructuretypes.
808-00504-0000-006 249
NVIDIA
Appendix A Cg Language Specification
Bothoperandsarescalarswithnumericorbooltype.
Bothoperandsarevectorswithnumericorbooltype,wherethetwo
vectorsareofthesamesize,whichislessthanorequaltofour.
Ifthefirstoperandisapackedvectorofbool,thentheconditionalselection
isperformedonanelementwisebasis.Boththesecondandthirdoperands
mustbenumericvectorsofthesamesizeasthefirstoperand.
UnlikeC,sideeffectsintheexpressionsinthesecondandthirdoperandsare
alwaysexecuted,regardlessofthecondition.
Miscellaneous Operators
(typecast) ,
CgsupportsC’stypecastandcommaoperators.
Reserved Words
ThefollowingarethereservedwordsinCg:
asm* asm_fragment auto
bool break case
catch char class
column major compile const
const_cast continue decl*
default delete discard
do double dword*
dynamic_cast else emit
enum explicit extern
false fixed float*
for friend get
goto half if
in inline inout
int interface long
matrix* mutable namespace
new operator out
packed pass* pixelfragment*
pixelshader* private protected
public register reinterpret_cast
return row major sampler
sampler_state sampler1D sampler2D
sampler3D samplerCUBE shared
short signed sizeof
static static_cast string*
struct switch technique*
template texture* texture1D
250 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Cg Standard Library Functions
Cgprovidesasetofbuiltinfunctionsandpredefinedstructureswith
bindingsemanticstosimplifyGPUprogramming.Thesefunctionsare
discussedin“CgStandardLibraryFunctions”onpage 33.
Vertex Program Profiles
AfewfeaturesoftheCglanguagethatarespecifictovertexprogramprofiles
arerequiredtobeimplementedinthesamemannerforallvertexprogram
profiles.
Mandatory Computation of Position Output
Vertexprogramprofilesmay(andtypicallydo)requirethattheprogram
computeapositionoutput.Thishomogeneousclipspacepositionisusedby
thehardwarerasterizerandmustbestoredinaprogramoutputwithan
outputbindingsemanticofPOSITION(orHPOSforbackwardcompatibility).
Position Invariance
InmanygraphicsAPIs,theusercanchoosebetweentwodifferent
approachestospecifyingpervertexcomputations:useabuiltin
configurablefixedfunctionpipelineorspecifyauserwrittenvertexprogram.
Iftheuserwishestomixthesetwoapproaches,itissometimesdesirableto
guaranteethatthepositioncomputedbythefirstapproachisbitidenticalto
thepositioncomputedbythesecondapproach.Thispositioninvarianceis
particularlyimportantformultipassrendering.
SupportforpositioninvarianceisoptionalinCgvertexprofiles,butforthose
vertexprofilesthatsupportit,thefollowingrulesapply:
Positioninvariancewithrespecttothefixedfunctionpipelineis
guaranteediftwoconditionsaremet:
texture2D texture3D textureCUBE
textureRECT this throw
true try typedef
typeid typename uniform
union unsigned using
vector* vertexfragment* vertexshader*
virtual void volatile
while __identifier(twounderscoresbeforeidentifier)
808-00504-0000-006 251
NVIDIA
Appendix A Cg Language Specification
ªThevertexprogramiscompiledusingacompileroptionindicating
positioninvariance(-posinv,forexample).
ªThevertexprogramcomputespositionasfollows:
where
OUT_POSITION isavariable(orstructureelement)oftypefloat4
withanoutputbindingsemanticofPOSITIONorHPOS.
IN_POSITION isavariable(orstructureelement)oftypefloat4
withaninputbindingsemanticofPOSITION.
MVP isauniformvariable(orstructureelement)oftypefloat4x4
withaninputbindingsemanticthatcausesittotrackthefixed
functionmodelviewprojectionmatrix.(Thenameofthisbinding
semanticiscurrentlyprofilespecific—forOpenGLprofiles,the
semantic _GL_MVPisrecommended).
Ifthefirstconditionismetbutnotthesecond,thecompileris
encouragedtoissueawarning.
Implementationsmaychoosetorecognizemoregeneralversionsofthe
secondcondition(suchasthevariablesbeingcopypropagatedfromthe
originalinputsandoutputs),butthisadditionalgeneralityisnot
required.
Binding Semantics for Outputs
AsshowninTable 11.,therearetwooutputbindingsemanticsforvertex
programprofiles:
Profilesmaydefineadditionaloutputbindingsemanticswithspecific
behaviors,andthesedefinitionsareexpectedtobeconsistentacross
commonlyusedprofiles.
OUT_POSITION = mul(MVP, IN_POSITION)
Table 11. Vertex Output Binding Semantics
Name Meaning Type Default Value
POSITION Homogeneous clip-space position;
fed to rasterizer. float4 Undefined
PSIZE Point size float Undefined
252 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Fragment Program Profiles
AfewfeaturesoftheCglanguagethatarespecifictofragmentprogram
profilesarerequiredtobeimplementedinthesamemannerforallfragment
programprofiles.
Binding Semantics for Outputs
AsshowninTable 12.,therearethreeoutputbindingsemanticsforfragment
programprofiles.Profilesmaydefineadditionaloutputbindingsemantics
withspecificbehaviors,andthesedefinitionsareexpectedtobeconsistent
acrosscommonlyusedprofiles.
Ifaprogramdesiresanoutputcoloralphaof1.0,itshouldexplicitlywritea
valueof1.0totheWcomponentoftheCOLORoutput.Thelanguagedoesnot
defineadefaultvalueforthisoutput.
Note: If the target hardware uses a default value for this output, the compiler may
choose to optimize away an explicit write specified by the user if it matches the
default hardware value. Such defaults are not exposed in the language.
Incontrast,thelanguagedoesdefineadefaultvaluefortheDEPTHoutput.
Thisdefaultvalueistheinterpolateddepthobtainedfromtherasterizer.
Semantically,thisdefaultvalueiscopiedtotheoutputatthebeginningofthe
executionofthefragmentprogram.
Note: Although the DEPTH output is assigned a default value, as with all outputs its
value cannot be read in a Cg program.
Table 12. Fragment Output Binding Semantics
Name Meaning Type Default Value
COLOR RGBA output color float4 Undefined
COLOR0 Same as COLOR ——
DEPTH Fragment depth value
(in range [0,1]) float Interpolated depth from rasterizer
(in range [0,1])
808-00504-0000-006 253
NVIDIA
Appendix A Cg Language Specification
Asdiscussedearlier,whenabindingsemanticisappliedtoanoutput,the
typeoftheoutputvariableisnotrequiredtomatchthetypeofthebinding
semantic.Forexample,thefollowingislegal,althoughnotrecommended:
Insuchcases,thevariableisimplicitlycopied(withatypecast)tothe
semanticuponprogramcompletion.Ifthevariable’svectorsizeisshorter
thanthesemantic’svectorsize,thelargernumberedcomponentsofthe
semanticreceivetheirdefaultvalues,ifapplicable,andotherwiseare
undefined.Inthecaseabove,theRandG componentsoftheoutputcolorare
obtainedfrommycolor,whiletheBandAcomponentsofthecolorare
undefined.
struct myfragoutput {
float2 mycolor : COLOR;
}
254 808-00504-0000-006
NVIDIA
Cg Language Toolkit
808-00504-0000-006 255
NVIDIA
Appendix B
Language Profiles
Thisappendixdescribesthelanguagecapabilitiesthatareavailableineach
ofthefollowingprofilessupportedbytheCgcompiler:
OpenGLARBVertexProgramProfile(arbvp1)
OpenGLARBFragmentProgramProfile(arbfp1)
OpenGLNV_vertex_program3.0Profile(vp40)
OpenGLNV_fragment_program2.0Profile(fp40)
OpenGLNV_vertex_program2.0Profile(vp30)
OpenGLNV_fragment_programProfile(fp30)
OpenGLNV_vertex_program1.0Profile(vp20)
OpenGLNV_texture_shaderandNV_register_combinersProfile(fp20)
DirectXVertexShader2.xProfiles(vs_2_*)
DirectXPixelShader2.xProfiles(ps_2_*)
DirectXVertexShader1.1Profile(vs_1_1)
DirectXPixelShader1.xProfiles(ps_1_*)
Ineachcase,thecapabilitiesareasubsetofthefullcapabilitiesdescribedby
theCglanguagespecificationin“CgLanguageSpecification”onpage 221.
256 808-00504-0000-006
NVIDIA
Cg Language Toolkit
OpenGL ARB Vertex Program Profile (arbvp1)
TheOpenGLARBVertexProgramProfileisusedtocompileCgsourcecode
tovertexprogramscompatiblewithversion1.0ofthe
GL_ARB_vertex_programextension.
Profile name:arbvp1
How to invoke:Usethecompileroption-profile arbvp1.
ThissectiondescribesthecapabilitiesandrestrictionsofCgwhenusingthe
arbvp1profile.
Overview
Thearbvp1profileissimilartothevp20profileexceptfortheformatof
itsoutputanditscapabilityofaccessingOpenGLstateeasily.
ARB_vertex_programhasthesamecapabilitiesasNV_vertex_program
andDirectX8vertexshaders,sothelimitationsthatthisprofileplaceson
theCgsourcecodewrittenbytheprogrammeristhesameasthe
NV_vertex_program1profile.
Accessing OpenGL State
Thearbvp1profileallowsCgprogramstorefertotheOpenGLstatedirectly,
unlikethevp20profile.However,ifyouwanttowriteCgprogramsthatare
compatiblewithvp20,vp30,anddx8vsprofiles,youshouldusethealternate
mechanismofsettinguniformvariableswiththenecessarystateusingtheCg
runtime.ThecompilerreliesonthefeatureofARBvertexassembly
programsthatenablespartsoftheOpenGLstatetobewrittenautomatically
toprogramparameterregistersasthestatechanges.TheOpenGLdriver
handlesthisstatetrackingfeature.
Aspecialvariablesemanticcalledstatecanbeusedtorefertoeverypartof
theOpenGLstatethatARBvertexprogramscanreference.Followingthis
paragrapharethreelistsofthestatefieldsthatcanbeaccessed.Thearray
indexesareshownas0,butanarraycanbeaccessedusinganypositive
integerthatislessthanthelimitofthearray.Forexample,thediffuse
componentofthesecondlightwouldbeaccessedbyusingthesemantic
1. See“OpenGLNV_vertex_program1.0Profile(vp20)”onpage 279forafullexplanation
ofthedatatypes,statements,andoperatorssupportedbythisprofile.
808-00504-0000-006 257
NVIDIA
Appendix B Language Profiles
state.light[1].diffuse,assumingthatGL_MAX_LIGHTSisatleast2,as
showninthefollowingcode:
Thestatesemanticsoftypefloat4x4thatcanbeaccessedareinTable 13.
Accessiblestatesemanticsoftypefloat4arelistedinTable 14.
void main( uniform float4 lightColor : state.light[1].diffuse,
… )
Table 13. float4x4 state Semantics
state.matrix.modelview[0] state.matrix.projection
state.matrix.mvp state.matrix.texture[0]
state.matrix.palette[0] state.matrix.program[0]
state.matrix.inverse.modelview[0] state.matrix.inverse.projection
state.matrix.inverse.mvp state.matrix.inverse.texture[0]
state.matrix.inverse.palette[0] state.matrix.inverse.program[0]
state.matrix.transpose.modelview[0] state.matrix.transpose.projection
state.matrix.transpose.mvp state.matrix.transpose.texture[0]
state.matrix.transpose.palette[0] state.matrix.transpose.program[0]
state.matrix.invtrans.modelview[0] state.matrix.invtrans.projection
state.matrix.invtrans.mvp state.matrix.invtrans.texture[0]
state.matrix.invtrans.palette[0] state.matrix.invtrans.program[0]
Table 14. float4 state Semantics
state.material.ambient state.material.diffuse
state.material.specular state.material.emission
state.material.shininess state.material.front.ambient
state.material.front.diffuse state.material.front.specular
state.material.front.emission state.material.front.shininess
state.material.back.ambient state.material.back.diffuse
state.material.back.specular state.material.back.emission
258 808-00504-0000-006
NVIDIA
Cg Language Toolkit
ThestatesemanticsoftypefloatthatcanbeaccessedarelistedinTable 15.
Position Invariance
Thearbvp1profilesupportspositioninvariance,asdescribedinthecore
languagespecification.
Themodelviewprojectionmatrixisnotspecifiedusingabinding
semanticof_GL_MVP.
Data Types
Thisprofileimplementsdatatypesasfollows:
state.material.back.shininess state.light[0].ambient
state.light[0].diffuse state.light[0].specular
state.light[0].position state.light[0].attenuation
state.light[0].spot.direction state.light[0].half
state.lightmodel.ambient state.lightmodel.scenecolor
state.lightmodel.front.scenecolor state.lightmodel.back.scenecolor
state.lightprod[0].ambient state.lightprod[0].diffuse
state.lightprod[0].specular state.lightprod[0].front.ambient
state.lightprod[0].front.diffuse state.lightprod[0].front.specular
state.lightprod[0].back.ambient state.lightprod[0].back.diffuse
state.lightprod[0].back.specular state.texgen[0].eye.s
state.texgen[0].eye.t state.texgen[0].eye.r
state.texgen[0].eye.q state.texgen[0].object.s
state.texgen[0].object.t state.texgen[0].object.r
state.texgen[0].object.q state.fog.color
state.fog.params state.clip[0].plane
Table 14. float4 state Semantics (continued)
Table 15. float state Semantics
state.point.size state.point.attenuation
808-00504-0000-006 259
NVIDIA
Appendix B Language Profiles
floatdatatypeisimplementedasdefinedintheARB_vertex_program
specification.
halfdatatypeisimplementedasfloat.
fixedorsampler*datatypesarenotsupported,buttheprofiledoes
providetheminimalpartialsupportthatisrequiredforthesedatatypes
bythecorelanguagespecification—thatis,itislegaltodeclarevariables
usingthesetypesaslongasnooperationsareperformedonthe
variables.
Compatibility with the vp20 Vertex Program Profile
Programsthatworkwiththevp20profilearecompatiblewiththearbvp1
profileaslongastheyusetheCgruntimetomanagealluniformparameters,
includingOpenGLstate.Thatis,arbvp1andvp20profilescanbeused
interchangeablywithoutchangingtheCgsourcecodeortheapplication
programexceptforspecifyingadifferentprofile.However,ifanyofthe
glProgramParameterxxNV()routinesareusedtheapplicationprogram
needstobechangedtousethecorrespondingARBfunctions.
SincethereisnoARBfunctioncorrespondingtoglTrackMatrixNV(),an
applicationusingglTrackMatrixNV()andthearbvp1profileneedstobe
modified.OnesolutionistochangetheCgsourcecodetorefertothematrix
usingthestatestructuresothatthematrixisautomaticallytrackedbythe
OpenGLdriveraspartofitsGL_ARB_vertexsupport.Anothersolutionisfor
theapplicationtousetheCgruntimeroutine
cgGLSetStateMatrixParameter()toloadtheappropriatematrixor
matriceswhennecessary.
Anotherpotentialincompatibilitybetweenthearbvp1andvp20profilesis
thewaythatinputvaryingsemanticsarehandled.Inthevp20profile,
semanticnamessuchasPOSITIONandATTR0arealiasesofeachotherthe
samewayNV_vertex_programaliasesVertexandAttribute0(seeTable 30,
vp20 VaryingInputBindingSemantics,”onpage 281).Inthearbvp1
profile,thesemanticnamesarenotaliasedbecauseARB_vertex_program
allowstheconventionalattributes(suchasvertexposition)tobeseparate
fromthegenericattributes(suchasAttribute0).Forthisreasonitis
importanttofollowtheconventionsgiveninTable 17,arbvp1Varying
InputBindingSemantics,”onpage 261sothatarbvp1programsworkforall
implementationsofARB_vertex_program.Thearbvp1conventionsare
compatiblewiththevp20andvp30profiles.
260 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Loading Constants
ApplicationsthatdonotusetheCgruntimearenolongerrequiredtoload
constantvaluesintoprogramparametersregistersasindicatedbythe
#constexpressionsintheCgcompileroutput.Thecompilerproduces
outputthatcausestheOpenGLdrivertoloadthem.However,uniform
variablesthathaveadefaultdefinitionstillrequireconstantvaluestobe
loadedintotheappropriateprogramparameterregisters,asARBvertex
programsdonotsupportthisfeature.Applicationprogramseitherhaveto
usetheCgruntime,parse,andhandlethe#defaultcommands,orhaveto
avoidinitializinguniformvariablesintheCgsourcecode.
Bindings
Binding Semantics for Uniform Data
Thevalidbindingsemanticsforuniformparametersinthearbvp1profileare
summarizedinTable 16.
Binding Semantics for Varying Input/Output Data
Thevalidbindingsemanticsforuniformparametersinthearbvp1profileare
summarizedinTable 17.
Thesetofbindingsemanticsforvaryinginputdatatoarbvp1consistsof
POSITION,BLENDWEIGHT,NORMAL,COLOR0,COLOR1,TESSFACTOR,PSIZE,
BLENDINDICES,andTEXCOORD0–TEXCOORD7.OnecanalsouseTANGENTand
BINORMALinsteadofTEXCOORD6andTEXCOORD7.Additionally,asetof
genericbindingsemanticsofATTR0–ATTR15canbeusedInOpenGL
implementations,conventionalandgenericvertexattributesmayormaynot
bealiasesforeachother;seetheARB_vertex_programspecificationformore
Table 16. arbvp1 Uniform Input Binding Semantics
Binding Semantics Name Corresponding Data
register(c0)–register(c255)
C0–C255 Local parameter with index n, n = [0..255].
The aliases c0–c255 (lowercase) are also
accepted.
If used with a variable that requires more
than one constant register (for example, a
matrix), the semantic specifies the first local
parameter that is used.
808-00504-0000-006 261
NVIDIA
Appendix B Language Profiles
details.Themappingofthesesemanticstocorrespondingsettingcommand
islistedinthetable.
Thevalidbindingsemanticsforvaryingoutputparametersinthearbvp1
profilearefoundinTable 18.Thesebindingsemanticsmapto
ARB_vertex_programoutputregisters.Thetwosetsactasaliasestoeach
other.
Table 17. arbvp1 Varying Input Binding Semantics
Binding Semantics Name Corresponding Data
POSITION Input Vertex, through Vertex command
BLENDWEIGHT Input vertex weight through WeightARB,
VertexWeightEXT command
NORMAL Input normal through Normal command
COLOR0, DIFFUSE Input primary color through Color command
COLOR1, SPECULAR Input secondary color through
SecondaryColorEXT command
FOGCOORD Input fog coordinate through FogCoordEXT
command
TEXCOORD0-TEXCOORD7 Input texture coordinates (texcoord0-
texcoord7) through MultiTexCoord command
ATTR0-ATTR15 Generic Attribute 0-15 through VertexAttrib
command
PSIZE, ATTR6 Generic Attribute 6
Table 18. arbvp1 Varying Output Binding Semantics
Binding Semantics Name Corresponding Data
POSITION, HPOS Output position
PSIZE, PSIZ Output point size
FOG, FOGC Output fog coordinate
COLOR0, COL0 Output primary color
COLOR1, COL1 Output secondary color
BCOL0 Output backface primary color
262 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Note: The application must call glEnable(GL_COLOR_SUM_ARB) in order to
enable COLOR1 output when using the arbvp1 profile.
TheprofilealsoallowsWPOStobepresentasbindingsemanticsonamember
ofastructureofavaryingoutputdatastructure,providedthememberwith
thisbindingsemanticsisnotreferenced.ThisallowsCgprogramstohave
thesamestructurespecifythevaryingoutputofanarbvp1profileprogram
andthevaryinginputofanfp30profileprogram.
Options
Thearbvp1profilesupportsthefollowingprofilespecificoptions:
BCOL1 Output backface secondary color
TEXCOORD0-TEXCOORD7, TEX0-TEX7 Output texture coordinates
Table 18. arbvp1 Varying Output Binding Semantics (continued)
Binding Semantics Name Corresponding Data
NumTemps=<n> (where1<=n<=32;default32)
MaxAddressRegs=<n> (where1<=n<=8;default1)
MaxInstructions=<n>(where16<=n<=4096;default1024)
MaxLocalParams=<n> (where16<=n<=256;default96)
808-00504-0000-006 263
NVIDIA
Appendix B Language Profiles
OpenGL ARB Fragment Program Profile (arbfp1)
TheOpenGLARBFragmentProgramProfileisusedtocompileCgsource
codetofragmentprogramscompatiblewithversion1.0ofthe
GL_ARB_fragment_programOpenGLextension.2
Profile name:arbfp1
How to invoke:Usethecompileroption-profile arbfp1.
Thearbfp1profilelimitsCgtomatchthecapabilitiesofOpenGLARB
fragmentprograms.Thissectiondescribesthecapabilitiesandrestrictionsof
Cgwhenusingthearbfp1profile.
Accessing OpenGL State
Thearbfp1profilesupportsaccesstoOpenGLstatewiththesamesetof
statesemanticsprovidedbythearbvp1profile.SeeAccessingOpenGL
State”onpage 256formoreinformationaboutthisfeature.
MRT Support
Thisprofilesupportsmultiplerendertargets(MRTs).WhenMRTsareused,
uptothreeadditionalfourcomponentoutputsmaybewritteninadditionto
theCOLORandDEPTHoutputssupportedinotherprofiles.Thesenewoutputs
areavailableviatheoutputsemanticsCOLOR1throughCOLOR3.
TheuseofMRTsisanoptionalfeatureoftheARB_fragment_programand
theDirectXPixelShader2specifications;consequently,notallhardwarethat
supportstheseprofilessupportsMRTs.TheMaxDrawBuffersprofileoption
maybeusedtoexplicitlysetthenumberofdrawbuffers(thatis,render
targets)availableonthetargethardware.Iftheinputprogramrequiresmore
thanthespecifiednumberofdrawbuffers,compilationfails.
IftheMaxDrawBuffersprofileoptionisnotspecified,thestandaloneCg
compiler,cgc,assumesthatthetargethardwaresupportsMRTstowhatever
extentrequiredbytheinputprogram.
WhencompilingprogramsusingtheCgruntime,besuretocall
cgGLSetOptimalOptions()underOpenGL,orcall
cgD3D9GetOptimalOptions()underDirect3D.Thesefunctionsallowyouto
2. TounderstandthecapabilitiesofOpenGLARBfragmentprogramsandthecode
producedbythecompiler,refertotheARBfragmentprogramextensionintheOpenGL
Extensionsdocumentation.
264 808-00504-0000-006
NVIDIA
Cg Language Toolkit
automaticallydeterminethevaluefortheMaxDrawBuffersprofileoption
thatisappropriateforthegraphicshardwareonthetargetmachine.
Resource Limits
TheARB_fragment_profilespecificationsallowsanOpenGL
implementationtoplacelimitsonthenumbersandtypesofresourcesthata
fragmentprogrammayuse.Iftheseresourcelimitsmustbeexceededto
compileaCgprogram,thecompilationwillfail.Resourcesthatmaybe
limitedincludethenumberofinstructions,thenumberofregisters,andthe
numberofdependenttexturereads.
Thearbfp1profilesupportsanumberofoptionsthatallowtheselimitstobe
specifiedonthecompilercommandline;see“Options”onpage 262for
details.Theselimitsmayalsobevaluesappropriateforthehostcomputerʹs
GPU,whicharesetusingthecgGLSetOptimalOptions()Cgruntimecall.
Language Constructs and Support
Data Types
Thisprofileimplementsdatatypesasfollows:
floatdatatypeisimplementedasIEEE32bitsingleprecision.
half,fixed,anddoubledatatypesaretreatedasfloat.
intdatatypeissupportedusingfloatingpointoperations.
sampler*typesaresupportedtospecifysamplerobjectsusedfortexture
fetches.
Statements and Operators
WiththeARBfragmentprogramprofileswhile,do,andforstatementsare
allowedonlyiftheloopstheydefinecanbeunrolledbecausethereisno
dynamicbranchinginARBfragmentprogram1.
Comparisonoperatorsareallowed(>,<,>=,<=,==,!=)andBoolean
operators(||,&&,?:)areallowed.However,thelogicoperators(&,|,^,~)are
not.
Using Arrays and Structures
Variableindexingofarraysisnotallowed.Arrayandstructuredataisnot
packed.
808-00504-0000-006 265
NVIDIA
Appendix B Language Profiles
Bindings
Binding Semantics for Uniform Data
Thevalidbindingsemanticsforuniformparametersinthearbfp1profileare
foundinTable 19.
Binding Semantics for Varying Input/Output Data
The valid binding semantics for varying input parameters in the arbfp1 pro-
file aresummarizedinTable 20.
Thevalidbindingsemanticsforvaryingoutputparametersinthearbfp1
profilearesummarizedinTable 21.
Table 19. arbfp1 Uniform Input Binding Semantics
Binding Semantics Name Corresponding Data
register(s0)—register(s15)
TEXUNIT0-TEXUNIT15 Texunit image unit N, where N is in range
[0..15]
May only be used with uniform inputs with
sampler* types.
register(c0)-register(c31)
C0–C31 Local Parameter N, where N is in range
[0..31]
May only be used with uniform inputs.
Table 20. arbfp1 Varying Input Binding Semantics
Binding Semantics Name Corresponding Data (type)
COLOR0 Input color 0 (float4)
COLOR1 Input color 1 (float4)
TEXCOORD0-TEXCOORD7 Input texture coordinates (float4)
Table 21. arbfp1 Varying Output Binding Semantics
Binding Semantics Name Corresponding Data
COLOR, COLOR0 Output color (float4)
DEPTH Output depth (float)
266 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Options
TheARBfragmentprogramprofileallowsthefollowingprofilespecific
options:
NumTemps=<n> (where0<=n<=32;default32)
NumInstructionSlots=<n> (wheren>=0;default1024)
NumMathInstructionSlots=<n>(wheren>=0;default1024)
NoDependentReadLimit=<b> (whereb=0or1;default1)
NumTexInstructionSlots=<n>(wheren>=0;default1024)
MaxTexIndirections=<n> (wheren>=1;defaultinfinite)
NumDrawBuffers=<n> (where1<=n<=4;default1)
808-00504-0000-006 267
NVIDIA
Appendix B Language Profiles
OpenGL NV_vertex_program 3.0 Profile (vp40)
Thevp40profileisanextendedversionofthearbvp1profile.Ithasallofthe
capabilitiesofarbvp1andtheaddedcapabilitydescribedinthissection.
Vertex Texturing
Thevp40profilesupportsaccessingtexturemapsinprograms.Texturesare
availableviatheusualsampler*typesandthetex*()standardlibrarycalls.
268 808-00504-0000-006
NVIDIA
Cg Language Toolkit
OpenGL NV_fragment_program 2.0 Profile (fp40)
Thefp40profileisanextendedversionofthearpfp1profile.Ithasallofthe
capabilitiesofarbfp1aswellastheaddedcapabilitiesdescribedinthis
section.
Branching
Thebranchingsupportinfp40allowssomeifstatementsandlooping
constructstobeimplementedwithbranching.Inprofilessuchasfp30,
conditionalexecutionofcodewasalwaysimplementedwithpredicated
instructions,andloopswerealwaysunrolled.
IntheGeForce6800GPU,thereisacostassociatedwithexecutingabranch
inthefragmentshadingengine.Assuch,itispossiblethatthecostofthe
branchwilloutweighthesavingsfromskippingoverablockof
conditionallyexecutedcodeorofexecutinganunrolledloop.(Pleasereferto
theNVIDIAdeveloperWebsiteformoreinformationabouttheperformance
ofthisandotherNVIDIAGPUs.)Thefp40profile,therefore,providestwo
optionstocontrolwhetherthecompilershouldemitbranchesor
conditionallyexecutedcodefortheifstatementsandloopswithinCg
shaders.TheoptionsaredescribedinTable 22.
808-00504-0000-006 269
NVIDIA
Appendix B Language Profiles
Settingboth-ifcvtand-unrolltoallyieldsbehaviorsimilartothefp30
profile,forwhichbranchinstructionsarenotavailable.Using-ifcvt=none
placestheburdenontheCgfragmentprogramauthortouseifstatements
wheretheywanttruebranchesandtouseconditionalexpressionsotherwise.
FACE Semantic
TheFACEsemanticcanbeappliedtoavaryingparametertoaprogram.The
valueofsuchaparameterhasavaluelessthanzeroifthefragmentbeing
renderedisbackfacing,greaterthanzeroifitisfrontfacing,andzeroifthe
fragmentwasfromalineorapoint.
Table 22. fp40 Compiler Branching Options
Compiler Option Description
-ifcvt (all | none | count=N)Changestheifconversionmode
basedontheoptionselected:
all
Allifstatementsareconverted
toconditionalwrites.
none
Allifstatementsgenerate
branchingcode.
count=N
Setsif_limit_costtoN
operations.
-unroll (all | none | count=N)Changestheloopunrollingmode
basedontheoptionselected:
all
Allloopstatementsthatcanbe
unrolledwillbe.
none
Allloopstatementsthatcanbe
implementedwithbranching
willbe.
count=N
Setsloop_limit_costtoNʹ
operations.
270 808-00504-0000-006
NVIDIA
Cg Language Toolkit
OpenGL NV_vertex_program 2.0 Profile (vp30)
Thevp30VertexProgramprofileisusedtocompileCgsourcecodetovertex
programsforusebytheNV_vertex_program2OpenGLextension.
Profile name:vp30
How to invoke:Usethecompileroption-profile vp30.
Thevp30profilelimitsCgtomatchthecapabilitiesofthe
NV_vertex_program2extension.Thissectiondescribesthecapabilitiesand
restrictionsofCgwhenusingthevp30profile.
Position Invariance
Undervp30,unlikeotherprofiles,thefollowingpointscanbemade:
The-posinvoptionwon’tcauseanOPTIONdriverdirectivetobeadded
totheassemblycodeheader(seetheOpenGLspecificationformore
detailsonthisdirective).
Theinstructionsfortransformingthepositionusingthemodelview
projectionmatrixareemitted.
Theyaretruebecausethefinalassemblycodeitselfguaranteesthatthe
positioncalculationisinvariantcomparedtothefixedpipelinecalculation.
Language Constructs
Data Types
Thisprofileimplementsdatatypesasfollows:
floatdatatypeisimplementedasIEEE32bitsingleprecision.
halfdatatypeisimplementedasfloat.
intdatatypeissupportedusingfloatingpointoperations,whichadds
extrainstructionsforpropertruncationfordivides,modulos,andcasts
fromfloatingpointtypes.
fixedorsampler*datatypesarenotsupported,buttheprofiledoes
providetheminimalpartialsupportthatisrequiredforthesedatatypes
bythecorelanguagespecification—thatis,itislegaltodeclarevariables
usingthesetypes,aslongasnooperationsareperformedonthe
variables.
808-00504-0000-006 271
NVIDIA
Appendix B Language Profiles
Statements and Operators
Thisprofileisasupersetofthevp20profile.Anyprogramthatcompilesfor
thevp20profileshouldalsocompileforthevp30profile,althoughthe
converseisnottrue.
Theadditionalcapabilitiesofthevp30profile,beyondthoseofvp20are
for,while,anddoloopsaresupportedwithoutrequiringloopunrolling
Fullsupportforif/elseallowingnonconstantconditionalexpressions
Bindings
Binding Semantics for Uniform Data
Thevalidbindingsemanticsforuniformparametersinthevp30profileare
summarizedinTable 23.
Table 23. vp30 Uniform Input Binding Semantics
Binding Semantics Name Corresponding Data
register(c0)–register(c255)
C0–C255 Constant register [0..255].
The aliases c0–c255 (lowercase) are also
accepted.
If used with a variable that requires more
than one constant register (for example, a
matrix), the semantic specifies the first
register that is used.
272 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Binding Semantics for Varying Input/Output Data
Thevalidbindingsemanticsforvaryinginputparametersinthevp30profile
aresummarizedinTable 24.
OnecanalsouseTANGENTandBINORMALinsteadofTEXCOORD6and
TEXCOORD7.ThesebindingsemanticsmaptoNV_vertex_program2input
attributeparameters.Thetwosetsactasaliasestoeachother.
Thevalidbindingsemanticsforvaryingoutputparametersinthevp30
profilearesummarizedinTable 25.
ThesebindingsemanticsmaptoNV_vertex_program2outputregisters.The
twosetsactasaliasestoeachother.
Table 24. vp30 Varying Input Binding Semantics
Binding Semantics Name Corresponding Data
POSITION, ATTR0 Input Vertex, Generic Attribute 0
BLENDWEIGHT, ATTR1 Input vertex weight, Generic Attribute 1
NORMAL, ATTR2 Input normal, Generic Attribute 2
COLOR0, DIFFUSE, ATTR3 Input primary color, Generic Attribute 3
COLOR1, SPECULAR, ATTR4 Input secondary color, Generic Attribute 4
TESSFACTOR, FOGCOORD,
ATTR5 Input fog coordinate, Generic Attribute 5
PSIZE, ATTR6 Input point size, Generic Attribute 6
BLENDINDICES, ATTR7 Generic Attribute 7
TEXCOORD0-TEXCOORD7,
ATTR8-ATTR15 Input texture coordinates (texcoord0-
texcoord7), Generic Attributes 8–15
TANGENT, ATTR14 Generic Attribute 14
BINORMAL, ATTR15 Generic Attribute 15
Table 25. vp30 Varying Output Binding Semantics
Binding Semantics Name Corresponding Data
POSITION, HPOS Output position
PSIZE, PSIZ Output point size
808-00504-0000-006 273
NVIDIA
Appendix B Language Profiles
TheprofileallowsWPOStobepresentasbindingsemanticsonamemberofa
structureofavaryingoutputdatastructure,providedthememberwiththis
bindingsemanticsisnotreferenced.ThisallowsCgprogramstohavesame
structurespecifythevaryingoutputofavp30profileprogramandthe
varyinginputofanfp30profileprogram.
FOG, FOGC Output fog coordinate
COLOR0, COL0 Output primary color
COLOR1, COL1 Output secondary color
BCOL0 Output backface primary color
BCOL1 Output backface secondary color
TEXCOORD0-TEXCOORD7,
TEX0-TEX7 Output texture coordinates
CLP0-CL5 Output Clip distances
Table 25. vp30 Varying Output Binding Semantics (continued)
Binding Semantics Name Corresponding Data
274 808-00504-0000-006
NVIDIA
Cg Language Toolkit
OpenGL NV_fragment_program Profile (fp30)
Thefp30FragmentProgramProfileisusedtocompileCgsourcecodeto
fragmentprogramsforusebytheNV_fragment_programOpenGL
extension.
Profile name: fp30
How to invoke:Usethecompileroption-profile fp30.
ThissectiondescribesthecapabilitiesandrestrictionsofCgwhenusingthe
fp30profile.
Language Constructs and Support
Data Types
fixedtype(s1.10fixedpoint)issupported
halftype(s10e5floatingpoint)issupported
Itisrecommendedthatyouusefixed,half,andfloatinthatorderfor
maximumperformance.Reversingthisorderprovidesmaximumprecision.
Youareencouragedtousethefastesttypethatmeetsyourneedsfor
precision.
Statements and Operators
Fullsupportforif/else
Noforandwhileloops,unlesstheycanbeunrolledbythecompiler
Supportforflexibletexturemapping
Supportforscreenspacederivativefunctions
Nosupportforvariableindexingofarrays
808-00504-0000-006 275
NVIDIA
Appendix B Language Profiles
Bindings
Binding Semantics for Uniform Data
The valid binding semantics for uniform parameters in the fp30 profile aresum
marizedinTable 26.
Binding Semantics for Varying Input/Output Data
Thevalidbindingsemanticsforvaryinginputparametersinthefp30profile
aresummarizedinTable 27.
ThesebindingsemanticsmaptoNV_fragment_programinputregisters.The
twosetsactasaliasestoeachother.TheprofilealsoallowsPOSITION,FOG,
PSIZE,HPOS,FOGC,PSIZ,BCOL0,BCOL1,andCLP0–CLP5tobepresentas
bindingsemanticsonamemberofastructureofavaryinginputdata
structure,providedthememberwiththisbindingsemanticsisnot
referenced.ThisallowsCgprogramstohavethesamestructurespecifythe
varyingoutputofavp30profileprogramandthevaryinginputofanfp30
profileprogram.
Table 26. fp30 Uniform Input Binding Semantics
Binding Semantics Name Corresponding Data
register(s0)-register(s15)
TEXUNIT0-TEXUNIT15 Texunit N, where N is in the range [0..15].
May be used only with uniform inputs with
sampler* types.
register(c0)-register(c31)
C0-C31 Constant register N, where N is in range
[0..15]
May only be used with uniform inputs.
Table 27. fp30 Varying Input Binding Semantics
Binding Semantics Name Corresponding Data (type)
COLOR0, COL0 Input color0 (float4)
COLOR1, COL1 Input color1 (float4)
TEXCOORD0-TEXCOORD7,
TEX0-TEX7 Input texture coordinates (float4)
WPOS Window Position Coordinates (float4)
276 808-00504-0000-006
NVIDIA
Cg Language Toolkit
The valid binding semantics for varying output parameters in the fp30 profile
aresummarizedinTable 28.
Pack and Unpack Functions
Thefp30profileprovidesanumberoffunctionsforpackingmultiple
floatingpointvaluesintoasingle32bitresult.Correspondingunpacking
functionsarealsoprovided.Thesefunctionsmapdirectlytothepackingand
unpackinginstructionsdefinedbytheNV_fragment_programOpenGL
extension.
pack_2half()
Convertsthecomponentsofaintoapairof16bitfloatingpointvalues.The
twoconvertedcomponentsarethenpackedintoasingle32bitresult.This
operationcanbereversedusingtheunpack_2half()function.
unpack_2half()
Unpacksa32bitvalueintotwo16bitfloatingpointvalues.
Table 28. fp30 Varying Output Binding Semantics
Binding Semantics Name Corresponding Data
COLOR, COLOR0, COL Output color (float4)
DEPTH, DEPR Output depth (float)
float pack_2half(float2 a);
float pack_2half(half2 a);
// C Pseudocode
result = (((half)a.y) << 16) | (half)a.x;
half2 unpack_2half(float a);
// C Pseudocode
result.x = (a >> 0) & 0xFF;
result.y = (a >> 16) & 0xFF;
808-00504-0000-006 277
NVIDIA
Appendix B Language Profiles
pack_2ushort()
Convertsthecomponentsofaintoapairof16bitunsignedintegers.Thetwo
convertedcomponentsarethenpackedintoasingle32bitreturnvalue.This
operationcanbereversedusingtheunpack_2ushort()function.
unpack_2ushort()
Unpackstwo16bitunsignedintegervaluesfromaandscalestheresultsinto
individualfloatingpointvaluesbetween0.0and1.0.
pack_4byte()
Convertsthefourcomponentsofainto8bitsignedintegers.Thesigned
integersaresuchthatarepresentationwithallbitssetto0correspondstothe
value‐(128/127),andarepresentationwithallbitssetto1correspondsto
+(127/127).Thefoursignedintegersarethenpackedintoasingle32bit
result.Thisoperationmaybereversedusingtheunpack_4byte()function.
float pack_2ushort(float2 a);
float pack_2ushort(half2 a);
// C Pseudocode
ushort.x = round(65535.0 * clamp(a.x, 0.0, 1.0));
ushort.y = round(65535.0 * clamp(a.y, 0.0, 1.0));
result = (ushort.y << 16) | ushort.y;
float2 unpack_2ushort(float a);
// C Pseudocode
result.x = ((x >> 0) & 0xFFFF) / 65535.0;
result.y = ((x >> 16) & 0xFFFF) / 65535.0;
float pack_4byte(float4 a);
float pack_4byte(half4 a);
// C Pseudocode
ub.x = round(127 * clamp(a.x, -128/127, 127/127) + 128);
ub.y = round(127 * clamp(a.y, -128/127, 127/127) + 128);
ub.z = round(127 * clamp(a.z, -128/127, 127/127) + 128);
ub.w = round(127 * clamp(a.w, -128/127, 127/127) + 128);
result = (ub.w << 24) | (ub.z << 16) | (ub.y << 8) | ub.x;
278 808-00504-0000-006
NVIDIA
Cg Language Toolkit
unpack_4byte()
Unpacksfour8bitintegersfromaandscalestheresultsintoindividual16
bitfloatingpointvaluesbetween‐(128/127)and+(127/127).
pack_4ubyte()
Convertsthefourcomponentsofainto8bitunsignedintegers.The
unsignedintegersaresuchthatarepresentationwithallbitssetto0
correspondsto0.0,andarepresentationwithallbitssetto1correspondsto
1.0.Thefourunsignedintegersarethenpackedintoasingle32bitresult.
Thisoperationcanbereversedusingtheunpack_4ubyte()function.
unpack_4ubyte()
Unpacksthefour8bitintegersinaandscalestheresultsintoindividual16
bitfloatingpointvaluesbetween0.0and1.0.
half4 unpack_4byte(float a);
// C Pseudocode
result.x = (((a >> 0) & 0xFF) - 128) / 127.0;
result.y = (((a >> 8) & 0xFF) - 128) / 127.0;
result.z = (((a >> 16) & 0xFF) - 128) / 127.0;
result.w = (((a >> 24) & 0xFF) - 128) / 127.0;
float pack_4ubyte(float4 a);
float pack_4ubyte(half4 a);
// C Psuedocode
ub.x = round(255.0 * clamp(a.x, 0.0, 1.0));
ub.y = round(255.0 * clamp(a.y, 0.0, 1.0));
ub.z = round(255.0 * clamp(a.z, 0.0, 1.0));
ub.w = round(255.0 * clamp(a.w, 0.0, 1.0));
result = (ub.w << 24) | (ub.z << 16) | (ub.y << 8) | ub.x;
half4 unpack_4ubyte(float a);
// C Pseudocode
result.x = ((a >> 0) & 0xFF) / 255.0;
result.y = ((a >> 8) & 0xFF) / 255.0;
result.z = ((a >> 16) & 0xFF) / 255.0;
result.w = ((a >> 24) & 0xFF) / 255.0;
808-00504-0000-006 279
NVIDIA
Appendix B Language Profiles
OpenGL NV_vertex_program 1.0 Profile (vp20)
Thevp20VertexProgramprofileisusedtocompileCgsourcecodetovertex
programsforusebytheNV_vertex_programOpenGLextension3.
Profile name:vp20
How to invoke:Usethecompileroption-profile vp20.
ThissectiondescribesthecapabilitiesandrestrictionsofCgwhenusingthe
vp20profile.
Overview
Thevp20profilelimitsCgtomatchthecapabilitiesofthe
NV_vertex_programextension.NV_vertex_programhasthesame
capabilitiesasDirectX8vertexshaders,sothelimitationsthatthisprofile
placesontheCgsourcecodewrittenbytheprogrammeristhesameasthe
DirectXVS1.1shaderprofile4.
Asidefromthesyntaxofthecompileroutput,theonlydifferencebetween
thevp20VertexShaderprofileandtheDirectXVS1.1profileisthatthevp20
profilesupportstwoadditionaloutputs:BCOL0(forbackfacingprimary
color)andBCOL1(forbackfacingsecondarycolor).
Position Invariance
Thevp20profilesupportspositioninvariance,asdescribedinthecore
languagespecification.
Themodelviewprojectionmatrixmustbespecifiedusingabinding
semanticof_GL_MVP.
Data Types
Thisprofileimplementsdatatypesasfollows:
floatdatatypesareimplementedasIEEE32bitsingleprecision.
halfanddoubledatatypesareimplementedasfloat.
3. TounderstandtheNV_vertex_programandthecodeproducedbythecompilerusingthe
vp20profile,seetheGL_NV_vertex_programextensiondocumentation.
4. See“OpenGLNV_vertex_program1.0Profile(vp20)”onpage 279forafullexplanation
ofthedatatypes,statements,andoperatorssupportedbythisprofile.
280 808-00504-0000-006
NVIDIA
Cg Language Toolkit
intdatatypeissupportedusingfloatingpointoperations,whichadd
extrainstructionsforpropertruncationfordivides,modulos,andcasts
fromfloatingpointtypes.
fixedorsampler*datatypesarenotsupported,buttheprofiledoes
providetheminimalpartialsupportthatisrequiredforthesedatatypes
bythecorelanguagespecification—thatis,itislegaltodeclarevariables
usingthesetypes,aslongasnooperationsareperformedonthe
variables.
Bindings
Binding Semantics for Uniform Data
The valid binding semantics for uniform parameters in the vp20 profile aresum
marizedinTable 29.
Table 29. vp20 Uniform Input Binding Semantics
Binding Semantics Name Corresponding Data
register(c0)–register(c95)
C0–C95 Constant register [0..95].
The aliases c0c95 (lowercase) are also
accepted.
If used with a variable that requires more
than one constant register (for example, a
matrix), the semantic specifies the first
register that is used.
808-00504-0000-006 281
NVIDIA
Appendix B Language Profiles
Binding Semantics for Varying Input/Output Data
Thevalidbindingsemanticsforvaryinginputparametersinthevp20profile
aresummarizedinTable 30.
OnecanalsouseTANGENTandBINORMALinsteadofTEXCOORD6and
TEXCOORD7.Asecondsetofbindingsemantics,ATTR0–ATTR15,canalsobe
used.Thetwosetsactasaliasestoeachother.
Thevalidbindingsemanticsforvaryingoutputparametersinthevp20
profilearesummarizedinTable 31.
ThesebindingsemanticsmaptoNV_vertex_programoutputregisters.The
twosetsactasaliasestoeachother.
Table 30. vp20 Varying Input Binding Semantics
Binding Semantics Name Corresponding Data
POSITION, ATTR0 Input Vertex, Generic Attribute 0
BLENDWEIGHT, ATTR1 Input vertex weight, Generic Attribute 1
NORMAL, ATTR2 Input normal, Generic Attribute 2
COLOR0, DIFFUSE, ATTR3 Input primary color, Generic Attribute 3
COLOR1, SPECULAR, ATTR4 Input secondary color, Generic Attribute 4
TESSFACTOR, FOGCOORD, ATTR5 Input fog coordinate, Generic Attribute 5
PSIZE, ATTR6 Input point size, Generic Attribute 6
BLENDINDICES, ATTR7 Generic Attribute 7
TEXCOORD0-TEXCOORD7,
ATTR8ATTR15 Input texture coordinates (texcoord0-
texcoord7), Generic Attributes 8-15
TANGENT, ATTR14 Generic Attribute 14
BINORMAL, ATTR15 Generic Attribute 15
Table 31. vp20 Varying Output Binding Semantics
Binding Semantics Name Corresponding Data
POSITION, HPOS Output position
PSIZE, PSIZ Output point size
FOG, FOGC Output fog coordinate
282 808-00504-0000-006
NVIDIA
Cg Language Toolkit
TheprofilealsoallowsWPOStobepresentasbindingsemanticsonamember
ofastructureofavaryingoutputdatastructure,providedthememberwith
thisbindingsemanticsisnotreferenced.ThisallowsCgprogramstohave
thesamestructurespecifythevaryingoutputofavp20profileprogramand
thevaryinginputofanfp30profileprogram.
COLOR0, COL0 Output primary color
COLOR1, COL1 Output secondary color
BCOL0 Output backface primary color
BCOL1 Output backface secondary color
TEXCOORD0-TEXCOORD3, TEX0-TEX3 Output texture coordinates
Table 31. vp20 Varying Output Binding Semantics (continued)
Binding Semantics Name Corresponding Data
808-00504-0000-006 283
NVIDIA
Appendix B Language Profiles
OpenGL NV_texture_shader and NV_register_combiners
Profile (fp20)
TheOpenGLNV_texture_shaderandNV_register_combinersprofileisused
tocompileCgsourcecodetothenvparsetextformatforthe
NV_texture_shaderandNV_register_combinersfamilyofOpenGL
extensions5.
Profile name:fp20
How to invoke:Usethecompileroption-profile fp20.
ThisdocumentdescribesthecapabilitiesandrestrictionsofCgwhenusing
thefp20profile.
Overview
Operationsinthefp20profilecanbecategorizedastextureshader
operationsandarithmeticoperations.Textureshaderoperationsare
operationswhichgeneratetextureshaderinstructions,arithmeticoperations
areoperationswhichgenerateregistercombinersinstructions.
Theunderlyinginstructionsetandmachinearchitecturelimit
programmabilityinthisprofilecomparedtowhatisallowedbyCg
constructs.Thus,thisprofileplacesadditionalrestrictionsonwhatcanand
cannotbedoneinaCgprogram.
Restrictions
ACgprograminoneoftheseprofilesislimitedtogeneratingamaximumof
fourtextureshaderinstructionsandeightregistercombinerinstructions.
Sincethesenumbersarequitesmall,usersneedtobeveryawareofthis
limitationwhilewritingCgcodefortheseprofiles.
Thefp20profilealsorestrictswhenatextureshaderoperationorarithmetic
operationcanoccurintheprogram.Atextureshaderoperationmaynot
haveanydependencyontheoutputofanarithmeticoperationunless
thearithmeticoperationisavalidinputmodifierforthetextureshader
operation
5. Formoredetailsabouttheunderlyinginstructionsets,theircapabilities,andtheir
limitations,pleaserefertotheNV_texture_shaderandNV_register_combiners
extensionsintheOpenGLExtensionsdocumentation.
284 808-00504-0000-006
NVIDIA
Cg Language Toolkit
thearithmeticoperationispartofacomplextextureshaderoperation
(whicharesummarizedinthesectionAuxiliaryTextureFunctions”on
page 290)
Modifiers
Therearecertainsimplearithmeticoperationsthatcanbeappliedtoinputs
oftextureshaderoperationsandtoinputsandoutputsofarithmetic
operationswithoutgeneratingaregistercombinerinstruction.These
operationsarereferredtoasinputmodifiersandoutputmodifiers.
Insteadofgeneratingaregistercombinersinstruction,thearithmetic
operationmodifiestheassemblyinstructionorsourceregisterstowhichitis
applied.Forexample,thefollowingCgexpression
z = (x - 0.5 + y) / 2
couldgeneratethefollowingregistercombinerinstruction(assumingxisin
tex0,yisintex1,andzisincol0)
HowdifferentNV_texture_shaderandNV_register_combinersinstruction
setmodifiersareexpressedinCgprogramsaresummarizedinTable 32.For
moredetailsonthecontextinwhicheachmodifierisallowedandwaysin
whichmodifiersmaybecombinedrefertotheNV_texture_shaderand
NV_register_combinersdocumentation.
rgb
{
discard = half_bias(tex0.rgb);
discard = tex1.rgb;
col0 = sum();
scale_by_one_half();
}
alpha
{
discard = half_bias(tex0.a);
discard = tex1.a;
col0 = sum();
scale_by_one_half();
}
808-00504-0000-006 285
NVIDIA
Appendix B Language Profiles
Language Constructs and Support
Data Types
Inthefp20profile,operationsoccuronsignedclampedfloatingpointvalues
intherange‐1to1.Theseprofilesallowalldatatypestobeused,butall
operationsarecarriedoutintheaboverange.Refertothe
NV_texture_shaderandNV_register_combinersdocumentationformore
details.
Statements and Operators
Thefp20profilesupportsalloftheCglanguageconstructs,withthe
followingexceptions:
Arbitraryswizzlesarenotsupported(thougharbitrarywritemasksare).
Onlythefollowingswizzlesareallowed
.x/.r .y/.g .z/.b .w/.a
.xy/.rg .xyz/.rgb .xyzw/.rgba
.xxx/.rrr .yyy/.ggg .zzz/.bbb .www/.aaa
.xxxx/.rrrr .yyyy/.gggg .zzzz/.bbbb .wwww/.aaaa
Table 32. NV_texture_shader and NV_register_combiners Instruction
Set Modifiers
Instruction/Register Modifier Cg Expression
scale_by_two() 2*x
scale_by_four() 4*x
scale_by_one_half() x/2
bias_by_negative_one_half() x-0.5
bias_by_negative_one_half_scale_by_two() 2*(x-0.5)
unsigned(reg) saturate(x)
(i.e. min(1, max(0, x))
unsigned_invert(reg) 1-saturate(x)
half_bias(reg) x-0.5
-reg -x
expand(reg) 2*(x-0.5)
286 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Matrixswizzlesarenotsupported.
Booleanoperatorsotherthan<,<=,>and>=arenotsupported.
Furthermore,<,<=,>and>=areonlysupportedastheconditioninthe
?:operator.
Bitwiseintegeroperatorsarenotsupported.
/isnotsupportedunlessthedivisorisanonzeroconstantoritisused
tocomputethedepthoutput.
%isnotsupported.
Ternary ?:issupportedifthebooleantestexpressionisacompiletime
booleanconstant,auniformscalarbooleanorascalarcomparisontoa
constantvalueintherange[0.5,1.0](forexample,a > 0.5 ? b : c).
do, for,and whileloopsaresupportedonlywhentheycanbe
completelyunrolled.
arrays,vectors,andmatricesmaybeindexedonlybycompiletime
constantvaluesorindexvariablesinloopsthatcanbecompletely
unrolled.
Thediscardstatementisnotsupported.Thesimilarbutlessgeneral
clip()functionissupported.
Theuseofanallocation-rule-identifierforaninputoroutput
structisoptional.
Standard Library Functions
Becausethefp20profilehaslimitedcapabilities,notalloftheCgstandard
libraryfunctionsaresupported.
TheCgstandardlibraryfunctionsthataresupportedbythisprofileare
presentedinTable 33.Seethestandardlibrarydocumentationfor
descriptionsofthesefunctions.
Table 33. Supported Standard Library Functions
dot(floatN, floatN)
lerp(floatN, floatN, floatN)
lerp(floatN, floatN, float)
tex1D(sampler1D, float)
tex1D(sampler1D, float2)
808-00504-0000-006 287
NVIDIA
Appendix B Language Profiles
Note: The nonprojective texture lookup functions are actually done as projective lookups
on the underlying hardware. Because of this, the w component of the texture
coordinates passed to these functions from the application or vertex program must
contain the value 1.
Texturecoordinateparametersforprojectivetexturelookupfunctionsmust
haveswizzlesthatmatchtheswizzledonebythegeneratedtextureshader
instruction.Whilethismayseemburdensome,itisintendedtoallowfp20
profileprogramstobehavecorrectlyunderotherpixelshaderprofiles.
Theswizzlesrequiredonthetexturecoordinateparametertotheprojective
texturelookupfunctionsarelistedinTable 34.
tex1Dproj(sampler1D, float2)
tex1Dproj(sampler1D, float3)
tex2D(sampler2D, float2)
tex2D(sampler2D, float3)
tex2Dproj(sampler2D, float3)
tex2Dproj(sampler2D, float4)
texRECT(samplerRECT, float2)
texRECT(samplerRECT, float3)
texRECTproj(samplerRECT, float3)
texRECTproj(samplerRECT, float4)
tex3D(sampler3D, float3)
tex3Dproj(sampler3D, float4)
texCUBE(samplerCUBE, float3)
texCUBEproj(samplerCUBE, float4)
Table 33. Supported Standard Library Functions (continued)
288 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Bindings
Manual Assignment of Bindings
TheCgcompilercandeterminebindingsbetweentextureunitsanduniform
samplerparameters/texturecoordinateinputsautomatically.Thisautomatic
assignmentisbasedonthecontextinwhichuniformsamplerparameters
andtexturecoordinateinputsareusedtogether.
Tospecifybindingsbetweentextureunitsanduniformparameters/texture
coordinatestomatchtheirapplication,allsampleruniformparametersand
texturecoordinateinputsthatareusedintheprogrammusthavematching
bindingsemantics—forexample,TEXUNIT<n>mayonlybeusedwith
TEXCOORD<n>.Partiallyspecifiedbindingsemanticsmaynotworkinall
cases.Fundamentally,thisrestrictionisduetotheclosecouplingbetween
texturesamplersandtexturecoordinatesintheNV_texture_shader
extension.
Binding Semantics for Uniform Data
Ifabindingsemanticforauniformparameterisnotspecified,thenthe
compilerwillallocateoneautomatically.Scalaruniformparametersmaybe
allocatedtoeitherthexyzorthewportionofaconstantregisterdepending
onhowtheyareusedwithintheCgprogram.Whenusingtheoutputofthe
compilerwithouttheCgruntime,youmustsetallvaluesofascalaruniform
tothedesiredscalarvalue,notjustthexcomponent.
Thevalidbindingsemanticsforuniformparametersinthefp20profileare
summarizedinTable 35.
Table 34. Required Projective Texture Lookup Swizzles
Texture Lookup Function Texture Coordinate Swizzle
tex1Dproj .xw/.ra
tex2Dproj .xyw/.rga
texRECTproj .xyw/.rga
tex3Dproj .xyzw/.rgba
texCUBEproj .xyzw/.rgba
808-00504-0000-006 289
NVIDIA
Appendix B Language Profiles
Theps_1_Xprofilesallowtheprogrammertodecidewhichconstantregister
auniformvariablewillresideinbyspecifyingtheC<n>/register(c<n>)
bindingsemantic.Thisisnotallowedinthefp20profilesincethe
NV_register_combinersextensiondoesnothaveasinglebankofconstant
registers.WhiletheNV_register_combinersextensiondoesdescribe
constantregisters,theseconstantregistersarepercombinerstageand
specifyingbindingstothemintheprogramwouldoverlyconstrainthe
compiler.
Binding Semantics for Varying Input/Output Data
Thevaryinginputbindingsemanticsinthefp20profilearethesameasthe
varyingoutputbindingsemanticsofthevp20profile.
Varyinginputbindingsemanticsinthefp20profileconsistofCOLOR0,
COLOR1,TEXCOORD0,TEXCOORD1,TEXCOORD2andTEXCOORD3.Thesemapto
outputregistersinvertexshaders.
Thevalidbindingsemanticsforvaryinginputparametersinthefp20profile
aresummarizedinTable 36.
Table 35. fp20 Uniform Binding Semantics
Binding Semantics Name Corresponding Data
register(s0)—register(s3)
TEXUNIT0—TEXTUNIT3 Texture unit N, where N is in range [0..3].
May be used only with uniform inputs with
sampler* types.
Table 36. fp20 Varying Input Binding Semantics
Binding Semantics Name Corresponding Data
COLOR, COLOR0
COL, COL0 Input color value v0
COLOR1
COL1 Input color value v1
TEXCOORD0—TEXCOORD3
TEX0—TEX3 Input texture coordinates t0–t3
FOGP
FOG Input fog color and factor
290 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Additionally,thefp20profileallowsPOSITION,PSIZE,TEXCOORD4,
TEXCOORD5,TEXCOORD6,andTEXCOORD7tobespecifiedonvaryinginputs,
providedtheseinputsarenotreferenced.ThisallowsCgprogramstohave
thesamestructurespecifythevaryingoutputofavp20profileprogramand
thevaryinginputofafp20profileprogram.
Thevalidbindingsemanticsforvaryingoutputparametersinthefp20
profilearesummarizedinTable 37.
Theoutputdepthvalueisspecialinthatitmayonlybeassignedavalueof
theform
...
float4 t = <texture shader operation>;
float z = dot(texCoord<n>, t.xyz);
float w = dot(texCoord<n+1>, t.xyz);
depth = z / w;
...
Auxiliary Texture Functions
Becausethecapabilitiesofthetextureshaderinstructionsarelimitedin
NV_texture_shader,asetofauxiliaryfunctionsareprovidedintheseprofiles
thatexpressthefunctionalityofthemorecomplextextureshader
instructions.Thesefunctionsaremerelyprovidedasaconveniencefor
writingfp20Cgprograms.Thesameresultcanbeachievedbywritingthe
expandedformofeachfunctiondirectly.Usingtheexpandedformhasthe
additionaladvantageofbeingsupportedonotherprofiles.
ThesefunctionsaresummarizedinTable 38.
Table 37. fp20 Varying Output Binding Semantics
Binding Semantics Name Corresponding Data
COLOR, COLOR0
COL, COL0 Output color (float4)
DEPR
DEPTH Output depth (float)
808-00504-0000-006 291
NVIDIA
Appendix B Language Profiles
Table 38. fp20 Auxiliary Texture Functions
Texture Function
Description
offsettex2D(uniform sampler2D tex, float2 st,
float4 prevlookup, uniform float4 m)
offsettexRECT(uniform samplerRECT tex, float2 st,
float4 prevlookup, uniform float4 m)
Performs the following:
float2 newst = st + m.xy * prevlookup.xx + m.zw * prevlookup.yy;
return tex2D/RECT(tex, newst);
where
st are texture coordinates associated with sampler tex,
prevlookup is the result of a previous texture operation, and
m is the offset texture matrix.
This function can be used to generate the offset_2d or
offset_rectangle NV_texture_shader instructions.
offsettex2DScaleBias(uniform sampler2D tex, float2 st,
float4 prevlookup, uniform float4 m,
uniform float scale, uniform float bias)
offsettexRECTScaleBias(uniform samplerRECT tex, float2 st,
float4 prevlookup, uniform float4 m,
uniform float scale, uniform float bias)
Performs the following
float2 newst = st + m.xy * prevlookup.xx + m.zw * prevlookup.yy;
float4 result = tex2D/RECT(tex, newst);
return result * saturate(prevlookup.z * scale + bias);
where
st are texture coordinates associated with sampler tex,
prevlookup is the result of a previous texture operation,
m is the offset texture matrix,
scale is the offset texture scale, and
bias is the offset texture bias.
This function can be used to generate the offset_2d_scale or
offset_rectangle_scale NV_texture_shader instructions.
292 808-00504-0000-006
NVIDIA
Cg Language Toolkit
tex1D_dp3(sampler1D tex, float3 str, float4 prevlookup)
Performs the following
return tex1D(tex, dot(str, prevlookup.xyz));
where
str are texture coordinates associated with sampler tex, and
prevlookup is the result of a previous texture operation.
This function can be used to generate the dot_product_1d
NV_texture_shader instruction.
tex2D_dp3x2(uniform sampler2D tex, float3 str,
float4 intermediate_coord, float4 prevlookup)
texRECT_dp3x2(uniform samplerRECT tex, float3 str,
float4 intermediate_coord, float4 prevlookup)
Performs the following
float2 newst = float2(dot(intermediate_coord.xyz, prevlookup.xyz),
dot(str, prevlookup.xyz));
return tex2D/RECT(tex, newst);
where
str are texture coordinates associated with sampler tex,
prevlookup is the result of a previous texture operation, and
intermediate_coord are texture coordinates associated with the previous
texture unit.
This function can be used to generate the dot_product_2d or
dot_product_rectangle NV_texture_shader instruction combinations.
tex3D_dp3x3(sampler3D tex, float3 str,
float4 intermediate_coord1,
float4 intermediate_coord2, float4 prevlookup)
texCUBE_dp3x3(samplerCUBE tex, float3 str,
float4 intermediate_coord1,
float4 intermediate_coord2, float4 prevlookup)
Table 38. fp20 Auxiliary Texture Functions (continued)
Texture Function
Description
808-00504-0000-006 293
NVIDIA
Appendix B Language Profiles
Performs the following
float3 newst = float3(dot(intermediate_coord1.xyz, prevlookup.xyz),
dot(intermediate_coord2.xyz, prevlookup.xyz),
dot(str, prevlookup.xyz));
return tex3D/CUBE(tex, newst);
where
str are texture coordinates associated with sampler tex,
prevlookup is the result of a previous texture operation,
intermediate_coord1 are texture coordinates associated with the n-2
texture unit, and
intermediate_coord2 are texture coordinates associated with the n-1
texture unit.
This function can be used to generate the dot_product_3d or
dot_product_cube_map NV_texture_shader instruction combinations.
texCUBE_reflect_dp3x3(uniform samplerCUBE tex, float4 strq,
float4 intermediate_coord1,
float4 intermediate_coord2,
float4 prevlookup)
Performs the following
float3 E = float3(intermediate_coord2.w, intermediate_coord1.w,
strq.w);
float3 N = float3(dot(intermediate_coord1.xyz, prevlookup.xyz),
dot(intermediate_coord2.xyz, prevlookup.xyz),
dot(strq.xyz, prevlookup.xyz));
return texCUBE(tex, 2 * dot(N, E) / dot(N, N) * N - E);
where
strq are texture coordinates associated with sampler tex,
prevlookup is the result of a previous texture operation,
intermediate_coord1 are texture coordinates associated with the n-2
texture unit, and
intermediate_coord2 are texture coordinates associated with the n-1
texture unit.
This function can be used to generate the
dot_product_reflect_cube_map_eye_from_qs NV_texture_shader
instruction combination.
Table 38. fp20 Auxiliary Texture Functions (continued)
Texture Function
Description
294 808-00504-0000-006
NVIDIA
Cg Language Toolkit
texCUBE_reflect_eye_dp3x3(uniform samplerCUBE tex,
float3 str,
float4 intermediate_coord1,
float4 intermediate_coord2,
float4 prevlookup,
uniform float3 eye)
Performs the following
float3 N = float3(dot(intermediate_coord1.xyz, prevlookup.xyz),
dot(intermediate_coord2.xyz, prevlookup.xyz),
dot(coords.xyz, prevlookup.xyz));
return texCUBE(tex, 2 * dot(N, E) / dot(N, N) * N - E);
where
strq are texture coordinates associated with sampler tex,
prevlookup is the result of a previous texture operation,
intermediate_coord1 are texture coordinates associated with the n-2
texture unit,
intermediate_coord2 are texture coordinates associated with the n-1
texture unit, and
eye is the eye-ray vector.
This function can be used generate the
dot_product_reflect_cube_map_const_eye NV_texture_shader
instruction combination.
tex_dp3x2_depth(float3 str, float4 intermediate_coord,
float4 prevlookup)
Performs the following
float z = dot(intermediate_coord.xyz, prevlookup.xyz);
float w = dot(str, prevlookup.xyz);
return z / w;
where
str are texture coordinates associated with the nth texture unit,
intermediate_coord are texture coordinates associated with the n-1
texture unit, and
prevlookup is the result of a previous texture operation.
This function can be used in conjunction with the DEPTH varying out semantic
to generate the dot_product_depth_replace NV_texture_shader
instruction combination.
Table 38. fp20 Auxiliary Texture Functions (continued)
Texture Function
Description
808-00504-0000-006 295
NVIDIA
Appendix B Language Profiles
Examples
ThefollowingexamplesshowhowadevelopercanuseCgtoachieve
NV_texture_shaderandNV_register_combinersfunctionality.
Example 1
Example 2
struct VertexOut {
float4 color : COLOR0;
float4 texCoord0 : TEXCOORD0;
float4 texCoord1 : TEXCOORD1;
};
float4 main(VertexOut IN,
uniform sampler2D diffuseMap,
uniform sampler2D normalMap) : COLOR
{
float4 diffuseTexColor = tex2D(diffuseMap, IN.texCoord0.xy);
float4 normal = 2 * (tex2D(normalMap, IN.texCoord1.xy)-0.5);
float3 light_vector = 2 * (IN.color.rgb - 0.5);
float4 dot_result = saturate(
dot(light_vector, normal.xyz).xxxx);
return dot_result * diffuseTexColor;
}
struct VertexOut {
float4 texCoord0 : TEXCOORD0;
float4 texCoord1 : TEXCOORD1;
float4 texCoord2 : TEXCOORD2;
float4 texCoord3 : TEXCOORD3;
};
float4 main(VertexOut IN,
uniform sampler2D normalMap,
uniform sampler2D intensityMap,
uniform sampler2D colorMap) : COLOR
{
float4 normal = 2 * (tex2D(normalMap, IN.texCoord0.xy)-0.5);
float2 intensCoord = float2(
dot(IN.texCoord1.xyz, normal.xyz),
dot(IN.texCoord2.xyz, normal.xyz));
float4 intensity = tex2D(intensityMap, intensCoord);
float4 color = tex2D(colorMap, IN.texCoord3.xy);
return color * intensity;
}
296 808-00504-0000-006
NVIDIA
Cg Language Toolkit
DirectX Vertex Shader 2.x Profiles (vs_2_*)
TheDirectXVertexShader2.0profilesareusedtocompileCgsourcecodeto
DirectX9VS2.0vertexshaders6andDirectX9VS2.0Extendedvertex
shaders.
Profile names
vs_2_0(forDirectX9VS2.0vertexshaders)
vs_2_x (forDirectX9VS2.0extendedvertexshaders)
How to invoke:Usethecompileroptions
-profile vs_2_0
-profile vs_2_x
Thissectiondescribeshowusingthevs_2_0andvs_2_xprofilesaffectsthe
Cgsourcecodethatthedeveloperwrites.
Overview
Thevs_2_0profilelimitsCgtomatchthecapabilitiesofDirectXVS2.0
vertexshaders.Thevs_2_xprofileisthesameasthevs_2_0profilebut
allowsextendedfeaturessuchasdynamicflowcontrol(branching).
Memory
DirectX9vertexshadershavealimitedamountofmemoryforinstructions
anddata.
Program Instruction Limit
DirectX9vertexshadersarelimitedto256instructions.Ifthecompilerneeds
toproducemorethan256instructionstocompileaprogram,itreportsan
error.
Vector Register Limit
Likewise,therearelimitednumbersofregisterstoholdprogramparameters
andtemporaryresults.Specifically,thereare256readonlyvectorregisters
and12–32read/writevectorregisters.Ifthecompilerneedsmoreregistersto
compileaprogramthanareavailable,itgeneratesanerror.
6. TounderstandtheDirectXVS2.0VertexShadersandthecodethecompilerproduces,see
theVertexShaderReferenceintheDirectX9SDKdocumentation.
808-00504-0000-006 297
NVIDIA
Appendix B Language Profiles
Statements and Operators
Ifthevs_2_0profileisused,thenif,while,do,andforstatementsare
allowedonlyiftheloopstheydefinecanbeunrolledbecausethereisno
dynamicbranchinginunextendedVS2.0shaders.
Ifthevs_2_xprofileisused,thenif,while,anddostatementsarefully
supportedaslongastheDynamicFlowControlDepthoptionisnot0.
Comparisonoperatorsareallowed(>,<,>=,<=,==,!=)andBoolean
operators(||,&&,?:)areallowed.However,thelogicoperators(&,|,^,~)
arenot.
Data Types
Theprofilesimplementdatatypesasfollows:
floatdatatypesareimplementedasIEEE32bitsingleprecision.
halfanddoubledatatypesaretreatedasfloat.
intdatatypeissupportedusingfloatingpointoperations,whichadds
extrainstructionsforpropertruncationfordivides,modulosandcasts
fromfloatingpointtypes.
fixedorsampler*datatypesarenotsupported,buttheprofilesdo
providetheminimalpartialsupportthatisrequiredforthesedatatypes
bythecorelanguagespecification—thatis,itislegaltodeclarevariables
usingthesetypes,aslongasnooperationsareperformedonthe
variables.
Using Arrays
Variableindexingofarraysisallowedaslongasthearrayisauniform
constant.Forcompatibilityreasonsarraysindexedwithvariableexpressions
neednotbedeclaredconstjustuniform.However,writingtoanarraythatis
laterindexedwithavariableexpressionyieldsunpredictableresults.
Arraydataisnotpackedbecausevertexprogramindexingdoesnotpermit
it.Eachelementofthearraytakesasingle4floatprogramparameter
register.Forexample,float arr[10],float2 arr[10],float3 arr[10],
andfloat4 arr[10]allconsume10programparameterregisters.
Itismoreefficienttoaccessanarrayofvectorsthananarrayofmatrices.
Accessingamatrixrequiresafloorcalculation,followedbyamultiplybya
constanttocomputetheregisterindex.Becausevectors(andscalars)take
oneregister,neitherthefloornorthemultiplyisneeded.Itisfastertodo
298 808-00504-0000-006
NVIDIA
Cg Language Toolkit
matrixskinningusingarraysofvectorswithapremultipliedindexthan
usingarraysofmatrices.
Bindings
Binding Semantics for Uniform Data
Thevalidbindingsemanticsforuniformparametersinthevs_2_0 and
vs_2_XprofilesaresummarizedinTable 39.
Binding Semantics for Varying Input/Output Data
Onlythebindingsemanticnamesneedbegivenfortheseprofiles.Thevertex
parameterinputregistersareallocateddynamically.Allthesemanticnames,
exceptPOSITION,canhaveanumberfrom0to15afterthem.
Thevalidbindingsemanticsforvaryingoutputparametersinthevs_2_0
and vs_2__XprofilesaresummarizedinTable 41.
Table 39. vs_2_* Uniform Input Binding Semantics
Binding Semantics Name Corresponding Data
register(c0)–register(c255)
C0–C255 Constant register [0..95].
The aliases c0-c95 (lowercase) are also
accepted.
If used with a variable that requires more
than one constant register (for example, a
matrix), the semantic specifies the first
register that is used.
Table 40. vs_2_* Varying Input Binding Semantics
POSITION PSIZE
BLENDWEIGHT BLENDINDICES
NORMAL TEXCOORD
COLOR TANGENT
TESSFACTOR BINORMAL
808-00504-0000-006 299
NVIDIA
Appendix B Language Profiles
ThesemaptooutputregistersinDirectX9vertexshaders.
Options
Thevs_2_xprofileallowsthefollowingprofilespecificoptions:
Table 41. vs_2_* Varying Output Binding Semantics
Binding Semantics Name Corresponding Data
POSITION Output position: oPos
PSIZE Output point size: oPts
FOG Output fog value: oFog
COLOR0-COLOR1 Output color values: oD0, oD1
TEXCOORD0–TEXCOORD7 Output texture coordinates: oT0oT7
DynamicFlowControlDepth=<n> (wheren =0or24;default24)
NumTemps=<n> (where12<=n<=32;default16)
Predication (defaulttrue)
300 808-00504-0000-006
NVIDIA
Cg Language Toolkit
DirectX Pixel Shader 2.x Profiles (ps_2_*)
TheDirectXPixelShader2.0ProfilesareusedtocompileCgsourcecodeto
DirectX9PS2.0pixelshaders7andDirectX9PS2.0extendedpixelshaders.
Profile names
ps_2_0 (forDirectX9PS2.0pixelshaders)
ps_2_x(forDirectX9PS2.0extendedpixelshaders)
How to invoke:Usethecompileroptions
-profile ps_2_0
profile ps_2_x
Theps_2_0profilelimitsCgtomatchthecapabilitiesofDirectXPS2.0pixel
shaders.Theps_2_x profileisthesameastheps_2_0profilebutallows
extendedfeaturessuchasarbitraryswizzles,largerlimitonnumberof
instructions,nolimitontextureinstructions,nolimitontexturedependent
reads,andsupportforpredication.
ThissectiondescribesthecapabilitiesandrestrictionsofCgwhenusing
theseprofiles.
Memory
Program Instruction Limit
DirectX9Pixelshadershavealimitonthenumberofinstructionsinapixel
shader.
PS2.0(ps_2_0)pixelshadersarelimitedto32textureinstructionsand64
arithmeticinstructions.
ExtendedPS2(ps_2_x)shadershavealimitofmaximumnumberof
totalinstructionsbetween96to1024instructions.
Thereisnoseparatetextureinstructionlimitonextendedpixelshaders.
Ifthecompilerneedstoproducemorethanthemaximumallowednumber
ofinstructionstocompileaprogram,itreportsanerror.
Vector Register Limit
Likewise,therearelimitednumbersofregisterstoholdprogramparameters
andtemporaryresults.Specifically,thereare32readonlyvectorregisters
7. TounderstandthecapabilitiesofDirectXPS2.0PixelShadersandthecodeproducedby
thecompiler,refertothePixelShaderReferenceintheDirectX9SDKdocumentation.
808-00504-0000-006 301
NVIDIA
Appendix B Language Profiles
and1232read/writevectorregisters.Ifthecompilerneedsmoreregistersto
compileaprogramthanareavailable,itgeneratesanerror.
Language Constructs and Support
Data Types
Thisprofileimplementsdatatypesasfollows:
floatdatatypeisimplementedasIEEE32bitsingleprecision.
half,fixed,anddoubledatatypesaretreatedasfloat.
halfdatatypescanbeusedtospecifypartialprecisionhintforpixel
shaderinstructions.
intdatatypeissupportedusingfloatingpointoperations.
sampler*typesaresupportedtospecifysamplerobjectsusedfortexture
fetches.
Statements and Operators
Withtheps_2_0profileswhile,do,andforstatementsareallowedonlyif
theloopstheydefinecanbeunrolledbecausethereisnodynamicbranching
inPS2.0shaders.IncurrentCgimplementation,extendedps_2_xshaders
alsohavethesamelimitation.
Comparisonoperatorsareallowed(>,<,>=,<=,==,!=)andBoolean
operators(||,&&,?:)areallowed.However,thelogicoperators(&,|,^,~)are
not.
Using Arrays and Structures
Variableindexingofarraysisnotallowed.Arrayandstructuredataisnot
packed.
302 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Bindings
Binding Semantics for Uniform Data
Thevalidbindingsemanticsforuniformparametersintheps_2_0 and
ps_2_XprofilesaresummarizedinTable 42.
Binding Semantics for Varying Input/Output Data
Thevalidbindingsemanticsforvaryinginputparametersintheps_2_0and
ps_2_xprofilesaresummarizedinTable 43.
Thevalidbindingsemanticsforvaryingoutputparametersintheps_2_0
and ps_2_xprofilesaresummarizedinTable 44.
Table 42. ps_2_* Uniform Input Binding Semantics
Binding Semantics Name Corresponding Data
register(s0)—register(s15)
TEXUNIT0-TEXUNIT15 Texunit unit N, where N is in range [0..15]
May only be used with uniform inputs with
sampler* types.
register(c0)-register(c31)
C0–C31 Constant register N, where N is in range
[0..31]
May only be used with uniform inputs.
Table 43. ps_2_* Varying Input Binding Semantics
Binding Semantics Name Corresponding Data (type)
COLOR0 Input color 0 (float4)
COLOR1 Input color 1 (float4)
TEXCOORD0-TEXCOORD7 Input texture coordinates (float4)
Table 44. ps_2_* Varying Output Binding Semantics
Binding Semantics Name Corresponding Data
COLOR, COLOR0 Output color (float4)
DEPTH Output depth (float)
808-00504-0000-006 303
NVIDIA
Appendix B Language Profiles
Options
Theps_2_xprofileallowsthefollowingprofilespecificoptions:
Limitations in this Implementation
Currently,thisprofileimplementationhasthefollowinglimitations:
Dynamicflowcontrolisnotsupportedinextendedpixelshaders.
Multiplecoloroutputsarenotsupportedinpixelshaders.OnlyColor0
issupported.
NumTemps=<n> (where0<=n<=32;default32)
NumInstructionSlots=<n>(wheren>=0;default1024)
Predication=<b> (whereb=0or1;default1)
ArbitrarySwizzle=<b> (whereb=0or1;default1)
GradientInstructions=<b>(whereb=0or1;default1)
NoDependentReadLimit=<b>(whereb=0or1;default1)
NoTexInstructionLimit=<b>(whereb=0or1;default1)
304 808-00504-0000-006
NVIDIA
Cg Language Toolkit
DirectX Vertex Shader 1.1 Profile (vs_1_1)
TheDirectXVertexShader1.1profileisusedtocompileCgsourcecodeto
DirectX8.1VertexShadersandDirectX9VS1.1shaders8.
Profile name:vs_1_1
How to invoke:Usethecompileroption-profile vs_1_1.
Thevs_1_1profilelimitsCgtomatchthecapabilitiesofDirectXVertex
Shaders.
Thissectiondescribeshowusingthevs_1_1profileaffectstheCgsource
codethatthedeveloperwrites.
Memory Restrictions
DirectX8vertexshadershavealimitedamountofmemoryforinstructions
anddata.
Program Instruction Limits
TheDirectX8vertexshadersarelimitedto128instructions.Ifthecompiler
needstoproducemorethan128instructionstocompileaprogram,itreports
anerror.
Vector Register Limits
Likewise,therearelimitednumbersofregisterstoholdprogramparameters
andtemporaryresults.Specifically,thereare96readonlyvectorregisters
and12read/writevectorregisters.Ifthecompilerneedsmoreregistersto
compileaprogramthanareavailable,itgeneratesanerror.
Language Constructs and Support
Data Types
Thisprofileimplementsdatatypesasfollows:
floatdatatypesareimplementedasIEEE32bitsingleprecision.
halfanddoubledatatypesaretreatedasfloat.
8. TounderstandtheDirectXVS1.1VertexShadersandthecodethecompilerproduces,see
theVertexShaderReferenceintheDirectX8.1SDKdocumentation.
808-00504-0000-006 305
NVIDIA
Appendix B Language Profiles
intdatatypeissupportedusingfloatingpointoperations,whichadds
extrainstructionsforpropertruncationfordivides,modulosandcasts
fromfloatingpointtypes.
fixedorsampler*datatypesarenotsupported,buttheprofiledoes
providetheminimalpartialsupportthatisrequiredforthesedatatypes
bythecorelanguagespecification—thatis,itislegaltodeclarevariables
usingthesetypes,aslongasnooperationsareperformedonthe
variables.
Statements and Operators
Theif,while,do,andforstatementsareallowedonlyiftheloopsthey
definecanbeunrolled,becausethereisnobranchinginVS1.1shaders.
Therearenosubroutinecallseither,soallfunctionsareinlined.Comparison
operatorsareallowed(>,<,>=,<=,==,!=)andBooleanoperators(||,&&,?:)
areallowed.However,thelogicoperators(&,|,^,~)arenotallowed.
Using Arrays
Variableindexingofarraysisallowedaslongasthearrayisauniform
constant.Forcompatibilityreasonsarraysindexedwithvariableexpressions
neednotbedeclaredconstjustuniform.However,writingtoanarraythatis
laterindexedwithavariableexpressionyieldsunpredictableresults.
Arraydataisnotpackedbecausevertexprogramindexingdoesnotpermit
it.Eachelementofthearraytakesasingle4floatprogramparameter
register.Forexample,floatarr[10],float2arr[10],float3arr[10],
andfloat4arr[10]allconsumetenprogramparameterregisters.
Itismoreefficienttoaccessanarrayofvectorsthananarrayofmatrices.
Accessingamatrixrequiresafloorcalculation,followedbyamultiplybya
constanttocomputetheregisterindex.Becausevectors(andscalars)take
oneregister,neitherthefloornorthemultiplyisneeded.Itisfastertodo
matrixskinningusingarraysofvectorswithapremultipliedindexthan
usingarraysofmatrices.
Constants
Literalconstantscanbeusedwiththisprofile,butitisnotpossibletostore
themintheprogramitself.Insteadthecompilerwillissue,ascomments,a
listofprogramparameterregistersandtheconstantsthatneedtobeloaded
intothem.TheCgruntimesystemwillhandleloadingtheconstants,as
directedbythecompiler.
306 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Note: If the Cg run-time system is not used, it is the responsibility of the programmer to
make sure that the constants are loaded properly.
Bindings
Binding Semantics for Uniform Data
The valid binding semantics for uniform parameters in the vs_1_1 profile are
summarized in Table 45.
Binding Semantics for Varying Input/Output Data
Thevalidbindingsemanticsforuniformparametersinthevs_1_1profileare
summarized in Table 46.ThesemaptotheinputregistersinDirectX8.1vertex
shaders.
Table 45. vs_1_1 Uniform Input Binding Semantics
Binding Semantics Name Corresponding Data
register(c0)–register(c95)
C0–C95 Constant register [0..95].
The aliases c0–c95 (lowercase) are also
accepted.
If used with a variable that requires more than
one constant register (for example, a matrix),
the semantic specifies the first register that is
used.
Table 46. vs_1_1 Varying Input Binding Semantics
Binding Semantics Name Corresponding Data
POSITION Vertex shader input register: v0
BLENDWEIGHT Vertex shader input register: v1
BLENDINDICES Vertex shader input register: v2
NORMAL Vertex shader input register: v3
PSIZE Vertex shader input register: v4
COLOR0, DIFFUSE Vertex shader input register: v5
808-00504-0000-006 307
NVIDIA
Appendix B Language Profiles
Thevalidbindingsemanticsforvaryingoutputparametersinthevs_1_X
profile.ThesemaptooutputregistersinDirectX8.1vertexshadersare
summarizedinTable 47.
Options
Whenusingthevs_1_1profileunderDirectX9itisnecessarytotellthe
compilertoproducedclstatementstodeclarevaryinginputs.Theoption
profileopts dclscausesdclstatementstobeaddedtothecompiler
output.
COLOR1, SPECULAR Vertex shader input register: v6
TEXCOORD0–TEXCOORD7 Vertex shader input register: v7–v14
TANGENTiVertex shader input register: v14
BINORMAL Vertex shader input register: v15
i. TANGENTisanaliasforTEXCOORD7.
Table 47. vs_1_1 Varying Output Binding Semantics
Binding Semantics Name Corresponding Data
POSITION Output position: oPos
PSIZE Output point size: oPts
FOG Output fog value: oFog
COLOR0–COLOR1 Output color values: oD0, oD1
TEXCOORD0–TEXCOORD7 Output texture coordinates: oT0–oT7
Table 46. vs_1_1 Varying Input Binding Semantics (continued)
Binding Semantics Name Corresponding Data
308 808-00504-0000-006
NVIDIA
Cg Language Toolkit
DirectX Pixel Shader 1.x Profiles (ps_1_*)
TheDirectXpixelshader1_XprofilesareusedtocompileCgsourcecodeto
DirectXPS1.1,PS1.2,orPS1.3pixelshaderassembly.
Profile names
ps_1_1 (forDirectXPS1.1pixelshaders)
ps_1_2 (forDirectXPS1.2pixelshaders)
ps_1_3 (forDirectXPS1.3pixelshaders)
How to invoke:Usethecompileroptions
-profile ps_1_1
-profile ps_1_2
-profile ps_1_3
Thedeprecatedprofiledx8psisalsoavailableandissynonymouswith
ps_1_1.
ThisdocumentdescribesthecapabilitiesandrestrictionsofCgwhenusing
theDirectXpixelshader1_Xprofiles.
Overview
DirectXPS1.4isnotcurrentlysupportedbyanyCgprofile;allstatements
aboutps_1_Xintheremainderofthisdocumentreferonlytops_1_1,
ps_1_2andps_1_3.
Theunderlyinginstructionsetandmachinearchitecturelimit
programmabilityintheseprofilescomparedtowhatisallowedbyCg
constructs9.Thus,theseprofilesplaceadditionalrestrictionsonwhatcanand
cannotbedoneinaCgprogram.
ThemaindifferencesbetweentheseprofilesfromtheCgperspectiveisthat
additionaltextureaddressingoperationsareexposedinps_1_2andps_1_3
andthedepthvalueoutputismadeavailable(inalimitedform)inps_1_3.
OperationsintheDirectXpixelshader1_Xprofilescanbecategorizedas
textureaddressingoperationsandarithmeticoperations.Textureaddressing
operationsareoperationswhichgeneratetextureaddressinginstructions,
arithmeticoperationsareoperationswhichgeneratearithmeticinstructions.
ACgprograminoneoftheseprofilesislimitedtogeneratingamaximumof
fourtextureaddressinginstructionsandeightarithmeticinstructions.Since
9. Formoredetailsabouttheunderlyinginstructionsets,theircapabilities,andtheir
limitations,refertotheMSDNdocumentationofDirectXpixelshaders1.1,1.2and1.3.
808-00504-0000-006 309
NVIDIA
Appendix B Language Profiles
thesenumbersarequitesmall,usersneedtobeveryawareofthislimitation
whilewritingCgcodefortheseprofiles.
Therearecertainsimplearithmeticoperationsthatcanbeappliedtoinputs
oftextureaddressingoperationsandtoinputsandoutputsofarithmetic
operationswithoutgeneratinganarithmeticinstruction.Fromhereon,these
operationsarereferredtoasinputmodifiersandoutputmodifiers.
Theps_1_Xprofilesalsorestrictwhenatextureaddressingoperationor
arithmeticoperationcanoccurintheprogram.Atextureaddressing
operationmaynothaveanydependencyontheoutputofanarithmetic
operationunless
Thearithmeticoperationisavalidinputmodifierforthetexture
addressingoperation.
Thearithmeticoperationispartofacomplextextureaddressing
operation(whicharesummarizedinthesectiononAuxiliaryTexture
Functions).
Modifiers
Inputandoutputmodifiersmaybeusedtoperformsimplearithmetic
operationswithoutgeneratinganarithmeticinstruction.Instead,the
arithmeticoperationmodifiestheassemblyinstructionorsourceregistersto
whichitisapplied.Forexample,thefollowingCgexpression:
z = (x - 0.5 + y) / 2
couldgeneratethefollowingpixelshaderinstruction(assumingxisint0,y
isint1,andzisinr0):
add_d2 r0, t0_bias, t1
HowdifferentDirectXpixelshader1_Xinstructionsetmodifiersare
expressedinCgprogramsaresummarizedinTable 48.Formoredetailson
thecontextinwhicheachmodifierisallowedandwaysinwhichmodifiers
maybecombinedrefertotheDirectXpixelshader1_Xdocumentation.
Table 48. ps_1_x Instruction Set Modifiers
Instruction/Register
Modifier Cg Expression
instr_x2 2*x
instr_x4 4*x
instr_d2 x/2
310 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Language Constructs and Support
Data Types
Intheps_1_Xprofiles,operationsoccuronsignedclampedfloatingpoint
valuesintherangeMaxPixelShaderValuetoMaxPixelShaderValue,where
MaxPixelShaderValueisdeterminedbytheDirectXimplementation.These
profilesallowalldatatypestobeused,butalloperationsarecarriedoutin
theaboverange.RefertotheDirectXpixelshader1_Xdocumentationfor
moredetails.
Statements and Operators
TheDirectXpixelshader1_XprofilessupportalloftheCglanguage
constructs,withthefollowingexceptions:
Arbitraryswizzlesarenotsupported(thougharbitrarywritemasksare).
Onlythefollowingswizzlesareallowed
.x/.r .y/.g .z/.b .w/.a
.xy/.rg .xyz/.rgb .xyzw/.rgba
.xxx/.rrr .yyy/.ggg .zzz/.bbb .www/.aaa
.xxxx/.rrrr .yyyy/.gggg .zzzz/.bbbb .wwww/.aaaa
Matrixswizzlesarenotsupported.
Booleanoperatorsotherthan<,<=,>and>=arenotsupported.
Furthermore,<,<=,>and>=areonlysupportedastheconditioninthe
?:operator.
Bitwiseintegeroperatorsarenotsupported.
/isnotsupportedunlessthedivisorisanonzeroconstantoritisused
tocomputethedepthoutputinps_1_3.
instr_sat saturate(x) (i.e. min(1, max(0, x))
reg_bias x-0.5
1-reg 1-x
-reg -x
reg_bx2 2*(x-0.5)
Table 48. ps_1_x Instruction Set Modifiers (continued)
Instruction/Register
Modifier Cg Expression
808-00504-0000-006 311
NVIDIA
Appendix B Language Profiles
%isnotsupported.
Ternary ?:issupportedifthebooleantestexpressionisacompiletime
booleanconstant,auniformscalarbooleanorascalarcomparisontoa
constantvalueintherange[0.5,1.0](forexample,a > 0.5 ? b : c).
do,for,and whileloopsaresupportedonlywhentheycanbe
completelyunrolled.
arrays,vectors,andmatricesmaybeindexedonlybycompiletime
constantvaluesorindexvariablesinloopsthatcanbecompletely
unrolled.
Thediscardstatementisnotsupported.Thesimilarbutlessgeneral
clip()functionissupported.
Theuseofanallocation-rule-identifierforaninputoroutput
structisoptional.
Standard Library Functions
BecausetheDirectXpixelshader1_Xprofileshavelimitedcapabilities,not
alloftheCgstandardlibraryfunctionsaresupported.Table 49.presentsthe
Cgstandardlibraryfunctionsthataresupportedbytheseprofiles.Seethe
standardlibrarydocumentationfordescriptionsofthesefunctions.
Table 49. Supported Standard Library Functions
dot(floatN, floatN)
lerp(floatN, floatN, floatN)
lerp(floatN, floatN, float)
tex1D(sampler1D, float)
tex1D(sampler1D, float2)
tex1Dproj(sampler1D, float2)
tex1Dproj(sampler1D, float3)
tex2D(sampler2D, float2)
tex2D(sampler2D, float3)
tex2Dproj(sampler2D, float3)
tex2Dproj(sampler2D, float4)
tex3D(sampler3D, float3)
312 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Note: The non-projective texture lookup functions are actually done as projective
lookups on the underlying hardware. Because of this, the w component of the
texture coordinates passed to these functions from the application or vertex
program must contain the value 1.
Texturecoordinateparametersforprojectivetexturelookupfunctionsmust
haveswizzlesthatmatchtheswizzledonebythegeneratedtexture
addressinginstruction.Whilethismayseemburdensome,itisintendedto
allowps_1_Xprofileprogramstobehavecorrectlyunderotherpixelshader
profiles.
Theswizzlesrequiredonthetexturecoordinateparametertotheprojective
texturelookupfunctionsarelistedinTable 50.
Bindings
Manual Assignment of Bindings
TheCgcompilercandeterminebindingsbetweentextureunitsanduniform
samplerparameters/texturecoordinateinputsautomatically.Thisautomatic
assignmentisbasedonthecontextinwhichuniformsamplerparameters
andtexturecoordinateinputsareusedtogether.
tex3Dproj(sampler3D, float4)
texCUBE(samplerCUBE, float3)
texCUBEproj(samplerCUBE, float4)
Table 50. Required Projective Texture Lookup Swizzles
Texture Lookup Function Texture Coordinate Swizzle
tex1Dproj .xw/.ra
tex2Dproj .xyw/.rga
texRECTproj .xyw/.rga
tex3Dproj .xyzw/.rgba
texCUBEproj .xyzw/.rgba
Table 49. Supported Standard Library Functions (continued)
808-00504-0000-006 313
NVIDIA
Appendix B Language Profiles
Tospecifybindingsbetweentextureunitsanduniformparameters/texture
coordinatestomatchtheirapplication,allsampleruniformparametersand
texturecoordinateinputsthatareusedintheprogrammusthavematching
bindingsemantics—thatis,TEXUNIT<n>mayonlybeusedwith
TEXCOORD<n>.
Partiallyspecifiedbindingsemanticsmaynotworkinallcases.
Fundamentally,thisrestrictionisduetotheclosecouplingbetweentexture
samplersandtexturecoordinatesinDirectXpixelshaders1_X.
Binding Semantics for Uniform Data
Ifabindingsemanticforauniformparameterisnotspecifiedthenthe
compilerwillallocateoneautomatically.Scalaruniformparametersmaybe
allocatedtoeitherthexyzorthewportionofaconstantregisterdepending
onhowtheyareusedwithintheCgprogram.Whenusingtheoutputofthe
compilerwithouttheCgruntime,youmustsetallvaluesofascalaruniform
tothedesiredscalarvalue,notjustthexcomponent.
Thevalidbindingsemanticsforuniformparametersintheps_1_Xprofiles
aresummarizedinTable 51.
Binding Semantics for Varying Input/Output Data
Thevaryinginputbindingsemanticsintheps_1_Xprofilesarethesameas
thevaryingoutputbindingsemanticsofthevs_1_1profile.
Varyinginputbindingsemanticsintheps_1_XprofilesconsistofCOLOR0,
COLOR1,TEXCOORD0,TEXCOORD1,TEXCOORD2andTEXCOORD3.Thesemapto
outputregistersinDirectXvertexshaders.
Table 51. ps_1_x Uniform Input Binding Semantics
Binding Semantics Name Corresponding Data
register(s0)—register(s3)
TEXUNIT0—TEXTUNIT3 Texture unit N, where N is in range [0..3].
May be used only with uniform inputs with
sampler* types.
register(c0)–register(c7)
C0–C7 Constant register [0..7]
314 808-00504-0000-006
NVIDIA
Cg Language Toolkit
Thevalidbindingsemanticsforvaryinginputparametersintheps_1_X
profilesaresummarizedinTable 52.
Additionally,theps_1_XprofilesallowPOSITION,FOG,PSIZE,TEXCOORD4,
TEXCOORD5,TEXCOORD6,andTEXCOORD7tobespecifiedonvaryinginputs,
providedtheseinputsarenotreferenced.ThisallowsCgprogramstohave
thesamestructurespecifythevaryingoutputofavs_1_1profileprogram
andthevaryinginputofaps_1_Xprofileprogram.
Thevalidbindingsemanticsforvaryingoutputparametersintheps_1_X
profilearesummarizedinTable 53.
Theoutputdepthvalueisspecialinthatitmayonlybeassignedavaluein
theps_1_3profile,andmustbeoftheform
...
float4 t = <texture addressing operation>;
float z = dot(texCoord<n>, t.xyz);
float w = dot(texCoord<n+1>, t.xyz);
depth = z / w;
...
Table 52. ps_1_x Varying Input Binding Semantics
Binding Semantics Name Corresponding Data
COLOR, COLOR0
COL, COL0 Input color value v0
COLOR1
COL1 Input color value v1
TEXCOORD0—TEXCOORD3
TEX0—TEX3 Input texture coordinates t0–t3
Table 53. ps_1_x Varying Output Binding Semantics
Binding Semantics Name Corresponding Data
COLOR, COLOR0
COL, COL0 Output color (float4)
DEPTH
DEPR Output depth (float)
808-00504-0000-006 315
NVIDIA
Appendix B Language Profiles
Auxiliary Texture Functions
Becausethecapabilitiesofthetextureaddressinginstructionsarelimitedin
DirectXpixelshader1_X,asetofauxiliaryfunctionsisprovidedinthese
profilesthatexpressthefunctionalityofthemorecomplextexture
addressinginstructions.Thesefunctionsareprovidedmerelyasa
convenienceforwritingps_1_XCgprograms.Thesameresultcanbe
achievedbywritingtheexpandedformofeachfunctiondirectly.The
expandedformhastheaddedadvantageofbeingsupportedonother
profiles.
ThesefunctionsaresummarizedinTable 54.
Table 54. ps_1_x Auxiliary Texture Functions
Texture Function
Description
offsettex2D(uniform sampler2D tex, float2 st,
float4 prevlookup, uniform float4 m)
Performs the following:
float2 newst = st + m.xy * prevlookup.xx + m.zw * prevlookup.yy;
return tex2D(tex, newst);
where
st are texture coordinates associated with sampler tex,
prevlookup is the result of a previous texture operation, and
m is the 2-D bump environment mapping matrix.
This function can generate the texbem instruction in all ps_1_X profiles.
offsettex2DScaleBias(uniform sampler2D tex, float2 st,
float4 prevlookup, uniform float4 m,
uniform float scale, uniform float bias)
Performs the following:
float2 newst = st + m.xy * prevlookup.xx + m.zw * prevlookup.yy;
float4 result = tex2D(tex, newst);
return result * saturate(prevlookup.z * scale + bias);
where
st are texture coordinates associated with sampler tex,
prevlookup is the result of a previous texture operation,
m is the 2-D bump environment mapping matrix,
scale is the 2-D bump environment mapping scale factor, and
bias is the 2-D bump environment mapping offset.
This function can generate the texbeml instruction in all ps_1_X profiles.
316 808-00504-0000-006
NVIDIA
Cg Language Toolkit
tex1D_dp3(sampler1D tex, float3 str, float4 prevlookup)
Performs the following:
return tex1D(tex, dot(str, prevlookup.xyz));
where
str are texture coordinates associated with sampler tex, and
prevlookup is the result of a previous texture operation.
This function can be used to generate the texdp3tex instruction in the
ps_1_2 and ps_1_3 profiles.
tex2D_dp3x2(uniform sampler2D tex, float3 str,
float4 intermediate_coord, float4 prevlookup)
Performs the following:
float2 newst = float2(dot(intermediate_coord.xyz, prevlookup.xyz),
dot(str, prevlookup.xyz));
return tex2D(tex, newst);
where
str are texture coordinates associated with sampler tex,
prevlookup is the result of a previous texture operation, and
intermediate_coord are texture coordinates associated with the previous
texture unit.
This function can be used to generate the texm3x2pad/texm3x2tex
instruction combination in all ps_1_X profiles.
tex3D_dp3x3(sampler3D tex, float3 str,
float4 intermediate_coord1,
float4 intermediate_coord2, float4 prevlookup)
texCUBE_dp3x3(samplerCUBE tex, float3 str,
float4 intermediate_coord1,
float4 intermediate_coord2, float4 prevlookup)
Performs the following:
float3 newst = float3(dot(intermediate_coord1.xyz, prevlookup.xyz),
dot(intermediate_coord2.xyz, prevlookup.xyz),
dot(str, prevlookup.xyz));
return tex3D/CUBE(tex, newst);
where
Table 54. ps_1_x Auxiliary Texture Functions (continued)
Texture Function
Description
808-00504-0000-006 317
NVIDIA
Appendix B Language Profiles
str are texture coordinates associated with sampler tex,
prevlookup is the result of a previous texture operation,
intermediate_coord1 are texture coordinates associated with the n-2
texture unit, and
intermediate_coord2 are texture coordinates associated with the n-1
texture unit.
This function can be used to generate the texm3x3pad/texm3x3pad/
texm3x3tex instruction combination in all ps_1_X profiles.
texCUBE_reflect_dp3x3(uniform samplerCUBE tex, float4 strq,
float4 intermediate_coord1,
float4 intermediate_coord2,
float4 prevlookup)
Performs the following:
float3 E = float3(intermediate_coord2.w, intermediate_coord1.w,
strq.w);
float3 N = float3(dot(intermediate_coord1.xyz, prevlookup.xyz),
dot(intermediate_coord2.xyz, prevlookup.xyz),
dot(strq.xyz, prevlookup.xyz));
return texCUBE(tex, 2 * dot(N, E) / dot(N, N) * N - E);
where
strq are texture coordinates associated with sampler tex,
prevlookup is the result of a previous texture operation,
intermediate_coord1 are texture coordinates associated with the n-2
texture unit, and
intermediate_coord2 are texture coordinates associated with the n-1
texture unit.
This function can be used to generate the texm3x3pad/texm3x3pad/
texm3x3vspec instruction combination in all ps_1_X profiles.
Table 54. ps_1_x Auxiliary Texture Functions (continued)
Texture Function
Description
318 808-00504-0000-006
NVIDIA
Cg Language Toolkit
texCUBE_reflect_eye_dp3x3(uniform samplerCUBE tex,
float3 str, float4 intermediate_coord1,
float4 intermediate_coord2,
float4 prevlookup, uniform float3 eye)
Performs the following:
float3 N = float3(dot(intermediate_coord1.xyz, prevlookup.xyz),
dot(intermediate_coord2.xyz, prevlookup.xyz),
dot(coords.xyz, prevlookup.xyz));
return texCUBE(tex, 2 * dot(N, E) / dot(N, N) * N - E);
where
strq are texture coordinates associated with sampler tex,
prevlookup is the result of a previous texture operation,
intermediate_coord1 are texture coordinates associated with the n-2
texture unit,
intermediate_coord2 are texture coordinates associated with the n-1
texture unit, and
eye is the eye-ray vector.
This function can be used to generate the texm3x3pad/texm3x3pad/
texm3x3spec instruction combination in all ps_1_X profiles.
tex_dp3x2_depth(float3 str, float4 intermediate_coord,
float4 prevlookup)
Performs the following:
float z = dot(intermediate_coord.xyz, prevlookup.xyz);
float w = dot(str, prevlookup.xyz);
return z / w;
where
str are texture coordinates associated with the nth texture unit,
intermediate_coord are texture coordinates associated with the n-1
texture unit, and
prevlookup is the result of a previous texture operation.
This function can be used with the DEPTH varying out semantic to generate the
texm3x2pad/texm3x2depth instruction combination in ps_1_3.
Table 54. ps_1_x Auxiliary Texture Functions (continued)
Texture Function
Description
808-00504-0000-006 319
NVIDIA
Appendix B Language Profiles
Examples
ThefollowingexamplesillustratehowadevelopercanuseCgtoachieve
DirectXpixelshader1_Xfunctionality.
Example 1
Example 2
struct VertexOut {
float4 color : COLOR0;
float4 texCoord0 : TEXCOORD0;
float4 texCoord1 : TEXCOORD1;
};
float4 main(VertexOut IN,
uniform sampler2D diffuseMap,
uniform sampler2D normalMap) : COLOR
{
float4 diffuseTexColor = tex2D(diffuseMap, IN.texCoord0.xy);
float4 normal = 2 * (tex2D(normalMap, IN.texCoord1.xy)-0.5);
float3 light_vector = 2 * (IN.color.rgb - 0.5);
float4 dot_result = saturate(dot(light_vector,
normal.xyz).xxxx);
return dot_result * diffuseTexColor;
}
struct VertexOut {
float4 texCoord0 : TEXCOORD0;
float4 texCoord1 : TEXCOORD1;
float4 texCoord2 : TEXCOORD2;
float4 texCoord3 : TEXCOORD3;
};
float4 main(VertexOut IN,
uniform sampler2D normalMap,
uniform sampler2D intensityMap,
uniform sampler2D colorMap) : COLOR
{
float4 normal = 2 * (tex2D(normalMap, IN.texCoord0.xy)-0.5);
float2 intensCoord = float2(
dot(IN.texCoord1.xyz, normal.xyz),
dot(IN.texCoord2.xyz, normal.xyz));
float4 intensity = tex2D(intensityMap, intensCoord);
float4 color = tex2D(colorMap, IN.texCoord3.xy);
return color * intensity;
}
320 808-00504-0000-006
NVIDIA
Cg Language Toolkit
808-00504-0000-006 321
NVIDIA
Appendix C
Nine Steps to High-Performance Cg
WritingCgcodethatcompilestoefficientprogramsrequirestechniquesand
approachesthataredifferentfromefficientprogramminginC,C++,orJava.
Whilesomeofthebasiclessonsarethesame(suchasusingefficient
underlyingalgorithms),thehardwareprogrammingmodelofmodernGPUs
issubstantiallydifferentfromthatofmodernCPUs.Thiscanleadto
pitfalls—whereyoumaybedisappointedbyyourshadersperformance—as
wellastoopportunities—whereyoucanpushtheGPUtoitslimitsthough
carefulprogramming.
TheCglanguageshieldsyoufromthemajorityofthelowleveldetailsof
GPUhardware,enablingyoutothinkaboutyourshadersatahigherlevel
thanthelowlevelGPUinstructionsets.However,justasanunderstanding
ofmoderncomputerarchitecture(suchascacheandmemoryhierarchy
issues)isimportantforwritingfastCandC++code,understandingabit
abouttheGPUcanhelpyouwritebetterCgcode.Thisappendixfocuseson
techniquesformaximizingperformancefromvertexandfragmentprograms
writteninCgandrunningontheNVIDIAGeForceFXarchitecture
(specificallythevp30,fp30,arbfp1,ps_2_0,ps_2_x,vs_2_0,andvs_2_x
profiles),althoughmanyoftheprinciplesaremorebroadlyapplicable.
1. Program for Vectorization
TheGPUcangenerallyperformfourarithmeticoperationsasquicklyasit
canperformasingleoperation.Therefore,ifyouhavetwovectorsoffour
floatingpointvalues,
youcanaddthetwovectorstogether
float4 a, b;
float4 c = a+b;
322 808-00504-0000-006
NVIDIA
Cg Language Toolkit
withnomorecomputationalexpensethanaddingtogethertwooftheir
elements
Thishastwoimplicationsforefficientprogramming.First,youshouldtryto
writecodethatnaturallymapstothesevectoroperations.Ifyouwanttoadd
twofloat4variablestogether,itmaybesubstantiallylessefficienttowriteit
thisway:
thantowriteitthisway:
Thecompilerdoesitsbesttofindvectorizationinyourprograms,butthe
morevectorizedyouroriginalcodeis,thebetterstartingplaceithastowork
from.
Amorespecificexamplecomesfromacommoncomputationdonefor
tangentspacebumpmapping.Givenatexturemapthatencodesabump
mapbystoringtheoffsetalongthetangentdirectioninx,theoffsetalongthe
binormaliny,andtheoffsetalongthenormalinz,thebumpmapped
normaliscomputedbyscalingthetangent,binormal,andnormal
appropriately.InCorC++,thenaturalwaytowritethiscomputationisas
shown:
However,herewehavewrittenaseriesofcomputationsthataddand
multiplysinglepairsoffloatingpointvaluesatatime.Afteralittlealgebra,
wecanrewritethisasthreemultipliesofafloat3andafloatandtwo
float3additions—whichrunsseveraltimesfasterthantheoriginal!
float d = a.x + b.x;
float4 c = float4(a.x + b.x, a.x + b.y, a.z + b.z,
a.w + b.w);
float4 c = a+b;
// Tangent, binormal, normal. Passed in from vertex program.
Float3 T, B, N;
Float3 Nbump; // Bump-mapped normal
Float3 bump = tex2D(bumpSampler, uv);
Nbump.x = bump.x * T.x + bump.y * B.x + bump.z * N.x;
Nbump.y = bump.x * T.y + bump.y * B.y + bump.z * N.y;
Nbump.z = bump.x * T.z + bump.y * B.z + bump.z * N.z;
Nbump = bump.x * T + bump.y * B + bump.z * N;
808-00504-0000-006 323
NVIDIA
Appendix C Nine Steps to High-Performance Cg
2. Use Swizzles to Make the Most of Vectorization
TheGPUcanswizzlethevaluesinvectorswithnoperformancepenalty
(recallthataswizzlecanbeusedtorearrangetheelementsofavector).
Givenavector:
swizzlesconstructnewvectors:
andsoforth.Byswizzlingyourdatacarefully,youcanstilltakeadvantageof
vectorization,evenwhenyoudon’twanttousethesamecomponentofboth
vectorsonbothsidesofyourcomputation.Forexample,considerthe
computationofthecrossproduct.Giventwothreedimensionalvectors,the
crossproductreturnsanewvectorthatisperpendiculartothegivenvectors.
Itiscomputedby
Herewe’veagaingotalotofarithmeticoperations,eachusingasinglepair
offloatvalues.Someclevernessletsusturnthisintoavectorizedoperation.
Belowistheimplementationofthecross()functionfromtheCgStandard
Library,requiringjusttwovectormultiplyoperationsandonevector
subtractionoperation:
Confirmforyourselfthatthiscomputesthesamevalueasthefirstsectionof
codeforthecrossproduct;notethatitexposesmuchmorevectorized
computationfortheGPUtoefficientlyprocess.
float3 a = float3(0, 1, 2);
a.xxx = float3(0, 0, 0);
a.yzz = float3(1, 2, 2);
a.zy = float2(2, 1);
float3 a, b;
float3 c = float3(a.y*b.z - a.z*b.y, a.z*b.x - a.x*b.z,
a.x*b.y - a.y*b.x);
float3 cross(float3 a, float3 b) {
return a.yzx * b.zxy - a.zxy * b.yzx;
}
324 808-00504-0000-006
NVIDIA
Cg Language Toolkit
3. Use the Cg Standard Library
ThefunctionsintheCgStandardLibraryhavebeencarefullywrittenfor
bothefficiencyandcorrectness.ByusingStandardLibraryfunctionswhen
appropriate,youcanautomaticallytakeadvantageoftheworkthatwent
intomakingsuretheycompiletofastcodeonGPUswhileyouconcentrate
onthehardproblemsyou’resolvinginyourownshaders.
ParticularlyfastStandardLibraryfunctionsincludedot(),whichcomputes
thedotproductoftwovectors,abs(),whichcomputestheabsolutevalueof
avariable,saturate(),whichclampsavaluetobebetweenzeroandone,
andmin()andmax(),whichreturntheminimumandmaximumofapairof
values.Youwon’tbeabletowritemoreefficientimplementationsofthese
functionsthantheStandardLibraryprovidesbecausemanyofthemcompile
directlytoGPUassemblylanguageinstructions.Writingadotproduct
functionofyourown,
compilestoahandfulofinstructions,whilethebuiltindot()function
compilestoasinglespecializeddotproductinstruction.There’snootherway
togettothisinstructionotherthanbyusingtheStandardLibrary.
Twofunctionsdeserveparticularattention.Theabs()functionusuallyhas
nocostineithervertexorfragmentprogramsbecausetheGPUcanevaluate
thefunctionwhileexecutingotherinstructions.Similarly,thesaturate()
functionusuallyhasnocostinfragmentprograms.Donothesitatetouse
thesefunctionswhenappropriate.
4. Use Texture Maps to Encode Complex Functions
Forprofilesthatsupporttexturemaps,filteredtexturemaplookupsare
extraordinarilyefficient.Ifyouhaveacomplexfunctionthattakesmorethan
ahandfulofarithmeticoperationstoevaluate,youmightwanttoencodethe
functioninatexturemap.Saythatyouhavewrittenafunctionf(x,y)that
isabottleneckinyourshader.Assumefornowthatitisalwayscalledwith
valuesofxandybetweenzeroandone,andthatthevaluethatf(x,y)
computesisalwaysbetweenzeroandone.Ifthefunctionisreasonably
smoothandyoudon’tneedtocomputeitatextremelyhighprecision,you
float mydot(float3 a, float3 b) {
return a.x*b.x + a.y*b.y + a.z*b.z;
}
808-00504-0000-006 325
NVIDIA
Appendix C Nine Steps to High-Performance Cg
canprecomputethefunctioninyourapplicationandstoreitinatexture
map,replacingcallslike
withcodelike
Thismethodcanalsobeappliedtoone‐andthreedimensionalfunctions,
using1Dand3Dtexturemaps.
Moregenerally,thevaluesyoupasstothefunctionmaynotbeintherange
[0,1],andthevaluesyourfunctionreturnsmaynotbeintherange[0,1].
Inthiscase,thefollowingtwoutilityfunctionscanserveasabase:
remapTo01()remapstherange[low,high]into[0,1],remapFrom01()
doestheopposite.
Don’tforgetvectorizationhereaswell.Iftwofloatvaluedfunctionshave
thesamedomainandrange,youcanpackthemintotwotexturecomponents
ofthesametexture.Onlyonetexturelookupisneededtoloadthemboth,
andvectorizedversionsoftheremap*()canbeusedtodotheremapping
moreefficientlyaswell.
5. Use Data Types with Minimum Sufficient Precision
Forprofilesthatsupportmultipleprecisions,ageneralruleofthumbisthat
ifyoucandoacomputationwithfixedprecisionvariables,thecomputation
isfasterthanifyouusehalf;andifyouusehalf,thecomputationisfaster
thanifyouusefloat.Althoughsometimesyouneedtherangeandextra
precisionthathalfandfloatoffer,youshouldavoidusingthemunless
necessary.
float val = f(x, y);
float val = tex2D(fSampler, float2(x, y)).x;
float4 remapTo01(float4 v, float4 low, float4 high) {
return saturate((v - low)/(high-low));
}
float4 remapFrom01(float4 v, float4 low, float4 high) {
return lerp(low, high, v);
}
326 808-00504-0000-006
NVIDIA
Cg Language Toolkit
6. Use the Right Standard Library Routines for Shading
Computations
Ifyoureimplementingashadingmodel(suchasLambertian,Blinn,or
Phong),youllgenerallybeperformingsomedotproductroutines,clamping
negativeresultstozero,andraisingsomeofthevaluestoapower,to
computeaspecularexponent.Thereareafewtricksthatcanspeedupthis
process:
Besuretousethedot()functionwhencomputingdotproducts.
Ifyouneedtoclamptheresultofadotproductcomputationtotherange
[0,1]inafragmentprogram,usethesaturate()functioninsteadof
max().Thisisoftenwrittenasmax(0,dot(N,L)),butaslongastheN
andLvectorsarenormalized,thiscanbewrittenequivalentlyas
saturate(dot(N,L))becausethedotproductoftwonormalized
vectorsisnevergreaterthanone.Giventhatsaturate()isfreein
fragmentprograms(see“3.UsetheCgStandardLibrary”onpage 324),
thiscompilestomoreefficientcode.
Usethelit()StandardLibraryfunction,ifappropriate.Thelit()
functionimplementsadiffuseglossyBlinnshadingmodel.Ittakesthree
parameters:
ªThedotproductofthenormalizedsurfacenormalandthelight
vector
ªThedotproductofahalfanglevectorandthenormal
ªThespecularexponent
Itreturnsa4vector,where
ªThexandwcomponentsarealwaysone.
ªTheycomponentisequaltothediffusedotproductortozeroifthe
productislessthanzero.
ªThezcomponentisequaltothespeculardotproductraisedtothe
givenexponentortozeroifthediffusedotproductwaslessthan
zero.
Allthisisdonesubstantiallymoreefficientlythanifthecorresponding
operationswerewrittenoutinCgcode.
808-00504-0000-006 327
NVIDIA
Appendix C Nine Steps to High-Performance Cg
7. Take Advantage of the Different Levels of
Computation Frequency
Alwayskeepinmindthefactthatfragmentprogramsgenerallyareexecuted
manymoretimesthanvertexprograms.Therefore,movecomputationfrom
fragmentprogramsintovertexprogramswheneverpossible.Recallthat
varyingoutputsfromvertexprogramsareautomaticallylinearly
interpolatedbeforebeingpassedtothefragmentprogram.
Therearethreemaincaseswhereyoucanmovecomputationfroma
fragmentprogramintoavertexprogram:
Theresultisconstantoverallfragments
Ifthevertexshadercomputesavaluethatisthesameforallvertices,so
thatallfragmentsreceivethesamevalueafterinterpolation,any
computationthatthefragmentshadersdothatisbasedsolelyonsuch
valuescanbemovedtothevertexshader(aslongasitdoesn’trequire
texturemaplookupsorotherfragmentonlyoperations).
Theresultislinearacrossatriangle.
Ifthefragmentshaderiscomputingavaluethatvarieslinearlyoverthe
faceofthetriangle(forexample,thedistancefromthefragmenttoalight
source,tobeusedforattenuation),thevaluecanbecomputedinthe
vertexshaderateachvertex,passedtothefragmentshader,and
automaticallyinterpolatedbytheGPUalongtheway.
Theresultisnearlylinearacrossatriangle.
Whenavaluecomputedbyafragmentshadervariesslowlyover
triangles,itmaybeanacceptableapproximationtocomputeitsvalueat
eachvertexanduseitslinearlyinterpolatedvalueinthefragment
shader.Forexample,theusualGouraudshadingalgorithmtakes
advantageofthissituationtocomputelightingpervertex,ratherthan
perpixel.
Inasimilarmanner,itmaybeadvantageoustomoveanyvertexshader
computationthatissolelydependentonthevaluesofuniformparametersto
theCPUandthentopasstheresultofthecomputationintothevertexshader
withdifferentuniformparameters.Forexample,ifthevertexshaderis
passedafloat3vectorgivingthedirectionofadistantlightsource,the
vectorshouldbenormalizedontheCPUandpassedtothevertexshader.
Thisavoidstheneedtorepeatedlyandunnecessarilyrecompute
normalize(lightvector)inthevertexshader.
328 808-00504-0000-006
NVIDIA
Cg Language Toolkit
8. Avoid Matrix Transposes Just for Multiplication
Computingthetransposeofamatrixcanoftenbeavoided.Ifyouwouldlike
tomultiplytransposedfloat3x3matrixmbyafloat3 v,
isequivalenttoandmoreefficientthan
9. Minimize Conditional Code in Fragment Programs
GPUsdon’tcurrentlysupportbranchinginfragmentprograms;aprogram
withalargeamountofcodethatisconditionallyexecuted—forexamplein
anif/elseexpression—tendstorunatthesamespeedasifallofitwere
executed.Therefore,ifyouhavealargeamountofconditionalcodeanditis
possibletoevaluatetheconditionontheCPU,itmaybeadvantageousto
havemultipleversionsoftheshadersourcecodeandtobindtheonewith
theappropriatecodepathatruntime.
Anexampleofthissituationwouldbeafragmentshaderthatsupporteda
genericlightsourcemodelforshading.Dependingonhowitsparameters
wereset,itmightimplementapointlight,aspotlight,oralightsourcethat
projectedatexturemaptodeterminethelightdistribution.Ratherthan
havingaseriesofif/elseteststodeterminewhichlightmodeltouse,
havingaseparateversionoftheshaderforeachlighttypeisgenerallymore
efficient.
mul(v, m);
mul(transpose(m), v);
808-00504-0000-006 329
NVIDIA
Appendix D
Cg Compiler Options
ThisappendixdescribesthecommandlineoptionsfortheCgcompiler.
WhatfollowsarethecommandlineoptionsfortheCgcompiler,cgc.exe:
-profile prof
Compilefortheprofprofile.
-profileopts profopts
Specifyacommaseparatedlistofprofilespecificoptions.Seetheprofile
specificationforvalidoptions.
-entry fname
Specifythemainfunctionnameasfname.
-o fname
Writetheoutputtofilefname.
-Dmacro[=value]
Defineamacro,withoptionalvalue.
-Ipathname
Specifypathtoanincludedirectory.
-l filename
Writecompilermessagestofilenameratherthantostandardoutput.
-strict
Enforcestricttypechecking.
-nofx
DonottreatCgFXkeywordsasreservedwords.
-quiet
Suppressprintingtheheadertostdout.
-nocode
Compile,butdonotgenerateanycode.
-nostdlib
Donotincludethestdlib.hheaderfilebeforecompilation.
330 808-00504-0000-006
NVIDIA
Cg Language Toolkit
-longprogs
Allowcodegenerationthatislongerthanaprofile’slimit.
-debug
Activatethedebug()function.
-v
Printthecompilersversiontostdout.
-h
Printashorthelpmessage.
-maxunrollcount N
SetthemaximumloopunrollcounttoN.LoopswithgreaterthanN
iterationsarenotunrolled.Defaultsto256.
-posinv
Generateapositioninvariantvertexprogramifpositioninvarianceis
supportedbythecurrentprofile.
808-00504-0000-006 331
NVIDIA
Index
A
abs() for performance 324
animation of geometry 202
anisotropic lighting
sample shader 190
vertex shader code example 191
Annotation 118
ANSI C
differences from Cg 222
relation to Cg 221
arbfp1 profile 263
arbvp1 profile 256
arithmetic operators 20, 248
arithmetic precision 246
arithmetic range 246
array type, specification 230
arrays
declaration and use of 238
support of 14
B
binding semantics 242
defined 6
overview 241
Blinn-Phong Bump-Mapping 175
bool data type 11
bool type, specification 229
boolean operators 21, 248
built-in functions 33
bump dot3x2 diffuse and specular
pixel shader code example 194
sample shader 192
vertex shader code example 193
bump-reflection mapping
pixel shader code example 199
sample shader 196
vertex shader code example 197
C
C preprocessor
supporting 241
C++, relation to Cg 221
Car Paint 9
pixel shader code example 186
vertex shader code example 184
cfloat type, specification 229
Cg brief tutorial 145
defined 1
language, introduction 1
necessity for xiv
standard library functions 33
Cg compiler
cgc.exe 329
command-line options 329
Cg runtime
API specific 72
benefits 44
compiling 46
context creation 46
Direct3D 85
cgD3D9GetLastError() 115
CGerror 114
debugging mode 112
error callbacks 116
error testing 115
error types 114
Direct3D
cgD3D9EnableDebugTracing() 114
Direct3D
cgD3D9TranslateHRESULT() 116
Direct3D expanded interface 98
cgD3D8LoadProgram() 103
cgD3D8SetSamplerState() 102
cgD3D9BindProgram() 105
cgD3D9EnableParameterShadowing()
103
cgD3D9GetDevice() 98
cgD3D9GetLatestPixelProfile() 105
cgD3D9GetLatestVertexProfile() 105
332 808-00504-0000-006
NVIDIA
Cg Language Toolkit
cgD3D9GetOptimalOptions() 105
cgD3D9IsParameterShadowingEnable
d() 103
cgD3D9IsProgramLoaded() 104
cgD3D9LoadProgram() 103
cgD3D9SetDevice() 98
cgD3D9SetSamplerState() 102
cgD3D9SetTexture() 102
cgD3D9SetTextureWrapMode() 102
cgD3D9SetUniform() 100
cgD3D9SetUniformArray() 101
cgD3D9SetUniformMatrix() 101
cgD3D9SetUniformMatrixArray() 10
1
cgD3D9UnloadProgam() 104
Direct3D 8 application 109
Direct3D 9 application 106
Direct3D device 98
fragment program 106
lost devices 98
parameters 100
array 101
sampler 102
uniform 100
profile support 105
program executiion 103
vertex program 106
Direct3D HRESULT 114
Direct3D minimal interface 85
cgD3D8ResourceToDeclUsage() 90
cgD3D8ValidateVertexDeclaration()
88
cgD3D9ResourceToDeclUsage() 90
cgD3D9ValidateVertexDeclaration()
88
Direct3D 8 application 95
Direct3D 9 application 92
fragment program 92
type retrieval 91
vertex declaration 85
vertex declaration for Direct3D 8 86
vertex declaration for Direct3D 9 86
vertex program 91
header files 46
loading 47
modifying parameters 47
OpenGL 73
error reporting 85
OpenGL application 82
OpenGL parameter setting 74
parameter shadowing 73
program execution 48
releasing resources 49
Cg Runtime Library
overview 45
Cg standard library 33
Cg_Simple file 145
cgc.exe, Cg compiler 329
cgD3D9EnableParameterShadowing() 103
CGerror
Direct3D 114
OpenGL 85
cint type, specification 229
command-line options, Cg compiler 329
comparison operators 248
introduction 21
compilation profiles, use of 225
compiler options
command-line 329
-debug 330
-Dmacro 329
-entry 329
-h 330
-Ipathname 329
-l filename 329
-longprogs 330
-maxunrollcount 330
-nocode 329
-nofx 329
-nostdlib 329
-o 329
-profile 329
-profileopts 329
-quiet 329
-strict 329
-v 330
compile-time type category 232
computation frequency for performance 327
concrete type category 232
conditional code in fragment programs and
performance 328
conditional operator 248
808-00504-0000-006 333
NVIDIA
conditional operators 22
constants, typing of 232
construction operator, described 244
context
core Cg 50
control constructs used 19
core Cg context 50
Core Cg error reporting 71
Core Cg parameter 54
Core Cg program 50
core Cg runtime 49
D
data types
bool 11
fixed 11
float 11
half 11
int 11
sampler 11
supported 11
data types for performance 325
debugging function 41
declaration, Cg definition 224
definition, as used in Cg 224
derivative functions 41
Direct3D Cg runtime 85
cgD3D9EnableDebugTracing() 114
cgD3D9GetLastError() 115
cgD3D9TranslateHRESULT() 116
CGerror 114
debugging mode 112
error callbacks 116
error testing 115
error types 114
expanded interface 98
cgD3D8LoadProgram() 103
cgD3D8SetSamplerState() 102
cgD3D9BindProgram() 105
cgD3D9EnableParameterShadowing()
103
cgD3D9GetDevice() 98
cgD3D9GetLatestPixelProfile() 105
cgD3D9GetLatestVertexProfile() 105
cgD3D9GetOptimalOptions() 105
cgD3D9IsParameterShadowingEnable
d() 103
cgD3D9IsProgramLoaded() 104
cgD3D9LoadProgram() 103
cgD3D9SetDevice() 98
cgD3D9SetSamplerState() 102
cgD3D9SetTexture() 102
cgD3D9SetTextureWrapMode() 102
cgD3D9SetUniform() 100
cgD3D9SetUniformArray() 101
cgD3D9SetUniformMatrix() 101
cgD3D9SetUniformMatrixArray() 10
1
cgD3D9UnloadProgam() 104
Direct3D 8 application 109
Direct3D 9 application 106
Direct3D device 98
fragment program 106
lost devices 98
parameters 100
array 101
sampler 102
uniform 100
profile support 105
program executiion 103
vertex program 106
HRESULT 114
minimal interface 85
cgD3D8ResourceToDeclUsage() 90
cgD3D8ValidateVertexDeclaration()
88
cgD3D9ResourceToDeclUsage() 90
cgD3D9ValidateVertexDeclaration()
88
Direct3D 8 application 95
Direct3D 9 application 92
fragment program 92
type retrieval 91
vertex declaration 85
vertex declaration for Direct3D 8 86
vertex declaration for Direct3D 9 86
vertex program 91
Direct3D debug DLL, using 113
DirectX pixel shader 1.x profiles 308
DirectX pixel shader 2.x profile 300
DirectX vertex shader 1.1 profile 304
334 808-00504-0000-006
NVIDIA
Cg Language Toolkit
DirectX vertex shader 2.x profile 296
dot() for performance 324
dx8ps profile, deprecated 308
E
effect 117
Effect parameter 118
effect parameters 121
evaluating Cg programs 127
explicit casts
compile-time 235
numeric 236
numeric matrix 236
numeric vector 236
F
fixed data type 11
fixed type, specification 229
float data type 11
float type, specification 229
floating type category 232
for statements 244
fp20 profile 283
fp30 profile 274
fragment profiles
texture lookups 23
fragment program 121
predefined output structures 42
varying output 9
fragment program profiles 252
OpenGL ARB 263
OpenGL NV_fragment_program 274
fragment program, defined 3
fresnel 200
sample shader 200
vertex shader code example 200
function
calls 228
multiplying 20
open profile 227
function definitions
introduction 19
function overloading 240
introduction 19
functions
debugging 41
declaring 226
derivative 41
geometric 38
mathematical 33
overloading by profile 226
standard library 33
texture map 38
G
geometric functions 38
GL_ARB_vertex 256
global variables 241
graphics hardware, evolution of xiii
grass
sample shader 202
vertex shader code example 202
H
half data type 11
half type, specification 229
I
if statements 244
inputs
uniform 5
varying 5, 6
int data type 11
int type, specification 229
integral type category 232
interfaces 125
J
Java, relation to Cg 221
L
language profiles
concept of 3
M
mathematical functions 33
matrices, multiplying 20
matrices, support of 12
matrix palette skinning 217
808-00504-0000-006 335
NVIDIA
sample shader 217
vertex shader code example 218
matrix transposes and performance 328
melting paint
pixel shader code example 163
sample shader 161
vertex shader code example 161
min() for performance 324
miscellaneous operators 249
modifiable function parameters, passing 19
multipaint
pixel shader code example 167
sample shader 165
vertex shader code example 166
N
namespaces 237
numeric type category 232
O
object, Cg definition 224
open profile functions 227
OpenGL Cg runtime 73
error reporting 85
OpenGL application 82
parameter setting 74
OpenGL CGerror 85
OpenGL profiles
ARB fragment program 263
ARB vertex program 256
NV_fragment_program 274
NV_register_combiners 283
NV_texture_shader 283
NV_vertex_program 279
NV_vertex_program 2.0 270
operations
expressed differently from C 222
operator
enhancements 247
precedence 247
operators
arithmetic 20
boolean 21
conditional 22
introduction 18
swizzle 22
write-mask 22
P
packed, type modifier 230
parameter shadowing 73
parameters
modifiable function, passing 19
parameters in function definitions, syntax 227
pass 117, 120
pass state 120
performance techniques
abs() 324
avoiding matrix transposes 328
computation frequency 327
conditional code in fragment
programs 328
data types 325
dot() 324
min() 324
saturate() 324
shading computations 326
swizzle 323
texture maps 324
vectorization 321
pixel program, defined 3
pixel shader, defined 3
position invariance 250
profile
arbfp1 263
arbvp1 256
fp20 283
fp30 274
ps_1_1, ps_1_2, ps_1_3 308
ps_2_0, ps_2_x 300
vp20 279
vp30 270
vs_1_1 304
vs_2_0, vs_2_x 296
profile, defined 3
program
declaring 5
kinds of inputs 5
program profiles
fragment 252
336 808-00504-0000-006
NVIDIA
Cg Language Toolkit
vertex 250
programming model, GPU 2
ps_1_x profile 308
ps_2_0 profile 300
ps_2_x profile 300
R
ray-traced refraction
pixel shader code example 172
sample shader 170
vertex shader code example 171
recursion, function 19
reflection vector 200
refraction
pixel shader code example 207
sample shader 205
vertex shader code example 206
release notes xvi
Renderman, relation to Cg 221
reserved words 249
runtime
core Cg 49
S
sampler data type 11
sampler type, specification 230
samplers 123
saturate() for performance 324
scalar type category 232
semantics
aliasing 243
restrictions 243
shader sample
anisotropic lighting 190
bump dot 3x2 diffuse and specular 192
bump-reflection mapping 196
fresnel 200
grass 202
improved skinning 154
improved water 157
matrix palette skinning 217
melting paint 161
multipaint 165
ray-traced refraction 170
refraction 205
shadow mapping 208
shadow volume extrusion 211
sine wave demo 214
skin 175
shader, simple.cg example 146
shaders
advanced profile samples 153
basic profile samples 189
shading computations for performance 326
shadow mapping 208
pixel shader code example 210
sample shader 208
vertex shader code example 209
shadow volume extrusion
sample shader 211
vertex shader code example 212
shadow volumes 211
silent incompatibilities with C 221
simple.cg
basic transformations 149
passing arguments 149
Sine function 202, 214
sine wave demo
sample shader 214
vertex shader code example 215
sinh(x) 37
skin pixel shader code example 175
sample shader 175
skinning, improved
sample shader 154
vertex shader code example 155
smearing, scalar to vector 237
Stanford shading language, relation to Cg 221
State assignment 118
statements
introduction 18
statements, in Cg 244
structures
introduction 13
swizzle
for performance 323
swizzle operator 22
swizzle operator, described 245
808-00504-0000-006 337
NVIDIA
T
technique 117
technique validation 120
texture lookups 23
texture map functions 38
texture maps for performance 324
textures 123
thin film effect
pixel shader code example 182
vertex shader code example 180
tutorial 145
type conversions 12, 234
array 235
matrix 234
scalar 234
structure 235
vector 234
type equivalency 236
type promotion 236
assignment 237
smearing 237
type qualifiers 233
const 233
in 233
out 233
types
general discussion 229
partial support 231
U
uniform inputs 5
uniform modifer, use of 225
uninitialized variables, use of 241
unsized arrays 125
V
variables
global 241
uninitialized, use of 241
varying inputs 5, 6
vector data types 12
vector operators, new 244
vectorization
for performance 321
vectors, constructing 21
vertex color 149
vertex position 149
vertex program 121
varying output 7
vertex program profiles 250
vertex programs, defined 3
virtual machine 127
void type, specification 229
vp20 profile 279
vp30 profile 270
vs_1_1 profile 304
vs_2_0 profile 296
vs_2_x profile 296
W
water, improved
pixel shader code example 160
sample shader 157
vertex shader code example 158
web site, NVIDIA xvi
while statements 244
workspace, loading 145
write-mask operator 22
described 246
338 808-00504-0000-006
NVIDIA
Cg Language Toolkit

Navigation menu