NVIDIA CUDA Programming Guide 1.1
User Manual:
Open the PDF directly: View PDF
Page Count: 143 [warning: Documents this large are best viewed by clicking the View PDF Link!]
- Chapter 1. Introduction to CUDA
- 1.1 The Graphics Processor Unit as a Data Parallel Computing Device
- 1.2 CUDA: A New Architecture for Computing on the GPU
- 1.3 Document’s Structure
- Chapter 2. Programming Model
- Chapter 1.
- Chapter 3. Hardware Implementation
- Chapter 4. Application Programming Interface
- 4.1 An Extension to the C Programming Language
- 4.2 Language Extensions
- 4.3 Common Runtime Component
- 4.4 Device Runtime Component
- 4.5 Host Runtime Component
- 4.5.1 Common Concepts
- 4.5.2 Runtime API
- 4.5.3 Driver API
- 4.5.3.1 Initialization
- 4.5.3.2 Device Management
- 4.5.3.3 Context Management
- 4.5.3.4 Module Management
- 4.5.3.5 Execution Control
- 4.5.3.6 Memory Management
- 4.5.3.7 Stream Management
- 4.5.3.8 Event Management
- 4.5.3.9 Texture Reference Management
- 4.5.3.10 OpenGL Interoperability
- 4.5.3.11 Direct3D Interoperability
- Chapter 5. Performance Guidelines
- Chapter 6. Example of Matrix Multiplication
- 6.1 Overview
- 6.2 Source Code Listing
- 6.3 Source Code Walkthrough
- 6.3.1 Mul()
- 6.3.2 Muld()
- Appendix A. Technical Specifications
- A.1 General Specifications
- A.2 Floating-Point Standard
- Appendix B. Mathematical Functions
- Appendix C. Atomic Functions
- Appendix D. Runtime API Reference
- D.1 Device Management
- D.2 Thread Management
- D.3 Stream Management
- D.4 Event Management
- D.5 Memory Management
- D.5.1 cudaMalloc()
- D.5.2 cudaMallocPitch()
- D.5.3 cudaFree()
- D.5.4 cudaMallocArray()
- D.5.5 cudaFreeArray()
- D.5.6 cudaMallocHost()
- D.5.7 cudaFreeHost()
- D.5.8 cudaMemset()
- D.5.9 cudaMemset2D()
- D.5.10 cudaMemcpy()
- D.5.11 cudaMemcpy2D()
- D.5.12 cudaMemcpyToArray()
- D.5.13 cudaMemcpy2DToArray()
- D.5.14 cudaMemcpyFromArray()
- D.5.15 cudaMemcpy2DFromArray()
- D.5.16 cudaMemcpyArrayToArray()
- D.5.17 cudaMemcpy2DArrayToArray()
- D.5.18 cudaMemcpyToSymbol()
- D.5.19 cudaMemcpyFromSymbol()
- D.5.20 cudaGetSymbolAddress()
- D.5.21 cudaGetSymbolSize()
- D.6 Texture Reference Management
- D.7 Execution Control
- D.8 OpenGL Interoperability
- D.9 Direct3D Interoperability
- D.10 Error Handling
- Appendix E. Driver API Reference
- E.1 Initialization
- E.2 Device Management
- E.3 Context Management
- E.4 Module Management
- E.5 Stream Management
- E.6 Event Management
- E.7 Execution Control
- E.8 Memory Management
- E.8.1 cuMemGetInfo()
- E.8.2 cuMemAlloc()
- E.8.3 cuMemAllocPitch()
- E.8.4 cuMemFree()
- E.8.5 cuMemAllocHost()
- E.8.6 cuMemFreeHost()
- E.8.7 cuMemGetAddressRange()
- E.8.8 cuArrayCreate()
- E.8.9 cuArrayGetDescriptor()
- E.8.10 cuArrayDestroy()
- E.8.11 cuMemset()
- E.8.12 cuMemset2D()
- E.8.13 cuMemcpyHtoD()
- E.8.14 cuMemcpyDtoH()
- E.8.15 cuMemcpyDtoD()
- E.8.16 cuMemcpyDtoA()
- E.8.17 cuMemcpyAtoD()
- E.8.18 cuMemcpyAtoH()
- E.8.19 cuMemcpyHtoA()
- E.8.20 cuMemcpyAtoA()
- E.8.21 cuMemcpy2D()
- E.9 Texture Reference Management
- E.9.1 cuTexRefCreate()
- E.9.2 cuTexRefDestroy()
- E.9.3 cuTexRefSetArray()
- E.9.4 cuTexRefSetAddress()
- E.9.5 cuTexRefSetFormat()
- E.9.6 cuTexRefSetAddressMode()
- E.9.7 cuTexRefSetFilterMode()
- E.9.8 cuTexRefSetFlags()
- E.9.9 cuTexRefGetAddress()
- E.9.10 cuTexRefGetArray()
- E.9.11 cuTexRefGetAddressMode()
- E.9.12 cuTexRefGetFilterMode()
- E.9.13 cuTexRefGetFormat()
- E.9.14 cuTexRefGetFlags()
- E.10 OpenGL Interoperability
- E.11 Direct3D Interoperability
- Appendix F. Texture Fetching
- Appendix A. Technical Specifications