Get Started With CUDA Programming Model
CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) model created by NVIDIA. It enables developers to utilize the immense computational power of NVIDIA GPUs (Graphics Processing Units) for general-purpose processing tasks beyond graphics rendering.
Key features of CUDA programming include:
- Parallelism: CUDA enables developers to exploit parallelism at multiple levels, including thread-level, instruction-level, and data-level parallelism, allowing for efficient computation on GPUs.
- CUDA C/C++ Language Extensions: CUDA extends the C/C++ programming languages with additional keywords and constructs to facilitate programming for GPU architectures, making it easier to write parallel code.
- CUDA Runtime API: The CUDA Runtime API provides a set of functions for managing GPU devices, memory allocation, data transfer between CPU and GPU, and launching kernel functions (the functions executed on the GPU).
- CUDA Libraries: NVIDIA provides a collection of libraries optimized for GPU computing tasks, such as cuBLAS for linear algebra, cuFFT for Fast Fourier Transforms, cuDNN for deep neural networks, and more.
- CUDA Toolkit: The CUDA Toolkit includes compilers, debuggers, profilers, and other development tools necessary for CUDA programming.
CUDA programming allows developers to harness the massive parallel processing power of GPUs to accelerate a wide range of computational tasks, including scientific simulations, image and video processing, machine learning, and more.
Prerequisites
Before you get get started with CUDA programming, please make sure:
- You have at least one NVIDIA GPU on your computer(Pascal or newer)
- NVIDIA Driver is installed
- CUDA Toolkit is installed, please also make sure you select also install nsight when installing CUDA on windows, or install cuda-tools(which should include nsight system) separately when installing on linux.
nvcc
Compiler is ready (which should be ready itself after you have CUDA installed)
To verify nvcc
(Nvidia CUDA Compiler, a proprietary compiler by Nvidia intended for use with CUDA) is ready, type nvcc -V
in terminal and check the output:
> nvcc -v
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0
The version of your nvcc
compiler could vary from different environment you are using. Don't worry if your GPU is old, this tutorial should work on most microarchitecture of NVIDIA GPUs, even date back to Pascal GPUs.
Besides, you should know the basics and fundamental knowledge about C/C++ programming.
CUDA programming does support C++, and you can use C++ features and syntax in CUDA kernels. CUDA C/C++ is an extension of the C/C++ programming languages, allowing developers to write code for execution on NVIDIA GPUs.
However, regarding the C++ standard library, CUDA does support some components of it, but not all. In particular, many features of the C++ standard library that are dependent on host-side functionality may not be available for device code (code that runs on the GPU).
Some parts of the C++ standard library that are commonly used in CUDA device code include:
-
Math functions: CUDA provides math functions similar to those in
<cmath>
such assqrt
,sin
,cos
, etc., which can be used in device code. -
Utilities: Some utilities from the C++ standard library, such as
std::numeric_limits
,std::tuple
,std::pair
, etc., are available for use in device code. -
Memory management: CUDA provides memory management functions (
malloc
,free
) which can be used in device code.
However, many other components of the C++ standard library, such as I/O operations (iostream
), containers (vector
, map
, etc.), and algorithms (sort
, find
, etc.), are not directly usable in CUDA device code because they rely heavily on host-side functionality not available on the GPU.
CUDA developers often use CUDA-specific libraries and APIs for device-side operations, and they typically write host-side code in C or C++ to manage data transfers between the host and the GPU.
What does CUDA programming stand for
CUDA (Compute Unified Device Architecture) programming is not about game developing. While you may use DX12(a graphics API developed by Microsoft primarily for rendering graphics in Windows-based applications and games) for game developing, which also runs on GPU, CUDA enables developers to harness the power of NVIDIA GPUs for general-purpose computing tasks, not just graphics rendering. It allows programmers to write code in CUDA C/C++ or CUBA-accelerated libraries to perform tasks such as scientific simulations, data processing, machine learning, and more on the GPU.
One of the most important task for data scientists is matrix multiplication, since most of the calculation on machine learning is achieved this way. Consider a situation that you want to perform matrix multiplication on two matrix A
and B
: