Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction
- What is GPU programming?
- Why utilize GPU programming?
- What challenges and trade-offs are associated with GPU programming?
- Which frameworks are available for GPU programming?
- Selecting the appropriate framework for your application
OpenCL
- What is OpenCL?
- What are the pros and cons of OpenCL?
- Configuring the development environment for OpenCL
- Developing a basic OpenCL program for vector addition
- Using the OpenCL API to query device details, manage memory, transfer data, launch kernels, and synchronize threads
- Writing OpenCL C kernels for device execution and data manipulation
- Utilizing OpenCL built-in functions, variables, and libraries for common tasks
- Applying OpenCL memory spaces (global, local, constant, private) to optimize data transfers and memory access
- Using the OpenCL execution model to manage work-items, work-groups, and ND-ranges for parallelism
- Debugging and testing OpenCL programs with tools like CodeXL
- Optimizing OpenCL programs via techniques such as coalescing, caching, prefetching, and profiling
CUDA
- What is CUDA?
- What are the pros and cons of CUDA?
- Configuring the development environment for CUDA
- Developing a basic CUDA program for vector addition
- Using the CUDA API to query device details, manage memory, transfer data, launch kernels, and synchronize threads
- Writing CUDA C/C++ kernels for device execution and data manipulation
- Utilizing CUDA built-in functions, variables, and libraries for common tasks
- Applying CUDA memory spaces (global, shared, constant, local) to optimize data transfers and memory access
- Using the CUDA execution model to manage threads, blocks, and grids for parallelism
- Debugging and testing CUDA programs with tools like CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight
- Optimizing CUDA programs via techniques such as coalescing, caching, prefetching, and profiling
ROCm
- What is ROCm?
- What are the pros and cons of ROCm?
- Configuring the development environment for ROCm
- Developing a basic ROCm program for vector addition
- Using the ROCm API to query device details, manage memory, transfer data, launch kernels, and synchronize threads
- Writing ROCm C/C++ kernels for device execution and data manipulation
- Utilizing ROCm built-in functions, variables, and libraries for common tasks
- Applying ROCm memory spaces (global, local, constant, private) to optimize data transfers and memory access
- Using the ROCm execution model to manage threads, blocks, and grids for parallelism
- Debugging and testing ROCm programs with tools like ROCm Debugger and ROCm Profiler
- Optimizing ROCm programs via techniques such as coalescing, caching, prefetching, and profiling
Comparison
- Comparing the features, performance, and compatibility of OpenCL, CUDA, and ROCm
- Evaluating GPU programs using benchmarks and metrics
- Learning best practices and tips for GPU programming
- Exploring current and future trends and challenges in GPU programming
Summary and Next Steps
Requirements
- Proficiency in C/C++ programming and understanding of parallel programming concepts
- Fundamental knowledge of computer architecture and memory hierarchy
- Experience using command-line tools and code editors
Audience
- Developers seeking to learn how to use different GPU programming frameworks and compare their features, performance, and compatibility
- Developers aiming to write portable and scalable code compatible with various platforms and devices
- Programmers interested in exploring the trade-offs and challenges inherent in GPU programming and optimization
28 Hours