Get in Touch

Course Outline

Introduction

  • What is GPU programming?
  • Why utilize GPU programming?
  • What challenges and trade-offs are associated with GPU programming?
  • Which frameworks are available for GPU programming?
  • Selecting the appropriate framework for your application

OpenCL

  • What is OpenCL?
  • What are the pros and cons of OpenCL?
  • Configuring the development environment for OpenCL
  • Developing a basic OpenCL program for vector addition
  • Using the OpenCL API to query device details, manage memory, transfer data, launch kernels, and synchronize threads
  • Writing OpenCL C kernels for device execution and data manipulation
  • Utilizing OpenCL built-in functions, variables, and libraries for common tasks
  • Applying OpenCL memory spaces (global, local, constant, private) to optimize data transfers and memory access
  • Using the OpenCL execution model to manage work-items, work-groups, and ND-ranges for parallelism
  • Debugging and testing OpenCL programs with tools like CodeXL
  • Optimizing OpenCL programs via techniques such as coalescing, caching, prefetching, and profiling

CUDA

  • What is CUDA?
  • What are the pros and cons of CUDA?
  • Configuring the development environment for CUDA
  • Developing a basic CUDA program for vector addition
  • Using the CUDA API to query device details, manage memory, transfer data, launch kernels, and synchronize threads
  • Writing CUDA C/C++ kernels for device execution and data manipulation
  • Utilizing CUDA built-in functions, variables, and libraries for common tasks
  • Applying CUDA memory spaces (global, shared, constant, local) to optimize data transfers and memory access
  • Using the CUDA execution model to manage threads, blocks, and grids for parallelism
  • Debugging and testing CUDA programs with tools like CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight
  • Optimizing CUDA programs via techniques such as coalescing, caching, prefetching, and profiling

ROCm

  • What is ROCm?
  • What are the pros and cons of ROCm?
  • Configuring the development environment for ROCm
  • Developing a basic ROCm program for vector addition
  • Using the ROCm API to query device details, manage memory, transfer data, launch kernels, and synchronize threads
  • Writing ROCm C/C++ kernels for device execution and data manipulation
  • Utilizing ROCm built-in functions, variables, and libraries for common tasks
  • Applying ROCm memory spaces (global, local, constant, private) to optimize data transfers and memory access
  • Using the ROCm execution model to manage threads, blocks, and grids for parallelism
  • Debugging and testing ROCm programs with tools like ROCm Debugger and ROCm Profiler
  • Optimizing ROCm programs via techniques such as coalescing, caching, prefetching, and profiling

Comparison

  • Comparing the features, performance, and compatibility of OpenCL, CUDA, and ROCm
  • Evaluating GPU programs using benchmarks and metrics
  • Learning best practices and tips for GPU programming
  • Exploring current and future trends and challenges in GPU programming

Summary and Next Steps

Requirements

  • Proficiency in C/C++ programming and understanding of parallel programming concepts
  • Fundamental knowledge of computer architecture and memory hierarchy
  • Experience using command-line tools and code editors

Audience

  • Developers seeking to learn how to use different GPU programming frameworks and compare their features, performance, and compatibility
  • Developers aiming to write portable and scalable code compatible with various platforms and devices
  • Programmers interested in exploring the trade-offs and challenges inherent in GPU programming and optimization
 28 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories