CUDACasts Episode #3: Your First OpenACC Program

In the last episode of CUDACasts, we wrote our first accelerated program using CUDA C. In this episode, we will explore an alternate method of accelerating code by using OpenACC directives. These directives give hints to the compiler on how to accelerate sections of code, without having to write CUDA code or change the underlying source.

The algorithm we’ll be accelerating is the Jacobi iteration; you can get a copy of the OpenACC accelerated code from GitHub.

The video presents the typical process for accelerating code with OpenACC.

  1. Identify the computationally intensive sections of code you want to offload to the massively parallel GPU.
  2. Using OpenACC directives, move parallel loop execution to the GPU and verify it is functionally correct.
  3. Optimize any data movement between the host and device.

For a more in-depth look at OpenACC and the example shown here, you might want to read these past Parallel Forall blog posts (#1, #2, #3). If you would like to try OpenACC for yourself, you can get a free 30-day trial of the PGI compiler with OpenACC support here.

To request a topic for a future episode of CUDACasts, or if you have any other feedback, please use the contact form or leave a comment to let us know!


About Mark Ebersole

As CUDA Educator at NVIDIA, Mark Ebersole teaches developers and programmers about the NVIDIA CUDA parallel computing platform and programming model, and the benefits of GPU computing. With more than ten years of experience as a low-level systems programmer, Mark has spent much of his time at NVIDIA as a GPU systems diagnostics programmer in which he developed a tool to test, debug, validate, and verify GPUs from pre-emulation through bringup and into production. Before joining NVIDIA, he worked for IBM developing Linux drivers for the IBM iSeries server. Mark holds a BS degree in math and computer science from St. Cloud State University. Follow @cudahamster on Twitter