The key to the power of GPUs is their 1000′s of parallel processors that execute threads. Anyone who has worked with even a handful of threads know how easy it can be to introduce race conditions, and how difficult it can be to debug and fix these errors. Because a modern GPU can have thousands of simultaneously executing threads, NVIDIA engineers felt it was imperative to create an incredibly powerful tool for detecting and debugging race conditions.
This racecheck tool comes as part of the cuda-memcheck command-line utility. In CUDA 5.5 a new racecheck analysis mode presents much more human-readable analysis of your code, even reporting which source lines conflict with other lines. In this episode of CUDACasts we use a simple version of Conway’s Game of Life to show the new racecheck features cuda-memcheck. We’ll start with a few race condition bugs, and then use the analysis tool to find and fix them.
Visual tools offer a very efficient method for developing and debugging applications. When working on massively parallel codes built on the CUDA Platform, this visual approach is even more important because you could be dealing with tens of thousands of parallel threads.
With the free NVIDIA Nsight Eclipse Edition IDE, you can quickly and easily examine the GPU memory state in a running CUDA C or C++ application. In today’s CUDACast, we continue our CUDA 5.5 series with a look at this new feature available to Eclipse users.
In the next few weeks, we’ll take a break from the CUDA 5.5 new feature series and explore some other topics, such as writing CUDA applications in pure Python. Stay tuned!
GPU libraries provide an easy way to accelerate applications without writing any GPU-specific code. With the new CUDA 5.5 version of the NVIDIA CUFFT Fast Fourier Transform library, FFT acceleration gets even easier, with new support for the popular FFTW API. It is now extremely simple for developers to accelerate existing FFTW library calls on the GPU, sometimes with no code changes! By simply changing the linker command line to link the CUFFT library instead of the FFTW library, you can take advantage of the GPU with only a re-link. In today’s CUDACast, we take a simple application that uses the standard FFTW library, and accelerate the function calls on the GPU by simply changing which library we link. In fact, the only code change we will make is to use the cufftw.h header file. This ensures that, at compile time, we are not calling any unsupported functions.
In CUDACast #5, we saw how to use the new NVIDIA RPM and Debian packages to install the CUDA toolkit, samples, and driver on a supported Linux OS with a standard package manager. With CUDA 5.5, it is now possible to compile and run CUDA applications on ARM-based systems such as the Kayla development platform. In addition to native compilation on an ARM-based CPU system, it is also possible to cross-compile for ARM systems, allowing for greater development flexibility.
NVIDIA’s next-generation Logan system on a chip will contain a Kepler GPU supporting CUDA along with a multicore ARM CPU. The combination of ARM support in CUDA 5.5 and the Kayla platform gives developers a powerful toolset to prepare for the next step in the mobile visual computing revolution.
What amazing applications will you be able to create with a small and power-efficient CPU combined with a massively parallel Kepler GPU—the same GPU architecture powering some of the most powerful supercomputers in the world?
Even if you’ve already watched CUDACasts episode 3 on creating your first OpenACC program, you’ll want to go watch the new version which includes a clearer, animated introduction. So check it out!
In the next few CUDACasts we’ll be exploring some of the new features available in CUDA 5.5, which is available as a release candidate now and will be officially released very soon. Episode 4 kicks it off by demonstrating single-GPU debugging using Nsight Eclipse Edition on Linux. With this feature, it is now possible to debug a CUDA application on the same NVIDIA GPU that is driving your active display. In fact, you can debug multiple CUDA applications even while others are actively running.