CUDA Pro Tip: Clean Up After Yourself to Ensure Correct Profiling

NVIDIA’s profiling and tracing tools, including the NVIDIA Visual Profiler, NSight Eclipse and Visual Studio editions, cuda-memcheck, and the nvprof command line profiler are powerful tools that can give you deep insight into the performance and correctness of your GPU-accelerated applications. These tools gather data while your application is running, and use it to create profiles, application API traces, automatic optimization guidance, and in the case of cuda-memcheck, memory leak and race checking.


To improve tracing performance and reduce overhead in the target application, these tools internally buffer the data they gather, and flush it to disk at various points, including stream synchronization, context synchronization, context destruction, and when the internal buffer is full. For technical reasons, it is not always possible to automatically flush the data on application exit. Therefore, you should clean up your application’s CUDA objects properly to make sure that the profiler is able to store all gathered data. This means not only freeing memory allocated on the GPU, but also resetting the device Context.

If your application uses the CUDA Runtime API, call cudaDeviceReset() just before exiting, or when the application finishes making CUDA calls and using device data. If your application uses the CUDA Driver API, call cuProfilerStop() on each context to flush the profiling buffers before destroying the context with cuCtxDestroy().

Without resetting the device, applications that don’t synchronize before they exit may produce incomplete profile traces. With this simple clean-up step, you can be sure you get an accurate profile.


About Mark Harris

Mark is Chief Technologist for GPU Computing Software at NVIDIA. Mark has fifteen years of experience developing software for GPUs, ranging from graphics and games, to physically-based simulation, to parallel algorithms and high-performance computing. Mark has been using GPUs for general-purpose computing since before they even supported floating point arithmetic. While a Ph.D. student at UNC he recognized this nascent trend and coined a name for it: GPGPU (General-Purpose computing on Graphics Processing Units), and started to provide a forum for those working in the field to share and discuss their work. Follow @harrism on Twitter