About Cliff Woolley

Cliff Woolley
Cliff Woolley is a senior developer technology engineer with NVIDIA. He received his master's degree in Computer Science from the University of Virginia in 2003, where he was among the earliest academic researchers to explore the use of GPUs for general purpose computing. Today he works with developers of high-performance computing applications to fine-tune their algorithms for the CUDA Platform, and he is one of the lead authors of developer documentation in the CUDA Toolkit for application tuning and best practices.
cuda_pro_tip

CUDA Pro Tip: Improve NVIDIA Visual Profiler Loading of Large Profiles

Some applications launch many tiny kernels, making them prone to very large (100s of megabytes or larger) nvprof timeline dumps, even for application runs of only a handful of seconds.

Such nvprof files may fail to even load when you try to import them into the NVIDIA Visual Profiler (NVVP). One symptom of this problem is that when you click “Finish” on the import screen, NVVP “thinks” for a minute or so, but then just goes right back to the import screen asking you to click Finish again. In other cases, attempting to load a large file can result in NVVP “thinking” about it for many hours.

It turns out that this problem is because of the Java max heap size setting specified in the libnvvp/nvvp.ini file of the CUDA Toolkit installation: the profiler configures the Java VM to cap the heap size at 1GB in order to work even on systems with minimal physical memory.  While this 1GB value is already an improvement over the 512MB setting used in earlier CUDA versions, it is still not enough for some applications, considering that the memory footprint of the profiler can be at least four to five times larger than the input file size.

Continue reading