Some applications launch many tiny kernels, making them prone to very large (100s of megabytes or larger) nvprof timeline dumps, even for application runs of only a handful of seconds.
Such nvprof files may fail to even load when you try to import them into the NVIDIA Visual Profiler (NVVP). One symptom of this problem is that when you click “Finish” on the import screen, NVVP “thinks” for a minute or so, but then just goes right back to the import screen asking you to click Finish again. In other cases, attempting to load a large file can result in NVVP “thinking” about it for many hours.
It turns out that this problem is because of the Java max heap size setting specified in the libnvvp/nvvp.ini file of the CUDA Toolkit installation: the profiler configures the Java VM to cap the heap size at 1GB in order to work even on systems with minimal physical memory. While this 1GB value is already an improvement over the 512MB setting used in earlier CUDA versions, it is still not enough for some applications, considering that the memory footprint of the profiler can be at least four to five times larger than the input file size.