While high-level languages for GPU programming like CUDA C offer a useful level of abstraction, convenience, and maintainability, they inherently hide some of the details of the execution on the hardware. It is sometimes helpful to dig into the underlying assembly code that the hardware is executing to explore performance problems, or to make sure the compiler is generating the code you expect. Reading assembly language is tedious and challenging; thankfully Nsight Visual Studio Edition can help by showing you the correlation between lines in your high-level source code and the executed assembly instructions.
As Mark Harris explained in the previous CUDA Pro Tip, there are two compilation stages required before a kernel written in CUDA C can be executed on the GPU. The first stage compiles the high-level C code into the PTX virtual GPU ISA. The second stage compiles PTX into the actual ISA of the hardware, called SASS (details of SASS can be found in the cuobjdump.pdf installed in the doc folder of the CUDA Toolkit). The hardware ISA is in general different between GPU architectures. To allow forward compatibility, the second compilation phase can be either done as part of the normal compilation using nvcc or at runtime using the integrated JIT compiler in the driver.
It is possible to manually extract the PTX or SASS from a cubin or executable using the cuobjdump tool included with the CUDA Toolkit. Nsight Visual Studio Edition makes it easier by showing the correlation between lines of CUDA C, PTX, and SASS. Continue reading