Remote application development using NVIDIA® Nsight™ Eclipse Edition

NVIDIA® Nsight™ Eclipse Edition (NSEE) is a full-featured unified CPU+GPU integrated development environment(IDE) that lets you easily develop CUDA applications for either your local (x86_64) system or a remote (x86_64 or ARM) target system. In my last post on remote development of CUDA applications, I covered NSEE’s cross compilation mode. In this post I will focus on the using NSEE’s synchronized project mode.

For remote development of CUDA applications using synchronized-project mode, you can edit code on the host system and synchronize it with the target system. In this scenario, the code is compiled natively on the target system as Figure 1 shows.

CUDA application development usage scenarios with Nsight Eclipse Edition
Figure 1: CUDA application development usage scenarios with Nsight Eclipse Edition

In synchronized project mode the host system does not need an ARM cross-compilation tool chain, so you have the flexibility to use Mac OS X or any of the CUDA supported x86_64 Linux platforms as the host system. The remote target system can be a CUDA-supported x86_64 Linux target or an ARM-based platform like the Jetson TK1 system. I am using Mac OS X 10.8.5 on my host system (with Xcode 5.1.1 installed) and 64-bit Ubuntu 12.04 on my target system.

CUDA Toolkit Setup

To install the CUDA toolkit on the Mac OS X host system, first please make sure you have “Xcode command line tools” installed on your system. Then download the latest 64-bit CUDA 6.5 package for Mac (I’m using cuda_6.5.14_mac_64.pkg ) and double-click to install the package.

On the 64-bit Ubuntu12.04 host system download the latest 64-bit CUDA 6.5 installer for your Linux distribution (I’m using cuda-repo-ubuntu1204_6.5-14_amd64.deb). After downloading, update the repo and install the CUDA6.5 toolkit as follows:

> sudo dpkg –i cuda-repo-ubuntu1204_6.5-14_amd64.deb
> sudo apt-get update
> sudo apt-get install cuda

To synchronize CUDA projects between host and target systems, you need to configure git on both the host and the target systems using these commands.

> git config –global user.name <your_name>
> git config –global user.email <your_email>

That’s all for the setup. Please note that if you have a Jetson TK1 as your target system, the current Jetson TK1 OS image for L4T (Linux for Tegra) does not contain the latest CUDA 6.5 toolkit.  This support will be available in a future release (Rel21.2), but in the meantime you can use the CUDA 6.0 toolkit archive.  You can check the L4T version with the following command.

> head -1 /etc/nv_tegra_release.

Importing a CUDA Sample

Let’s launch Nsight Eclipse Edition on the Mac OS X host system. You can find Nsight in the Finder if the system has indexed it,  You can also use the Finder to navigate to the /Developer/NVIDIA/CUDA-6.5/libnsight folder or open a Terminal Application window and launch ./nsight from the /usr/local/cuda/bin folder. Click on File->New->CUDA C/C++ Project to launch the project creation wizard. Enter project name “particles”, select “Import CUDA Sample”, as the project type and select “CUDA Toolkit 6.5” from the available tool chains.

Next, select the CUDA sample by applying “Simulations” as the samples filter type, which will populate a short list of available simulations samples. Select “Particles” and click next. The remaining options in the wizard let you choose which GPU and CPU architectures to generate code for.  First, we will choose the GPU code that the nvcc compiler should generate. Nsight will default to the GPU architecture that it detects on the host system. On my Mac OS X system I have a Geforce GT 650M, so Nsight defaults PTX (virtual ISA) and GPU SASS code generation to SM3.0.  My target Linux system also has a kepler GPU so SM3.0 is the correct architecture for Nsight to target. But if your target system has an older GPU, then check the SM1.1 or SM2.0 PTX check box to generate PTX code that the CUDA driver will JIT (just-in-time) compile to the architecture on the target system.

Next, enter your host architecture. Since my target is an x86_64 system, I can leave it as native or select “x86 (64-bit)” in the CPU architecture drop-down menu. Note if you have an ARM target you can also select ARM as the target CPU architecture. Click “Finish” and you should see the particles sample opened in Nsight and ready to use. Nsight will index all the headers when it opens the project for the first time, so let it complete this operation before creating a remote build on the target system.

Creating A Remote Synchronized-Project Build

In the Nsight project explorer select “particles”, then click on “File->Properties” to bring up the project settings on the particles project. In the properties UI click on “Build->Target Systems” which shows the UI for selecting the remote connection settings. Click on “Manage…” and then click “Add” to enter the host IP address and user name of the target system. When you click “Finish” you will see the entry for the new target system that you just added. Next, click on “Browse…” to choose a project path on the target system and click on “Manage…” to choose the toolkit path from the target system. In the dialog that pops up you can click on “Detect” to let Nsight auto-detect the installed toolkit path for you. Choose the target CPU architecture “x86 (64-bit)” and click “Apply” so you can also update the libraries next. The remote target system setup should look like Figure 2.

Remote target system setup with Nsight Eclipse Edition
Figure 2: Remote target system setup with Nsight Eclipse Edition

Based on your remote target architecture and remote OS, a couple of library settings need to be adjusted. First select [All configurations] to update library settings for all build targets. For a 64-bit Linux target, in the project properties click on “Settings->ToolSettings->NVCC Linker->Miscellaneous”, delete GLUT and -framework from the -Xlinker option, and change the libGLEW.a path in “other objects” to point to the Linux 64b samples/common lib: /usr/local/cuda-6.5/samples/common/lib/linux/x86_64/libGLEW.a (see Figure 3) shows.

Figure 3: Target linker settings
Figure 3: Target linker settings

Next, click on “Settings->ToolSettings->NVCC Linker->Libraries”, add glut in the Libraries section and change the “Library search path” to the target Linux toolkit path: /usr/local/cuda-6.5/samples/lib/linux/x86_64 (see Figure 4).

Figure 4: Target settings for libGLEW.
Figure 4: Target settings for libGLEW.

That’s it for the target library path updates. Next click OK to save the project settings then click on the build “hammer” icon in the toolbar to drop down the build menu. You will see the target system entry there for debug and release builds. Choose debug build; this will create a native build on the Linux target system from Nsight running on your Mac OS X host system.

Running Your Remote Application

Figure 5: The particles sample application running on the target system
Figure 5: The particles sample application running on the target system

Since your target system settings are already in place with remote build creation, running your application remotely is straightforward. In the Nsight project explorer left pane, click on the top-level Particles project. Then click on the Run icon in the toolbar to pull down the Run menu and select “Run As->Remote C/C++ Application”. Enter the password for the remote system if Nsight prompts for one. Nsight will launch the remote binary that it created on the target system and you will see the Particles application running on the target system’s display, as in Figure 5.

Debugging Your Remote Application

The particles CUDA sample uses the C++ Thrust library. To avoid hitting breakpoints in the Thrust library, let’s open the file particleSystem_cuda.cu file in the editor view and search for the collideD kernel. Press Function-F3 to open the kernel declaration in file particles_kernel_impl.cuh and double-click on line #308 to set your first breakpoint.

void collideD(float4 *newVel, // output: new velocity
float4 *oldPos,               // input: sorted positions
float4 *oldVel,               // input: sorted velocities
uint   *gridParticleIndex,    // input: sorted particle indices
uint   *cellStart,
uint   *cellEnd,
uint    numParticles)
   uint index = __mul24(blockIdx.x,blockDim.x) + threadIdx.x;

   if (index >= numParticles) return;

   // read particle data from sorted arrays
   float3 pos = make_float3(FETCH(oldPos, index));
   float3 vel = make_float3(FETCH(oldVel, index));

To debug the application, click on the “Debug” icon to the left of the “Run” icon to select “Debug As->Remote C/C++ Application”. When Nsight asks if it’s okay to switch to the “Debug Perspective”, select “Yes” and check the box to remember that choice. Because Nsight Eclipse Edition allows seamless debugging of both CPU and GPU code, it will stop at the first instruction executing on the CPU which is the first line in the main function of particles.cpp. You can single-step a bit there to see the execution on the CPU and watch the variables and registers as they get updated. In the breakpoint tab on the top right, you can see the breakpoint set at line #308 of particles_kernel_impl.cuh.

You can now resume the application, which will run until it hits the breakpoint we set in the collideD kernel. Once at the first breakpoint, you can browse the CPU and GPU call stack in the top-left pane. You can also view the variables, registers and hardware state in the top-right pane. You will see that the target GK110 GPU is executing 208 blocks of 256 total blocks occupying all the 13 SMs of the GK110 GPU.

You can also switch to disassembly view and watch the register values being updated by clicking on the “i->” icon to do GPU instruction-level single-stepping, as Figure 6 shows.

Nsight Eclipse Edition Debugger UI perspective, showing assembly code stepping.
FIgure 6: Nsight Eclipse Edition Debugger UI perspective, showing assembly code stepping.

Please note that when debugging a remote application that uses OpenGL-CUDA interop like this one, do not use the remote desktop keyboard or mouse, because it will not be interactive unless you have multiple GPUs. If you have a GK110 (SM35) and higher GPU then interacting with a single GPU is possible by enabling software preëmption when Nsight prompts you to switch to the debugger perspective. When you finish exploring the debug view, click on the red icon to stop the debugging session. Next, click on the C/C++ perspective icon on the right-hand side of the toolbar to switch back to the editor mode.

Profiling Your Remote Application

We need to create a release build before we launch the profiler. You can enable the -lineinfo option in the compile options to generate information on source-to-SASS instruction correlation. To do this, first go to the project settings by right-clicking on the project in the left pane. Then navigate to Properties->Build->Settings->Tool Settings->Debugging and check the box that says “Generate line-number…” and click Apply. Back in the main window, click on the build hammer drop-down menu to create a release build. Once the build is ready, profile your remote application by clicking on the “Profile” icon on the right of the Run icon. In the drop-down menu select “Profile AS->Remote C/C++ Application”, Nsight will prompt you to select the binaries; choose the release binary so it runs on the target system.

Unlike the debugger runs (with GPU compute capability 3.5 and lower), during profiler runs you can use your keyboard or mouse to interact with the active desktop on the target system. So use the mouse to close the particles application on the target. Once the application terminates, within seconds Nsight on the host system will process all the gathered records from the run and display the timeline in the “Profiler Perspective” view as in Figure 7.

Figure 7: Nsight Eclipse Edition Profiler UI perspective
Figure 7: Nsight Eclipse Edition Profiler UI perspective

You can roll the mouse over the timeline to see all the properties of the API calls and the CUDA kernels in the property panel to the right. There are many kernels launched by this application, so we will continue to focus on the collideD kernel. In the lower pane you will see the Analysis tab buttons to “Examine GPU usage” and “Examine individual kernels”. Click on the latter, which will cause the application to run again. Close the application on the target system again and you will see all the performance critical kernels in the right pane ranked in the order of importance (see Figure 7). The higher the rank, the better bang for the buck you will get for tuning kernel performance.

Let’s select the collideD kernel and then click on the “unguided analysis” icon under the “Analysis” tab. Then scroll down to click on “Kernel Profile”, which will analyze the source code of the collideD kernel and map it to the executed instructions. Once the analysis finishes, the pane on the right shows the kernel name. Double click on collideD and Nsight will bring up the source-to-SASS assembly instructions view which shows all the hot spots at instruction level. The kernel profile, shown in , as shown in Figure 8, provides the execution count, inactive threads, and predicated threads for each source and assembly line of the kernel. Using this information you can pinpoint portions of your kernel that inefficiently use compute resources due to divergence or predication.

Figure 8: Nsight Eclipse Edition source to SASS code correlation
Figure 8: Nsight Eclipse Edition source to SASS code correlation

As you can see, using NVIDIA® Nsight™ Eclipse Edition for remote development using “synchronized-project” mode is as simple as doing remote development using the “cross compilation” mode described in my earlier post. So go ahead give it a shot by downloading the recently announced CUDA 6.5 toolkit. Use cross compilation mode if you have an ARM target and want faster compilation on your x86 Ubuntu host system or use the remote synchronized-project mode if you want to use other Linux distributions or Mac OS X as the host system. Check out the CUDA documentation for more information on Getting Started Guides on CUDA toolkit Installation and Nsight, or read more about CUDA 6.5 here.


About Satish Salian

Satish Salian
Satish Salian is a Sr. Software Engineering Manager at NVIDIA responsible for the software stack and developer experience of world’s fastest deskside deep learning machine called the DIGITS DevBox. Satish has over 13 years of experience at NVIDIA with prior projects that include building CUDA developer tools, display control UI tools and SDKs at NVIDIA. He has a Bachelor's degree in Computer Engineering from University of Pune, India.
  • Damir

    I followed the above instructions up to trying to run the particles example.

    Here is what I get:

    Last login: Sun Aug 31 21:22:47 2014 from

    echo $PWD’>’

    /bin/sh -c “cd “/home/ubuntu/cuda-wrokspace”;export LD_LIBRARY_PATH=”/usr/local/cuda-6.0/lib”:${LD_LIBRARY_PATH};”/home/ubuntu/cuda-wrokspace/particles””;exit

    ubuntu@tegra-ubuntu:~$ echo $PWD’>’


    ubuntu@tegra-ubuntu:~$ /bin/sh -c “cd “/home/ubuntu/cuda-wrokspace”;export LD_ LIBRARY_PATH=”/usr/local/cuda-6.0/lib”:${LD_LIBRARY_PATH};”/home/ubuntu/cuda -wrokspace/particles””;exit

    /bin/sh: 1: /home/ubuntu/cuda-wrokspace/particles: Permission denied


    Do you have any idea what is the source of this problem?

    • Satish

      Sorry about the late response. Nsight 6.0 had some flacky connection bug that may cause such file permission update issues on the JetsonTK1. Nsight 6.5 has this bug fixed. Since your target is JetsonTK1, you are doing the right thing by continuing to use 6.0 toolkit. To work around this issue you may want to try updating the file permission manually on the target by enabling execute and write permissions using “chmod 777 particles”.

  • http://dysco.imtlucca.it/sopasakis Pantelis Sopasakis

    Thanks a lot for the excellent tutorial. In my case the host is a MacOSX and the remote target is a Jetson TK1 and I made it to successfully compile and run some CUDA projects. I would just like to add that it was necessary to change the (remote) compiler’s path to /usr/bin/arm-linux-gnueabihf-g++-4.8 (the default was g++ version 4.6 for the ARM architecture, while my target had version 4.8 already installed). To do so, go to “Project name>Properties>Build>Settings>Build stages>Compiler path” and do the same at “Project name>Properties>Build>Settings>NVCC Linker>Miscelaneous>Compiler path” (see the attached figures).

    • Satish

      Excellent great to know that you were successful in creating CUDA applications for JetsonTK1 using your MacOSX host system. For Jetson TK1 targets with default g++-4.8, yes your proposed change in NsightEclipse is necessary. One could also create a g++-4.6 symlink on the target as follows: sudo ln -sf `which arm-linux-gnueabihf-g++` /usr/bin/arm-linux-gnueabihf-g+

  • peepo

    bug #1566745
    MacOSX host Jetson target builds and runs particles fine, however:

    Debugging Your Remote Application
    set break point then
    “Debug As->Remote C/C++ Application”

    error message:

    Error in final launch sequence
    Failed to execute MI command:
    -target-select remote
    Error message from debugger back end:
    cuda-gdb version (6.5.121) is not compatible with cuda-gdbserver version (6.0.116).nPlease use the same version of cuda-gdb and cuda-gdbserver.

    NB “Debug Perspective” dialogue not raised

    please note on Jetson target:

    /usr/local/cuda-6.0/bin$ ./cuda-gdbserver –version
    NVIDIA (R) CUDA gdbserver
    6.5 release

    /usr/local/cuda-6.0/bin$ ./cuda-gdb -v
    NVIDIA (R) CUDA Debugger
    6.0 release

    • peepo

      seems for Jetson one must install CUDA 6.0 toolkit on MacOSX

      • Satish

        That’s right if you are using Jetson TK1 as a target device please continue to use 6.0 toolkit as mentioned in the CUDA TK setup section. CUDA6.5TK will be available in a future JetsonTK1 OS image version Rel21.2. You can check your JetsonTK1 release as follows:

        > head -1 /etc/nv_tegra_release

  • peepo

    add “/usr/local/cuda-6.0/samples/common/inc”
    to Properties Settings NVCC compiler Includes

    wfm cuda-6.0 on OSX with same on Jetson

    not needed for some reason on cuda 6.5 but see my note below

  • peepo

    can step through CPU code, but having set breakpoint and clicked resume, Registers Value are all Error: Target not available and Disassembly No debug context

    please advise further


    OSX host Jetson target

  • Scott Jordan

    I followed this sample using a 64-bit linux host and the Jetson TK1 remote. It worked just fine, but the remote application only runs at 6-7 frames per second. The one made on the Jetson from the the cuda samples directory runs at 60 fps. I was wonder what caused the slow down and how to fix it?

    • Satish

      Great good to know you are running code on your JetsonTK1. On the perf issue note that Jetson TK1 has a Kepler class GPU so make sure you check SM32(3.2) in the “Generate GPU code” option under Project>Build>Settings>CUDA.

      • Scott Jordan

        I set the GPU code to 3.2 and the PTX code to 3.0 and still didn’t see a performance increase.

        • Satish

          Make sure you are not running in the debugger and that the “Enable CUDA memcheck” box is unchecked under debug configurations->Debugger tab.

          • Scott Jordan

            Ok it worked. I just needed to run it from the release build.

  • Cosmin

    Is it possible to have Nsight index also header files on the remote machine? For example, I have project which uses the OpenCV library, which is installed only on the Jetson TK1 system. Locally I don’t have an OpenCV installation…

    • Satish

      Nope you can index only project files on the host system. In sync project mode if you maintain files on the host then those will get sync’d with the target.

      • Cosmin

        I’ve read that newer Eclipse versions provide this functionality. AFAIK, Nsight is currently based on Eclipse Juno.
        Are there any plans to upgrade to a newer version of Eclipse?

        • Satish

          Yeah sometime later 2015 to 4.4 Luna.

  • Maxim Semenov

    I’m using freshly installed JetPack (CUDA 6.5) and trying to remotly debug particles example (Jetson target). I’m able to build and run example, but when in the debug, kernel breakpoint are not hit, it stops once in the main and after I click resume I can see that application starts on target (application window opens) but never stops in the kernel and nothing going on in the application window.

    Any suggestions?

  • Raj

    While running it shows
    ubuntu@tegra-ubuntu:~$ echo $PWD’>’
    ubuntu@tegra-ubuntu:~$ /bin/sh -c “cd “/home/ubuntu/NSIGHT_WORKSPACE/one/Debug “;export LD_LIBRARY_PATH=”/usr/local/cuda-6.5/lib”:${LD_LIBRARY_PATH};”/home /ubuntu/NSIGHT_WORKSPACE/one/Debug/one””;exit
    /bin/sh: 1: cd: can’t cd to /home/ubuntu/NSIGHT_WORKSPACE/one/Debug
    /bin/sh: 1: /home/ubuntu/NSIGHT_WORKSPACE/one/Debug/one: not found
    logout in the console .
    am using nsight 6.5 with L4T R21.3

    Please respond asap

    • Satish

      You seem to be missing your debug binary named “one”. Please make sure you have built the debug binary and it exists prior to launching it.

      • Raj

        Yes it exists in nsight(host pc -ubuntu 14) after build.But it couldnt upload to jetson kit .no directories are created after NSIGHT_WORKSPACE /. !!

        • Satish

          Make sure you are able to access ssh into JetsonTK1 from your host. Also recreate the connection in NsightEE.

  • Wilson


    I want to develop program with opencv on Jetson TK1 by creating a new sync-mode project. After trying with several methods, I still fail to run my program on Jetson TK1 and I have no idea how to resolve them. I wrote my configuration steps in a file, download link:


    Hope any one who knows how to run “Jetson TK1 Nsight program development+opencv” can tell me how to resolve the problems I met


    • Satish

      Your issue is unrelated to anything opencv. From your dropbox description you seem to be building the project on the host system instead of the target JetsonTK1. There is build menu for target system that appears when you click on the build hammer(reverse triangle) – please select that to build on target system. Read my note under Figure 4: “click on the build “hammer” icon in the toolbar to drop down the build
      menu. You will see the target system entry there for debug and release
      builds. Choose debug build; this will create a native build on the Linux
      target system from Nsight running on your Mac OS X host system.”

      • Wilson

        Firstly thanks for your reply very much. I followed the steps you mentioned, and it worked! REALLY thank you very much!!! Pretty helpful!

  • lee

    Hi all
    with all the new updates that have happened since July, would it be possible to update some of the how to instructions for newbies like myself. especially for the Mac setups. it seems there are some differences in versions that the Jetson and Mac have for downloads especially for the Cuda toolboxes they are using. Almost everything has switched to 6.5.