Easy Multi-GPU Deep Learning with DIGITS 2

DIGITS is an interactive deep learning development tool for data scientists and researchers, designed for rapid development and deployment of an optimized deep neural network. NVIDIA introduced DIGITS in March 2015, and today we are excited to announce the release of DIGITS 2, which includes automatic multi-GPU scaling. Whether you are developing an optimized neural network for a single data set or training multiple networks on many data sets, DIGITS 2 makes it easier and faster to develop optimized networks in parallel with multiple GPUs.

Deep learning uses deep neural networks (DNNs) and large datasets to teach computers to detect recognizable concepts in data, to translate or understand natural languages, interpret information from input data, and more. Deep learning is being used in the research community and in industry to help solve many big data problems such as similarity searching, object detection, and localization. Practical examples include vehicle, pedestrian and landmark identification for driver assistance; image recognition; speech recognition; natural language processing; neural machine translation and mitosis detection.

This is a short sample clip promoting a 7 minute introduction to the DIGITS 2 deep learning training system. Watch the full-length video.

DNN Development and Deployment with DIGITS

Developing an optimized DNN is an iterative process. A data scientist may start from a popular network configuration such as “AlexNet” or create a custom network, and then iteratively modify it into a network that is well-suited for the training data. Once they have developed an effective network, data scientists can deploy it and use it on a variety of platforms, including servers or desktop computers as well as mobile and embedded devices such as Jetson TK1 or Drive PX. Figure 1 shows the overall process, broken down into two main phases: development and deployment.

Figure 1: Deep Learning Neural Network Development and Deployment Workflow Process
Figure 1: Deep Learning Neural Network Development and Deployment Workflow Process

DIGITS makes it easy to rapidly develop an optimized DNN, by providing interactive adjustments for the network parameters needed to develop and train the best DNN for your dataset. With DIGITS it is easy to create new datasets and select them for training; during the DNN development process, DIGITS lets you append new data to a dataset or inflate the data to account for variations in object orientation or other distortions that may occur in the model’s deployed environment.

Solver parameters such as learning rate and policy are also easy to adjust, and the batch size and cadence of the accuracy test can be quickly modified. DIGITS provides the flexibility to train with a standard network, to modify or fine tune an existing network, or to create a custom network from scratch. Once configuration is complete, you’re ready to start training. While training, DIGITS displays the accuracy of the network, helping you make decisions about its performance in real time, and if needed, terminate the training and reconfigure the network parameters.

Once you have developed an effective network, DIGITS can bundle all of the network files into a single download. This makes it easy to deploy an optimized network to any device. If there are misclassifications or a new category needs to be added in the future, it is easy to adjust the network, retrain and then redeploy it.

Let’s take a look at the new features in DIGITS 2.

Train Networks Faster with Multiple GPUs

DIGITS 2 enables automatic multi-GPU scaling. With just a few clicks, you can select multiple GPUs. As datasets get larger, training with more GPUs allows networks to ingest more data in less time, saving precious time during the development process. This easy-to-use feature is visible near the bottom of the the New Image Classification Model page, shown in Figure 2.

Figure 2: Model Creation Interface
Figure 2: DIGITS Model Creation Interface

DIGITS can train multiple networks on the same data set in parallel, or train the same network on multiple datasets in parallel. With the GPU selection option, you can select the GPUs to use for training each data set, making it easier to multi-task with your hardware.

Figure 3 shows how using multiple GPUs can reduce training time. The graph plots the speedup for training GoogleNet on 1, 2 and 4 GPUS with a batch size of 128. These results were obtained with a DIGITS DevBox using GeForce TITAN X GPUs and the Caffe framework.

Training Speedup Achieved with DIGITS on Multiple GeForce TITAN X GPUs in a DIGITS DevBox. These results were obtained with the Caffe framework and a batch size of 128.
Training Speedup Achieved with DIGITS on Multiple GeForce TITAN X GPUs in a DIGITS DevBox. These results were obtained with the Caffe framework and a batch size of 128.

New Solvers

DIGITS 2 adds two new solvers: adaptive gradient descent (ADAGRAD) and Nesterov’s accelerated gradient descent (NESTEROV). These are selectable along with standard stochastic gradient descent from the Solver Type drop-down menu on the left hand side of the New Image Classification Model window.

The Solver Options pane, shown in Figure 2, lets you configure the snapshot interval, validation interval, batch size, and learning rate policies for the solver.

GoogLeNet Standard Network

Figure 4: Example Network Configuration for LeNet
Figure 4: Example Network Configuration for LeNet

GoogLeNet won the classification and object recognition challenges in the 2014 ImageNet LSVRC competition. This standard network is listed with the two others, LeNet and AlexNet, in the Standard Networks pane shown in Figure 2. Some users like to begin their network optimization process with a standard network, and then customize it based on results. Like the other standard networks, LeNet and AlexNet, GoogleNet is a great starting place for developing the optimum DNN for a data set.

The Custom Network edit box (Figure 2) has settings for the layers, activation function (ReLU, TANH, or sigmoid), and bias value. Selecting the Visualization button on the Custom Network tab is a quick and easy way to view modifications before training. Figure 4 shows the visualization of the standard LeNet network.

Improved Visualization and Monitoring

During training, DIGITS 2 now shows the utilization of all GPUs in use in the training window as Figure 5 shows. The utilization, memory, and temperature of the GPU are posted in the training window next to the network performance plot. This allows you to monitor GPU usage in real time during training, even without direct access to the host machine DIGITS is running on. You can easily halt training if you find that the GPUs are under-utilized, and go back to the New Image Classification Model window and adjust network parameters such as batch size.

Figure 5: Image of example training performance and GPU utilization information while training.
Figure 5: Image of example training performance and GPU utilization information while training.

Classification During Training

DIGITS can quickly perform classification during the training process with the Classify One Image button at the bottom of the training window (Figure 5). With Show Visualizations and Statistics selected, DIGITS plots the weights and responses of the network from the input image. Figure 6 shows example output from the first layer. In addition to the network responses, DIGITS now plots statistical information alongside each layer parameter, including the frequency, mean, and standard deviation. This helps you understand the overall response of the network from the input image. The classification results displayed along the left hand side show the input image and response from the first convolutional layer including the weights, activations, and statistical information.

Figure 6: Example DIGITS 2 classification results. DIGITS 2 presents statistical information for each layer.
Figure 6: Example DIGITS 2 classification results. DIGITS 2 presents statistical information for eacn layer.

Deploying with DIGITS

It’s easy to download a trained network and deploy on another system. Use the Download button near the bottom of the trained model window to get a copy of all the necessary network files needed for deploying a model to new hardware. There are two new example scripts provided with DIGITS 2 under ${DIGITS_ROOT}/examples/classification. One works directly with the .tar.gz file downloaded from DIGITS, and another that allows specification of the network files to use. Example commands for these scripts are shown below.

./use_archive.py DIGITS_Network_files.tar.gz path/to/image.jpg

./example.py network_snapshot.caffemodel deploy.prototxt image.jpg -l labels.txt -m mean.npy

Get Started with DIGITS 2 Today

A preview build of DIGITS 2 that includes all capabilities described in this post is available today. Convenient installation packages are available on the NVIDIA developer website.

The full list of features and changes is in the DIGITS release notes. To learn more about DIGITS, sign up for the upcoming webinar, “Introducing DIGITS 2”. You can also read my post about the first DIGITS release.

To access the DIGITS source code or contribute to the DIGITS project, please visit the open source repository at on GitHub. For questions, feedback and interaction with the DIGITS user community, please visit the DIGITS mailing list or email digits-users@googlegroups.com.

34 Comments
  • Gonzalo Vaca

    Thanks for the post.
    We just got the digits box this week and comes with digits v1 pre-installed.what is the best procedure to update to Digits V2?

  • Alberto

    I am a researcher running many different models. Is there a way to script the model set-up (maybe through a command line interface, a la DIGITS 1)?

    • Allison Gray

      Are you interested in quickly visualizing each network’s performance? It is easy to toggle between network results via the main console, and it is relatively easy to launch a training on each of your GPUs or multiple ones via the web interface. If you don’t care about the visualization part you could just write a shell script that will launch all of your trainings for you, assigning them to the GPUs.

      • Alberto

        Thanks for your quick reply Allison! I am interested in the latter. Is there documentation that I can follow to set this up? The downside of a beautiful GUI is one loses contact with the commands

        • Allison Gray

          I just reviewed our API commands and I don’t see anything for training with DIGITS this way, https://github.com/NVIDIA/DIGITS/blob/digits-2.0/docs/API.md

          If you just want to launch trainings you can use Caffe directly

          ./path/to/caffe/build/tools/caffe train –gpus=0 –solver=solver.prototxt

          If you downloaded DIGITS with the web installer (https://developer.nvidia.com/digits, caffe will likely be in path/to/digits-1.0/caffe.

          I just realized you mention “model setup” above, do you also want to create a variety of different NN configurations too via the command line?

          • Alberto

            Hi again,

            Yes, basically I want to create many different kind of models using the command line. I suppose I can do that via Caffe, I just assumed there was something in DIGITS (performance-related) that was not part of the master branch of Caffe.

            If this is indeed not the case, is there still a way to load trained Caffe models into DIGITS and visualise them, or do I need to create them via the DIGITS GUI for that? I just find very time-consuming to have to navigate through the buttons and menus to set up multiple models.

            Thanks,

          • Allison Gray

            I have not seen an easy way to do that with DIGITS.

            I have heard of folks using tools (or creating them themselves) for this but I don’t recall their names now. I tried to do a quick search on github but didn’t find anything.

            I stumbled across this tool, but it doesn’t seem to be a random NN creator, https://github.com/Chasvortex/caffe-gui-tool which it sounds like you are looking for.

            If I were going to do this myself, I would create it in python (just because it is a language I am comfortable with), allowing for patching layer components together and generating/saving the new train_val.prototxt files.

            Have you tried posting this question on one of the framework user groups?

            Yes, you can create a model outside of DIGITS and then use it for classification. There are some instructions here – https://github.com/NVIDIA/DIGITS/issues/49. This allows you to load a pretrained network, but it won’t show you the change in accuracy and loss as a function of epoch.

  • J Ehnes

    How do you extend Digits (1 or 2) to support speech recognition tasks?

    In the standard setup only image datasets can be imported, no audio datasets.

    Also all models you can create are for image classification only.

    How could you add functionality to do something as described in “Deep Speech: Accurate Speech Recognition with GPU-Accelerated Deep Learning” in this web server based platform with “Improved Visualization and Monitoring” and “classification during training” for sound files etc.?

    • Allison Gray

      If you can insert your data as a linear array such a [1xnumber], you can upload it into DIGITS. It will treat the input data like a image with a single line of pixels. Right now DIGITS supports classification, would you like to perform classification with your audio data?

  • Rick Dell

    Any plans of having a Windows version of Digits in the near future?

    • Allison Gray

      It looks like there has been some activity from users on this -https://github.com/NVIDIA/DIGITS/pull/199

  • pato lobos

    Nvidia,please release a windows version! You hava a large base of windows users (Gaming community, etc) so I think is the natural evolution of ML to take this market. I have my self a windows server 2012 with 2 Maxwell 980 GPU’s waiting for this… and if you really launch it, I will swap those 2 for 3 Titan X…. your move Nvidia :)

  • Luciano

    I am trying to run it on a virtual machine but I am receiving the error message: “CUDA driver version is insufficient for CUDA runtime version”. Is it possible to run on VM?

    • Allison Gray

      You should be able to run DIGITS on a VM. What OS and driver are you using? Are you able to run any of the CUDA samples?

      • JP

        I’m getting a similar problem (Ubuntu 14.04 on a VM), when I run ./runme.sh I get “cudaRuntimeGetVersion() failed with error #35”, the server loads and the interface is reachable but I’m assuming DIGITS doesn’t get GPU assistance here. When I try to install the Linux drivers on my virtual machine (Ubuntu) I get an error saying it does not recognize a compatible card. I have a GeForce GTX 460. Any ideas? Thanks!

        • Allison Gray

          I haven’t tried using a GeForce card for pass-through on a VM. I assume you are able to run nvidia-smi and any CUDA sample program, is this correct?

          • JP

            when I run nvidia-smi I get “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running”. And if I try installing the drivers I downloaded from the NVidia website (Linux 64) I also get an error saying it couldn’t find a CUDA compatible card…. My card is CUDA compliant according to the Nvidia web site, as far as I understand. Much appreciate your help!

          • Allison Gray

            Do you know if your VM can see the GPU? If you run “lspci | grep NVIDIA” do you see any NVIDIA devices?

            How did you install your NVIDIA driver? Are you using the cuda db file from -https://developer.nvidia.com/cuda-downloads?

          • JP

            I don’t see an nvidia device after running that command, I’m using VMWare Workstation with a i7 processor. And I tried installing the Nvidia driver with “sh “

          • Allison Gray

            I suspect you effectively configured your VM with the GPU -https://blogs.vmware.com/euc/2015/03/nvidia-grid-vgpu-vmware-horizon.html, right? I asked some of our GRID folks about using a GeForce card with VMware and found out that GeForce is not supported for pass through. I think this is the root cause of the issue here. Sorry about this. Here is a link to supported GPUs with VMware -https://www.vmware.com/resources/compatibility/search.php?deviceCategory=vdga

      • Tong

        hi ,Allison, thanks for your video , I already set up the digits 2.0 on my Vritual box. now I got the same error msg when I reached the model step ,
        and I am not using NVIDIA devices on my laptop,is that the reason why ? thank you

        • Allison Gray

          Sounds like you are on the right track. You have to rebuild Caffe without the GPU. I have not had this issue before. After you rebuilt caffe, did you get any errors when you ran “make runtest”?

          Are you using the DIGITS version from our developer zone -https://developer.nvidia.com/digits or did you get DIGITS from github?

          • Tong

            hi Allision , I think i registered the memeber from NVIDIA and follow the download link to get my Digits2.0, and also I followed the other reply from you in Digits google user group saying enable CPU_ONLY in the MakeFile.config, than rebuild caffe, “make clean” then “make all -j8” , after than I got the error msg “fatal error ” “cnmem.h” no such file or directory … ,I went into the lib folder and under the cumem there is only a libcnmem.so, does that infer I got the digits2.0 from the wrong source ? thank you

          • Tong

            now , I turn off the “USE_CNMEM=1 ” option in Makefile.config file , it is running fine so far , thanks lot

  • roland j

    Hi!

    I’ve setup digits-2.0 and when I run ./runme.sh, I get:
    cudaRuntimeGetVersion() failed with error #30, which means “unknown error”.
    Did you forget to “make pycaffe”?
    Traceback (most recent call last):
    File “digits-devserver”, line 39, in
    config.load_config(‘quiet’)

    #system info:
    $ nvcc -V
    Cuda compilation tools, release 6.5, V6.5.16
    $ nvidia-smi
    Mon Nov 16 13:09:45 2015
    +——————————————————+
    | NVIDIA-SMI 346.96 Driver Version: 346.96 |
    …..
    $ lspci | grep VGA
    VGA compatible controller: NVIDIA Corporation GM107 [GeForce GTX 750] (rev a2)

    So, what might be the problem? Is DIGITS2 incompatible with CUDA 6.5?

    Kind regard,
    Roland

    PS: I updated my graphics driver from v343 to v346, as I got the the error #35 before…

  • Songyue Qian

    Hi!
    I plan to implement DIGITS for some experiments. It works fine in my laptop. However, when I try to install ubuntu 14.04 in my office’s desktop which has two image cards, it can’t be installed correctly. The error pops up is “ACPI PCC Probe failed”. I checked the solution is that there is an compatibility issue in between Ubuntu 14 and multiple display adapters. I know it should be a question about Ubuntu os, but I want to know that have you guys meet the same situation before? If so, how did you fix it? BTW, both of two display adapters are NIVIDA Quadro K620.

    Thanks,

  • John

    Could you quickly explain the accuracy or give me a pointer to better understand it? Especially, is the accuracy for AlexNet Top-1 or Top-5?

    • Allison Gray

      Great question. Are you asking about the Top-1 and Top5 accuracy from the ImageNet competition? If so, you can review ImageNet website for information on how they apply this metric to competitors in 2012 when this network won, http://image-net.org/challenges/LSVRC/2012/index, or check out the ImageNet citation here too – http://arxiv.org/pdf/1409.0575v3.pdf. I think there is a brief discussion on this on page 15.

      • John

        Thanks for the reply! Will have a look at it.

        However, I was wondering about the accuracy/loss graph within digits. Does the “Accuracy” for AlexNet represent the Top-1 accuracy? And is that for a random subset of the validation set or the whole validation set. Thanks!

  • Alex

    Good day! Is it possible to use the multiple GPUs option during the decision phase or only during the training phase? thanks

  • Michael Holm

    Will you please share a more detailed english explanation/tutorial/post on the meaning of the distributions under the “Show visualizations and statistics” option. I’m not asking for a lesson in statistics (I have a graduate degree in math), but there seems to be substantial explanation missing for this results page, as I’m not able to connect the dots between the pictures and the distributions.

    Thank you.