Image recognition and GPUs go hand-in-hand, particularly when using deep neural networks (DNNs). The strength of GPU-based DNNs for image recognition has been unequivocally demonstrated by their success over the past few years in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), and DNNs have recently achieved classification accuracy on par with trained humans, as Figure 1 shows. The new Low-Power Image Recognition Challenge (LPIRC) highlights the importance of image recognition on mobile and embedded devices.
DNNs with convolutional layers are a biologically inspired artificial neural network. These networks may have five or more layers with many neurons in each layer. Links similar to synapses connect the layers, forwarding information to the next layer. The training process adjusts weights on the links, improving the network’s ability to classify the information presented to it. The more data used to train a DNN, the better its classification performance. This big data requirement has resulted in heavy GPU use, because GPUs are designed for high throughput on highly parallel computations like those used in deep learning.
ImageNet is a great resource for imagery, hosting a large database of images organized according to a hierarchy of descriptive nouns. Each year, ImageNet hosts the ILSVRC, for which entrants develop algorithms for accurately recognizing objects in the images. ImageNet provides a large image set of over 1.2 million images from 1000 different object categories for training recognition algorithms. Academic as well as industrial participants have performed strongly, with competitors from Google, Stanford University, University of California, Berkeley, and Adobe (among many others) in recent years.
A Low-Power Challenge
To motivate improved image recognition on low-power devices, Yung-Hsiang Lu, Associate Professor of Electrical and Computer Engineering at Purdue University, and Alex Berg, Assistant Professor of Computer Science at UNC Chapel Hill, are organizing the Low-Power Image Recognition Challenge (LPIRC), a competition focused on identifying the best technology in both image recognition and energy conservation. Registration for the LPIRC is now open.
Achieving high performance while maintaining low power can be challenging, as these two parameters often increase together. Last year NVIDIA released the Jetson TK1 Development Kit, a low-power GPU-accelerated computing platform that is well-suited for image processing and computer vision applications. Jetson TK1’s low power requirements and image processing capabilities will make it a popular platform for LPIRC competitors. Continue reading →
Deep learning models are making great strides in research papers and industrial deployments alike, but it’s helpful to have a guide and toolkit to join this frontier. This post serves to orient researchers, engineers, and machine learning practitioners on how to incorporate deep learning into their own work. This orientation pairs an introduction to model structure and learned features for general understanding with an overview of the Caffe deep learning framework for practical know-how. References highlight recent and historical research for perspective on current progress.The framework survey points out key elements of the Caffe architecture, reference models, and worked examples. Through collaboration with NVIDIA, drop-in integration of the cuDNN library accelerates Caffe models. Follow this post to join the active deep learning community around Caffe.
Automating Perception by Deep Learning
Deep learning is a branch of machine learning that is advancing the state of the art for perceptual problems like vision and speech recognition. We can pose these tasks as mapping concrete inputs such as image pixels or audio waveforms to abstract outputs like the identity of a face or a spoken word. The “depth” of deep learning models comes from composing functions into a series of transformations from input, through intermediate representations, and on to output. The overall composition gives a deep, layered model, in which each layer encodes progress from low-level details to high-level concepts. This yields a rich, hierarchical representation of the perceptual problem. Figure 1 shows the kinds of visual features captured in the intermediate layers of the model between the pixels and the output. A simple classifier can recognize a category from these learned features while a classifier on the raw pixels has a more complex decision to make.
In the previous CUDACasts episode, we saw how to flash your Jetson TK1 to the latest release of Linux4Tegra, and install both the CUDA toolkit and OpenCV SDK. We’ll continue exploring the power efficiency the Jetson TK1 Kepler-based GPU brings to computer vision by porting a simple OpenCV sample to run on the GPU. We’ll explore computer vision further in a future CUDACast when we look at the VisionWorks toolkit from NVIDIA.
The Jetson TK1 development kit has fast become a must-have for mobile and embedded parallel computing due the amazing level of performance packed into such a low-power board. In this and the following CUDACast, you’ll learn how to get started building computer vision applications on your Jetson TK1 using CUDA and the OpenCV library.
In an earlier post we showed how MATLAB® can support CUDA kernel prototyping and development by providing an environment for quick evaluation and visualization using the CUDAKernel object. In this post I will show you how to integrate an existing library of both host and device code implemented in C++ or another CUDA-accelerated language using MEX. With MEX you can extend and customize MATLAB, or use MATLAB as a test environment for your production code.
The MATLAB MEX compiler allows you to expose your libraries to the MATLAB environment as functions. You write your entry point in C, C++ or Fortran as a modified main() function which MATLAB invokes. MEX provides a framework for compiling this code, as well as an API for interacting with MATLAB and MATLAB data in your source code.
MATLAB’s Parallel Computing Toolbox™ provides constructs for compiling CUDA C and C++ with nvcc, and new APIs for accessing and using the gpuArray datatype which represents data stored on the GPU as a numeric array in the MATLAB workspace.
Feature Detection Example
I am going to use a feature detection example from MATLAB’s documentation for Computer Vision System Toolbox™. This uses tracked features to remove camera shake from an in-car road video. You will need MATLAB®, Parallel Computing Toolbox™, Image Processing Toolbox™ and Computer Vision System Toolbox™ to run the code. You can request a trial of these products at www.mathworks.com/trial. This example also depends on the OpenCV Computer Vision library, compiled with CUDA support.
Features are an essential prerequisite for many Computer Vision tasks; in this case, for instance, they might also be used to determine the motion of the car or to track other cars on the road.
To set up the example environment, I am using the following MATLAB code to load the video and display the first two frames superimposed (Figure 1). Continue reading →
NVIDIA’s Tegra K1 (TK1) is the first ARM system-on-chip (SoC) with integrated CUDA. With 192 Kepler GPU cores and four ARM Cortex-A15 cores delivering a total of 327 GFLOPS of compute performance, TK1 has the capacity to process lots of data with CUDA while typically drawing less than 6W of power (including the SoC and DRAM). This brings game-changing performance to low-SWaP (Size, Weight and Power) and small form factor (SFF) applications in the sub-10W domain, all the while supporting a developer-friendly Ubuntu Linux software environment delivering an experience more like that of a desktop rather than an embedded SoC.
Tegra K1 is plug-and-play and can stream high-bandwidth peripherals, sensors, and network interfaces via built-in USB 3.0 and PCIe gen2 x4/x1 ports. TK1 is geared for sensor processing and offers additional hardware-accelerated functionality asynchronous to CUDA, like H.264 encoding and decoding engines and dual MIPI CSI-2 camera interfaces and image service processors (ISP). There are many exciting embedded applications for TK1 which leverage its natural ability as a media processor and low-power platform for quickly integrating devices and sensors.
As GPU acceleration is particularly well-suited for data-parallel tasks like imaging, signal processing, autonomy and machine learning, Tegra K1 extends these capabilities into the sub-10W domain. Code portability is now maintained from NVIDIA’s high-end Tesla HPC accelerators and the GeForce and Quadro discrete GPUs, all the way down through the low-power TK1. A full build of the CUDA 6 toolkit is available for TK1, including samples, math libraries such as cuFFT, cuBLAS, and NPP, and NVIDIA’s NVCC compiler. Developers can compile CUDA code natively on TK1 or cross-compile from a Linux development machine. Availability of the CUDA libraries and development tools ensures seamless and effortless scalability between deploying CUDA applications on discrete GPUs and on Tegra. There’s also OpenCV4Tegra available as well as NVIDIA’s VisionWorks toolkit. Additionally the Ubuntu 14.04 repository is rich in pre-built packages for the ARM architecture, minimizing time spent tracking down and building dependencies. In many instances applications can be simply recompiled for ARM with little modification, as long as source is available and doesn’t explicitly call out x86-specific instructions like SSE, AVX, or x86-ASM. NEON is ARM’s version of SIMD extensions for Cortex-A series CPUs. Continue reading →
Today, cars are learning to see pedestrians and road hazards; robots are becoming higher functioning; complex medical diagnostic devices are becoming more portable; and unmanned aircraft are learning to navigate autonomously. As a result, the computational requirements for these devices are increasing exponentially, while their size, weight, and power limits continue to decrease. Aimed at these and other embedded parallel computing applications, last week at the 2014 GPU Technology Conference NVIDIA announced an awesome new developer platform called Jetson TK1.
Jetson TK1 is a tiny but full-featured computer designed for development of embedded and mobile applications. Jetson TK1 is exciting because it incorporates Tegra K1, the first mobile processor to feature a CUDA-capable GPU. Jetson TK1 brings the capabilities of Tegra K1 to developers in a compact, low-power platform that makes development as simple as developing on a PC.
Jetson TK1 is aimed at two groups of people. The first are OEMs, including robotics, avionics, and medical device companies, who would like to develop new products that use Tegra K1 SoCs, and need a development platform that makes it easy to write software for these products. Once these companies are ready to move to production, they can work with one of our board partners to design the exact board that they need for their product. The second group is the large number of independent developers, researchers, makers, and hobbyists who would like a platform that will enable them to create amazing technology such as robots, security devices, or anything that needs substantial parallel computing or computer vision in a small, flexible and low-power platform. For this group, Jetson TK1 offers the size and adaptability of Raspberry Pi or Arduino, with the computational capability of a desktop computer. We’re excited to see what developers create with Jetson TK1!
Tegra K1 is NVIDIA’s latest mobile processor. It features a Kepler GPU with 192 cores, Continue reading →
Artefacto Estudio is a developer of interactive applications and games. The company’s projects include a real-time virtual shoe fitting kiosk that allows people to “try on” shoes using augmented reality powered by Microsoft Kinect and GPU computing (see the video).
NVIDIA: Néstor, tell us a bit about Artefacto Estudio. Néstor: Artefacto is an independent development studio. We integrate solutions using cutting-edge technologies like Microsoft Kinect, Oculus Rift and Leap Motion.
NVIDIA: How did you become involved in the shoe industry? Néstor: An ad agency, Kempertrautmann, was seeking a technology partner to work on a prototype for a virtual shoe fitting exhibit for Goertz, the German shoe company.