About Dustin Franklin

Dustin Franklin
Dustin is a Developer Evangelist on the Jetson team at NVIDIA. With a background in robotics and embedded systems, Dustin enjoys helping out in the community and working on projects with Jetson. You can find him on Devtalk.
Figure 4. Jetson TX1 Developer Kit, including module, reference carrier and camera board.

NVIDIA® Jetson™ TX1 Supercomputer-on-Module Drives Next Wave of Autonomous Machines

Figure 1. The 50x87mm embedded Jetson TX1 module and thermal plate, featuring integrated Maxwell GPU, ARMv8 CPU, and H.265 video processor.
Figure 1. The 50x87mm embedded Jetson TX1 module and thermal plate, featuring integrated Maxwell GPU, ARMv8 CPU, and H.265 video processor.

Today NVIDIA introduced Jetson TX1, a small form-factor Linux system-on-module, destined for demanding embedded applications in visual computing.  Designed for developers and makers everywhere, the miniature Jetson TX1 (figure 1) deploys teraflop-level supercomputing performance onboard platforms in the field.  Backed by the Jetson TX1 Developer Kit, a premier developer community, and a software ecosystem including Jetpack, Linux For Tegra R23.1, CUDA Toolkit 7, cuDNN, and VisionWorks, Jetson enables machines everywhere with the proverbial brains required to achieve advanced levels of autonomy in today’s world.

Aimed at developers interested in computer vision and on-the-fly sensing, Jetson TX1’s credit-card footprint and low power consumption mean that it’s geared for deployment onboard embedded systems with constrained size, weight, and power (SWaP).  Jetson TX1 exceeds the performance of Intel’s high-end Core i7-6700K Skylake in deep learning classification with Caffe, and while drawing only a fraction of the power, achieves more than ten times the perf-per-watt.

Jetson provides superior efficiency while maintaining a developer-friendly environment for agile prototyping and product development, removing extra legwork typically associated with deploying power-limited embedded systems. Jetson TX1’s small form-factor module enables developers everywhere to deploy Tegra into embedded applications ranging from autonomous navigation to deep learning-driven inference and analytics. Continue reading

Figure 4: MMTI and trainable HoG pedestrian/vehicle detectors extract dynamic obstacles from HD video at runtime

Low-Power Sensing and Autonomy With NVIDIA Jetson TK1

Figure 1: simple TK1 block diagram
Figure 1: simple TK1 block diagram

NVIDIA’s Tegra K1 (TK1) is the first ARM system-on-chip (SoC) with integrated CUDA.  With 192 Kepler GPU cores and four ARM Cortex-A15 cores delivering a total of 327 GFLOPS of compute performance, TK1 has the capacity to process lots of data with CUDA while typically drawing less than 6W of power (including the SoC and DRAM).  This brings game-changing performance to low-SWaP (Size, Weight and Power) and small form factor (SFF) applications in the sub-10W domain, all the while supporting a developer-friendly Ubuntu Linux software environment delivering an experience more like that of a desktop rather than an embedded SoC.

Tegra K1 is plug-and-play and can stream high-bandwidth peripherals, sensors, and network interfaces via built-in USB 3.0 and PCIe gen2 x4/x1 ports.  TK1 is geared for sensor processing and offers additional hardware-accelerated functionality asynchronous to CUDA, like H.264 encoding and decoding engines and dual MIPI CSI-2 camera interfaces and image service processors (ISP).  There are many exciting embedded applications for TK1 which leverage its natural ability as a media processor and low-power platform for quickly integrating devices and sensors.

As GPU acceleration is particularly well-suited for data-parallel tasks like imaging, signal processing, autonomy and machine learning, Tegra K1 extends these capabilities into the sub-10W domain.  Code portability is now maintained from NVIDIA’s high-end Tesla HPC accelerators and the GeForce and Quadro discrete GPUs, all the way down through the low-power TK1.   A full build of the CUDA 6 toolkit is available for TK1, including samples, math libraries such as cuFFT, cuBLAS, and NPP, and NVIDIA’s NVCC compiler.  Developers can compile CUDA code natively on TK1 or cross-compile from a Linux development machine.  Availability of the CUDA libraries and development tools ensures seamless and effortless scalability between deploying CUDA applications on discrete GPUs and on Tegra.  There’s also OpenCV4Tegra available as well as NVIDIA’s VisionWorks toolkit.  Additionally the Ubuntu 14.04 repository is rich in pre-built packages for the ARM architecture, minimizing time spent tracking down and building dependencies.  In many instances applications can be simply recompiled for ARM with little modification, as long as source is available and doesn’t explicitly call out x86-specific instructions like SSE, AVX, or x86-ASM. NEON is ARM’s version of SIMD extensions for Cortex-A series CPUs.
Continue reading