Today we’re excited to announce the release of the CUDA Toolkit version 6.5. CUDA 6.5 adds a number of features and improvements to the CUDA platform, including support for CUDA Fortran in developer tools, user-defined callback functions in cuFFT, new occupancy calculator APIs, and more.
CUDA on ARM64
Last year we introduced CUDA on ARM, and in March we released the Jetson TK1 developer board, which enables development of CUDA on the NVIDIA Tegra K1 system-on-a-chip which includes a quad-core 32-bit ARM CPU and an NVIDIA Kepler GPU. There is a lot of excitement about developing mobile and embedded parallel computing applications on Jetson TK1. And this week at the Hot Chips conference, we provided more details about our upcoming 64-bit Denver ARM CPU architecture. CUDA 6.5 takes the next step, enabling CUDA on 64-bit ARM platforms. The heritage of ARM64 is in low-power, scale-out data centers and microservers, while GPUs are built for ultra-fast compute performance. When we combine the two, we have a compelling solution for HPC. ARM64 provides power efficiency, system configurability, and a large, open ecosystem. GPUs bring to the table high-throughput, power-efficient compute performance, a large HPC ecosystem, and hundreds of CUDA-accelerated applications. For HPC applications, ARM64 CPUs can offload the heavy lifting of computational tasks to GPUs. CUDA and GPUs make ARM64 competitive in HPC from day one. Development platforms available now for CUDA on ARM64 include the Cirrascale RM1905D HPC Development Platform and the E4 ARKA EK003. Eurotech has announced a system available later this year. These platforms are built on Applied Micro X-Gene 8-core 2.4GHz ARM64 CPUs, Tesla K20 GPU Accelerators, and CUDA 6.5. As Figure 1 shows, performance of CUDA-accelerated applications on ARM64+GPU systems is competitive with x86+GPU systems.