CUDA

Decorative image of bugs crawling over a computer chip.

Mar 27, 2024

Efficient CUDA Debugging: Using NVIDIA Compute Sanitizer with NVIDIA Tools Extension and Creating Custom Tools

NVIDIA Compute Sanitizer is a powerful tool that can save you time and effort while improving the reliability and performance of your CUDA applications....

14 MIN READ

Mar 25, 2024

Building High-Performance Applications in the Era of Accelerated Computing

AI is augmenting high-performance computing (HPC) with novel approaches to data processing, simulation, and modeling. Because of the computational requirements...

6 MIN READ

Mar 14, 2024

Just Released: NVIDIA cuSPARSELt 0.6

NVIDIA cuSPARSELt harnesses Sparse Tensor Cores to accelerate general matrix multiplications. Version 0.6. adds support for the NVIDIA Hopper architecture.

1 MIN READ

Mar 06, 2024

CUDA Toolkit 12.4 Enhances Support for NVIDIA Grace Hopper and Confidential Computing

The latest release of CUDA Toolkit, version 12.4, continues to push accelerated computing performance using the latest NVIDIA GPUs. This post explains the new...

9 MIN READ

Decorative image of colorful protein structures.

Feb 28, 2024

Optimizing OpenFold Training for Drug Discovery

Predicting 3D protein structures from amino acid sequences has been an important long-standing question in bioinformatics. In recent years, deep...

7 MIN READ

Jan 12, 2024

Just Released: cuBLASDx

cuBLASDx allows you to perform BLAS calculations inside your CUDA kernel, improving the performance of your application. Available to download in Preview...

1 MIN READ

Decorative image of light fields in green, purple, and blue.

Jan 05, 2024

Improving CUDA Initialization Times Using cgroups in Certain Scenarios

Many CUDA applications running on multi-GPU platforms usually use a single GPU for their compute needs. In such scenarios, a performance penalty is paid by...

5 MIN READ

Dec 20, 2023

Just Released: cuBLASMp

cuBLASMp is a high-performance, multi-process, GPU-accelerated library for distributed basic dense linear algebra. It is available to download in Preview now.

1 MIN READ

Nov 29, 2023

CUDA-Q 0.5 Delivers New Features for Quantum-Classical Computing

NVIDIA CUDA-Q is a platform for building quantum-classical computing applications. It is an open-source programming model for heterogeneous computing such as...

4 MIN READ

Decorative image of blocks of light against a dark background.

Nov 21, 2023

Unlocking GPU Intrinsics in HLSL

There are some useful intrinsic functions in the NVIDIA GPU instruction set that are not included in standard graphics APIs. Updated from the original 2016 post...

9 MIN READ

Nov 17, 2023

Boosting Custom ROS Graphs Using NVIDIA Isaac Transport for ROS

NVIDIA Isaac Transport for ROS (NITROS) is the implementation of two hardware-acceleration features introduced with ROS 2 Humble-type adaptation and type...

11 MIN READ

An illustration representing HPC applications.

Nov 16, 2023

Unlock the Power of NVIDIA Grace and NVIDIA Hopper Architectures with Foundational HPC Software

High-performance computing (HPC) powers applications in simulation and modeling, healthcare and life sciences, industry and engineering, and more. In the modern...

7 MIN READ

Nov 07, 2023

CUDA-Accelerated Robot Motion Generation in Milliseconds with NVIDIA cuRobo

Real-time autonomous robot navigation powered by a fast motion-generation algorithm can enable applications in several industries such as food and services,...

3 MIN READ

Nov 06, 2023

ICYMI: Leveraging the Power of GPUs with CuPy in Python

See how KDNuggets achieved 500x speedup using CuPy and NVIDIA CUDA on 3D arrays.

1 MIN READ

Nov 01, 2023

CUDA Toolkit 12.3 Delivers New Features for Accelerated Computing

The latest release of CUDA Toolkit continues to push the envelope of accelerated computing performance using the latest NVIDIA GPUs. New features of this...

4 MIN READ

Oct 24, 2023

Efficient CUDA Debugging: Memory Initialization and Thread Synchronization with NVIDIA Compute Sanitizer

NVIDIA Compute Sanitizer is a powerful tool that can save you time and effort while improving the reliability and performance of your CUDA applications. In...

13 MIN READ