Calling CUDA-accelerated Libraries from MATLAB: A Computer Vision Example

In an earlier post we showed how MATLAB® can support CUDA kernel prototyping and development by providing an environment for quick evaluation and visualization using the CUDAKernel object. In this post I will show you how to integrate an existing library of both host and device code implemented in C++ or another CUDA-accelerated language using MEX. With MEX you can extend and customize MATLAB, or use MATLAB as a test environment for your production code.

The MATLAB MEX compiler allows you to expose your libraries to the MATLAB environment as functions. You write your entry point in C, C++ or Fortran as a modified main() function which MATLAB invokes. MEX provides a framework for compiling this code, as well as an API for interacting with MATLAB and MATLAB data in your source code.

MATLAB’s Parallel Computing Toolbox™ provides constructs for compiling CUDA C and C++ with nvcc, and new APIs for accessing and using the gpuArray datatype which represents data stored on the GPU as a numeric array in the MATLAB workspace.

Feature Detection Example

Figure 1: Color composite of frames from a video feature tracking example. (Frame A = red, frame B = cyan)
Figure 1: Color composite of frames from a video feature tracking example. (Frame A = red, frame B = cyan)

I am going to use a feature detection example from MATLAB’s documentation for Computer Vision System Toolbox™. This uses tracked features to remove camera shake from an in-car road video. You will need MATLAB®, Parallel Computing Toolbox™, Image Processing Toolbox™ and Computer Vision System Toolbox™ to run the code. You can request a trial of these products at This example also depends on the OpenCV Computer Vision library, compiled with CUDA support.

Features are an essential prerequisite for many Computer Vision tasks; in this case, for instance, they might also be used to determine the motion of the car or to track other cars on the road.

To set up the example environment, I am using the following MATLAB code to load the video and display the first two frames superimposed (Figure 1). Continue reading


Prototyping Algorithms and Testing CUDA Kernels in MATLAB

This guest post by Daniel Armyr and Dan Doherty from MathWorks describes how you can use MATLAB to support your development of CUDA C and C++ kernels. You will need MATLAB, Parallel Computing Toolbox™, and Image Processing Toolbox™ to run the code. You can request a trial of these products at For a more detailed description of this workflow, refer to the MATLAB for CUDA Programmers webinar and associated demo files.

NVIDIA GPUs are becoming increasingly popular for large-scale computations in image processing, financial modeling, signal processing, and other applications—largely due to their highly parallel architecture and high computational throughput. The CUDA programming model lets programmers exploit the full power of this architecture by providing fine-grained control over how computations are divided among parallel threads and executed on the device. The resulting algorithms often run much faster than traditional code written for the CPU.

While algorithms written for the GPU are often much faster, the process of building a framework for developing and testing them can be time-consuming. Many programmers write CUDA kernels integrated into C or Fortran programs for production. For this reason, they often use these languages to iterate on and test their kernels, which requires writing significant amounts of “glue code” for tasks such as transferring data to the GPU, managing GPU memory, initializing and launching CUDA kernels, and visualizing kernel outputs. This glue code is time-consuming to write and may be difficult to change if, for example, you want to run the kernel on different input data or visualize kernel outputs using a different type of plot.

Using an image white balancing example, this article describes how MATLAB® supports CUDA kernel development by providing a language and development environment for quickly evaluating kernels, analyzing and visualizing kernel results, and writing test harnesses to validate kernel results. Continue reading