Joss Knight is a Senior Developer in the MathWorks UK office, working on accelerating MATLAB functionality on GPU hardware. His background is in robot navigation and visual geometry, which he studied at Oxford University’s Robotics Research Group. He also spent eight years working in physics-based video game character animation, and had a brief foray with space exploration including the Beagle 2 Mars probe.

# High-Performance MATLAB with GPU Acceleration

In this post, I will discuss techniques you can use to maximize the performance of your GPU-accelerated MATLAB® code. First I explain how to write MATLAB code which is inherently parallelizable. This technique, known as vectorization, benefits all your code whether or not it uses the GPU. Then I present a family of function wrappers—bsxfunpagefun, and arrayfun—that take advantage of GPU hardware, yet require no specialist parallel programming skills. The most advanced function, arrayfun, allows you to write your own custom kernels in the MATLAB language.

If these techniques do not provide the performance or flexibility you were after, you can still write custom CUDA code in C or C++ that you can run from MATLAB, as discussed in our earlier Parallel Forall posts on MATLAB CUDA Kernels and MEX functions.

All of the features described here are available out of the box with MATLAB and Parallel Computing Toolbox™.

## Mobile phone signal strength example

Throughout this post, I will use an example to help illustrate the techniques. A cellular phone network wants to map its coverage to help plan for new antenna installations. We imagine an idealized scenario with M = 25 cellphone masts, each H = 20 meters in height, evenly spaced on an undulating 10km x 10km terrain. Figure 1 shows what the map looks like.

On the GPU, in the following listing we define a number of variables including:

• map: An N x 3 height field in a 10km x 10km grid (N = 10,000);
• masts: An M x 3 array of antenna positions, at height H;
• AntennaDirection: A 3 x M array of vectors representing the orientation of each antenna.

# Calling CUDA-accelerated Libraries from MATLAB: A Computer Vision Example

In an earlier post we showed how MATLAB® can support CUDA kernel prototyping and development by providing an environment for quick evaluation and visualization using the `CUDAKernel` object. In this post I will show you how to integrate an existing library of both host and device code implemented in C++ or another CUDA-accelerated language using MEX. With MEX you can extend and customize MATLAB, or use MATLAB as a test environment for your production code.

The MATLAB MEX compiler allows you to expose your libraries to the MATLAB environment as functions. You write your entry point in C, C++ or Fortran as a modified `main()` function which MATLAB invokes. MEX provides a framework for compiling this code, as well as an API for interacting with MATLAB and MATLAB data in your source code.

MATLAB’s Parallel Computing Toolbox™ provides constructs for compiling CUDA C and C++ with `nvcc`, and new APIs for accessing and using the `gpuArray` datatype which represents data stored on the GPU as a numeric array in the MATLAB workspace.

## Feature Detection Example

I am going to use a feature detection example from MATLAB’s documentation for Computer Vision System Toolbox™. This uses tracked features to remove camera shake from an in-car road video. You will need MATLAB®, Parallel Computing Toolbox™, Image Processing Toolbox™ and Computer Vision System Toolbox™ to run the code. You can request a trial of these products at www.mathworks.com/trial. This example also depends on the OpenCV Computer Vision library, compiled with CUDA support.

Features are an essential prerequisite for many Computer Vision tasks; in this case, for instance, they might also be used to determine the motion of the car or to track other cars on the road.

To set up the example environment, I am using the following MATLAB code to load the video and display the first two frames superimposed (Figure 1). Continue reading