In this post, I will discuss techniques you can use to maximize the performance of your GPU-accelerated MATLAB® code. First I explain how to write MATLAB code which is inherently parallelizable. This technique, known as *vectorization*, benefits all your code whether or not it uses the GPU. Then I present a family of function wrappers—`bsxfun`, `pagefun`, and `arrayfun—`that take advantage of GPU hardware, yet require no specialist parallel programming skills. The most advanced function, `arrayfun`, allows you to write your own custom kernels in the MATLAB language.

If these techniques do not provide the performance or flexibility you were after, you can still write custom CUDA code in C or C++ that you can run from MATLAB, as discussed in our earlier Parallel Forall posts on MATLAB CUDA Kernels and MEX functions.

All of the features described here are available out of the box with MATLAB and Parallel Computing Toolbox™.

## Mobile phone signal strength example

Throughout this post, I will use an example to help illustrate the techniques. A cellular phone network wants to map its coverage to help plan for new antenna installations. We imagine an idealized scenario with *M* = 25 cellphone masts, each *H* = 20 meters in height, evenly spaced on an undulating 10km x 10km terrain. Figure 1 shows what the map looks like.

On the GPU, in the following listing we define a number of variables including:

`map`: An*N*x 3 height field in a 10km x 10km grid (*N*= 10,000);`masts`: An*M*x 3 array of antenna positions, at height*H;*`AntennaDirection`: A 3 x*M*array of vectors representing the orientation of each antenna.