This post is a GPU program chrestomathy. What’s a Chrestomathy, you ask?

In computer programming, a

program chrestomathyis a collection of similar programs written in various programming languages, for the purpose of demonstrating differences in syntax, semantics and idioms for each language.[Wikipedia]

There are several good examples of program chrestomathies on the web, including Rosetta Code andNBabel, which demonstrates gravitational N-body simulation in multiple programming languages. In this post I demonstrate six ways to implement a simple SAXPY computation on the CUDA platform. Why is this interesting? Because it demonstrates the breadth of options you have today for programming NVIDIA GPUs, and it covers the three main approaches to GPU computing: GPU-accelerated libraries, GPU compiler directives, and GPU programming languages.

SAXPY stands for “Single-Precision A·X Plus Y”. It is a function in the standard Basic Linear Algebra Subroutines (BLAS)library. SAXPY is a combination of scalar multiplication and vector addition, and it’s very simple: it takes as input two vectors of 32-bit floats `X` and `Y` with `N` elements each, and a scalar value `A`. It multiplies each element `X[i]` by `A` and adds the result to `Y[i]`. A simple C implementation looks like this.

void saxpy(int n, float a, float *x, float *y) { for (int i = 0; i < n; ++i) y[i] = a*x[i] + y[i]; } // Perform SAXPY on 1M elements saxpy(1<<20, 2.0, x, y);