Finite Difference Methods in CUDA C++, Part 2

In the last CUDA C++ post we dove in to 3D finite difference computations in CUDA C/C++, demonstrating how to implement the derivative part of the computation. In this post, let’s continue by exploring how we can write efficient kernels for the y and derivatives. As with the previous post, code for the examples in this post is available for download on Github.

Y and Z Derivatives

We can easily modify the derivative code to operate in the other directions. In the derivative each thread block calculates the derivatives in an x, y tile of 64 × sPencils elements. For the derivative we can have a thread block calculate the derivative on a tile of sPencils × 64 elements in x, y, as depicted on the left in the figure below.

Likewise, for the derivative a thread block can calculate the derivative in a x, z tile of sPencils × 64 elements. The kernel below shows the derivative kernel using this approach. Continue reading

Finite Difference Methods in CUDA Fortran, Part 2

In the last CUDA Fortran post we dove in to 3D finite difference computations in CUDA Fortran, demonstrating how to implement the derivative part of the computation. In this post, let’s continue by exploring how we can write efficient kernels for the y and derivatives. As with the previous post, code for the examples in this post is available for download on Github.

Y and Z Derivatives

We can easily modify the derivative code to operate in the other directions. In the derivative each thread block calculates the derivatives in an x, y tile of 64 × sPencils elements. For the derivative we can have a thread block calculate the derivative on a tile of sPencils × 64 elements in x, y, as depicted on the left in the figure below.

Likewise, for the derivative a thread block can calculate the derivative in a x, z tile of sPencils × 64 elements. The kernel below shows the derivative kernel using this approach. Continue reading

Finite Difference Methods in CUDA C/C++, Part 1

In the last CUDA C/C++ post we investigated how we can use shared memory to optimize a matrix transpose, achieving roughly an order of magnitude improvement in effective bandwidth by using shared memory to coalesce global memory access. The topic of today’s post is to show how to use shared memory to enhance data reuse in a finite difference code. In addition to shared memory, we will also discuss constant memory, which is a cached read-only memory optimized for uniform access across threads in a block (or warp).

Problem Statement: 3D Finite Difference

Our example uses a three-dimensional grid of size 643. For simplicity we assume periodic boundary conditions and only consider first-order derivatives, although extending the code to calculate higher-order derivatives with other types of boundary conditions is straightforward.

The finite difference method essentially uses a weighted summation of function values at neighboring points to approximate the derivative at a particular point. For a (2N+1)-point stencil with uniform spacing ∆x in the direction, the following equation gives a central finite difference scheme for the derivative in x. Equations for the other directions are similar. 

Continue reading

Finite Difference Methods in CUDA Fortran, Part 1

In the last CUDA Fortran post we investigated how shared memory can be used to optimize a matrix transpose, achieving roughly an order of magnitude improvement in effective bandwidth by using shared memory to coalesce global memory access. The topic of today’s post is to show how to use shared memory to enhance data reuse in a finite difference code. In addition to shared memory, we will also discuss constant memory, which is a read-only memory that is cached on chip and is optimized for uniform access across threads in a block (or warp).

Problem Statement: 3D Finite Difference

Our example uses a three-dimensional grid of size 643. For simplicity we assume periodic boundary conditions and only consider first-order derivatives, although extending the code to calculate higher-order derivatives with other types of boundary conditions is straightforward.

The finite difference method essentially uses a weighted summation of function values at neighboring points to approximate the derivative at a particular point. For a (2N+1)-point stencil with uniform spacing ∆x in the x-direction, the following equation gives a central finite difference scheme for the derivative in x. Equations for the other directions are similar. 

Continue reading