High-Performance Geometric Multi-Grid with GPU Acceleration

Linear solvers are probably the most common tool in scientific computing applications. There are two basic classes of methods that can be used to solve an Ax=b equation: direct and iterative. Direct methods are usually robust, but have additional computational complexity and memory capacity requirements. Unlike direct solvers, iterative solvers require minimal memory overhead and feature better computational complexity. However, these solvers are still super-linear in the number of variables and often have a slow rate of convergence of low-frequency errors. Finally, there are multi-grid iterative methods that can deliver linear complexity by solving a problem at different resolutions and smoothing low frequency errors using coarser grids.

Figure 1: The V-cycle is the core of any multi-grid solver. In each V-cycle, the grid is smoothed and a residual is computed and propagated to the coarser grid. At the coarsest level, a direct solver is applied, and the solution is then iteratively interpolated to finer grids.

Broadly speaking, multi-grid methods can be differentiated into more general algebraic multi-grid (AMG) and the specialized geometric multi-grid (GMG). AMG is a perfect “black-box” solver for problems with unstructured meshes, where elements or volumes can have different numbers of neighbors, and it is difficult to identify a subproblem. There is an interesting blog post demonstrating that GPU accelerators show good performance in AMG using the NVIDIA AmgX library. GMG methods are more efficient than AMG on structured problems, since they can take advantage of the additional information from the geometric representation of the problem. GMG solvers have significantly lower memory requirements, deliver higher computational throughput and also show good scalability. Moreover, these methods require less tuning in general and have a simpler setup than AMG. Let’s take a closer look at GMG and see how well it maps to GPU accelerators. Continue reading


AmgX: Multi-Grid Accelerated Linear Solvers for Industrial Applications

Many industries use Computational Fluid Dynamics (CFD) to predict fluid flow forces on products during the design phase, using only numerical methods. A famous example is Boeing’s  777 airliner, which was designed and built without the construction (or destruction) of a single model in a wind tunnel, an industry first. This approach dramatically reduces the cost of designing new products for which aerodynamics is a large part of the value add. Another good example is Formula 1 racing, where a fraction of a percentage point reduction in drag forces on the car body can make the difference between a winning or a losing season.

FluentUsers of CFD models crave higher accuracy and faster run times. The key enabling algorithm for realistic models in CFD is Algebraic Multi-Grid (AMG). This algorithm allows solution times to scale linearly with the number of unknowns in the model; it can be applied to arbitrary geometries with highly refined and unstructured numerical meshes; and it can be run efficiently in parallel. Unfortunately, AMG is also very complex and requires specialty programming and mathematical skills, which are in short supply. Add in the need for GPU programming skills, and GPU-accelerated AMG seems a high mountain to climb. Existing GPU-accelerated AMG implementations (most notably the one in CUSP) are more proofs of concept than industrial strength solvers for real world CFD applications, and highly tuned multi-threaded and/or distributed CPU implementations can outperform them in many cases. Industrial CFD users had few options for GPU acceleration, so NVIDIA decided to do something about it.

NVIDIA partnered with ANSYS, provider of the leading CFD software Fluent to develop a high-performance, robust and scalable GPU-accelerated AMG library. We call the library AmgX (for AMG Accelerated). Fluent 15.0 uses AmgX as its default linear solver, and it takes advantage of a CUDA-enabled GPU when it detects one.  AmgX can even use MPI to connect clusters of servers to solve very large problems that require dozens of GPUs.  The aerodynamics problem in Figure 1 required 48 NVIDIA K40X GPUs, and involved 111million cells and over 440 million unknowns. Continue reading

Accelerated Solution of Sparse Linear Systems

DAGFresh from the NVIDIA Numeric Libraries Team, a white paper illustrating the use of the CUSPARSE and CUBLAS libraries to achieve a 2x speedup of incomplete-LU- and Cholesky-preconditioned iterative methods. The paper focuses on the Bi-Conjugate Gradient and stabilized Conjugate Gradient iterative methods that can be used to solve large sparse non-symmetric and symmetric positive definite linear systems, respectively. The paper also comments on the parallel sparse triangular solver, which is an essential building block in these algorithms.

Read the technical review on our NVIDIA Research Site

Download the white paper or the webinar recording discussing this white paper: www.nvidia.com/webinars.