AmgX: Multi-Grid Accelerated Linear Solvers for Industrial Applications

Many industries use Computational Fluid Dynamics (CFD) to predict fluid flow forces on products during the design phase, using only numerical methods. A famous example is Boeing’s  777 airliner, which was designed and built without the construction (or destruction) of a single model in a wind tunnel, an industry first. This approach dramatically reduces the cost of designing new products for which aerodynamics is a large part of the value add. Another good example is Formula 1 racing, where a fraction of a percentage point reduction in drag forces on the car body can make the difference between a winning or a losing season.

FluentUsers of CFD models crave higher accuracy and faster run times. The key enabling algorithm for realistic models in CFD is Algebraic Multi-Grid (AMG). This algorithm allows solution times to scale linearly with the number of unknowns in the model; it can be applied to arbitrary geometries with highly refined and unstructured numerical meshes; and it can be run efficiently in parallel. Unfortunately, AMG is also very complex and requires specialty programming and mathematical skills, which are in short supply. Add in the need for GPU programming skills, and GPU-accelerated AMG seems a high mountain to climb. Existing GPU-accelerated AMG implementations (most notably the one in CUSP) are more proofs of concept than industrial strength solvers for real world CFD applications, and highly tuned multi-threaded and/or distributed CPU implementations can outperform them in many cases. Industrial CFD users had few options for GPU acceleration, so NVIDIA decided to do something about it.

NVIDIA partnered with ANSYS, provider of the leading CFD software Fluent to develop a high-performance, robust and scalable GPU-accelerated AMG library. We call the library AmgX (for AMG Accelerated). Fluent 15.0 uses AmgX as its default linear solver, and it takes advantage of a CUDA-enabled GPU when it detects one.  AmgX can even use MPI to connect clusters of servers to solve very large problems that require dozens of GPUs.  The aerodynamics problem in Figure 1 required 48 NVIDIA K40X GPUs, and involved 111million cells and over 440 million unknowns. Continue reading