Accelerating ANSYS Fluent Simulations with NVIDIA GPUs

By Vijay Sellappan, Applied Engineer, and Bhushan Desam, Senior Alliances and Marketing Manager, NVIDIA Corporation, Santa Clara, U.S.A.

ANSYS Fluent supports GPUs so engineers can meet project schedules and get robust products to market faster.

Save PDF Subscribe
External aerodynamic flow simulation

By adding GPUs to existing clusters and workstations, engineers can reduce time to solution by up to half.

ANSYS Fluent software supports solver computation on NVIDIA® graphics processing units (GPUs) to help engineers reduce the time required to explore many design variables to optimize product performance and meet design deadlines. Integration of AmgX, a library of GPU-accelerated solvers developed by NVIDIA, within Fluent makes this possible. By adding GPUs to existing clusters and workstations, engineers can reduce time to solution by up to half. In addition to speeding up simulation, GPUs consume less energy when compared with a CPU-only solution.

Activating the GPU feature is straightforward, but will all Fluent simulations benefit from employing GPUs? Read on.


GPUs are supported in the most recent release of all ANSYS HPC license products including ANSYS HPC, ANSYS HPC Pack and ANSYS HPC Workgroup. Further, each GPU is treated as a CPU core in terms of licensing, so users can gain higher productivity through GPU simulations.


The algebraic multigrid (AMG) solver for Fluent simulations can be computationally intense, and computing requirements grow as the number of cells in the domain increase. Problems that contain less than a few million cells do not gain speed from GPUs because of communication overheads incurred in transferring matrices from or to CPUs. However, speedup is significant for meshes that contain tens and hundreds of millions of cells because the overhead is relatively small compared to the computing time in the AMG solver.

Would a coupled solver or a segregated solver benefit most from GPUs? In flow-only problems, typically the coupled solver spends about 60 percent to 70 percent of its time solving the linear system using AMG, making GPUs a good choice. Since the segregated solver spends only 30 percent to 40 percent of its time in AMG, GPUs may not be advantageous because of memory transfer overhead costs.

You can determine the AMG portion in a Fluent calculation (and therefore whether it is a good candidate for GPU employment) by adding the following command to the journal file for a CPU run:


The information is reported near the end of the output file after successful completion of calculations. In the sample shown, the AMG portion is nearly 75 percent, so it is a good candidate for GPU implementation.

LE wall-clock time per iteration: 12.299 sec (74.8%)

In addition, stiff matrices are difficult to solve and, thus, require more iterations in the AMG solver, making them ideal for GPUs. An application that incorporates all these factors is an external aerodynamic calculation over automobiles and airplanes that can significantly benefit from using GPUs with ANSYS Fluent.

CPU vs CPU+GPU cost/benefit
A Fluent truck benchmark model consisting of 14 million cells was used but reconfigured as a steady-state pressure-based coupled solver problem. When running on 64 Intel® Xeon® E5-2680 CPU cores on a four-node cluster, the number of jobs completed to full convergence was about 16 per day. The number of jobs increased to 25 per day when eight NVIDIA Tesla® K40 GPUs were added to the system.
CPU vs CPU+GPU cost/benefit
To examine the GPU performance and value for a large-scale CFD simulation, a generic Formula 1 car model with 140 million cells was run in a steady-state mode with the pressure-based coupled solver. Performance was evaluated based on the time taken per iteration over a period of 1,000 iterations. Adding GPUs decreased the time to solution by a factor of 2.1, while delivering 110 percent additional productivity at 55 percent additional system cost.


A critical performance metric to consider when evaluating GPUs is job throughput per day or speedup factor based on wall-clock time.

Fluent Speedup Factor

These metrics depend on the AMG portion of the total solution and associated speedup of that portion on GPUs.

GPU performance on external aerodynamics problems along with its value proposition is explored in the graphics.

To accurately account for the value proposition of GPUs, you must consider the system cost of both hardware and software, as well as the overall productivity improvements. A CPU-only system (including memory, high-speed interconnect and the associated license cost of 100 percent) delivers 16 jobs per day in the truck benchmark, which is considered 100 percent benefit. Adding eight GPUs increases total system cost by 25 percent while the GPUs deliver 56 percent additional throughput per day. This demonstrates the value of GPUs in Fluent for aerodynamic calculations.

GPU acceleration of single-phase coupled flow problems is not just limited to aerodynamics simulations; it also includes internal flows. However, the Fluent 15.0 GPU capability is not yet offered for modeling other physical phenomena such as detailed chemical kinetics, radiation modeling with discrete ordinates and multiphase flows. Some of these features will be available in future versions along with performance improvements for the AmgX library through ongoing collaboration between ANSYS and NVIDIA.

Power draw - CPU vs CPU+GPU
The instantaneous power drawn by a 24-core CPU-only system was compared with a CPU plus GPU system to do the same job. The CPU system drew 471 watts on average over a period of 2,651 seconds, which totals 350 watt-hours. Though the CPU plus GPU system drew an average of 600 watts, the job was completed in 1,302 seconds due to acceleration; therefore it consumed only 217 watt-hours. Compared with the CPU-only system, the GPU system resulted in 38 percent energy savings, which can be valuable in assisting organizations to achieve energy-efficiency objectives.

GPU-accelerated simulations can reduce product development times, resulting in a competitive advantage for these businesses.


Big enterprises running simulations in large CAE clusters want to drive down energy consumption to reduce costs and/or meet broader corporate sustainability initiatives. At the same time, researchers and engineers demand high levels of computing power to model complex simulations and explore large design spaces. GPUs can fill this gap as they are optimized for higher throughput and performance per watt. In fact, large installations of GPUs are typically included in supercomputers to manage energy costs. The same benefits also apply to ANSYS Fluent simulations.


  • Use NVIDIA Tesla GPUs for servers and workstations; use Quadro® GPUs for workstations.
  • Configure Tesla GPUs like Tesla K40 or K80 or a high-end Quadro K6000.
  • Cards with 12 GB to 24 GB of memory per GPU and high double-precision capacity are recommended.
  • GeForce® GPUs, gaming class cards, are not recommended.


GPU acceleration of ANSYS Fluent — the result of innovative GPU-based AMG solver developed by NVIDIA in collaboration with ANSYS — supports multiple GPUs to benefit those who are performing coupled flow problems that demand high solver computing power. GPUs demonstrated considerable acceleration in external aerodynamic benchmarks. This capability allows engineers to complete more simulations in the same time, and simulate larger and more complex models without project schedules slipping. Furthermore, GPUs consume less energy to do the same job in ANSYS Fluent when compared with a CPU-only system, which saves energy for large enterprises. GPU-accelerated simulations can reduce product development times — so companies can deliver higher-quality products and decrease time to market — resulting in a competitive advantage for these businesses.

Start a conversation with Ansys

Contact Us