Accelerating Mechanical Solutions Using the Latest Intel Technologies

By Wim Slagter, Lead Product Manager, and Jeff Beisheim, Lead Software Developer, ANSYS

Advances in ANSYS 16.0 and Xeon technology address the high-performance computing needs of Windows users.

Save PDF
Accelerating Mechanical Solutions Using the Latest Intel Technologies

Engineers always try to minimize simulation times so that they can increase the complexity of their models or simply run more simulations in a given period of time.

ANSYS 16.0 benchmark
The benchmark suite was run using ANSYS 16.0 on two very similar systems: one containing two Intel Xeon E5-2670 (Sandy Bridge, 2.6 GHz, 16 total cores) processors, and one with two Intel Xeon E5-2697 v3 (Haswell, 2.6 GHz, 28 total cores) processors. The geometric mean of the total elapsed times for each benchmark run with 1, 2, 4, 8 and 16 cores was used to generate the times shown in the table. The Haswell system is on average 20 percent faster than E5 v2 for the iterative solver benchmarks and 40 percent faster than E5 v2 for the direct solver benchmarks.

Organizations using ANSYS structural mechanics simulation software expect the accuracy, efficiency and throughput required to generate reliable designs as quickly as possible. ANSYS has worked with Intel to make sure that these companies can leverage the latest Intel® Xeon® E5 v3 processors and Xeon Phi™ coprocessors for their simulation workloads.

Structural mechanics simulations often require a large amount of computing resources including memory, disk space and I/O. This impacts time spent computing. Because CPU clock rates are not increasing as quickly as they were a decade ago, faster CPUs cannot be counted on to maintain the computing pace. The new performance paradigm is parallel computing that leverages the swelling number of CPU cores that continues to grow every couple of years to deliver increased computations at each clock cycle. This has resulted in significant performance gains for structural simulation software. But engineers are always trying to minimize simulation times so that they can increase the complexity of their models (for example, by increased mesh density or nonlinear behavior) or simply run more simulations in a given period of time.

One way to speed up structural mechanics simulations is to make full use of the latest available hardware. The computer industry has delivered enormous increases in computing performance with continued platform advancements, including more compute cores per CPU, integrated I/O processor (yielding higher memory bandwidth), additional and faster memory (channels), larger L3 cache size, faster disk storage (like solid-state drives for ANSYS Mechanical), faster interconnects, and Intel Advanced Vector Extensions 2 (AVX2) support. Intel and ANSYS continue to work together so that ANSYS solutions can take advantage of these hardware advances.

LEVERAGING INTEL XEON E5 V3 PROCESSORS

ANSYS structural mechanics products have supported parallel processing over two decades, allowing engineers to effectively use multi-core processors and/or clusters to speed up their simulations. With the launch of release 16.0, ANSYS continues its sustained investment by adding capabilities to exploit the latest Intel processor technologies.

With Intel’s latest Xeon E5 v3 processors, ANSYS users will see significant reduction in simulation runtimes, mainly due to the additional cores (up to 18), Intel AVX2 support, larger L3 cache (up to 35 MB), and higher memory speed (up to 2,133 MHz). ANSYS Mechanical 16.0 shows improved performance for the E5 v3 generation of processors from Intel, code-named Haswell. The E5 v3 system is on average 20 percent faster than E5 v2 for iterative solver benchmarks (usually good measures of memory bandwidth speed) and on average 40 percent faster than E5 v2 for direct solver benchmarks (usually good measures of raw compute speed).

LEVERAGING INTEL XEON PHI COPROCESSORS

To leverage cutting-edge hardware advancements to deliver faster engineering simulation technology, ANSYS has worked with NVIDIA since the release of ANSYS 13.0 to develop and release parallel solver execution on general-purpose graphics processing units (GPUs). GPUs can now speed up fluids, structural and electromagnetic simulations to increase the value of ANSYS high-performance computing (HPC) capabilities.

Recently, Intel released the Xeon Phi series of coprocessors that are similar in design to high-end GPUs. They are full-height cards that plug into a PCI Express slot and require at least 200 watts of additional power. However, the coprocessors are not meant for graphics and have no connections for graphical display output (for example, HDMI or a display port). Each Xeon Phi coprocessor contains roughly 60 cores that can perform computations at just over 1 teraflop and has 8 GB to 16 GB of GDDR5 memory to provide significant amounts of memory bandwidth. This new hardware accelerator can potentially speed up structural mechanics simulations.

IMPLEMENTATION

Simulation speedup factors
Overall simulation speedup factors using Intel Xeon Phi coprocessors with ANSYS Mechanical 16.0

Benchmark setup

Before starting the implementation to support Xeon Phi coprocessors in structural mechanics products, ANSYS required that:

  • The user experience would be straightforward and simple
  • Xeon Phi hardware must never slow down the simulation and, when applicable, should accelerate it.
  • Xeon Phi would not compromise the accuracy of the solution.

To utilize the Xeon Phi coprocessor to speed up ANSYS structural mechanics simulations, the software uses the GPU accelerator capability. Although Xeon Phi allows for other execution models, the GPU accelerator was a natural fit to introduce this coprocessor. Because the sparse direct solver is the default solver and is commonly used for all types of analyses, this linear equation solver was the best place to start.

ANSYS Mechanical 15.0 supported Xeon Phi coprocessors with shared-memory parallelism on Linux® platforms only. However, distributed memory parallelism typically provides more significant speedup than shared memory parallelism, and ANSYS structural mechanics software is often run on the Windows® platform. ANSYS Mechanical 16.0 supports shared memory and distributed memory parallelism for both the Linux and Windows platforms. Virtually all ANSYS users — including those who have access to clusters in which each compute node contains one or more coprocessors — can accelerate structural mechanics simulations using Xeon Phi coprocessors.

ANSYS 16.0 — Efficiency and Robustness

With the launch of release 16.0, ANSYS continues its sustained investment to improve efficiency and robustness for structural mechanics simulations.

Key improvements in solver numerics allow faster and more robust simulations.

  • Numerous enhancements improve the convergence of nonlinear analyses.
  • Sparse solver improvements allow more jobs to run in-core, leading to better solver performance.

Numerous improvements were made in the area of distributed memory parallel computing.

  • Domain decomposition has been further improved, leading to faster performance and better scaling, particularly at higher core counts.
  • Newly added capabilities include support for inertia relief, QRDAMP eigenvalue extraction method (in modal analysis) and mode-superposition method (in harmonic and transient analysis).

With Intel’s latest Xeon E5 v3 processors, ANSYS users will see significant reduction in simulation runtimes.

USING XEON PHI ACCELERATION

To enable the use of Xeon Phi hardware within ANSYS Mechanical, activate the GPU accelerator capability upon launching the software by adding the – acc intel option to the list of command line arguments. You can also select how many Xeon Phi coprocessors to use with –na N, where N is an integer number greater than 0. (The software defaults to 1 for a single coprocessor.)

ANSYS Workbench users can easily enable this feature during solution by modifying the GPU acceleration option on the Advanced Properties page of the Solve Process Settings. Select INTEL in the associated drop-down box and then choose the number of Xeon Phi coprocessors to enable during the simulation. Activating this capability requires one additional HPC license for each coprocessor.

Once activated, this capability will accelerate the solution, when possible, by automatically using the Xeon Phi hardware. No user input is required. In cases in which acceleration is not possible, the CPU core(s) will continue to be used, and the Xeon Phi feature will have no effect on the progress of the solution.


PERFORMANCE

ANSYS conducted a series of standard benchmarks for ANSYS Mechanical to obtain performance data. The benchmarking used a workstation running Windows 7 x 64 SP1 with 128 GB of RAM and two Intel E5-2670 (2.6 GHz) processors with a total of 16 CPU cores. Two Xeon Phi 7120A coprocessors were utilized in the workstation.

The results showed that using a Xeon Phi always provides some level of acceleration. However, the amount of acceleration achieved varies greatly from benchmark to benchmark, and it also depends on the number of CPU cores involved. With two CPU cores and a single Xeon Phi coprocessor, an average speedup of 2.1 times is achieved for the entire simulation, compared to using only two CPU cores. With 16 CPU cores, the addition of the two Xeon Phis provides on average 1.4 times speedup for the overall simulation. Because the performance varies for each benchmark, some guidelines are required to understand which structural mechanics models are expected to achieve the most acceleration when using a Xeon Phi coprocessor.

USAGE GUIDELINES

The amount of acceleration gained from using the Xeon Phi coprocessor varied greatly with the hardware used and the model simulated. These guidelines can help to determine whether the coprocessor will provide a performance boost.

Using newer, faster CPU hardware typically decreased the amount of speedup achieved when using a Xeon Phi card. Using more CPU cores per Xeon Phi coprocessor will also decrease the amount of speedup achieved. If one or more coprocessors is requested, all available coprocessors are used. However, for performance reasons, the number of processes per Xeon Phi coprocessor is limited to a maximum of eight.

Certain classes of simulation are expected to achieve more acceleration when using a Xeon Phi. For ANSYS Mechanical simulations, more acceleration is achieved when:

  • The sparse solver is running in the in-core memory mode.
  • The assembled matrix size is greater than 2 million equations
  • Models are three-dimensional, have bulkier or thicker geometry, contain higher-order element types or include certain types of boundary conditions (for example, constraint equations).

INCREASING VALUE THROUGH ONGOING COLLABORATION

As the computing power provided by hardware vendors increases, ANSYS will continue to harness the full potential of this new technology. As vendors provide more parallel hardware, ANSYS developers continue to parallelize more algorithms in the software. For structural mechanics simulations, these efforts are critical to ensure that companies can meet competitive demands to deliver innovative and robust products to market by performing increasingly complex simulations in a short time.

Intel and ANSYS will continue to work together to deliver optimized and tested solutions that deliver value. For these new types of hardware accelerators, like Xeon Phi coprocessors, the main limitation is the amount of computations that can be offloaded onto the accelerator device. Future Xeon Phi products aim to offer the ability to accelerate more computations as well as to remove the limitation of transferring data to the device (via the PCI Express channel).

click below to start a conversation with ANSYS

Contact Us
Contact Us
Contact