Skip to Main Content

Case Study

Ansys and the University of Virginia: Rapid Data Generation for Machine Learning

“PyMAPDL is a powerful and versatile tool that serves as a bridge between engineering simulation and modern data science. It allowed us to generate random samples with various areas of damage, check for yielding, and export results within the same software environment. Its Python foundation facilitates dataset creation and preparation specifically tailored for machine learning workflows, combining advanced engineering simulation with the extensive capabilities of Python to solve the complex research problems of today."

Aya Yehia
Postdoctoral Research Associate, University of Virginia


Rapid Data Generation for Machine Learning with PyAnsys

Machine learning (ML) requires an extensive amount of data for training and validation, but data from experiments is often not enough to train ML models. To detect unseen structural damage, engineers turn to simulation to train graph neural networks (GNNs) to learn patterns from finite element (FE) models then feeding surface level strain and displacement information into the trained model. This results in rapid prediction instead of hours of computation.

Challenges

Structural health monitoring relies on non-destructive evaluation tools to detect damage in structures. Non-contact methods like digital image correlation can infer damage from behavioral changes, but subsurface damage remains invisible. ML shows promise for detecting damage localization yet requires thousands of training instances that are impractical to obtain experimentally. Simulations generate the required data. by automating  randomized subsurface damage scenarios, producing surface strain and displacement data that integrates seamlessly with Python for GNN) training.

Technical Decisions

The research required simulations with randomly generated subsurface voids (i.e., voids that do not extend to the surface) within a specified region of interest. This was efficiently handled in Python by defining allowable spatial boundaries and generating randomized coordinates within those constraints. This approach dramatically reduced time and effort compared to manually setting up each variation in a traditional simulation workflow.

Engineering Solutions 

  • PyAnsys software was utilized to generate comprehensive training and testing datasets required for effectively training the GNN to detect subsurface damage in steel.
  • This powerful tool enabled automated, large-scale data generation by systematically simulating partial subsurface voids, enabling the creation of thousands of randomized damage scenarios that would be impractical to produce through physical experimentation alone.
  • PyAnsys software’s integration with Python enabled robust exception handling for simulation failures and seamless compatibility with libraries such as Pandas and NumPy, making it easy to export results in CSV format for downstream ML workflows. 
  • In addition, the native integration of Python with Pandas streamlined post-processing and data management. Node displacement results were automatically captured, labeled with the correct column headers, and exported in a single line of code. In a manual workflow, this would require multiple steps — saving files, editing headers, and reformatting data —  introducing both inefficiency and potential for error.
PyAnsys research overview
PyAnsys shear strain front view
PyAnsys shear strain back view

Shear strain (XY) visual of front view (left) and back view (right) of randomized dogbone sample with unseen damage

Benefits

Simulation enabled the team to scale simulations faster, ensure consistency across datasets, and accelerate analysis for ML applications. It offered flexibility beyond physical experiments, enabling complex and intersecting void geometries, easy adjustment of void size and depth, and the ability to discard failed cases without the time and cost associated with manufacturing and testing physical samples. Advanced simulation generated thousands of representative dogbone samples within days, which would have required months of laboratory work. Samples were efficiently randomized to simulate subsurface damage invisible during physical inspection. The simulation provided surface-level strain and displacement data equivalent to full-field methods such as digital image correlation. By serving as a surrogate for physical tensile experiments, engineering simulation saved thousands of dollars in materials and testing costs while reducing the research timeline by several months.