October 25, 2023
Design failure mode and effect analysis (DFMEA) is a process that helps engineers understand the impact of potential risks associated with a design. Introducing FMEA in the design phase is a best practice that helps answer questions like:
To understand what DFMEA is, we must start with a clear understanding of failure mode and effects analysis (FMEA). FMEA is a systematic approach to recognize and evaluate potential failures of systems, products, or processes. FMEA identifies the effects and outcomes of failures or actions and helps product developers eliminate or mitigate the impact of failures.
Every product has modes of failure at different levels of integration, from component to system and everything in between. Each failure mode leads to potential impacts on the efficacy, reliability, and safety of the product and presents challenges of detection, mitigation, and prevention. FMEA is a tool that helps address those challenges by:
First developed in the late 1940s by the U.S. military, FMEA techniques were adopted by NASA in the 1960s and later by the automotive industry in the 1970s. Throughout the 1980s, efforts led by the automotive industry helped consolidate and standardize FMEA best practices to optimize the approach as a quality improvement and risk assessment tool at various stages of product development. Today, major industries such as energy and healthcare, in which high reliability and product safety are critical considerations, have adopted the practice of FMEA to meet industry standards (e.g., SAE J1739-FMEA) for supplier and product qualifications.
FMEA is broadly divided into design and process approaches, depending on whether it is applied toward a design of a system/product or a process/workflow. This article focuses specifically on FMEA of a design, which is known as DFMEA.
While DFMEA is valuable in any design process, it is especially critical in industries in which the pace of new product introduction (NPI) and new technology integration is high. New products and technologies inherently have little to no failure history. While assessing similarity to previous products or technologies can be useful, a disciplined approach to identifying likely failure modes and mechanisms based on reliability physics principles and the DFMEA process is critical to risk mitigation. Failure to make DFMEA a critical design-stage tool can result in costly failures showing up in production, qualification testing, or even in the field.
DFMEA helps product teams understand the potential failure modes of designs early in product development so that they can be designed out. It enables the impacts of those failures to be mitigated through elements of the design, methods of detection, or the overall operational and logistics support concept for the product. Some of the industries that have embraced the DFMEA concept include:
DFMEAs can be used throughout the product life cycle, from prototype design until the production phase. The primary objective being detection of potential failures impacting reliability or safety before going to production. Costs of unreliability of a product can be significant, and it increases exponentially the later it gets detected in the product life cycle, as estimated in Figure 1.
Although the DFMEA process requires certain key resources and time commitments, it is straightforward compared to many other reliability assessment methods that require complex statistical analysis and interpretation. To get maximum benefit from the DFMEA process:
Although the detailed steps of a DFMEA may vary slightly from standard to standard, the core processes of any DFMEA are scoping, failure mode and effects definition, risk assessment, and risk mitigation (Figure 2).
The scope of a DFMEA addresses the level of detail to be considered. For example, a DFMEA scoped at the component level considers failure modes and risk mitigation strategies for each component in the design. A component-level DFMEA might consider failure modes like short, open, loss of capacitance, or high leakage current for a capacitor.
While component-level DFMEAs can be useful, extending the results of component-level DFMEA to system-level impacts and risks is more difficult. As an alternative, a DFMEA can be scoped to the subsystem or even the functional block diagram level. Doing so allows the analysis to begin at a much earlier stage of the design and establishes a baseline for DFMEAs with more refined scopes as the design progresses.
With the scope confirmed, the initial work of the DFMEA team focuses on breaking down the system (consistent with the scope definition), identifying the potential failure modes of each part of the system, and identifying the effect of each mode on the function of the product as perceived by the user. For example, if a user turns a power switch to the “ON” position and the corresponding indicator light does not turn on, the user might state the failure mode as, “Indicator light does not turn on.” The effects of this failure could be stated as, “Inaccurate indication of power-on state” or “High voltage hazard to user due to inaccurate indication of power state.” Each failure mode can have multiple effects and should be considered carefully from all perspectives on the DFMEA team.
Ultimately, the team will quantify the severity of the failure based on the most severe effect by using a numerical severity rating or index (usually between 1 and 10), reflecting the impact on product performance perceived by the user. The DFMEA team should agree on the rating scale to be used prior to initiating the failure definition process. Figure 3 shows an example rating scale given in the SAE J1739 standard.
The goal of risk assessment is to quantify the overall risk of a failure in terms of severity, probability of occurrence, and the ability to be detected. As you might suspect, severe failures that are likely to occur and hard to detect present the highest risk. Minimally severe failures that are not likely to occur and are easy to detect receive the lowest risk ratings.
The risk rating is a weighted factor called the risk priority number (RPN), and it is used to rank failure risk from highest to lowest. As with definition of severity, defining the probability of occurrence and likelihood of detection requires input from members of the DFMEA team with a broad range of experience across the product life cycle. With the help of the facilitator, a DFMEA team should establish common probability of occurrence and detection definitions and a rating scale before getting into the definitions of failure.
The final phase in the DFMEA process is concerned with risk mitigation and prevention through execution of a control plan. The control plan details the areas of ownership and accountability, as well as a timeline of completion for each individual prevention or mitigation task. With changes in product design, characteristics, and specifications, actions taken are updated to determine the new risk rating with implemented changes. This process serves as a communication tool for both upstream and downstream supply chain groups to ensure the potential risk factors are not only identified, but also eliminated or reduced.
A major telecommunications products company for defense and military requested that Ansys Reliability Engineering Services (RES) team facilitate a DFMEA analysis on a printed circuit board assembly (PCBA) for a next-generation GPS product. The RES team scoped and facilitated the analysis at the block level, taking into consideration all components that make up the circuit of each block.
A team representing design, production, supplier quality, and supply chain management performed the analysis. Based on the established ranking and threshold criteria, key risk factors were identified. Based on experience with similar systems, the RES team also contributed to opportunities for improvement in the design, including PCB manufacturing guidelines and best practices to ensure high reliability, selection of the right quality grade for components, and protection strategies to mitigate electrostatic discharge and electrical overstress (ESD/EOS) failures.
The customer followed through on the control plan, which resulted in significant cost savings and prevention of field failures by implementing second-level interconnect improvements at board level, as well as adopting external protection techniques to mitigate ESD and EOS failures in the field.
If you have an impending product launch and are looking to assess reliability of the product and are unsure of the potential risks prior to manufacturing, please submit your request here or contact the Ansys Reliability Engineering Services Sales team at 301-640-5831 to schedule a session to discuss opportunities.
We’re here to answer your questions and look forward to speaking with you. A member of our Ansys sales team will contact you shortly.