FMEA Explained: What Is It and How Do You Implement It?

Jonathan Trout, Noria Corporation

FMEA is an analytical tool used in the design phase to help mitigate risk and failures during processes. Below we'll discuss the main types of FMEA and how to implement it into your maintenance processes.

FMEA: What Is It?

Failure mode and effects analysis (FMEA) is a step-by-step process for anticipating things that could go wrong during the design stage by identifying all possible failures in the design, manufacturing and assembly processes. In other words, it's a structured approach to discover ways in which a process can fail (failures) and ways those failures lead to waste, defects or dangerous outcomes (effects). As part of the root cause analysis process, FMEA helps you minimize and limit these failures.

Failure Modes: The ways in which something could fail. Failures refer to any errors or defects in a process or product that affect the customer or overall outcome. These errors can be potential or actual.
Effects Analysis: The process of studying the consequences of the discovered failures.

FMEA is broken down into two fairly broad categories: Design FMEA and Process FMEA. Each of these categories addresses failures in two different scenarios.

Design FMEA

As the name implies, Design FMEA looks at potential risks in the new or changed design of a product or service. Design FMEA assigns severity or danger rankings to design functions, failure modes, and the effects those failures could have on the customer.

Once failure modes and severity rankings are identified, causes of the failures are sought out and identified to help with preventive maintenance schedules. For example, the occurrence ranking in the Design FMEA process (which we will discuss later) helps determine high-probability causes, initiating action to prevent failures from occurring.

So, when should you use Design FMEA?

When there's a new design with new processes or new content
When a current design has been modified, which could also include changes from past failures
When a current design is being used in a new environment (no physical design change)

Design FMEA is great for identifying risks on a program as early as possible and mitigating failures as proactively as possible.

Process FMEA

Process FMEA looks at potential failures impacting product quality, diminished process reliability, customer dissatisfaction, and environmental safety and hazards due to human error, materials and machines used, environmental factors, and more.

Once potential failures are identified, severity rankings are assigned to each one. Process FMEA dissects all the steps in a current process and analyzes them individually to identify risks and possible errors.

Use Process FMEA when:

New technology or processes are being introduced
Your current process has been modified due to updated processes or continuous improvement
Your current process is being used in a new environment or location (no physical change made to the process itself)

Identifying risks of new technology and processes helps proactively prevent failure through preventive maintenance scheduling.

When to Use FMEA

There are several instances or circumstances when performing FMEA is a good idea to ensure risk is mitigated and your operation is running smoothly, safely, and at maximum capacity.

Most people use Design and Process FMEA when:

A process, product or service is new and being designed, or an old product or service is being redesigned
A current process is being used in a new way
You are preparing to develop control plans for a new or modified process
You're about to implement improvement goals to an existing process, product, or service
You are looking into failures of an existing process, product, or service

It's also a good idea to do FMEA periodically throughout the life of the process. Consistently examining quality and reliability ensures processes are improved, giving you optimal results.

FMEA Criteria for Analysis

Now that you know what FMEA is, let's take a look at three criteria used to analyze potential failure modes. FMEA uses three criteria to search for potential problems:

Severity: The severity ranking helps you determine and rank what is most important to your operation. Things like safety standards, environment, legal, production consistency, waste, and even damaged reputation can be considered. When evaluating severity, your team ranks a failure's severity on a scale from 1 to 10, with 1 being "low impact" and 10 being "high impact." Impact encompasses risk to the customer and/or the manufacturing process.
Occurrence: The occurrence ranking shows the probability of a failure happening during the lifetime of your process, product, or service. Your team ranks the occurrence probability on a scale of 1 to 10, with 1 being "not likely to occur" and 10 being "inevitable."
Detection: How likely are you to detect a problem before it occurs? The detection ranking helps you quantify the probability of a failure being caught early and action taken to prevent it from happening all together. Your team determines the detection ranking on a scale of 1 to 10, with 1 being "very likely to be detected" and 10 being "not likely to be detected."

These criteria (and their ranking numbers) make up the equation to calculate a risk priority number (RPN) for each failure mode. The RPN calculation looks like this:

Let's take a look at a simple example to see how to calculate RPN. We've performed FMEA for a new espresso machine for our coffee shop. The management and staff determined if the espresso machine failed, it would greatly impact business by not producing coffee, which would lead to lost sales, angry customers, and eventually loss of business from those angry customers.

They assign the espresso machine a severity rating of 10. The team decides the frequency in which a failure could occur on the espresso machine is fairly low, so they ranked the occurrence factor as a 4.

Finally, the team determines by properly cleaning and maintaining the espresso machine each day and listening for abnormal noises or monitoring for quality production, they can detect a potential failure fairly easily. They rank the detection part of the equation as a 2.

This gives an RPN of 80 (10 (s) x 4 (o) x 2 (d) = 80).

In most cases, severity can't be reduced, so concentrate on decreasing occurrence and increasing detection. It may be tempting to focus only on high RPN scores, but be sure you address anything with a high severity score, regardless of the total RPN. For example, you could have an equation like this:

10 (s) x 2 (o) x 2 (d) = 40

Even though the overall RPN is fairly low, the severity is at the maximum and should be assessed.

FMEA Process

The FMEA process is conducted in a step-by-step fashion because each step builds on the previous one as you work through the analysis process. FMEA is done in seven steps, each one carefully designed to make the analysis quick and effective.

Step 1: Assemble the FMEA Team and Review the Process

FMEA is a team effort, and a team approach is vital to its success. Your FMEA team should be led by a responsible manufacturing engineer or lead technician. Team members can include design and process engineers, materials suppliers, and even customers.

Once your team is assembled, it's time to do some pre-work – that is to collect and create key documents, such as information on past failures and prep documents. Prep documents should include:

Information on past failures: This could be from past FMEA experiences or one-off incidents.
Boundary diagrams (helpful for Design FMEA): A boundary diagram is a graphical representation of the relationships between subsystems, assemblies and components, as well as how they work with other systems and environments.
Parameter diagram (helpful for Design FMEA): A parameter diagram is defined as taking the input from a system/customer and relating those inputs to desired outputs of a design that the engineer is creating, all while considering non-controllable or outside influences, according to J.M. Juran in his book Quality Planning and Analysis.
Process flow diagram (helpful for Process FMEA): A process flow diagram is used in process engineering to indicate the general flow of plant processes and equipment. It only shows the relationships between major equipment in a plant and not minor equipment or details like piping or wiring.
Characteristics matrix (helpful for Process FMEA): The characteristics matrix is a tool to illustrate the relationship between product characteristics and process operations.

Engineering consultant Quality-One International also recommends putting together a pre-work checklist to maximize FMEA efficiency. Your checklist can include things like:

Requirements
Preliminary bill of materials/components
Baseline FMEA (past FMEA)
Previous tests and control methods used on similar products
Known causes from surrogate products
Potential causes from interfaces
Potential causes from design choices
Potential causes from noises and environments

Step one is a great time to consider all the ways each component could fail. Reviewing existing documentation and data likely will reveal several potential failures for each component in question. Brainstorm an exhaustive list initially and then pare down or combine items generated by the original list.

Murphy's law states, "Anything that can go wrong will go wrong." During the step-one brainstorming session, keep this in mind when identifying functions, processes, systems and components that have the potential to fail.

Step 2: Determine the Severity Ranking

Using an FMEA template or outline, it's time to add the functions, failure modes and effects to determine severity rankings. When listing the functions, make sure each one can be measured in some way. Functions may include:

Design specifications
Government regulations
Program requirements
Characteristics of the component or product being analyzed

Let's use the example of installing driver-side airbags in a car manufacturing assembly line. In this scenario, the function would be to properly orient and place the airbag into the assembly fixture.

Next, you'll list possible failures for each function. Think of failures as "anti-functions" and consider things like:

Total failure of the function
Partial failure of the function
Intermittent failure of the function
Over failure
Unintended failure of the function

Using our example of installing the driver-side airbag, one possible failure mode could be not receiving the correct airbag for installation.

Now we need to list the possible effects our failure mode could have, making sure to give each effect a severity ranking (1-10). If the severity level is a 9 or 10 at this stage, actions should be considered.

Potential effects for receiving the incorrect airbag for installation could be a delay in assembly while waiting for the correct airbag to arrive, or the wrong airbag being installed, causing a deployment malfunction in the event of a crash and leading to driver injury. Your team might decide to give this effect a severity ranking of 9 or 10.

Step 3: Determine the Occurrence Ranking

Step three involves determining potential causes and prevention controls using an occurrence ranking. You can brainstorm causes using past failure data or by getting ideas and input from the design team. For example, why would we receive an incorrect airbag? One possible cause could be human error. An assembly-line stop prior to ours may have gotten the driver- and passenger-side airbags mixed up, or there may have been a discrepancy in the total number of airbags ordered, leading to more passenger-side airbags coming down the line. You and your team might decide there is a moderate chance this potential cause could occur and give it an occurrence ranking of 4.

Step 4: Determine the Detection Ranking

Step four requires you to brainstorm and discuss controls or processes that ensure the design meets the requirements (Design FMEA) or, if a failure occurs, the likelihood an undetected failure mode would reach the customer. You can split this into two columns in your template: current process controls (prevention) and current process controls (detection).

Using our example, a process control currently in place might be a set of airbag assembly instructions. A current process control for detection may be a visual check of the airbags performed by the operator. You and your team may determine the likelihood of this failure being detected is fairly good and give it a detection ranking of 6.

Step 5: Prioritize Action and Assign an RPN

Remember the RPN equation from earlier? This is where you assign an RPN number to each action established in steps two through four. The RPN number helps prioritize and assign follow-up action items. As discussed previously, RPN is calculated by multiplying the severity, occurrence and detection rankings for each possible failure, cause and control combination.

After you've assigned RPNs for additional follow-up, assign the actions to the appropriate employees and make sure due dates are set for the completed actions.

Based on the assigned rankings in our example, the RPN is 240 (10 x 4 x 6).

Step 6: Take Action and Review the Design

Since the whole purpose of performing FMEA is to discover and mitigate risk, an action is only complete once it has been determined that it successfully reduces risk. In this step, failures should be listed in descending RPN order, so you can concentrate your efforts on the most critical areas.

You may have heard of the Pareto principle, which states that 80% of issues come from 20% of the causes. This means the decision on where to focus your attention shouldn't be based strictly on the RPN alone, although it is a good starting point. The FMEA team leader should ensure actions are taken by the pre-determined due dates so a design review can occur.

Step 7: Re-Ranking RPN

Finally, you want to see if your actions truly did mitigate risk. Once all risk-mitigating actions have been implemented, the FMEA team should meet and re-rank each value (severity, occurrence and detection) and calculate a new RPN.

The old and new RPNs should be compared, and if the risk factors are lower, new actions can be implemented into the design or process phases.

FMEA Template Example

Below is an FMEA template using the airbag example from above. Listed topics are defined as follows:

Item: This refers to the item being analyzed, otherwise known as the function.
Failure mode: This describes what has gone wrong.
Failure effects: This describes the potential impact of the failure.
Severity (S): This ranking shows how severely this failure will impact the customer.
Causes: List potential causes of how the failure could occur.
Occurrence (O): This ranking shows the likelihood of the failure happening and the frequency at which it may occur.
Controls: What controls are already in place that could prevent the failure from happening or detect it should it happen?
Detection (D): This ranking shows how easy the failure is to detect.
RPN: Calculate the RPN and place the number in this column for reference.
Recommended actions: List actions that will mitigate the risk of this failure occurring.
Responsibility and due date: Assign who should implement the recommended actions and assign a due date.
Actions taken: Once the assigned party has implemented the recommended actions, list it here.
New RPN: Calculate the new RPN, taking into consideration the newly implemented actions, and compare the new RPN with the old RPN.

Many people add additional columns to the template after the RPN column to help keep track of action items and track improvement. In the case of our airbag scenario, additional colums might look like this:

Recommended actions: Add manual/visual inspection as the first step in the airbag installation process
Responsibility and Due Date: Airbag installation technician is responsible (11/15/2019)
Action Taken: Step added - manual and visual inspection on the assembly line

Now, you can add new columns for Severity, Occurrence, and Detection to see if the changes made a difference in the RPN.

About the Author

Jonathan Trout

Key Performance Indicators: Measuring and Managing the Maintenance Function

Navigating Change & Sustainability in the Maintenance and Reliability Industry

The Critical Connection Between Digital Transformation and Sustainable Continual Improvement

Check Your KPIs

Featured Whitepapers

Control What You Can Control: Unlocking Resilience in Asset Management

How Oil Cleanliness Extends Industrial Equipment Life

Buyer's Guide

Lubricants

Oil Filtration

Lubricant Storage and Handling