Root cause analysis is a systematic approach for uncovering the root cause of problems. Below we'll take a look at how to conduct a Root cause analysis, including the tools and methods used and the phases of the RCM process.
Root cause analysis (RCA) is defined as a systematic process for identifying the root causes of problems or events and an action plan for responding to them. Many organizations tend to focus on or single out one factor when trying to identify a cause, which leads to an incomplete resolution. Root cause analysis helps avoid this tendency and looks at the event as a whole. Another common occurrence is for companies to treat the symptoms rather than the actual underlying problems contributing to the issue, leading to recurrence.
Using root cause analysis to analyze problems or events should help you tackle the primary goal of determining:
In the end, root cause analysis boils down to three goals. The first goal is just as the name implies: to discover the root cause of a problem or event. The second goal is to understand how to fix, compensate for or learn from issues derived from the root cause. The third and most important goal is to apply what you learn from the analysis to prevent issues in the future.
Root cause analysis can be used in a variety of settings across multiple industries. Each industry might conduct the analysis in a slightly different way, but most follow the same general five-step process when investigating issues involving heavy machinery. This process was laid out by the United States Department of Energy (DOE-NE-STD-1004-92) back in 1992. Root cause analysis is commonly referred to as detective work at its finest. You’ll see similarities between how a detective works to solve a case and how manufacturers can figure out the root cause of an issue in the five-step process.
Phase 1 - Data Collection
Just like how detectives preserve a crime scene and meticulously collect evidence for review, collecting data is probably the most important step in the root cause analysis process. It’s best practice to collect data immediately after a failure happens or, if possible, while the failure is occurring. In addition to data, be sure to note any physical evidence of the failure as well.
Examples of data you should collect include conditions before, during and after the occurrence; employee involvement (actions taken); and any environmental factors. When machinery is involved, collect data and samples on things like lubrication systems, filters and separators, byproduct deposits (gums, varnish or sludge), oil analysis, and tank and sump conditions.
Phase 2 - Assessment
During the assessment phase, analyze all collected data to identify possible causal factors until one (or more) root causes are determined. According to the DOE’s process, the assessment phase incorporates four steps:
Common assessment conclusions for manufacturers include things like contaminated lubricant, using the wrong lubricant, using too much or too little lubricant, and abnormal wear debris.
Later we will discuss common root cause analysis methods and tools to help with the assessment phase of this process. Common methods include Pareto charts, determining the “5 Whys,” fishbone diagrams and more.
Phase 3 - Corrective Action
Implementing corrective action once a root cause has been established lets you improve your process and make it more reliable. First, identify the corrective action for each cause. Then, ask these five questions or criteria laid out by the DOE and apply them to your corrective actions to make sure they are practical.
Before taking corrective action, your company as a whole should discuss and weigh the pros and cons of implementing these actions. Consider the cost of carrying out these changes. The costs may include training, engineering, risk-based and operational expenses among others. Weigh the benefits of the costs associated with eliminating the failure(s) with the probability the corrective action(s) will work. In addition to cost, your team should discuss questions like:
Phase 4 - Inform
Communication is key. Ensure all affected parties are informed of the pending correction or implementation. In the manufacturing setting, these parties may include supervisors, managers, engineers, and operations and maintenance staff. It’s also a good idea to communicate any corrective actions with suppliers, consultants and subcontractors. Many companies inform all departments of any changes so they can be aware and determine if or how the changes apply to their unique situation as it relates to the overall manufacturing process.
Phase 5 - Follow-up
The follow-up phase is where you establish if your corrective action is effective in resolving the issues.
Following up regularly lets you see how well your corrective actions are working and helps you identify new issues that could lead to future failures. For a more detailed look at how to conduct root cause analysis specifically for lubrication professionals and manufacturers, check out "Root Cause Analysis Techniques for the Lubrication Professional.”
As discussed earlier, the data collection and assessment phases in the RCA process are perhaps the two most important aspects when it comes to properly determining the root cause of a particular failure. There are many root cause analysis tools to choose from when you’re assessing data. Each one can be used to evaluate different information or provide another way to look at similar data. Below are eight common root cause analysis tools and methods:
Read more about how you can create a Pareto chart in eight easy steps.
You may need more or less than five questions to get to the root of your problem, but as long as your questions keep peeling away issues on the surface, the more likely you are to uncover your root cause.
Work the diagram right to left, having your team brainstorm possible causes of the problem and placing each idea in the appropriate category. Once the team is done brainstorming, rate the potential causes by level of importance and likelihood of contributing to the problem. From here, select which causes to investigate further.
In the example above, the fishbone diagram includes a main problem, six factors contributing to the main problem and potential causes of those factors branching off.
You can think of FMEA as more of a proactive tool rather than a reactive tool.
A fault tree can be used to build a safety program, discover what went wrong in a process or determine why employees may not be meeting company standards. For example, you can take a hypothetical incident like a lubrication spill, break down the contributing factors and see the chain of events or failures along the way. You can then choose safety procedures that help minimize these outcomes.
For example, let’s say you have an abnormally good sales day and want to figure out why so you can replicate it. You’d start by considering every possible internal and external factor, such as whether a new sales training was implemented the day before or if it was the last day of the month and people were trying to hit their goals. Next, examine each event to see if it was an unrelated factor, contributing factor, correlated factor or the probable root cause. This is where all your analysis is done and where you can loop in other methods like the 5 Whys. Finally, see how the cause can be replicated.
You can perform root cause analysis to help solve day-to-day problems using brainstorming techniques or the 5 Whys. Employ RCA routinely as a proactive tool to analyze safety and environmental data, evaluate asset utilization, and identify trends that point to chronic losses or systematic defects. High-level RCAs are costly, so you need a process to help decide when one is appropriate. If you’re considering a high-level RCA, you’ll want to define triggers that determine the point at which a formal RCA should be conducted. Below are some ideas for forming trigger criteria:
It’s important to spend time preparing for a root cause analysis by doing some initial investigation, identifying the appropriate personnel and anticipating problems that could arise during the RCA meeting. A common example of preparing for an RCA is that of a puzzle builder. Even the most experienced puzzle builder, who may know tips and tricks for efficient puzzle-building, can’t be successful if a puzzle piece is missing or there is no place to build the puzzle.
Likewise, a team can’t complete a root cause analysis if it is missing important evidence, team members are absent, or the facilities are dysfunctional. So, make sure you collect evidence, identify key team members and prepare for the unexpected prior to your RCA meeting.
In most cases, RCA is used after an event or failure has occurred. The goal with root cause analysis is to be proactive or eventually move from being reactive to proactive.
The time required for a root cause analysis will depend on certain factors, such as the complexity of the incident, the availability of employees to be interviewed, whether there is regulatory interference and how far you want to dig into the causes. Most RCAs can be completed in a couple of weeks or a few months.
Examining internal and external factors in the weeks and months leading up to a failure event can help you obtain a snapshot of what happened. Let’s say you want to find out why revenue dipped last quarter in your food-processing company. Examples of internal and external factors might include: