Root Cause Analysis: Rooting for Reliability

Drew Troyer, Noria Corporation
Tags: root cause analysis, maintenance and reliability

It's fall. The leaves are or will soon be turning colors and then falling to the ground. Football season is in full swing, and I'm sure you're rooting on your favorite pro, college and/or high school team to victory. It's a good time to start thinking about rooting your plant's reliability on to victory, too - but in this case, I'm not referring to cheering; I'm talking about root cause analysis (RCA).

Reliable Plant magazine recently conducted an extensive survey about the application of root cause analysis in industrial plants. Some of the results were predictable, but others were very surprising.

For example, safety was not identified as the leading motivation to trigger an RCA event. Of almost 600 respondents, we found that 77.5 percent perform some root cause analysis in their organization.

Granted, the readers of Reliable Plant are probably among the upper echelon of reliability practitioners, but the results suggest that this important tool is here to stay - and I believe that we've only just begun to unlock its potential for improving plant reliability.

I'll be providing a full report with detailed analysis on our survey findings at Reliable Plant's "Root Cause Analysis: Successful Applications for Plant Reliability" conference, which will be held December 11-13 in Houston.

Rooting is an interesting verb. In its intransitive form, rooting is "to wish the success of or lend support to someone or something." More formally, rooting is "to remove altogether by or as if by pulling out by the roots."

Certainly both are required to achieve excellence in plant reliability management, but the latter definition is the business of RCA. What a wonderful statement, "to remove altogether." The business of RCA is to remove problems altogether by addressing them at their roots.

Unfortunately, when a reliability problem arises, most organizations either address it at the symptomatic level, seek immediately to lay blame on a person or group, or, regrettably in many instances, both. Root cause analysis is a systematic process enabling you to understand and address the underlying causes of a problem.

There are many techniques and approaches to root cause analysis, but they share many similarities. I'd like to share some of my thoughts and philosophies about this important reliability improvement tool.

1) It's not the objective of RCA to find someone to blame. I should repeat that … RCA is not about finding someone to blame. The only time we should seek to lay blame on a person or group is when that individual or group takes intentional action to undermine plant reliability.

While applying root cause analysis to solve reliability problems is similar in concept to solving crimes - in real life or on your favorite TV show about forensic detective work - the difference is that, in a criminal investigation, there is a perpetrator, or perpetrators, that intentionally committed the crime; otherwise, the event is deemed an accident.

While people are involved in most plant reliability problems, in very, very few instances do the people exhibit what lawyers call mens rea, or criminal intent. As such, an investigation, be it root cause-oriented or so-called shallow cause analysis, that is focused on finding someone to blame is destined to fail.

2) We rarely find a smoking gun. Often, organizations enter into a root cause event intending to find THE root cause of the problem. In fact, the process is more about eliminating causes that we believe did not contribute to the failure than in actually finding the cause.

At the end, we settle on what we believe is a manageable set of addressable contributing causes (see the figure below for the cause categories defined in the DOE-NE-1004-92 standard). Root cause analysis employs abductive reasoning, which doesn't afford us the controls associated with deductive reasoning applied to experiments utilizing the scientific method. We've got to do our best, which often requires some leaps of faith.

3) It is imperative to connect RCA to your failure modes and effects analysis (FMEA or FMECA) log. Unfortunately, when plants complete an FMEA, they take actions and then bury the FMEA log in a file folder or computer file. Stop it! The FMEA is the manifestation of your risk assessment for a plant, system or machine.

Root cause analysis is a continuous improvement tool that should be employed for reliability growth. If a failure mode exhibits a high-risk priority number (RPN), you may elect to initiate an RCA event to better understand the failure mode and to develop possible solutions to reduce the severity of the possible failure and/or the likelihood of occurrence, or increase your ability to detect and control the failure.

Likewise, the results of RCA, irrespective of the reason for initiating the RCA event, must be incorporated into the FMEA. In some instances, the RCA will clarify already identified failure modes/causes.

In other instances, it will uncover new ones. In any event, FMEA and RCA must be linked to gain maximum effectiveness from both. While it's beyond the scope of this column, this is an important enough subject that I plan to address it in a future technical article in Reliable Plant magazine.

Exponent---Figure1.jpg

Figure 1. Cause categories defined in Standard DOE-NE-1004-92.

4) Don't forget bad actors. It's common to initiate an root cause analysis event when a major failure occurs - one that has safety, environmental or significant financial implications. In fact, these failures often get more attention than they can handle (particularly from interested senior management), which often disrupts the investigation team. It's equally important to initiate RCA events for bad actors - failures that occur with a high level of frequency.

While the impact of the single events may be relatively small, the cumulative effect can be quite significant. The cumulative effect of small bad actor failures often far exceeds that of any single event deemed significant enough to warrant RCA.

Finding bad actors, of course, requires that you institute a rigorous protocol and system to report failures. This is a tool that most organizations lack (another topic for a future technical article in Reliable Plant).

So, stop just cheering for plant reliability and start seriously employing root cause analysis. More than 80 percent of the respondents to our survey rated RCA as one of their better - or their best - plant reliability management tools.