Improving machinery reliability is everyone’s job. Plantwide participation in reliability improvement efforts can significantly increase equipment run times and reduce maintenance costs. One facet of machinery reliability that never gets enough notice is the work done in the trenches by shop mechanics.
Mechanics possess in-depth knowledge, experience, and intuition about the machines they repair. They need to be engaged when repeat or premature failures occur to help identify the underlying causal issues using failure analysis methods such as root cause analysis (RCA) and root cause failure analysis (RCFA).
Root cause analysis (RCA) is an investigative process employed to determine the underlying event(s) responsible for an unwanted condition, such as a drop in the production rate, off-spec products, or a high-temperature condition. When a machine failure occurs, the exercise to uncover the cause is called a root cause failure analysis (RCFA). Root causes are latent, or hidden, causes that begin the chain of events leading to an undesired condition or failure.
A partial list of common root cause categories includes:
1. System, Equipment, or Component Issue:
- Design error
- Misapplication
- Defective part
- Improper machine repair
2. Staff, People, or Training Issue
- Inadequate skills
- Lack of training
- Moral issues
3. Rules, Policies, or Procedural Issue
- Lack of procedures
- Inconsistent application of policies
- Outdated procedures
4. Organizational Issues
- A system, process, or policy used to make decisions or complete work is faulty (Example: No one person was responsible for vehicle maintenance, and everyone assumed someone else had filled the brake fluid).
Root Cause Analysis Example
For this example, an analyst performs an RCA on a centrifugal compressor using the “5 Why” method of questioning. With this method, the analyst repeatedly asks “why” until they arrive at the true event catalyst that created the undesired condition.
Problem: The flow from a critical centrifugal compressor dropped off suddenly, resulting in an unplanned unit outage. To better understand the event, an analyst was asked to discover the root cause.
Line of questioning:
- Why was there an unplanned unit outage? Because the centrifugal compressor speed dropped below the rated speed.
- Why? Because the compressor’s steam turbine driver could not maintain the rated compressor speed.
- Why? Because the steam turbine is not getting enough steam into its steam chest.
- Why? The steam turbine’s inlet strainer was found to be plugged.
- Why? The startup inlet strainer was never replaced with a larger, permanent mesh strainer as required. This is the root cause.
Most reliability groups employ a multilayered approach to RCAs and RCFAs. The actual analysis level is determined by the economic value of the unwanted event(s). Three commonly used analysis levels include:
Level “A” RCFA: Required for 1) A major event, such as a safety event, an environmental release event, an extended unplanned outage, or a major equipment failure costing more than $250,000 or 2) Multiple major failures costing more than $50,000 occurring in a 12-month period.
With Level A investigations:
- Team effort is required between engineers, technicians, and operators and results in a detailed report.
- The team may take weeks to gather and analyze data and finalize their findings and recommendations.
- Findings are transmitted to the regional management level.
Level “B” RCFA: Conducted by a Reliability Engineer or Technician and results in a shorter, less detailed report than a Level “A” RCFA report. A Level “B” investigation is usually prompted by 1) A single, costly (>$100,000) equipment failure or 2) Two or more equipment failures costing more than $20,000 in a 12-month period.
With Level B investigations:
- Analysis rarely takes more than one week to be finalized.
- Findings will be transmitted to the site or area management level.
Level “C” RCFA: 1 or 2-page failure investigations conducted by craftsmen and entered in the equipment file. Typically, Level “C” analyses are conducted whenever smaller (<250 hp), less critical rotating machinery fail, such as pumps, induction motors, and general-purpose steam turbines.
Pump Seal Level “C” Analysis Example
In this example, a mechanic is attempting to perform a Level “C” analysis on a centrifugal pump seal that has failed several times. The mechanic removes the most recently failed seal and inspects its components, discovering that the silicon carbide carbon face has signs of heat checking – the formation of surface cracks.
So, the mechanic begins asking “why”.
- Why has heat checking occurred? Because of a lack of seal flush.
- Why? Because there was a plugged seal flush strainer.
- Why? Because a large quantity of debris was found to be blocking off strainer flow.
- Why? Because a recent thermal excursion shocked the upstream piping causing large amounts of pipe scale to be released, and the strainer was not sized to hold this large amount of scale. This is the root cause.
Final recommendation: Since process upsets will continue to occur occasionally, the mechanic recommends installing duplex strainers with larger capacities for solids along with a differential pressure alarm that signals a plugging issue.
Once this Level “C” report is added to the pump’s repair history, it will add to the entire equipment’s story. With this, the next investigator will know what was found upon disassembly and inspection and what was recommended the last time the pump failed. If the recommendations were implemented, the next investigator could determine the effectiveness of those improvements.
Database example for tracking equipment failures.
Benefits of First Line Level “C” RCFAs
It is recommended that Level “C” RCFAs be conducted by plant craftsmen on all machinery failures. In a short period of time, valuable data can be gathered that can be used to correct hidden issues in the plant. Benefits of utilizing a Level “C” collection program include:
- Having a more detailed record of machine failures.
- Mechanics will:
- Learn about reliability engineering, such as cause and effect thinking and criticality thinking.
- Feel like they are part of the reliability effort.
- Improve their troubleshooting skills
- Learn more about the equipment, systems, and processes.
- Uncover previously unknown root causes and make recommendations based on actual failure data.
- Gain the satisfaction of solving plant problems.
Final Considerations
Before starting a Level “C” RCFA program, there are some final points of advice to consider.
- Simplify the Level “C” reporting process by either using developing paper forms with checklists or computer-based forms with pull-down selection for failure codes. Talk to your mechanics about the options to help identify a system that works best for the company goals as well as the employees who will be using it frequently.
- Provide basic RCFA training to help mechanics become comfortable with the process.
- Provide any special training required for the mechanics to understand the cause-and-effect relationships pertinent to their machines. For example, they should know that when the discharge of a centrifugal pump is restricted, the discharge pressure rises.
- Demand consistency. Every report must contain what was found, the most likely root cause, and their recommendations.
- Monitor the Level “C” reports to ensure quality.
- Provide feedback to mechanics so they know their information is being reviewed and used.