The Power of "T"

Daniel Daley
Tags: condition monitoring, maintenance and reliability

The Power of

All too frequently, the need for improved reliability becomes apparent. While the most common strategy for addressing this is to make physical changes that increase an asset’s Mean Time Between Failure (MTBF), that approach requires time, engineering and money, all things typically in short supply.

Mean Time Between Failure

A maintenance metric, represented in hours, predicting how long a piece of equipment can go during normal operating hours before it will fail.
Source: Reliable Plant

A useful alternative is one that takes advantage of currently available resources and characteristics by applying them in a different manner. While the current MTBF of an asset may be too short as currently employed, it is a characteristic that can be leveraged into improved performance.

Calculating the Likelihood of Survival

The following simplified equation is useful in identifying an asset’s reliability, or likelihood of survival, based on the current MTBF and the interval which the asset’s current health is determined by performing simple health checks.

R = The likelihood of survival

t = Time between health verification

MTBF = Currently measured mean time between failure

The following table shows R (the likelihood of survival) based on typical values of MTBF and t (health verification intervals). Note that even with short MTBF, it is possible to substantially increase the likelihood of survival by performing health checks more frequently.

For instance, in the case where the MTBF is only one year and a semi-annual health check would deliver a likelihood of survival of 60.65%, weekly health checks would increase the likelihood of survival to 98.10%.

It is useful to think of this new characteristic as “apparent reliability.” While the physical capability of the asset remains unchanged, the apparent capability has been improved through increased scrutiny.

Using this analysis, it is also possible to calculate the value of these improvements by determining the value of risk reductions. We do so using the following equation:

Since R equals the likelihood of survival, the likelihood of failure equals L, where L = R – 1.

Using the example provided above, where the initial likelihood of survival is 60.65%, the likelihood of failure would be 29.35%.

Similarly, in the case where the apparent likelihood of survival increased through frequent observations to 98.10%, the likelihood of failure would be 1.9%.

In this example, if a failure was to produce an outage lasting ten days and the value of lost production was $10,000 per day, the total loss caused by the failure would be $100,000.

In this example, the value of risk in the first situation would be $100,000 times 29.35% or $29,350.

The value of risk in the improved situation would be $100,000 times 1.9% or $1,900.

In other words, the value of the risk reduction would be $29,350 - $1,900 = $27,450 per annum.

It is useful to always remember that risk is real money. If the values of the likelihood and the impact upon failure you determined are accurate, the calculated value of risk will always become an actual cost, sooner or later.

The results of risk-taking are best described by the results of flipping a coin. While, at any point in the experiment, there may be more heads than tails or vice versa, ultimately, the results will always even out over time.

It is also useful to keep in mind that, while the value of a single instance will produce the results described above, when the strategy is broadly applied, the combined results will be much more impressive.

(On the contrary, it is also useful to notice that while shortened health-check intervals can substantially improve the apparent reliability of assets with a short MTBF, the opposite is also true. Longer inspection intervals can reduce or adversely affect the reliability of assets even with longer MTBF.)

Determining Asset Health

Now, let’s discuss the kinds of things that can easily be accomplished to quickly determine the health of an asset. Most of these can be accomplished to the required degree of accuracy using only human senses and observations completed in the time available when observing assets at a reasonable pace during normal operating rounds.

For a typical pump containing a mechanical seal, the following observations should be completed on an interval, t, during structured operator rounds while passing from asset to asset that have been marked sequentially in an efficient path through a plant or facility.

Discharge pressure and stability
Oil color
Oil level
Bearing housing temperature by touch or feel
Seal pot fluid level
Seal pot fluid color
Bearing housing vibration level by touch or feel
Condition of surroundings (looking for leaks, coupling spacer pieces, bolts, nuts, shims, etc.)
Filter pressure differential

To make readings like pressure, pressure differential or level as quickly completed as possible, acceptable levels and ranges should be marked by a piece of tape and recorded only by the presence or absence of a check mark rather than a specific number.

Other kinds of health checks may entail simply verifying the operation or functionality of various kinds of instruments or controls. This can be done by contacting the inside control operator by radio and asking them to make minor changes to ensure everything functions as intended.

For example:

Slightly open or closed valves.
Check level alarms or controls by creating simple scenarios that cause them to function as intended.
When testing spare pumps or redundant equipment, allow auto start or transfer switches to function as intended.
Continue by performing function tests but do so during times when it is light and when there is enough help to respond to problems when uncovered.

Additional health checks using either observations or function tests are left to the creativity of the reader. The main objective is to identify instances where short MTBF can be aided by short interval testing and where available resources can be thoughtfully applied to reduce risks of failure.

All such observations made as part of a single structured round should be on a single spreadsheet or in an electronic tablet that includes the date and time the round was completed and the initials of the person who completed the observations.

Conclusion

It is important to stress the need for complete health checks when scheduled and for timely follow-up when aberrant or unexplained conditions are identified.

It is also important to routinely audit these activities to ensure they are completed as intended.

In the short term, it is important to compare the cost of time spent by operators performing these checks at more frequent intervals and with greater discipline to the cumulative value of all risk reductions for all assets included in the structured rounds. This comparison will justify the effort compared to other ways time may be spent.

In the long term, it is important to compare the overall cost of this program to the value of improved asset availability. Again, this will show the value of performing this low-cost program when compared to more costly programs or to simply doing nothing until resources are available to implement more costly strategies.

Time and experience will show that this is a highly sustainable method that will achieve tangible improvements using currently available resources at an intensity and complexity within realistic limits.