In manufacturing, reliability is the product of maintenance. However, many factors can limit the degree of reliability that maintenance can deliver. One of the most significant is the inherent ability of the operating equipment to perform the required function.
The equipment’s design and selection set the bar for maintenance. If maintenance does all the right things and reliability is still unsatisfactory, an equipment redesign or replacement may be necessary.
Some dramatic examples of failure have resulted from inadequate equipment/system design and selection. These could have been avoided if a thorough design review, using any one of several standard processes, had been performed prior to plant construction.
One example with which I was involved was the failure of a 25,000-horsepower pulp refiner. Shortly after start-up, a 250-milliamp control fuse failed, shutting down the main oil pump and resulting in the destruction of the main refiner bearings. From a control point of view, the main oil pump was quite remote from the equipment it was protecting.
Compare this to the large steam turbine in the same plant where the main turbine oil pump was mounted directly to the end of the turbine shaft. As long as the turbine was turning, so was the oil pump.
There were auxiliary pumps for start-up and to provide emergency backup, but the design principle of keeping the “control connection” between the operating equipment and the other services required for protection as simple and direct as possible provided the highest level of reliability. The number of potential failure modes in the refiner example was much higher than with the turbine.
Two other examples relate to the failure of emergency backup systems. One involved a power failure at Hartsfield-Jackson Atlanta International Airport in December 2017. The other was a failure of the emergency generating system at the Fukushima Daiichi Nuclear Power Plant in March 2011. In Atlanta, the cables for the primary and backup power supplies shared the same service tunnel and were both damaged by the same fire.
At Fukushima, the emergency generators were inundated by a tsunami, leading to the partial destruction and complete closure of the nuclear power station. The generators had been installed at a low level even though the area was recognized as having the potential for a tsunami. A thorough design review of both installations based on their operating context, including emergency situations, should have enabled both of these failures to be avoided.
Primary and backup systems must be kept separate to prevent any foreseeable event from rendering both unserviceable at the same time. In particular, isolating devices (valves or electrical breakers) that separate primary and backup systems should be located so they can be accessed and operated under any emergency situation.
These critical devices should be carefully selected and of a much higher quality than dictated by normal plant component standards. Even if they are not operated for many years, the devices must be completely reliable.
Of course, there are many other considerations when designing and selecting equipment. Nevertheless, a critical and logical analysis of the effects of failure and possible emergencies, such as fires, earthquakes and floods, along with a plan to mitigate the resulting damage, should be fundamental components of the design process.