Shutdown Risk Management Best Practices

Andrew Levitt; Ben Wurtmann
Tags: maintenance and reliability

The highest pressure situation your maintenance department might ever encounter is a planned downtime. A large amount of work is scheduled into a small amount of time, but the deadline for resuming production is just around the corner.

There can be great gains to be made by increasing reliability or installing new equipment. However, there are risks. New problems can arise, and costs can mount. How do you make shutdowns a safer bet?

In project management context, the word risk is simply used as shorthand for “deviation from the project plan.” Encountering at least some risk is unavoidable. The impact of that risk depends on how a shutdown has been planned.

Uncertainty about the magnitude of repairs needed, over-aggressive estimates, lack of experience, and a number of other issues can contribute to delays, cost overruns and lost productivity. Some of these factors can be eliminated, but most risks can only be managed.

Seeing Risk

The critical task here is to accurately project the magnitude of the risks involved in a shutdown and respond accordingly. As the project is outlined, several factors should be examined to develop a better view of the situation.

The basic rule is that the complexity of the task is directly related to the likelihood of encountering difficulty. In planning for a shutdown, the following factors may flag a process for being a likely delay.

Critical Path – By definition, a Critical Path task has the potential to cause serious delays. Any delay in activity on the Critical Path has the potential of delaying the whole project.

Predecessors – A task that depends on multiple tasks being completed first is subject to more possibilities for delay.

Aggressive Estimates – Setting high standards for productivity doesn’t mean expecting the impossible. Unrealistic estimates can cause serious bottlenecks when later tasks get delayed by overruns in the initial stages of a shut down.

Unfamiliar Tasks – Have workers performed this task before? New equipment and turnover could create a situation where workers would be learning on the fly. Identifying training needs and calling in outside resources may be the difference between on time and behind schedule.

Final Work – At the end of a shutdown, your workforce has been under pressure and the finishing work can present a stumbling block. Proper load leveling can minimize this problem.

Rarity – Are the materials or labor needed for a specific task hard to obtain? Delay with supplying these needs can cause major delays. Project management reports should include a filter for specialists and resource availability.

Measuring Risk

Not all problems are created equal. Solving some might be fairly easy, others might bankrupt a company. Determining which need your most urgent attention is the comparison of three values: tolerance level, cost and probability.

Tolerance: Tolerance is evaluating the capability of your company to respond to risk without unacceptable consequences. Cash reserves, overdue orders, production goals, and regulatory requirements can all inform this discussion.

What might be an acceptable possible cost for an international operation with multiple plants could be catastrophic to a smaller custom manufacturer. Risk tolerance is not just a fiscal calculation, even if money is a major consideration here. It is a yardstick of the magnitude of various shutdown risks.

Parallel to the financial accounting needs to be an assessment of environmental, health, and safety concerns. Human lives can’t be counted in the same way, but this doesn’t mean that quantifying the potential for problems isn’t important. Assume the worst, brainstorm for possibilities, identify consequences, and plan out scenarios in detail.

Cost: The additional cost of risk can be estimated by comparing the worst possible scenario against the planned outlay. If things begin to go wrong, what will it take to get things back under control? Costs aren’t just limited to the immediate expenses of fixing a problem, but everything that adds up from the interruption. Will orders go unfulfilled? Contracts or customers lost? Will specialists need to be brought in? New equipment ordered?

Broad brainstorming is crucial here. Expertise from all corners helps build a complete picture of possible consequences and tally up the costs.

Precise calculation is not possible for every risk you will have identified, and some may simply need to be estimated. The priority here is to carefully consider the risks that have the greatest potential to cause disruption and delay.

Probability: Likelihood of an event is most accurately predicted based on prior data. Records from previous shutdowns and experienced employees can help guide this kind of analysis. But since shutdowns tend to be rare, not every kind of risk will have hard data associated with it.

The key here will be to make the best possible estimate based on experience. A reasoned estimate is more useful than a wild guess or no prediction at all. Thinking through the possible chains of events will help identify likely trouble spots based on the criteria presented above.

Remember, the more complex a task, the greater the possibility of failure. Multiple inputs, critical supplies and talents, and time sensitivity all drive risk.

Let’s not overlook PERT and Monte Carlo duration estimate methods – these allow you to enter worst case, expected, and best case scenarios (dollars and durations) for each task.

Project management software can then extrapolate the likelihood of a task’s starting on a particular date, and Monte Carlo calculations can help give such results more detail and accuracy.

Tasks that are not very likely to begin on their planned date are, of course, more likely to fall further behind. Think of PERT as a way to quantify the confidence that a planner has in their duration estimates, and Monte Carlo calculations as a way to figure the cumulative effects of these uncertainties throughout the project.

Combine the probability that a risk will occur with the cost of the risk and compare that with your tolerance for acceptable costs and delays. If you feel that the task is involves risk your operation cannot afford, note this in a field in your list or project management software.

These risks must be managed. Risks that are better than acceptable can be left alone. Prioritize your list of risks, separating those that must be managed from those that do not.

Response Development

There are two fronts on which to act in responding to risk. Risks can be avoided or they can be made less costly when they do occur.

Avoidance is a first step. Now that your team is looking for risks, some problems can be entirely bypassed. Delays with the arrival of materials and or the lack of information on the repair of a piece of equipment can be remedied with foresight.

Some potential issues can be absorbed into the plan if they can’t be prevented. One of the biggest possibilities for risk and delay comes out of unplanned repairs. Issues that arise during the shutdown tend to be prioritized because of the surprise factor.

While this might be necessary, it is far better to start out knowing the magnitude of work to be done. Have non-invasive tests been performed on equipment? Infra-red scans done to look for overheating? Vibration checks performed? Sound checks for compressed air leakage?

Think of all the ways that equipment can be assessed before shutdown and disassembly, so that you enter the shutdown with a clear picture of what needs to be done so that supplies and labor are ready.

Knowing about an issue in advance can mean the difference of a major delay while waiting for a part and having it arrive right on time. Once a problem is identified, planned and prepared for, it’s no longer a risk, but a regular part of your planned maintenance.

But what about risks that aren’t necessarily avoidable? Some issues can arise during the shutdown that can’t be known in advance. Diagnostics have limits, and some equipment may not be possible to check until it is offline. Needed supplies might be delayed. Repair work may turn out to be more complicated than originally thought. The key here is mitigation.

Mitigation is the process of taking steps that reduce the impact of risks. This might mean building some extra worker hours into a schedule so that new issues can be addressed while planned work is still done on time.

It might mean having more spare parts on hand, should they be required as refurbishing takes place. Or it could be writing damages into a contract with an outside supplier to reimburse your company if materials are not delivered on time.

The goal is to make a risk less costly if it occurs and make the impact be within your tolerance. The amount that is reasonable to spend on mitigation efforts needs to be related to both the cost and the probability of the risk.

Some low cost measures can take care of some small risks, while a major outlay might be prudent to protect against a catastrophic risk.

Draw up contingency plans for what to do in the event that the risk does indeed take place – these minimize the cost and consequence of the risk by minimizing reaction time and maximizing response efficiency.

If a piece of equipment is in far worse shape than expected, the set of tasks required to bring it up to operational condition should be drawn up and saved for quick insertion into the project plan file, work packets already prepared, and parts either already on hand or ready to be ordered quickly without much hunting down of relevant information.

This allows work to begin as soon as possible. Since most risks in shutdowns come from unexpected emergent work, contingency planning is a great source of progress for managing shutdown risk.

For each risk that remains significant after these preparations have been made, identify a trigger or set of triggers that indicate that the risk has occurred or is about to occur. By identifying triggers, you minimize your reaction time for the implementation of contingency plans.

To determine triggers, call another brainstorming meeting with experts that are familiar with the risk. Find out how they would know that the risk has occurred, and then work back from there to the earliest indicator.

Try to find indicators that would be apparent in the project plan during updates, such as a particular pattern of overtime or the heavy use of a certain type of specialist resource. Build filters in your project management software that represent this behavior, so that during the project you can check once a day to see if these patterns are happening.

Be sure to assign responsibility for monitoring the risk if you don’t do it yourself – supervisors and contractors also can be given access to project data. Each risk can only happen during the period that the relevant tasks are in progress, so build a risk-watch schedule.

In order to closely track possible indicators, you may need to gather more information than you would otherwise. For example, if the risk is the production and delivery of a critical material, you might request that the manufacturer notify you at each step of its production, just to make sure that it is on track.

The idea is not to eliminate all sources of risks, but to decide ahead of time what preparation one wants to take. High cost, high likelihood events obviously need to be considered first, and considered to be part of the main planning process.

Risk quantification can be an exact science or an exercise in vague wizardry, depending mostly on how much past information you have available.

The difficult judgment call is in separating out events based on a combination of the potential disruption and the chance that it will occur.

Protecting against long shot events with devastating consequences may be more worthwhile than spending excessive advance time dealing with minor issues that are more likely. The object of risk analysis is to plan for significant risks.

Risk Response Control

After you’ve implemented any risk elimination and mitigation measures and begun your shutdown, you need to monitor for two things: the triggers that you’ve already determined for expected risks, and the occurrence of unexpected risks.

Unexpected risks should of course be responded to quickly, and their causes well documented to assist in future shutdowns. Expected risks should be monitored using the risk schedule you developed.

Analyze project indicators, the plant floor, and communications from supervisors for evidence of triggers. If any triggers have happened, investigate further to see if the risk has indeed occurred, and if so then import your contingency plan into the project plan and rearrange task schedules as necessary to accommodate the extra work.

For major contingencies, save a new baseline to reflect the change in plans. Make note of the risk occurrence in reports as well.

If you put in the energy to complete these steps, you will have shorter, more tightly controlled shutdowns with fewer incidents – in short, the cost of planning to this degree is more than returned through improved project performance.

To summarize the steps in a comprehensive risk management program for shutdowns:

Determine your tolerance for cost, customer relations, safety, and environmental risks.
Filter for high-risk tasks.
Using your shortened list, come up with environmental, health, and safety issues; as well as financial costs for each risk.
Determine the probability that each risk would occur.
Prioritize risks based on your tolerance and the combination of each risks’ probability and the magnitude of its consequence.
Come up with mitigation plans or contingency plans or both.
For tasks with contingency plans, brainstorm a list of triggers that signify that a risk is turning sour.
Monitor the project during execution for triggers and unexpected risks.
Collect data and debrief after the shutdown. A history of previous problems encountered might help locate potential trouble spots in the future.