How missing the importance of the strategy execution almost created months of work for no practical benefit to asset reliability.
As an example of our article about Simplifying Root Cause Analysis, the following is a case study to demonstrate the point.
A dozer track pin failed in service and the track separated. Below are photos of the pin that failed.
The reliability engineer from the site, who was trained in the methods that we consider appropriate, organised and conducted a failure investigation. He involved all the necessary people and utilised a 5-Why method. The (very extensive) outputs – cause tree and defect elimination plan – are shown below.
|1||Include track warm up procedures on dozers||Reliability Engineer||???|
|2||Develop procedure for detecting hot pins and include in SOP||E&M Manager (delegate)||???|
|3||Investigate alternate tracks/design||E&M Projects Super||???|
|4||OEM presentation to operators on track care and operation||E&M Manager (delegate)||???|
|5||More communication between workshop and operators on potential problems||E&M Manager (delegate)||???|
|6||Better training for operators on preventative maintenance||E&M Manager (delegate)||???|
|7||Maintenance dept. To pass on information of runup procedures and management to be accountable||E&M Manager||???|
|8||CTS inspections at service intervals (change management strategy to capture in services)||Reliability Engineer||???|
|9||Inspect tracks for hot joints more frequently||Reliability Engineer||???|
|10||Task to inspect track joints more descriptive||Reliability Engineer||???|
|11||Frequent heat checks while operating – 100hrs thermal imaging||Reliability Engineer||???|
|12||Include tools and training package for removing packing||Production Super||???|
|13||Better use of equipment through forward planning (use float for > 1km relocation)||Mine Manager||???|
|14||Management to understand importance of floating dozers||Mine Manager||???|
|15||Train at least two staff per crew per shift in float operation||Mine Manager||???|
After the RCA was completed, the reliability engineer went and discussed with the Maintenance (E&M) Manager, since the manager was assigned many of the actions arising from the RCA.
The manager asked, “what was the failure mode?”
The RE replied, “it was a hot pin that then wore out and failed.”
The manager then asked, “what is our current task or tactic to manage hot pins?”
The RE replied, “that is a good question, I will find out!”
After two days the RE returned to the maintenance manager and said that the current tactic to manage hot pins was to do hot pin checks and action the results.
The manager asked, “why did it not occur on this occasion?”
The RE replied, “the person from the OEM who conducted the hot pin checks realised that no-one was reading the reports, so he stopped doing them.”
The manager then asked, “so, how do we stop this from recurring?”
The RE replied “we just have to execute the hot pins check properly and action the findings.”
The realisation that we just needed to understand how we should have managed the failure modes meant that 14 of the 15 actions resulting from the RCA were unnecessary. Instead, all we had to do was to focus on making sure that our maintenance execution quality was adequate. Then, to extrapolate (turbo-charge) this learning, we could ask ourselves “what other condition monitoring tasks are not being actioned?” This is clearly an exercise that the maintenance execution team could have performed without the reliability engineer needing to get involved.
By just focusing on the failure mode, and the current tasks to manage this failure mode, we can simplify the RCA process and significantly reduce the work that we tend to create from these processes. It also prevented the reliability engineer wasting months of his time.
By Gerard Wood