When failure happens, the need to identify the root cause becomes imperative. There are many ways to determine why a failure occurs, but the method behind these methods will lead to strong solutions. A strategy leads to strong processes, which inherently leads to strong testing and solution methods. Engineers and engineering groups trust failure analysis in their business, but what type of control do they lean on? What do they do to make sure their analysis is complete? What do they do to know the problem won’t arise again? Let’s find out!
What is the goal of failure analysis?
One of the most prominent goals of failure analysis is to collect data to determine the root cause of a failure. This is often referred to as root cause failure analysis or RCFA. There are many forms of RCFA, including 5 Whys, Ishikawa, and Causal Factor Tree, to name a few. Failure mode and effects analysis also referred to as FMEA, attempts to predict failures before they happen. However, the tethering goal of all of these methods is data collection.
Another goal of root cause failure analysis is to prevent similar failures from happening repeatedly. For the same reason, failure analysis wants to know why something fails, and it also wants to know how to mitigate further risk. When failures happen to determine ways to prevent them better is key.
Data collection and risk mitigation also lead to another goal which is safety. Whenever a failure occurs, there could be more underlying causes at work. So, identifying if a root cause could have more potential risks is a must with safety in mind.
Lastly, both a benefit and a goal is cost saving. Whether it be on unnecessary preventative maintenance costs or safety-related costs, failure analysis will provide savings.
When should you perform a failure analysis?
As simplistic as it may sound, failure analysis should be performed whenever a failure occurs. Failures could be anything from a crack in a support structure to a chemical spill and many others. This might seem like a generalization, but it depends on the business using failure analysis.
For example, say you are a company that performs maintenance on roller coasters, and during a routine inspection, the inspector identifies a crack in a support structure. The need to stop use and failure analysis would be high. This is because that is easily identifiable as a safety-critical issue.
However, failure analysis could extend to a lighting and sound company that provides lights and sound systems for events when a lighting software doesn’t seem to be working. Failure analysis may not seem as apparent, but when performed could save a lot of resources down the line.
This is because when failure analysis is performed, problems can be solved. This then saves money, time, and resources in the future.
How do you do a failure analysis?
Failure analysis can be done in many ways, but there are some similarities between each one. They all tend to follow the following patterns.
The first step of failure analysis is to collect relevant data. This is generally completed by several persons specializing in failure analysis or experts in the failure area. Using the example of the roller coaster from earlier, once a failure is identified, the next steps could include people from Quality Management, Engineering, Compliance, and people associated with the roller coaster. Such as the technician who found the failure, the operating managers, and other persons who would bring in background information. Even outside engineers may be called in if there is some special need, like chemical analysis, that needs to be done.
Data Analysis to Determine Root Cause
Once the data has been collected, the responsible analyzer will then perform root cause failure analysis. The responsible person will use the provided data to determine the cause of the failure. After a root cause is determined, the responsible person will present their analysis to the stakeholders. Depending on the situation, this may involve both non-destructive and destructive testing to evaluate different potential root causes. As well, many causes can be identified in the analysis.
Determining Corrective Actions
The last common step is to determine the next steps. This will all depend on the failure’s root cause but will generally offer an approach that would be a detailed way to address the failure and any other underlying factors. If we use the lighting software failure from before and we say the investigation showed the root cause was both environmental and production-related, we are then able to provide an effective approach.
So, if the issue was that the lighting software would not start up because the lights were out in the sun and overheated, a corrective action might be to replace the existing lights with ones that could handle the outdoor temperatures. If the cause of the problem was from the manufacturer who provided the wrong type of equipment, the solution might be to reach out to the vendor to rectify the issue.
Strategic steps to your failure analysis process
Now that you know the benefits and steps to failure analysis, how do you make yours more strategic?
Firstly, the best thing a business owner can do for their failure analysis defines a process. A defined process will always lead to a more concise and deliberate analysis. A defined process will also help those new to a process understand the importance of tests and such. When there is a process in place, things are much less likely to fall through the cracks.
Secondly, it is important to keep records of failure analysis and solutions. As other failures came up, it is important to look back and see what actions were performed on a specific asset. It is also important to know how maintenance methods have changed over the years because this will influence the proposed solution.
Lastly, the most important way to be strategic is to use the correct tools. What is the way to determine the correct tools? Well, it depends on your business, but many businesses trust IBM Maximo for this kind of work. This is because, as a tool, Maximo is built to do already all of the things we have talked about. Maximo is powerful at record keeping. It is a system that prides itself on maintaining records of what was performed. From components used to maintenance performed, Maximo is a well-rounded system for supporting any need. However, the best part about Maximo, in this case, is there is already integrated support for RCFA called Failure Reporting. Failure Reporting allows users to associate a failure type with work being done and even the failure’s problem, cause, and remedy. This way, an event can be captured and recorded for future enhancement of techniques.
Effective methodologies come from planning and analyzing the process. The development and management of a methodology will provide more groundwork in the case of a complex issue.
Fail me once; shame on you. Fail me twice, and I need better strategies!
Failure comes from many places: materials, manufacturing, mechanical issues, and even physical constraints, but with solidified strategies for failure analysis, your search for the causes won’t take as long as you might think.
Determining what is going to benefit your organization most is the first step. However, it is the first step to a complete process.
Now that you’ve learned about failure strategies, what are you going to test first? Are you looking to learn to improve your testing procedures? Are you looking to figure out whether the fault is component or manufacturing? Or are you looking to test some of the more robust failure reporting features of IBM Maximo now that you know it makes it easy for your use?
Let us know what you’re going to figure out or test for your business!