“…the messy interior of engineering practice, which after the accident investigation looks like ‘an accident waiting to happen’ is nothing more than ‘normal technology.’ Normal technology…is unruly.”
– Dr. Diane Vaughn “Challenger Launch Decision” (Chapter 6: Engineering Culture)
I just read a very interesting article that you should also read:
“Launch failures: the Predictables” by Wayne Eleazer Monday, December 14, 2015 http://www.thespacereview.com/article/2884/1
However, I think we need to examine the subject with a little more rigor.
It seems to me to be a little too glib to say that people make mistakes because they get in a hurry (launch pressure) and ignore information.
That seems awfully shallow in a complex real world. And misleading – because you can begin to think that you are smarter than those folks who got in a hurry and made that mistake. I would never make such a mistake.
That was certainly the mode that many of us at NASA were in after the Challenger accident. After all, an incredibly stupid middle manager let launch pressure force him into making a dumb decision that cost the lives of seven astronauts. Right?
Well, er, no. You need to read Dr. Vaughn’s book. It’s a little more complex than that.
If you settle for a simple, easy cause for an accident, you might just miss an important lesson that – if learned early enough – just might help you keep from making the same mistake later in life.
“Most accidents originate in actions committed by reasonable, rational individuals who were acting to achieve an assigned task in what they perceived to be a responsible and professional manner.
— Peter Harle, Director of Accident Prevention, Transportation Safety Board of Canada and former RCAF pilot, ‘Investigation of human factors: The link to accident prevention.’ In Johnston, N., McDonald, N., & Fuller, R. (Eds.), Aviation Psychology in Practice, 1994
After Challenger, almost all of us at NASA failed to learn the complex, rich, and ultimately effective lessons there because we accepted the glib, easy answer that all we had to do was avoid ‘launch fever’ and keep those middle managers from doing the same.
A necessary step for flight safety but hardly a sufficient method to avoid future mistakes.
Doing an accident investigation, I have been told, involves asking “why?” seven times. Such and such happened. Why? Because somebody did so and so. Why? Because they were trained wrong. Why? Because nobody thought that could happen. Why?
You get the picture. If you want to find a root cause for an accident you have to go deep. If you want to keep from having an accident, it follows, you must go equally deep.
The problem with most engineering projects – particularly complex, highly coupled, high performance, extreme environment engineering projects –is that there are too many issues to deal with. A great leader will organize the team to look at all the possible problems, issues and triage them into what needs the most attention. There is never enough resource (time, people, money) to get to the depths on all the issues that are out there. By their very nature, complex problems require priority setting and resource allocation.
“Absolute certainty can never be attained for many reasons, one of them being that even without limits on time and other resources, engineers can never be sure they have foreseen all possible contingencies, asked and answered every question, played out every scenario.” – Dr. Diane Vaughn
In the months leading up to the Columbia accident, the Space Shuttle program staff – engineers, safety specialists, managers, operators – worked through more than two dozen potentially fatal issues; some arising from inflight anomalies, some arising from new technical analysis or ground test. All major issues. Worked hard. Delayed flights to solve them. Put in place new equipment, new procedures, and new safety checks to ensure that the probability of success was maximized.
Nothing was ignored. Nothing. Nothing. Not even the foam.
But issues were evaluated, ranked, and resources applied as it was felt appropriate.
Nothing was ignored. But some things were mis-evaluated.
“Judgments are always made under conditions of imperfect knowledge” – Dr. Diane Vaughn
In High Reliability Organization Theory, one of the postulates is “a reluctance to simplify interpretations”. This makes sense. The deeper an understanding of a subject, the more likely it is that a proper judgement can be made.
One of the leading causes of the Columbia accident is clearly the simplification of one issue that led to a mis-categorization.
In a larger sense, having a simplistic understanding of how accidents occur in complex engineering systems will prevent learning and lead to a continuation of the accidents.
A really stupid organization is one that ignores critical issues. Those organizations are not in business very long. A smarter, but still accident prone organization, addresses critical issues but improperly. A truly smart organization addresses all issues with the best possible judgement applied. A successful organization is very smart and always worried that something has been missed – or improperly evaluated. “Preoccupied with Failure” is the term.
Or you can just remember to think “I’m not as smart as I think I am.” Properly applied, that can work too.