Messy Accidents

Messy Failure

“…the messy interior of engineering practice, which after the accident investigation looks like ‘an accident waiting to happen’ is nothing more than ‘normal technology.’ Normal technology…is unruly.”
– Dr. Diane Vaughn “Challenger Launch Decision” (Chapter 6: Engineering Culture)

I just read a very interesting article that you should also read:
“Launch failures: the Predictables” by Wayne Eleazer Monday, December 14, 2015 http://www.thespacereview.com/article/2884/1

 
However, I think we need to examine the subject with a little more rigor.
It seems to me to be a little too glib to say that people make mistakes because they get in a hurry (launch pressure) and ignore information.

 

That seems awfully shallow in a complex real world. And misleading – because you can begin to think that you are smarter than those folks who got in a hurry and made that mistake. I would never make such a mistake.

 
That was certainly the mode that many of us at NASA were in after the Challenger accident. After all, an incredibly stupid middle manager let launch pressure force him into making a dumb decision that cost the lives of seven astronauts. Right?

 
Well, er, no. You need to read Dr. Vaughn’s book. It’s a little more complex than that.

 
If you settle for a simple, easy cause for an accident, you might just miss an important lesson that – if learned early enough – just might help you keep from making the same mistake later in life.

 
“Most accidents originate in actions committed by reasonable, rational individuals who were acting to achieve an assigned task in what they perceived to be a responsible and professional manner.
— Peter Harle, Director of Accident Prevention, Transportation Safety Board of Canada and former RCAF pilot, ‘Investigation of human factors: The link to accident prevention.’ In Johnston, N., McDonald, N., & Fuller, R. (Eds.), Aviation Psychology in Practice, 1994

 

 

After Challenger, almost all of us at NASA failed to learn the complex, rich, and ultimately effective lessons there because we accepted the glib, easy answer that all we had to do was avoid ‘launch fever’ and keep those middle managers from doing the same.

 

A necessary step for flight safety but hardly a sufficient method to avoid future mistakes.

 
Doing an accident investigation, I have been told, involves asking “why?” seven times. Such and such happened. Why? Because somebody did so and so. Why? Because they were trained wrong. Why? Because nobody thought that could happen. Why?

 
You get the picture. If you want to find a root cause for an accident you have to go deep. If you want to keep from having an accident, it follows, you must go equally deep.

 
The problem with most engineering projects – particularly complex, highly coupled, high performance, extreme environment engineering projects –is that there are too many issues to deal with. A great leader will organize the team to look at all the possible problems, issues and triage them into what needs the most attention. There is never enough resource (time, people, money) to get to the depths on all the issues that are out there. By their very nature, complex problems require priority setting and resource allocation.

 
“Absolute certainty can never be attained for many reasons, one of them being that even without limits on time and other resources, engineers can never be sure they have foreseen all possible contingencies, asked and answered every question, played out every scenario.” – Dr. Diane Vaughn
 

In the months leading up to the Columbia accident, the Space Shuttle program staff – engineers, safety specialists, managers, operators – worked through more than two dozen potentially fatal issues; some arising from inflight anomalies, some arising from new technical analysis or ground test. All major issues. Worked hard. Delayed flights to solve them. Put in place new equipment, new procedures, and new safety checks to ensure that the probability of success was maximized.

 
Nothing was ignored. Nothing. Nothing. Not even the foam.

 
But issues were evaluated, ranked, and resources applied as it was felt appropriate.

 
Nothing was ignored. But some things were mis-evaluated.

 
“Judgments are always made under conditions of imperfect knowledge” – Dr. Diane Vaughn

 

 

In High Reliability Organization Theory, one of the postulates is “a reluctance to simplify interpretations”. This makes sense. The deeper an understanding of a subject, the more likely it is that a proper judgement can be made.

 
One of the leading causes of the Columbia accident is clearly the simplification of one issue that led to a mis-categorization.

 
In a larger sense, having a simplistic understanding of how accidents occur in complex engineering systems will prevent learning and lead to a continuation of the accidents.

 
A really stupid organization is one that ignores critical issues. Those organizations are not in business very long. A smarter, but still accident prone organization, addresses critical issues but improperly. A truly smart organization addresses all issues with the best possible judgement applied. A successful organization is very smart and always worried that something has been missed – or improperly evaluated. “Preoccupied with Failure” is the term.

 

Or you can just remember to think “I’m not as smart as I think I am.”  Properly applied, that can work too.

About waynehale

Wayne Hale is retired from NASA after 32 years. In his career he was the Space Shuttle Program Manager or Deputy for 5 years, a Space Shuttle Flight Director for 40 missions, and has retired from consulting and is currently a full time grandpa. He might be available for speaking engagements for the right incentives (coffee and donuts work!)
This entry was posted in Uncategorized. Bookmark the permalink.

9 Responses to Messy Accidents

  1. jedswift says:

    Hindsight is a strong influence on evaluation. We tend to move past failure modes to the head of the list even if that failure was statistically remote. Maintaining rigor is difficult in the heat of battle.
    Great article, I enjoyed Mr. Eleazer’s discussion too.

  2. “pride goeth before the fall’? Your analyses are invaluable, Wayne, please keep them coming.

  3. Vince says:

    Reductionism: bulwark of ignorance, domain of inability to imagine, acme of superficiality.

    Thank you for this conversation Wayne.

  4. Dave H. says:

    Wayne,

    With the perfect vision of hindsight, please permit my curmudgeonly analysis of your statements…

    “However, I think we need to examine the subject with a little more rigor.
    It seems to me to be a little too glib to say that people make mistakes because they get in a hurry (launch pressure) and ignore information.
    That seems awfully shallow in a complex real world. And misleading – because you can begin to think that you are smarter than those folks who got in a hurry and made that mistake. ‘I would never make such a mistake.’ ”

    Agreed. Assigning a simplistic cause as the root cause is lazy and precludes finding what the actual contributing factors really were.

    “That was certainly the mode that many of us at NASA were in after the Challenger accident. After all, an incredibly stupid middle manager let launch pressure force him into making a dumb decision that cost the lives of seven astronauts. Right?”

    For many years I added NASA’s desire to be able to have President Reagan’s State of the Union address contain “Tonight, America has a teacher in space.” as a contributing factor, but in the end, it was ignoring the laws of physics as they applied to the O-ring booster seals. Not much to say there.

    “Nothing was ignored. Nothing. Nothing. Not even the foam.
    But issues were evaluated, ranked, and resources applied as it was felt appropriate.
    Nothing was ignored. But some things were mis-evaluated.”

    Once again, mis-directed anger on my part. If something had never been an issue in 22 years of operation why should it be a problem now? In the “In God we trust, all others bring data” environment the data to justify investigation of the results of the foam strike simply didn’t exist. Columbia and her crew weren’t doomed by a lack of pre-flight due diligence, they were doomed by in-flight complacency.

    Complacency can and will kill you. You need a system that values maximum input from anyone and everyone involved, inside the system and outside. Sometimes you don’t see the forest for the trees and someone with a different vantage point can see what you can’t.

    It’s not about being “smart”…it’s about not being afraid to be labeled a “chronic bitcher” for standing on safety. When safety is involved there is no substitute for maximum input.

  5. Charley S says:

    Something has haunted me for years. The same engineer asked the same question during both the Challenger and the Columbia investigations. In the same precise way each time. Basically his question was why, when the Thermal Protection System specification was set at 0% launch debris damage, a hundred hits or more became acceptable. The data seemed to say that TPS damage wasn’t a risk but a maintenance issue. Columbia’s first flight seemed to set that path when it returned safely with damaged tiles. It’s so clear now after reading this.

  6. Cem PAYZIN says:

    Hi,

    I know you are busy on your new assignment. However I really like to get your opinion on SpaceX landing on land and reusability of the boosters. You have spent so much time on Shuttle and its engines so I wonder how real reusability promises are ? Also please keep posting, we need more people to talk about the Space Transport Systems…

    Thanks.

  7. afganblues says:

    Wayne,
    I enjoy reading your articles from time to time. This one hits me right in the gut. My greatest fear in life at this moment is that we are not doing our due diligence. It keeps me awake at night. I have not slept well for several months. It seems to me that we are not keeping our eye on the ball. I fear what you call “Preoccupied with Failure” has been replaced by “Preoccupied with Cost Cutting”. Safety has somehow taken a backseat to the delivery of a hollow paper product which I fear will lead to tragic results.

Leave a comment