“The Space Shuttle is an experimental vehicle with an operational mission” – NASA Deputy Associate Administrator Michael Kostelnik, 2004
The Space Shuttle system was under development for 13 years and then actually flew in space for over 30 years. Its main engines were the most tested in history with over one million seconds of firing time in ground test stands. Equipment testing and inspections continued unabated from the early development days right up until the final flight, and even sometimes beyond. Tons of documents contained the data from millions of qualification and certification tests for every system, each subsystem component, every piece part, every bolt, every nut, every electronic chip that ever flew as part of the shuttle. Computer simulations of every conceivable operation of the shuttle system consumed uncounted terabytes of memory. Computational Fluid Dynamics analysis were run with the highest fidelity, finite element structural models were tested with every possible loading conditions, Monte Carlo simulations for the navigation and flight control operations combined every possible variation in systems performance, environmental variation, trajectory offset. If there was ever a mature space system, the space shuttle was it.
So, after the turn of the millennium, when individuals or groups would suggest that perhaps we did not know enough about how the shuttle worked, you can understand why senior management brushed concerns aside with the rejoinder: ‘after all, the shuttle is a mature system, we know how it operates.’
Seen from another angle the view might be different. All those parts were tested to the environment (loads, pressures, temperatures, etc) that we thought they might see in actual use. Mostly we got that right; sometimes we found out that we didn’t really know. And the real, total system, every time it flew, it flew right down the center of the envelope. Because of cost, risk, expense, focus on mission objectives, whatever; we avoided those tests that might really stress the system. Mostly those tests were flown in simulations, with computers, based on assumptions, and models; not the real world.
And sometimes the results of real flights, right down the middle of the envelope, had puzzling results. Because they didn’t fit with the cultural view that the shuttle was a well understood, mature system, sometimes not enough attention was paid to those pesky anomalies. So we just made a convoluted definition of what was an anomaly and ruled out some events as if they had never happened.
Lesson point: don’t let the great mass of information that says that everything is going well blind you to the uncomfortable evidence that something isn’t right. Really high reliability organizations are obsessed with the possibility of failure.
Space flight is not like commercial aviation, but consider this sticky story. Commercial airplanes, being certified by the FAA to gain an airworthiness certificate, fly not just 4 times, or even just 135 times, but thousands of times before being declared “operational.” Not that the FAA doesn’t appreciate analysis and simulation, but those pesky regulators require that the airplane really fly: the whole airplane, in the real environment, flying every test, every time. And not only that, the FAA requires testing what the engineers call ‘the corners of the envelope.’ Conditions where things might not go well; like shutting down an engine during takeoff. Conditions like landing at maximum weight and maximum speed on the shortest allowable runway and see if the brakes can really stop the plane. All of these tests and many more are the standard stuff of aircraft certification; virtually unheard of in space systems testing; not with the real flight vehicle, not in the real flight environment.
But those of us in the space shuttle program, though we knew about it, rarely talked about our immaturity. So the conventional wisdom grew that the Space Shuttle was a “mature” system with few surprises left to learn. The new hires and the marginally informed were easily won over. The old guys, who had lived through the Apollo 1 fire and Apollo 13, they were not fooled. And those of us in the middle remembered Challenger and were vaguely, inarticulately uncomfortable. We remembered Challenger, but with the wrong lens: a bad manager made one bad decision, that was what caused Challenger. But it really wasn’t that simple. It was a whole group of people who were blind because their culture taught them to be blind.
So start with a culture where the conventional wisdom confirms that the operations are mature, well understood, with few surprises left to encounter. Mix in a little schedule pressure. Compress it all with a huge amount of financial squeeze. What do you get?
Who did the Monte Carlo run on that simulation? And where are the results documented?
A lot of people think NASA is risk averse; or even that our country is risk averse. I think the opposite is true; we are willing to take great risks. It’s just that sometimes we are not very smart about taking risks.
The number one lesson in taking risks? Don’t fool yourself.