Pilot Error is Never Root Cause

Most accidents originate in actions committed by reasonable, rational individuals who were acting to achieve an assigned task in what they perceived to be a responsible and professional manner.
— Peter Harle, Director of Accident Prevention,Transportation Safety Board of Canada and former RCAF pilot, ‘Investigation of human factors: The link to accident prevention.’ In Johnston, N., McDonald, N., & Fuller, R. (Eds.), Aviation Psychology in Practice, 1994

Recent news stories have made me think about STS-28 landing. That flight is special to me because it was my first shuttle flight to sit in the big chair in the center of mission control. I was the Flight Director on the planning shift. New flight directors always start on the night shift when the crew is asleep. Get in less trouble that way. But your first time is always special and I won’t forget that flight.

STS-28 was a ‘classified’ flight that carried a national security payload. Someday, perhaps a long time from now, they will declassify it and let me know what exactly it was we were carrying. But for now, all I know is that they told me it was ‘important’. Important enough, in that post-Challenger era, to put a flight crew at risk. Because every shuttle flight is risky.

Brewster Shaw was the commander of STS-28, his first time in that role. Brewster is a remarkable pilot, one of the best, and went on to demonstrate significant skills as Program Manager for the Space Shuttle and later a leader in the Boeing Space and Defense organization. Not all astronauts make good managers but Brewster certainly did. But in those days Brewster was best known for his piloting ability.

Immediately prior to the flight of STS-28 a problem was uncovered with the way the flight software worked in connection with the small sensors on the landing gear. These so called ‘squat switches’ made contact as the landing gear was compressed and the software moded from flying to rolling on the wheels control. I’ve forgotten the particulars but there was a failure mode that if the switches made contact in a certain way that the computers would put the flight control system into the wrong mode – steering with the nose wheels when steering should be controlled by the rudder and elevens or something like that. Could lead to catastrophic loss of control.

It was too late to modify the software, and the switches were inaccessible with the shuttle attached to the external tank. A manual workaround by the pilot was required to ensure safety. So the Commander and Pilot got briefed – multiple times – in the last few days before flight about the need to land very softly – with a low ‘sink rate’ at touchdown – so the switches and software would work properly.

On the last night of the flight, I supervised the team as we prepared the entry messages for the crew. One of those was a reminder to land ‘softly’. The Entry flight control team came on and I went home hoping for a good landing. One of the first calls that the Capcom made – the crew was waking up as I was leaving the MCC – was a reminder to land softly.

So we set Brewster up.

Nominal deorbit burn, nominal entry, TAEM and HAC acquisition all normal, Commander took over flying manually as planned just as the orbiter decelerated to subsonic speeds. A perfect final glideslope. And now for the moment of truth, would the landing be soft enough to prevent the software glitch?

Normally an orbiter lands with a heavyweight payload in the bay at 205 kts – that is really fast for airplanes, but those stubby delta wings on the shuttle don’t create a lot of lift. With the payload bay empty – as it was for STS-28 – the lightweight landing speed is targeted at 195 kts. Under special circumstances, the pilots were allowed to land as slow as 185 kts. Brewster kept working and working to get the landing sink rate low and the speed kept dropping and dropping. At some point, as any fixed wing aircraft slows down, the wings will ‘stall’ and the aircraft will drop like a rock. Also, as the speed goes down, the pilot has to adjust the nose higher and higher – increasing the ‘angle of attack’ – to maintain lift. At some point with a high angle of attack at low altitude, the tail will scrape on the runway – always considered to be a catastrophic event for the shuttle.

The shuttle touched down at 154 kts. It is still the record for the slowest shuttle touchdown speed by a wide margin. It was less than 5 kts above stall speed. The tail avoided scraping by inches.

Oh, and by the way, the squat switches and software worked perfectly. No issues.

The post flight debriefings were all very positive and constructive – except for the entry and landing analysis. You can look back in my posts for the one called ‘Hockstein’s Law’ for a flavor.

I’ve never seen Brewster so embarrassed. In trying to avoid one hazard he nearly created another. In colorful pilot language (which I won’t repeat) he told us all that ‘on any given day the pilot can foul things up’. And it’s true. But I never blamed Brewster. We had set him up.

By concentrating on one issue to the exclusion of all others, and not reminding him of the training – probably years earlier – about very slow landing hazards – we, the flight control team, the program office, the NASA management – we set him up.

When doing an accident (or close call) investigation, I’ve been told to ask ‘why’ seven times before getting to root cause. The root cause, for example, can never be “the bolt broke”; a good accident investigator would ask “why did the bolt break”. Otherwise, the corrective action would not prevent the next problem. Simply putting another bolt in might lead to the same failure again. Finding out the bolt was not strong enough for the application and putting in a stronger bolt, that is the better solution – and so on.

The Russians had a spectacular failure of a Proton rocket a while back – check out the video on YouTube of a huge rocket lifting off and immediately flipping upside down to rush straight into the ground. The ‘root cause’ was announced that some poor technician had installed the guidance gyro upside down. Reportedly the tech was fired. I wonder if they still send people to the gulag over things like that. But that is not the root cause: better ask why did the tech install the gyro upside down? Were the blueprints wrong? Did the gyro box come from the manufacturer with the ‘this side up’ decal in the wrong spot? Then ask – why were the prints wrong, or why was the decal in the wrong place. If you want to fix the problem you have to dig deeper. And a real root cause is always a human, procedural, cultural, issue. Never ever hardware.

So it is with pilot error. Pilot error is never ever a root cause. Better to ask: was the training wrong? Were the controls wrong? Did the pilot get briefed on some other problem that cause distraction and made him/her fly the plane badly?

Corrective actions must go to root causes, not intermediate causes. Really fixing the problem requires more work than simply blaming the pilot.

Posted in Uncategorized | 24 Comments

Peeking Behind the Curtain

When I worked for the government, I never really understood what industry was doing; it was all behind a curtain. They gave only glimpses of what they wanted the government to see. Those of us in the civil services always had theories about what was going on in the corporate boardrooms or in the private research labs. But we really didn’t know and it was always the cause of puzzlement.
Nowadays, I’m retired from the government and work as a consultant, mostly to private industry. I find that the industry guys don’t have a lot of insight into how the government works, surprising to me. I always thought we had been fully open and transparent. Now I know better; the government and its decision making processes are pretty impenetrable from the outside. In fact, a lot of the leaders and workers out in the aerospace industry have established theories about how the government works internally, about what the government leadership wants, etc. I find most of these theories incredibly funny, terribly inaccurate, and I am astounded that otherwise knowledgeable people have some very odd ideas about what goes on behind the walls of government offices.
Thus my consulting work is very busy. Now that I’ve had a foot in both camps, I find I do a lot of theory correcting. Interpretation of what is motivating this party or that. Understanding what they want so needs can be efficiently met. Really keeps me busy.
Oh yeah, there is that technical work, too. Lots of that.
One of the reasons that I don’t get around to updating this blog as often as I used to is that my clients keep me busy. And my old government colleagues are always asking for my time, too.
A lot of what I do – make that almost all of what I do – is covered by ‘Non-Disclosure Agreements’. In other words, I can’t tell anybody about what anybody else is doing. There are a lot of times when I have to bite my tongue, but that is the nature of the job. Reminds me of the old days when I worked on ‘classified’ shuttle flights. In the name of national security I had to keep a lot of things from a lot of people. Kept me busy trying to remember who I could say what to and who I couldn’t. Interesting mental exercise to partition your memory and thoughts like that. Good training for my current work.
So, while I’d like to blog about what my clients are doing, well, you will just have to wait for them to tell you themselves; I’m not authorized
But what I can tell you is that it’s amazing. There are so many organizations working on so many aspects of space flight: new vehicles, new engines, new capabilities. Whew. I don’t know if they are all going to make it but I’m sure at least some of them will.
There is a renaissance coming in space travel. Some of it is from the government, yes, but a lot of it is not. Some of it is coming from garage shop inventors and some of it is coming from the biggest industrial corporations, and a lot of it is coming from folks in between.
Much of the really interesting advances won’t be the big jobs programs that the politicians like. If you are a politician and want to help the space program – you can send money, but better to open doors to private industry, remove barriers, reduce red tape.
Now that made me sound like I read the Wall Street Journal too much. Lest you think I’ve gone over to the ‘anything goes’ camp, I will quickly say that there is a very real place for the government to make sure that adequate safety precautions are followed. Not exactly like what is done for airliners, but something more fitted to this new, higher risk, higher energy field.
Anyway, I’ve got to say it’s been a great ride: all those years working on the forefront of the big government space programs, and now helping all the industry geniuses break through to the future.
Just stand by.
You will be amazed.

Posted in Uncategorized | 13 Comments

February first, again

No matter what is going on with the world, no matter what is happening in my life,  when the calendar turns to February 1 I have to stop, remember, and rededicate.

This year we have another gold star on the wall honoring those who have given the ultimate sacrifice in the conquest of space.  Michael Alsbury died last October trying to reach for the edge of space.  I trust that the NTSB will have several lessons for all of us to pay attention to when their report comes out.


It is a day to honor those brave souls.  I would particularly point out the last phrase on the Apollo 1 plaque at LC 34:  “Remember then not for how they died, but for those ideals for which they lived.”

It is not adequate to get emotional, and think about our losses; what is required is that we actually do something – make spaceflight safer, more reliable, and more common.  It has been too long to get maudlin.  It is time to get busy.

Over the next several blog posts I intend to visit the work we had to do to return the space shuttle to flight after Columbia.  It was much more than just technical.  Oh yes, much more than technical.

In the meantime, think on these words from the American author Jack London:

“I would rather be ashes than dust; I would rather that my spark should burn out in a brilliant blaze than it should be stifled by dry-rot; I would rather be a superb meteor, every atom of me in magnificent glow, than in a sleepy and permanent planet; the proper function of man is to live, not to exist.”

Posted in Uncategorized | 9 Comments

Space Thanksgiving

Today is the day in America that we set aside to count our blessings and give thanks for all the good that is in our lives.  So I will set aside my curmudgeonly ways and ignore the future and all the imperfections to concentrate on what is good and right.

Like most of you, but mindful that not everyone had these, I give thanks for faith, family, friends, food, freedom, and finances.  I wish that everyone could be as fortunate as I have been.  Good health, too, is a blessing that not everyone can share.

But today, as is the theme of this location, I would like to think about space.

I am thankful for the opportunity to live in such a time; with all its challenges and imperfections this is a unique time in history.  My generation, born before Sputnik, living yet, is the only generation that has gone from wondering to knowing.  Before, we all thought there were canals on Mars and it was full of intelligent life; now we know that it is a marvelous place but not like that.  Before, we thought Venus was a swampy, wet, and warm world; not we know it is not like that.  Before, we had no idea what the far side of the moon even looked like, now we know.  The mysteries of Saturn’s rings, the icy moons of Jupiter, and much much more, we could only guess about; now we know, at least in part.  This is the great age of solar system exploration.  We are fortunate to live during the excitement of these days. And there is more to come.

I am thankful to personally know and interact with the giants of this age of space exploration.  To work alongside many of the heroes of Apollo, to talk with Nobel laureates in cosmology and astronomy; to meet with the leaders of the organizations which carried out all these works, is truly amazing.  Better than meeting the Hollywood stars — and I’ve been fortunate to meet many of those, too.

I’m thankful that I have been able to contribute at least in a small way to the advancement of humanity into the cosmos.  To sit in the big chair in Mission Control, with all the power and responsibility, is an opportunity that few will have.  To be responsible for an organization which regularly launched humans into space to accomplish great tasks – repairing Hubble, building the ISS, and so much more; that has been a blessing, too.

To know those who put their lives on the line to fly on the fiery rockets and plunge into the unknown, that has been an awesome blessing and lesson in courage.

To have good work to do even in these days, helping new organizations build the next generation of rockets and spacecraft – safer and more efficient than before – that is a blessing.  To contribute to the nation’s policy discussions and shape a more hopeful future, that is a blessing.

Today I pray for blessings on all our spacefarers at work on the ISS, bring them home safely after long and productive work high above us.  I pray for blessings for those building new spacecraft and rockets, help them be diligent and creative so that we may all be successful.  I pray for the leaders of our nation and other nations who make space exploration a priority; grant them wisdom and vision to use resources wisely to make our lives better here on earth and in the future in the cosmos.

Finally, I am thankful for the dreamers who inspire us.  Ideas can be outlandish or practical but they challenge all of us to do more to bring the future into focus.

So I really have quite a lot to be thankful for today.  I hope you d,o too.




Posted in Uncategorized | 7 Comments

Unfamiliar Terms

During the summer of 2009, we were working on the history book about the Space Shuttle, ‘Wings in Orbit’. We had hired three summer interns, college students, to help with the book. Their primary assignments were to build the appendices, check references, make tables, and the like. Unusual for NASA interns, these three were not technical majors, but history, English, and social science majors. They did a great job for us.

One of the ‘other duties as assigned’ that we gave the interns was to read a chapter a week and report on it to the editorial board. All the writing was done by the engineers and scientist who worked on the shuttle program, and, sadly, engineers are not always known for excellence in written communications. We asked the interns to provide a critical look at the readability of each chapter and to be especially on the lookout for unfamiliar terms or acronyms that would make the text hard to follow for the general public.

One memorable week, one of the interns drew the assignment of reading the draft chapter on the historical setting of the space shuttle. I thought this might be unnecessary since Dennis Webb is a great writer and there was almost nothing technical in the chapter. At the end of the week, we convened the editorial board for a number of topics and had the intern’s reports at the end of the agenda. First up was a review of the chapter on the APUs and hydraulics. As you might expect, the intern pointed out several instances of very technical jargon and a number of undefined acronyms, all of which would have to be cleared up by rewrite. Then we covered a chapter on another technical subject with similar results. Finally it was time for the report on the history chapter. The young lady, probably a sophomore level college student, said that the chapter was very well written, easy to understand, and she had no recommendations for rewrite except for one term that she was unfamiliar with. ‘What was that?’ we inquired. She replied that the term she did not understand was:

‘Cold War’


I was thunderstruck. For someone of my generation, the idea that anyone would not know about the cold war is unthinkable. A short discussion ensued to make sure we had communicated correctly, but the bottom line was that she really hadn’t heard the term before and was unfamiliar with the concept.

More recently I had a conversation with a friend whose children are early high schoolers. He and his son were home alone one evening and decided to watch a movie together. On the schedule that evening was “The Hunt for Red October” based on the book by Tom Clancy. My friend reports that his teenage son didn’t quite get it. He kept asking why there was a big deal; after all the Russians are our friends, right?

Both of these young people were born after the fall of the Berlin Wall, after the dissolution of the Soviet Union. Those were ‘current events’ shortly before they were born. Given the lag in grade school history texts, those events were too recent to be covered.

When I was their age, nuclear annihilation stared us in the face. I can remember, probably when I was in 4th or 5th grade, when my parents came home from a Civil Defense meeting with plans of how to build a fallout shelter (we didn’t build one). In middle school the civics teacher showed us the AEC (Atomic Energy Commission – forerunner to the Department of Energy) films with the bomb tests in Nevada. You know the ones where they set up houses, mannequins with clothes, cars, household goods, etc., to see how a nearby bomb blast would affect those things. The message from those documentaries was that it was unsurvivable. I got to practice the ‘duck and cover’ method of hiding under our classroom desks if there was a bright flash in the sky. Again, the message came down that we were on the edge of annihilation and not likely to survive.

In high school history we studied the Cuban Missile Crisis and how close we came to the end in October of 1961. There were B-52 bombers on armed standby at air force bases near my home. I can remember as a high school student, plotting the likely fallout path from ‘targets’ near my home based on prevailing winds so that we would know which way to travel to survive following a nuclear exchange. That wouldn’t have helped either.


How that affected the psychology of two generations has been studied by sociologists.  It motivated people in so very many unusual ways.
And today’s kids don’t know what the term “cold war” means. I think that is a good thing. Not that they don’t need to understand history, but that those days are behind us. Hopefully for good, notwithstanding some burbles in the geopolitics these days.
There are a lot fewer nuclear weapons in the world these days – still too many – but the finger on the trigger seems to be a lot more relaxed. For our grandchildren’s sake I hope it stays that way.


We worry about a lot of things these days; there are serious problems all around us. But I think the level of worry is at a much lower intensity than it was 30 or 40 years ago.



And that is a good thing.

Posted in Uncategorized | Tagged , , , | 21 Comments

Significant Conversations

Just about exactly 10 years ago I was serving as the  Deputy Program Manager for the Space Shuttle Program.  Columbia had been lost a year and a half earlier and we were all trying our best to return the Space Shuttle to flight.  I got assigned to present the program status at a conference on Probabilistic Risk Assessment which occurred on the same day that NASA had chosen for our annual Safety day – when everybody was supposed to take a day off from their normal work to engage in classes, discussions, etc., about safety and how to improve it.  I wrote the following email to the folks working in the Shuttle Program .

With the events of the last few days, it seems appropriate to reprint it now.

Near the end, I refer to K.C.  In case you don’t remember, Kalpana Chawla was one of the crew members on Columbia’s last flight.

This is a little long, but you should read it to the end.


This week, while most of you will be taking a day to study and think of ways to be safer, I will be in Cleveland attending the NASA Risk Management Conference. It is my intention to understand this process better and apply it in program decision making as my personal contribution to improved safety in the Space Shuttle Program.

Many of us are struggling with the concept of accepting risk – how much is too much, how much is an inherent part of what we do – and I would ask each of you gives this topic some consideration during your Safety Day activities. The following are some of my thoughts on this subject – not direction nor even an exhaustive process on how to come to the best answer – but just some thoughts based on my experience in 26 years of shuttle experience. I can only write to you about what I know, and you must know by now that when I write to you it is from the heart.

Much of my experience comes from the 15 years I spend as a Flight Director. The Entry Flight Director is assigned the deorbit decision. After the payload bay doors are closed, the burn targets loaded in the computer, all the systems checked, and the last weather report is in, the clock counts down to ignition and the crew waits to hear the Flight Director’s decision: Go or No-Go. The Flight Control Room is always silent. Usually the orbiter is in flawless condition, rarely there is a minor systems problem, nothing significant. But there is always the weather forecast. The weather forecast – for a precise place at a precise time just about 2 hours in the future. The orbiter flies like a brick with handling qualities that would make a Mack truck proud. The commander has one shot at landing, there are no do-overs. It’s the Flight Director’s duty to make sure that one shot is a good shot.

Funny thing, it is never a black or white decision; it is always gray. There are always concerns, the chance for an adverse change, indicators of what might go wrong. When you look at weather under a microscope, it is never perfect. In fact, the harder you look the more little counter-indicators you see. The real question is not whether the weather will be perfect, but will it be good enough. The indicators may be gray but the decision is black and white. Binary. Go or No-Go. And if the decision is No-Go, everybody knows the shuttle can’t go around forever. We will be back to make the same decision tomorrow. So the managers in the viewing room watch and wait and second guess. The flight controllers strain to listen for an indication of what the answer will be. The convoy team fidgets. One person must make a decision; give the word for the record, live with the consequences.

I have given the Go 28 times. Every time was the toughest thing I have ever done. And I have never ever been 100% certain, it has always been gray, never a sure thing. But the team needs to have confidence that the decision was good. It is almost a requirement to speak the words much bolder than you feel, like it is an easy call. Then you pray that you were right.

We have done everything practical to mitigate the risk. The meteorologists train every day making real forecasts at the landing sites and checking two hours later to see if they were right. Their statistics are impressive: if they forecast a Go, it turns out to be Go about 97% of the time. But there is that 3% to consider. The weather rules are reasonably conservative, based on conditions that the astronauts have been able to handle in the simulators, in the shuttle training aircraft. But we cannot simulate the transition from zero gravity to one-g; and the stress of doing it for real with no chance to do it again, and with the whole world watching; all that is tough to get past. Even in the aerodynamics, proving that the machine is controllable, and in structural loads, how well the vehicle will hold together, there are limits to the uncertainties that are acceptable. If we have analyzed wrong, or the aerodynamics exceeds what we expect, . . . well.

When our predecessors invented the shuttle, based on their aircraft test experience and previous space programs, they set up a standard that everything should work properly in the face of 3 sigma environmental deviations and 3 sigma systems dispersions. Any basic statistics course will tell you that mean + 3 sigma covers 99.7% of the cases. But there are 3 chances in 1000 that aren’t covered. Why not? Because to try to cover everything – worst on worst on worst – would require a vehicle design that probably is too heavy to get off the ground, and would require a set of proof testing that would take a lifetime to accomplish, and would cost, well, way more than we can afford. So inherently there is risk in using this system. And don’t forget, that is the risk that we understand, that we have designed against, that we have good numbers for. What we don’t recognize or cannot quantify is out there as an “unknown unknown”.

There is nobody in the Shuttle Program Management, or the Agency management that has any delusion that we can reach perfection. Our collective job is to understand the risk, mitigate it as much as possible, communicate accurately all round about the risk remaining, and then decide if we can go on with that risk.

Right now we are working through many issues that are not black and white. There are many options, many shades of gray. There is always a debate about whether we have done enough, whether we have done too much, whether it is good enough. It has always been part of the engineer’s job to determine when enough has been done; not to overdo or to make it so conservative that it takes forever or is impractically uneconomical, or too heavy to get off the ground. Knowing when we have done enough is the art of engineering.

I have heard a well respected, retired senior NASA official speak on several occasions lately. One of his themes is that we are in an inherently risky business, but accepting risk does not mean not testing or not doing analysis. That is not risk acceptance, that is gambling. The real art is knowing when the testing is adequate and it is time to decide and move on. The newly reinvigorated engineering culture at NASA is to return to our roots and make decisions based on knowing — not perfectly knowing, but adequately knowing — what we face.

Every day people make little Go/No-Go decisions. The chief of the wind tunnel team signs off the latest test data to be complete and accurate. He has to decide if there have been enough runs to validate the data. Once he does, we use that data to make critical decisions. An engineer, knowing that there is never unlimited time nor unlimited resources, makes a decision about how many tests it takes to prove a new design. Another Go/No-Go decision is made.

When the OPF technician stamps the work document that the bolt has been tightened to its torque specification, a Go has been given. When the torque specification was written into the work document and the tech writer signed the document, the signature says Go. When the engineer who designed and tested the part calculates the proper torque value and signs the drawing, there is another Go. The engineer, the writer, and the tech may never meet face to face, but they have to trust each other that each one has done his job right. The Shuttle program is a big organization; there are over 20,000 of us. These days the whole agency is working to help us return to flight. It is impossible to know everyone, but we have to trust that everyone is doing their job right, making sure no mistakes have been made anywhere. Your signature or stamp will be rolled up into your manager’s signature on the Certification of Flight Readiness. Next spring, Bill Readdy, the chairman of the Flight Readiness Review, will read the poll, and all the managers will say Go. Their Go is based entirely on what decisions you have made.
There are jobs in the world where the calls don’t have much consequence. Nobody in this agency has one of those inconsequential jobs. It may not seem that a financial call on a budget line item could be a Go/No-Go decision. But frequently that becomes the critical decision that determines the difference between success and failure. It may not seem that a personnel action could be a Go/No-Go decision. But having the right person in the right place with the right training and experience is paramount. These are critical decisions in our profession, equal to the more obvious engineering decisions.

And behind every decision, everyone knows that we have neither the time nor the resources to do anything that is not absolutely critical to the safe return to flight. We have no time for fluff, no resources for nice-to-haves. Choices have to be made daily, between what must be done, what has to be done, and what can be eliminated because it is not required to be done.

The Flight Directors and CapComs always visit the crew in quarantine just after the last sim, just before the crew flies to the cape. We have trained together, sat in hours of meetings next to each other, laughed at each other, gotten angry with each other, and have undergone the great testing of simulated flight with each other. At the meeting, we cover the last minute changes and reminders, check on the wakeup calls and when to send the morning mail, and make bad jokes. At the end, we have the Ritual of The Handshake. Everybody has to shake everybody else’s hand before we leave. We look each other in the eye and say ‘Good luck’. They always say ‘We’re looking forward to a great flight’. Nobody ever talks about . . . you know.

But we all know.

There is risk about to be taken, serious risk that can have ultimate consequences. Humankind collectively does not know enough to scientifically drive the risk of space flight to zero. A hundred years would not provide enough time for all of us working together to positively eliminate any risk. Ten thousand small decisions throughout the preparation for the flight have been made, each with underlying risk calculations, and that total risk has accumulated and communicated upward. Everybody has done their best to make it perfect, but there is a limit to what can be done. That is what we know. And we also know that the risk of not going is infinitely worse; the consequences would be worse if we didn’t try than if we try and fail.

Sometimes it is exquisitely clear when you are having a Significant Conversation. After we close the quarantine door and walk to the parking lot there is never any conversation. It is always a silent walk. I’ve had that walk 40 times. Flight Directors know too much about the risks.

Senator McCain has written a book entitled “Why Courage Matters”. You may not agree with his politics, but the senator’s credentials concerning courage beyond dispute. He says that we have watered down the meaning of courage. An athlete’s prowess on the field of play is not courage, he says. Suffering an illness or injury without complaint is not courage. Being outspoken in a culture of silent acquiescence to certain wrongs is not courage. These are all evidence of virtue, the senator argues, but they are not examples of courage. The former POW defines the courage as acts that risk life and limb to uphold a virtue. And he quotes Martin Luther King, Jr.: “If a man hasn’t discovered something he will die for, he isn’t fit to live.”

Everybody knows that there are ultimate risks in space flight. Some among us believe so strongly in the benefits that they put their lives on the line. Others of us believe so strongly that we do something harder to live with: we send our colleagues into danger. Why should we do it? Because the consequences of not taking the risk are unthinkable. The choice of turning back and giving up would affect the rest of history in ways that are immeasurable.


Somebody recently said that what we are engaged in is like high stakes poker. That comment trivializes space flight to a parlor game where the only risk is money or pride or career or other cheap consideration. To push back the frontier incurs a price that sometimes must be paid in a currency more dear than mere dollars. It takes courage.

It was Christmas break and the parking lot at JSC was almost deserted. After 15 years as a Flight Director, I was days away from moving to my new job at KSC and had come in to finish up loose ends in my old Building 4 office. It was dusk as I walked out into the nearly empty parking lot. K. C. was leaving work too. She greeted me with that megawatt smile she always had. I asked if she was ready to go fly. Her response was enthusiastic: Yes, a couple of weeks to launch and her crew is trained and anxious to fly after long months of delay. Our brief conversation consisted of only happy words. We didn’t talk about risk or danger, only of the rewards to be expected from a successful flight. I wished her good luck and turned for my car and drove home. I didn’t give the conversation any thought until the first day of February.

Sometimes, you don’t know when you are having a Significant Conversation. I know now that K. C. did not understand all the infinite detail of risk that lay ahead of her, clearly none of us did. But I can say without a doubt that she felt what was to be accomplished outweighed the risk that she understood, outweighed it by a lot.

Recently a reporter asked if it would be difficult for me, as chairman of the MMT, to give Mike Leinbach, the Launch Director, the Go For Launch at T-9 minutes. I told him no, that by launch day our procedures and processes would be well polished, the decision criteria all agreed to and documented, and all the really difficult decisions would be behind us. We would just be executing from the checklist and the final Go would be a matter of making sure all the squares were checked. It would be easy.

After thinking about that for a few days, I realized that that answer is, of course, a lie: under the microscope, nothing looks perfect, and the call will be hard because . … .. you know. Life is full of gray choices. Deciding the work completed is good enough because more will not make it perfect. Ten thousand gray choices; doing what we must do, and not a bit more because that would take away from other work that is absolutely critical to be done right. When we have done what we can do, when we have driven the risk to the lowest practical level where it can be driven, then we have to accept the fact that it is time to make a decision and move on. Because history is waiting for us. But history will not wait forever, and it will judge us mercilessly if we fail to face tough choices and move ahead.

During the countdown, Steve Altemus, the launch NTD, will give many folks the challenge: ‘Say Go or No-Go’. You need to imagine Steve standing at your elbow each day asking that question to you because it all rolls up. Each of us has a part. Nobody can be sloppy or careless. Nobody can take forever trying to get it perfect, blocked by indecision or the fear of making a decision. Nobody in our business gets an easy choice. Yours will be a gray decision, too. We owe it to some courageous people to get it right. Don’t waste your time on things that don’t count. Focus on what must be done and do it right and then move on to the next problem to solve. There will be some risk that we cannot control, that we cannot solve, that we cannot eliminate. That risk, we will have to accept. If we have done our job right, it will be worth it.

At the end of the countdown, Mike Leinbach will wish Eileen and her crew ‘Good Luck.’

You will know what that conversation means. That will be significant.

Posted in Uncategorized | 23 Comments

STS-93: We don’t need any more of those

It was about a quarter past midnight on July 23, 1999 when the Ralph Roe, the Shuttle Launch Director, told Eileen’s crew that they were go for launch and wished them good luck. The launch, which had been scrubbed late in the countdown on two previous attempts, was going to be about 7 minutes late due to some transient communications system problems at the Merritt Island Launch Annex (MILA) tracking station.


In Houston, I was sitting on the back row of mission control acting as the Mission Operations Director. Sitting next to me was Chuck Knarr, former flight director and executive in the United Space Alliance. In the hot seat at the Flight Director position was John Shannon with LeRoy Cain sitting beside him to watch the weather and keep track of the checklists. Capcom was Scott “Scooter” Altman.


The old joke was: “which F-14 pilot do you want as your wingman? Cobra, Maverick, or Scooter?” The correct answer was Scooter – Cobra and Maverick were just Hollywood movie creations for ‘Top Gun’; in that movie all the real flying was done by Scooter.


Down in the trench was Flight Dynamics Officer (FDO) Lisa Shore with Carson Sparks sitting next to her as TRAJ officer. On the Electrical General Instrumentation and Lighting (EGIL) console was Tim North – about to get the ride of his life – and on the back row was Booster Officer Jon Reding, one of the coolest heads to sit at that position. A well seasoned crew to watch over the launch phase for Eileen Collins, the first woman to command a Space Shuttle mission, her pilot Jeff Ashby, and the rest of the unusually small crew of Columbia.


STS-93 carried the heaviest payload the shuttle ever launched; the Chandra X-ray observatory (formerly known as the Advanced X-ray Astronomy Facility or AXAF) and it IUS booster. The Boeing build solid rocket upper stage called the Inertial Upper Stage was extremely reliable, but a heavy load. It had an interesting history; originally called the Interim Upper Stage (same acronym), its more powerful replacement was cancelled for budgetary reasons and the IUS lived on.


Everything appeared quiet, but problem lurked just beneath the surface.
Post flight we calculated that the LOX load was less than planned; due to slight temperature variations, tank volume, placement of the ‘full’ level sensors and other factors, Columbia was going to launch this night with 897 lbs less LOX in her tank than was intended. Given everything else working normally, there should have been a performance reserve still in excess of 3,000 lbs. But everything else did not work as precisely as planned.


Columbia, the oldest shuttle, had flown 25 missions by this time and the tiles and thermal blankets showed it; scars and stains over almost all of them. And hidden, deep down inside, were the flaws and insults that any real world flying machine always carries: some minor, some major, some that never had a consequence, and some that ultimately contributed to her loss.


One flaw lived in a stretch of 22 gauge kapton insulated wiring, nearly half way down the payload bay, where a single strand of AC current carrying wire had rubbed and chaffed against a screw head which had a minor burr where the tech had over tightened it long ago, and which other techs had probably stepped on during turnaround refurbishment. In the Right SRB, a hydraulic pressure sensor was not completely connected to its wiring and even though it was showing accurate pressure now, the connection could shake loose under vibration just enough to open the circuit and give a false low pressure reading. In the center main engine the two chamber pressure measurements A&B were reading exactly the same with the engines off, but the B channel had a bias that would only show up when the engine reached full throttle; it would read 12 psi high. This was outside the allowable error but still small compared to the 6,000 psi operating pressure. On the right engine, a deactivated LOX post in the main injector had a gold coated pin that wasn’t quite as tightly seated as it should have been, whether it was installed slightly off or worked loose over a number of firings, in the end it didn’t matter. Columbia had a lot of really minor flaws that didn’t affect today’s story, but some of which would show up later with other consequences. Viewgraph PowerPoint spaceships are always flawless; real spaceships are made and maintained by fallible human beings and are less perfect.


Somebody posted the NASA TV video of the launch on YouTube – you can watch it at:

So the countdown started up at T-9 minutes as it always did, seemingly uneventful on the surface, but we all knew what was about to happen and the adrenalin built up as we waited. Ashby started the APUs right on time, pumping high pressure hydraulic fluid to all the valves in the main engines among other things. At T-31 seconds the onboard computers took over, calculating the exact launch parameters, allowing the IMU gimbals to turn freely, opening vent doors so that pressures could equalize during the climb to space. At T-10 seconds the ground computers sent the last command onboard – the electronic equivalent of ‘go for main engine start’. The onboard computers reacted immediately by firing off the ‘sparklers’ called ROFIs (never can remember what that stands for) to burn off the excess hydrogen down by the main engines. At T-6.6 seconds the onboard computer commanded the first main engine to start, and staggered the second start by 140 milliseconds, and then the third by the same delay. The SSME controllers commanded the spark igniters on, and started the complex choreography of valve openings to bring the three main engines safely to roaring life.


By about T-3 seconds, all engines were up and operating at 100% of rated power level. Exactly when it happened is not clear, but on the right engine, the gold plated pin from LOX post 32 in row 13 came shooting out. Just like a bullet it went through the narrow part of the converging nozzle and flew out into the nozzle extension.
Now two scary things could have happened. First, the LOX post, which was pinned for a reason, could have failed allowing LOX to pour into the engine cavity where the hot hydrogen was introduced. Remember from an earlier post that this could have caused an explosion, or melting of other LOX posts, or other really bad things to happen. Failure of the LOX post was considered a CRIT 1 failure – loss of vehicle and crew ‘promptly’. Fortunately – luckily – the LOX post held together for the next eight and one half minutes. How close we were to disaster has never been determined.


Second, the nozzle extension could have failed. Early on in ground testing, one of them did fail – spectacularly – generating a huge explosion and fire. Those nozzle extensions were one of the weak points of the entire complex engine system. Made of 1080 long narrow stainless steel tubes braze welded together, the nozzle extension was cooled by liquid hydrogen flowing down its length. So cold on the outside that a layer of frost condensed out of the moist Florida air in seconds over the outside of the nozzle extension, the temperature inside was hot enough to melt any metal. These tubes were a bear to maintain and the entire extension had to be replaced periodically. The tubes were prone to split which required complex weld repairs until there were just too many repairs and a new nozzle extension had to be installed. The next upgrade to the SSMEs was to build a more robust channel wall nozzle extension. The shuttle program ended before that was done. Someone had calculated that if 5 adjacent cooling tubes split or were otherwise ruptured, there would not be enough local area cooling and a burn through would occur, causing a cascading failure of the nozzle and . . . a CRIT 1 failure.


The bullet shaped LOX post pin hit the side of the right engine nozzle extension about two thirds of the way to the end with great force. Just by sheer luck, three nozzle tubes were breached. Three adjacent nozzle tubes lost cooling and started leaking hydrogen into the hot stream of gas coming out of the engine. Three tubes, not five. The adjacent cooling tubes kept the nozzle from failing during the eight and a half minutes the engines operated.


At the time, the LOX post pin and nozzle tube leak went undetected. Look closely in the video and see a blue streak on the side of the nozzle. Nobody in Mission Control noticed it until afterwards during the photo review. The nozzle was leaking 3.5 lbs of hydrogen every second.

STS-93.MissionStatusBriefing.Slide (2)

Several things happened as a result. One might think that this would lead to running out of hydrogen, which, as an earlier post noted, would have been very bad. But the engine reacts in to the loss of hydrogen in this area in a very interesting manner. This leak was downstream of the hydrogen flow meter so to all observers – human and digital – it appeared that the right amount of hydrogen was going into the engine. The real loss of hydrogen flow to the turbines driving the pumps on the right engine was lower mixture ratio – the mixture ratio now closer to stoichiometric – and the turbine temperatures increased by about 100 degrees. Normally there is about a 200 degree margin between the normal turbine temperatures and the redline value that will automatically shut an engine down. The nozzle leak immediately used up half the operating margin to the redline.


Since the chamber pressure dropped slightly due to the loss of fuel for the fire in the main combustion chamber, the SSME controller commanded more oxygen be sent to the MCC. That may sound strange but it is the way the SSME computer controls the chamber pressure. Again, the mixture ratio is off and not only is the fire hotter in the MCC, but more oxygen is being consumed. Due to both of these processes, post flight calculations showed that the LOX tank should have been short about 3,000 lbs – translating to about 200 fps short at MECO. That large velocity shortfall did not happen because we were lucky: what happened to the Center SSME corrected that. Hold that thought.


Meanwhile, about a minute after launch, the booster officer and his team recognized the fact that the right engine turbine temperatures and speeds were higher than normal. They correctly identified that this might be due to a nozzle leak, but there was another potential anomaly that also had the same signature. If the oxidizer pump started to lose ‘efficiency’ (blades rubbing, pump clogging, etc.) it would look the same. As the SSME controller commanded mixture ratio changes to keep up with the loss of efficiency on the pump, the turbines would reach their temp limit and the engine would have to throttle down to prevent a shutdown: this was called ‘thrust limiting’. Until the SSME went into thrust limiting, the Booster team could not tell the difference between an oxidizer turbine/pump efficiency loss and a nozzle leak. The instrumentation just wasn’t precise enough to know what was going on. Jon and his team correctly identified that the engine was running off nominally (‘off tags’) but could not quantify it. Later on, when the FDO asked him, the Booster officer had to report that none of the engines were ‘suspect’. All these terms were precisely defined in the flight rules and had specific actions for the flight controllers and crew to take to maximize safety. But this leak was too small for any of that.


The reason the booster officer took over a full minute to identify problems on the right engine was because of distractions: the AC 1 short – which was real, and had real consequences – and a big red light that went off on his console saying the Right SRB hydraulic system pressure had dropped to catastrophic levels – which was not real. Each of the solid rocket boosters put out over twice the thrust as the three SSMEs put together. The steering mechanism was critical and each SRB had two separate hydraulic systems for redundancy. If both hydraulic systems on one SRB failed there would be no steering on that side; probable loss of control, and another one of those ‘prompt’ CRIT 1 failures – loss of vehicle and crew. So having that red light go off right in front of his nose took a few seconds to sort out. It clearly was a transducer failure, not a real hydraulic system failure since all the other parameters were OK. But it surely got the heart rate higher. Good thing it wasn’t displayed to the crew; they never knew about it.


Failed SRB TVC could, in the worst case, result in the shuttle flying directly back to the Launch Control Center. Given worst case reaction time by the Range Safety officer (7 seconds), large chunks of SRB – probably with the propellant still ignited – could land on the LCC. I used to think about that when I sat in the front row of the LCC, up near the glass. Those big louvers in front of the windows? They were for shade; not only no protection from flying debris but likely to become lethal missiles themselves.


As Columbia lifted off the KSC public affairs officer, Lisa Malone made her little speech about how this flight advanced X-ray astronomy and women’s authority in the world. In the MCC, we didn’t listen to the PAO commentary so we were shocked to hear Eileen report “Fuel Cell PH”.


I’ve written about fuel cell PH failure mode before: https://blogs.nasa.gov/waynehalesblog/2009/01/07/post_1231342021582/


To make a long story short, it means that one of the fuel cells might be failing “It’s the Kaboom Case, Flight”.  On board Columbia, the Master Alarm klaxon had gone off and there were two messages on the failure summary page: H2O pump, FC 1 PH. As it turns out, these were symptoms, not the cause of the problem. The Fuel Cell was not breaking down, ready to explode, and the water pump (used for cooling) was not shut down. Instead it was one of the alternating current buses- AC 1 Phase A – which had shorted. About 5 seconds after liftoff, for about half a second, it sputtered along with a short of up to 72 amps. That is a lot of current!

93 short

The automatic protection cut in and shut down the affected part of the circuit. The Fuel Cell instrumentation that monitored for the escape of potassium hydroxide ( a strong base, high PH) used this bus, and this unpowered instrumentation gave an erroneous alarm. The water pump slowed and then resumed after the circuit breaker popped and normal power was returned on all the other AC1 equipment. An avionics bay fan (providing air cooling) had slowed but not enough to trip an alarm. What remained was loss of power to the SSME controllers.


The A computer on the Center SSME lost power, never to be recovered. The B computer (DCU B) immediately took control and the engine ran on normally. Well, almost – back to that in a moment. The Right Engine lost its B computer, but the A computer stayed in control and the engine, with its hydrogen leak, hung in there.


Loss of DCU A on the center engine caused a couple of interesting consequences. First of all, the MCC lost almost all the telemetry on the center engine; valve positions, turbine speed, some temperatures, almost everything was gone with the lost of DCU A. The folks that programmed the SSME computers put everything on the A side and very little – just Main Combustion Chamber pressure and hexadecimal word indicating any problems in the engine – on the B side. We were on ‘backup data’. The B computer also lost all the sensors that were being read by the A computer. Most importantly, the A chamber pressure transducer went away. The B computer no longer averaged the A and B transducers to get its chamber pressure for the basis of engine control but only had the B transducer – and it was reading 12 psi high! This may not seem like a lot, but it caused the center DCU B computer to command a throttle down on the center engine. All of a sudden, less LOX (and hydrogen) was being consumed. Almost invisibly, the Center engine was making up for the large shortfall of LOX that had been created by the nozzle leak on the right engine.


How lucky we were. Instead of being 200 or more fps short at MECO, possibly leading to an abort landing or requiring two tons of OMS propellant to make up, we wound up being only 15 fps short, well within the capability of the OMS budget.


Less than a minute off the pad and multiple failures. What were you thinking Sim Sup? Surely that is not a realistic case. Except it was real.


During a shuttle ascent, the crew normally has very little to do other than monitor the automated systems. As the velocity builds up, the MCC – the FDO – keeps the crew informed of what ‘abort mode’ they might have to use if one or more main engines were to turn off prematurely. As long as nothing happens, the crew happily gets to sit quietly until they are in orbit.


For STS-93, the big action for the crew was to take the AC bus sensors to off. Since the inverters – the devices that change the direct current electricity into alternating current – had a failure mode that could possibly make them over volt and fry their associated equipment, the automated circuit protection was enabled to swiftly react to any voltage issues. For the record, that never occurred in 135 shuttle flights. After losing two SSME computers, another AC bus dropping offline would cause one of the two affected main engines to shut down, probably requiring an abort. There was a slight chance the automated voltage control equipment could erroneously trip off an AC bus. Given the situation, the flight rules directed the crew to disable the automated shutdown. So Scooter told Ashby to take the AC Bus Sensors to off. That was the only action that the crew had to take the entire ascent.


They were blissfully unaware of the erroneous R SRB HYD pressure transducer. They did not get the word that the FDO was seeing the affects of . . . .something . . . .which turned out to be the combined effects of the right and center engine anomalies: a ‘6 fps thrust update in the ARD’ – not huge, not entirely atypical, but something going on. Not enough to change any calls: ‘no suspect engines’ ‘no under speed predicted’ ‘nominal shutdown plan’ were phrases – all carefully scripted by flight rules – that were passed between the flight controllers.


And of course, the flight controllers were unaware of all of what transpired inside Columbia.  That may be a good thing.


As it turns out, there was a LOX shortfall of 405 lbs. The LOX low level sensors detected depletion of the LOX, commanded main engine shutdown approximately 0.15 seconds earlier than guidance would have wanted, and the final shuttle velocity was about 15 feet per second short of what was desired; out of more than 25,500 feet per second. There was enough margin in the Orbital Maneuvering System load to make up about 300 lbs, and the flight proceeded normally.


Except Mr. Shannon, when informed of the situation shortly after main engine cut off said loud and clear on the Flight loop: “Yikes. We don’t need another one of those.”


If you look at the video of NASA TV just after those words, you can see me on the telephone, talking to the program manager down in Florida, about what this all meant.


It was just a couple of minutes later that one of the projectors hanging from the ceiling in Mission Control – the projectors that put up the displays on the front screens – overheated and started smoking. Quick action by the Ground Control officer to shut it off probably prevented a fire in the MCC, which would have lead to an evacuation.


That’s really over the top, isn’t it Sim Sup?


How much more exciting can you get. Give me a nominal, boring launch any day.

From friends at NASASpaceFlight.com watch the annotated video:



Now for the lesson: Be prepared. Spacecraft are complex and can fail in complex ways. Never, ever let your guard down. Practice for disaster all the time.


And remember:  Murphy does not play by the rules.

Posted in Uncategorized | Tagged , | 50 Comments