Unfamiliar Terms

During the summer of 2009, we were working on the history book about the Space Shuttle, ‘Wings in Orbit’. We had hired three summer interns, college students, to help with the book. Their primary assignments were to build the appendices, check references, make tables, and the like. Unusual for NASA interns, these three were not technical majors, but history, English, and social science majors. They did a great job for us.

 
One of the ‘other duties as assigned’ that we gave the interns was to read a chapter a week and report on it to the editorial board. All the writing was done by the engineers and scientist who worked on the shuttle program, and, sadly, engineers are not always known for excellence in written communications. We asked the interns to provide a critical look at the readability of each chapter and to be especially on the lookout for unfamiliar terms or acronyms that would make the text hard to follow for the general public.

 
One memorable week, one of the interns drew the assignment of reading the draft chapter on the historical setting of the space shuttle. I thought this might be unnecessary since Dennis Webb is a great writer and there was almost nothing technical in the chapter. At the end of the week, we convened the editorial board for a number of topics and had the intern’s reports at the end of the agenda. First up was a review of the chapter on the APUs and hydraulics. As you might expect, the intern pointed out several instances of very technical jargon and a number of undefined acronyms, all of which would have to be cleared up by rewrite. Then we covered a chapter on another technical subject with similar results. Finally it was time for the report on the history chapter. The young lady, probably a sophomore level college student, said that the chapter was very well written, easy to understand, and she had no recommendations for rewrite except for one term that she was unfamiliar with. ‘What was that?’ we inquired. She replied that the term she did not understand was:

 
‘Cold War’

 

 
I was thunderstruck. For someone of my generation, the idea that anyone would not know about the cold war is unthinkable. A short discussion ensued to make sure we had communicated correctly, but the bottom line was that she really hadn’t heard the term before and was unfamiliar with the concept.

 
More recently I had a conversation with a friend whose children are early high schoolers. He and his son were home alone one evening and decided to watch a movie together. On the schedule that evening was “The Hunt for Red October” based on the book by Tom Clancy. My friend reports that his teenage son didn’t quite get it. He kept asking why there was a big deal; after all the Russians are our friends, right?

 
Both of these young people were born after the fall of the Berlin Wall, after the dissolution of the Soviet Union. Those were ‘current events’ shortly before they were born. Given the lag in grade school history texts, those events were too recent to be covered.

 
When I was their age, nuclear annihilation stared us in the face. I can remember, probably when I was in 4th or 5th grade, when my parents came home from a Civil Defense meeting with plans of how to build a fallout shelter (we didn’t build one). In middle school the civics teacher showed us the AEC (Atomic Energy Commission – forerunner to the Department of Energy) films with the bomb tests in Nevada. You know the ones where they set up houses, mannequins with clothes, cars, household goods, etc., to see how a nearby bomb blast would affect those things. The message from those documentaries was that it was unsurvivable. I got to practice the ‘duck and cover’ method of hiding under our classroom desks if there was a bright flash in the sky. Again, the message came down that we were on the edge of annihilation and not likely to survive.

 
In high school history we studied the Cuban Missile Crisis and how close we came to the end in October of 1961. There were B-52 bombers on armed standby at air force bases near my home. I can remember as a high school student, plotting the likely fallout path from ‘targets’ near my home based on prevailing winds so that we would know which way to travel to survive following a nuclear exchange. That wouldn’t have helped either.

 

How that affected the psychology of two generations has been studied by sociologists.  It motivated people in so very many unusual ways.
And today’s kids don’t know what the term “cold war” means. I think that is a good thing. Not that they don’t need to understand history, but that those days are behind us. Hopefully for good, notwithstanding some burbles in the geopolitics these days.
There are a lot fewer nuclear weapons in the world these days – still too many – but the finger on the trigger seems to be a lot more relaxed. For our grandchildren’s sake I hope it stays that way.

 

We worry about a lot of things these days; there are serious problems all around us. But I think the level of worry is at a much lower intensity than it was 30 or 40 years ago.

 

 

And that is a good thing.

Posted in Uncategorized | Tagged , , , | 19 Comments

Significant Conversations

Just about exactly 10 years ago I was serving as the  Deputy Program Manager for the Space Shuttle Program.  Columbia had been lost a year and a half earlier and we were all trying our best to return the Space Shuttle to flight.  I got assigned to present the program status at a conference on Probabilistic Risk Assessment which occurred on the same day that NASA had chosen for our annual Safety day – when everybody was supposed to take a day off from their normal work to engage in classes, discussions, etc., about safety and how to improve it.  I wrote the following email to the folks working in the Shuttle Program .

With the events of the last few days, it seems appropriate to reprint it now.

Near the end, I refer to K.C.  In case you don’t remember, Kalpana Chawla was one of the crew members on Columbia’s last flight.

This is a little long, but you should read it to the end.

——————————————————

This week, while most of you will be taking a day to study and think of ways to be safer, I will be in Cleveland attending the NASA Risk Management Conference. It is my intention to understand this process better and apply it in program decision making as my personal contribution to improved safety in the Space Shuttle Program.

 
Many of us are struggling with the concept of accepting risk – how much is too much, how much is an inherent part of what we do – and I would ask each of you gives this topic some consideration during your Safety Day activities. The following are some of my thoughts on this subject – not direction nor even an exhaustive process on how to come to the best answer – but just some thoughts based on my experience in 26 years of shuttle experience. I can only write to you about what I know, and you must know by now that when I write to you it is from the heart.

 
Much of my experience comes from the 15 years I spend as a Flight Director. The Entry Flight Director is assigned the deorbit decision. After the payload bay doors are closed, the burn targets loaded in the computer, all the systems checked, and the last weather report is in, the clock counts down to ignition and the crew waits to hear the Flight Director’s decision: Go or No-Go. The Flight Control Room is always silent. Usually the orbiter is in flawless condition, rarely there is a minor systems problem, nothing significant. But there is always the weather forecast. The weather forecast – for a precise place at a precise time just about 2 hours in the future. The orbiter flies like a brick with handling qualities that would make a Mack truck proud. The commander has one shot at landing, there are no do-overs. It’s the Flight Director’s duty to make sure that one shot is a good shot.

 
Funny thing, it is never a black or white decision; it is always gray. There are always concerns, the chance for an adverse change, indicators of what might go wrong. When you look at weather under a microscope, it is never perfect. In fact, the harder you look the more little counter-indicators you see. The real question is not whether the weather will be perfect, but will it be good enough. The indicators may be gray but the decision is black and white. Binary. Go or No-Go. And if the decision is No-Go, everybody knows the shuttle can’t go around forever. We will be back to make the same decision tomorrow. So the managers in the viewing room watch and wait and second guess. The flight controllers strain to listen for an indication of what the answer will be. The convoy team fidgets. One person must make a decision; give the word for the record, live with the consequences.

 
I have given the Go 28 times. Every time was the toughest thing I have ever done. And I have never ever been 100% certain, it has always been gray, never a sure thing. But the team needs to have confidence that the decision was good. It is almost a requirement to speak the words much bolder than you feel, like it is an easy call. Then you pray that you were right.

 
We have done everything practical to mitigate the risk. The meteorologists train every day making real forecasts at the landing sites and checking two hours later to see if they were right. Their statistics are impressive: if they forecast a Go, it turns out to be Go about 97% of the time. But there is that 3% to consider. The weather rules are reasonably conservative, based on conditions that the astronauts have been able to handle in the simulators, in the shuttle training aircraft. But we cannot simulate the transition from zero gravity to one-g; and the stress of doing it for real with no chance to do it again, and with the whole world watching; all that is tough to get past. Even in the aerodynamics, proving that the machine is controllable, and in structural loads, how well the vehicle will hold together, there are limits to the uncertainties that are acceptable. If we have analyzed wrong, or the aerodynamics exceeds what we expect, . . . well.

 
When our predecessors invented the shuttle, based on their aircraft test experience and previous space programs, they set up a standard that everything should work properly in the face of 3 sigma environmental deviations and 3 sigma systems dispersions. Any basic statistics course will tell you that mean + 3 sigma covers 99.7% of the cases. But there are 3 chances in 1000 that aren’t covered. Why not? Because to try to cover everything – worst on worst on worst – would require a vehicle design that probably is too heavy to get off the ground, and would require a set of proof testing that would take a lifetime to accomplish, and would cost, well, way more than we can afford. So inherently there is risk in using this system. And don’t forget, that is the risk that we understand, that we have designed against, that we have good numbers for. What we don’t recognize or cannot quantify is out there as an “unknown unknown”.

 
There is nobody in the Shuttle Program Management, or the Agency management that has any delusion that we can reach perfection. Our collective job is to understand the risk, mitigate it as much as possible, communicate accurately all round about the risk remaining, and then decide if we can go on with that risk.

 
Right now we are working through many issues that are not black and white. There are many options, many shades of gray. There is always a debate about whether we have done enough, whether we have done too much, whether it is good enough. It has always been part of the engineer’s job to determine when enough has been done; not to overdo or to make it so conservative that it takes forever or is impractically uneconomical, or too heavy to get off the ground. Knowing when we have done enough is the art of engineering.

 
I have heard a well respected, retired senior NASA official speak on several occasions lately. One of his themes is that we are in an inherently risky business, but accepting risk does not mean not testing or not doing analysis. That is not risk acceptance, that is gambling. The real art is knowing when the testing is adequate and it is time to decide and move on. The newly reinvigorated engineering culture at NASA is to return to our roots and make decisions based on knowing — not perfectly knowing, but adequately knowing — what we face.

 
Every day people make little Go/No-Go decisions. The chief of the wind tunnel team signs off the latest test data to be complete and accurate. He has to decide if there have been enough runs to validate the data. Once he does, we use that data to make critical decisions. An engineer, knowing that there is never unlimited time nor unlimited resources, makes a decision about how many tests it takes to prove a new design. Another Go/No-Go decision is made.

 
When the OPF technician stamps the work document that the bolt has been tightened to its torque specification, a Go has been given. When the torque specification was written into the work document and the tech writer signed the document, the signature says Go. When the engineer who designed and tested the part calculates the proper torque value and signs the drawing, there is another Go. The engineer, the writer, and the tech may never meet face to face, but they have to trust each other that each one has done his job right. The Shuttle program is a big organization; there are over 20,000 of us. These days the whole agency is working to help us return to flight. It is impossible to know everyone, but we have to trust that everyone is doing their job right, making sure no mistakes have been made anywhere. Your signature or stamp will be rolled up into your manager’s signature on the Certification of Flight Readiness. Next spring, Bill Readdy, the chairman of the Flight Readiness Review, will read the poll, and all the managers will say Go. Their Go is based entirely on what decisions you have made.
There are jobs in the world where the calls don’t have much consequence. Nobody in this agency has one of those inconsequential jobs. It may not seem that a financial call on a budget line item could be a Go/No-Go decision. But frequently that becomes the critical decision that determines the difference between success and failure. It may not seem that a personnel action could be a Go/No-Go decision. But having the right person in the right place with the right training and experience is paramount. These are critical decisions in our profession, equal to the more obvious engineering decisions.

 
And behind every decision, everyone knows that we have neither the time nor the resources to do anything that is not absolutely critical to the safe return to flight. We have no time for fluff, no resources for nice-to-haves. Choices have to be made daily, between what must be done, what has to be done, and what can be eliminated because it is not required to be done.

 
The Flight Directors and CapComs always visit the crew in quarantine just after the last sim, just before the crew flies to the cape. We have trained together, sat in hours of meetings next to each other, laughed at each other, gotten angry with each other, and have undergone the great testing of simulated flight with each other. At the meeting, we cover the last minute changes and reminders, check on the wakeup calls and when to send the morning mail, and make bad jokes. At the end, we have the Ritual of The Handshake. Everybody has to shake everybody else’s hand before we leave. We look each other in the eye and say ‘Good luck’. They always say ‘We’re looking forward to a great flight’. Nobody ever talks about . . . you know.

 
But we all know.

 
There is risk about to be taken, serious risk that can have ultimate consequences. Humankind collectively does not know enough to scientifically drive the risk of space flight to zero. A hundred years would not provide enough time for all of us working together to positively eliminate any risk. Ten thousand small decisions throughout the preparation for the flight have been made, each with underlying risk calculations, and that total risk has accumulated and communicated upward. Everybody has done their best to make it perfect, but there is a limit to what can be done. That is what we know. And we also know that the risk of not going is infinitely worse; the consequences would be worse if we didn’t try than if we try and fail.

 
Sometimes it is exquisitely clear when you are having a Significant Conversation. After we close the quarantine door and walk to the parking lot there is never any conversation. It is always a silent walk. I’ve had that walk 40 times. Flight Directors know too much about the risks.

 
Senator McCain has written a book entitled “Why Courage Matters”. You may not agree with his politics, but the senator’s credentials concerning courage beyond dispute. He says that we have watered down the meaning of courage. An athlete’s prowess on the field of play is not courage, he says. Suffering an illness or injury without complaint is not courage. Being outspoken in a culture of silent acquiescence to certain wrongs is not courage. These are all evidence of virtue, the senator argues, but they are not examples of courage. The former POW defines the courage as acts that risk life and limb to uphold a virtue. And he quotes Martin Luther King, Jr.: “If a man hasn’t discovered something he will die for, he isn’t fit to live.”

 
Everybody knows that there are ultimate risks in space flight. Some among us believe so strongly in the benefits that they put their lives on the line. Others of us believe so strongly that we do something harder to live with: we send our colleagues into danger. Why should we do it? Because the consequences of not taking the risk are unthinkable. The choice of turning back and giving up would affect the rest of history in ways that are immeasurable.

 

Somebody recently said that what we are engaged in is like high stakes poker. That comment trivializes space flight to a parlor game where the only risk is money or pride or career or other cheap consideration. To push back the frontier incurs a price that sometimes must be paid in a currency more dear than mere dollars. It takes courage.

 
It was Christmas break and the parking lot at JSC was almost deserted. After 15 years as a Flight Director, I was days away from moving to my new job at KSC and had come in to finish up loose ends in my old Building 4 office. It was dusk as I walked out into the nearly empty parking lot. K. C. was leaving work too. She greeted me with that megawatt smile she always had. I asked if she was ready to go fly. Her response was enthusiastic: Yes, a couple of weeks to launch and her crew is trained and anxious to fly after long months of delay. Our brief conversation consisted of only happy words. We didn’t talk about risk or danger, only of the rewards to be expected from a successful flight. I wished her good luck and turned for my car and drove home. I didn’t give the conversation any thought until the first day of February.

 
Sometimes, you don’t know when you are having a Significant Conversation. I know now that K. C. did not understand all the infinite detail of risk that lay ahead of her, clearly none of us did. But I can say without a doubt that she felt what was to be accomplished outweighed the risk that she understood, outweighed it by a lot.

 
Recently a reporter asked if it would be difficult for me, as chairman of the MMT, to give Mike Leinbach, the Launch Director, the Go For Launch at T-9 minutes. I told him no, that by launch day our procedures and processes would be well polished, the decision criteria all agreed to and documented, and all the really difficult decisions would be behind us. We would just be executing from the checklist and the final Go would be a matter of making sure all the squares were checked. It would be easy.

 
After thinking about that for a few days, I realized that that answer is, of course, a lie: under the microscope, nothing looks perfect, and the call will be hard because . … .. you know. Life is full of gray choices. Deciding the work completed is good enough because more will not make it perfect. Ten thousand gray choices; doing what we must do, and not a bit more because that would take away from other work that is absolutely critical to be done right. When we have done what we can do, when we have driven the risk to the lowest practical level where it can be driven, then we have to accept the fact that it is time to make a decision and move on. Because history is waiting for us. But history will not wait forever, and it will judge us mercilessly if we fail to face tough choices and move ahead.

 
During the countdown, Steve Altemus, the launch NTD, will give many folks the challenge: ‘Say Go or No-Go’. You need to imagine Steve standing at your elbow each day asking that question to you because it all rolls up. Each of us has a part. Nobody can be sloppy or careless. Nobody can take forever trying to get it perfect, blocked by indecision or the fear of making a decision. Nobody in our business gets an easy choice. Yours will be a gray decision, too. We owe it to some courageous people to get it right. Don’t waste your time on things that don’t count. Focus on what must be done and do it right and then move on to the next problem to solve. There will be some risk that we cannot control, that we cannot solve, that we cannot eliminate. That risk, we will have to accept. If we have done our job right, it will be worth it.

 
At the end of the countdown, Mike Leinbach will wish Eileen and her crew ‘Good Luck.’

 
You will know what that conversation means. That will be significant.

Posted in Uncategorized | 22 Comments

STS-93: We don’t need any more of those

It was about a quarter past midnight on July 23, 1999 when the Ralph Roe, the Shuttle Launch Director, told Eileen’s crew that they were go for launch and wished them good luck. The launch, which had been scrubbed late in the countdown on two previous attempts, was going to be about 7 minutes late due to some transient communications system problems at the Merritt Island Launch Annex (MILA) tracking station.

 

In Houston, I was sitting on the back row of mission control acting as the Mission Operations Director. Sitting next to me was Chuck Knarr, former flight director and executive in the United Space Alliance. In the hot seat at the Flight Director position was John Shannon with LeRoy Cain sitting beside him to watch the weather and keep track of the checklists. Capcom was Scott “Scooter” Altman.

 

The old joke was: “which F-14 pilot do you want as your wingman? Cobra, Maverick, or Scooter?” The correct answer was Scooter – Cobra and Maverick were just Hollywood movie creations for ‘Top Gun’; in that movie all the real flying was done by Scooter.

 

Down in the trench was Flight Dynamics Officer (FDO) Lisa Shore with Carson Sparks sitting next to her as TRAJ officer. On the Electrical General Instrumentation and Lighting (EGIL) console was Tim North – about to get the ride of his life – and on the back row was Booster Officer Jon Reding, one of the coolest heads to sit at that position. A well seasoned crew to watch over the launch phase for Eileen Collins, the first woman to command a Space Shuttle mission, her pilot Jeff Ashby, and the rest of the unusually small crew of Columbia.

 

STS-93 carried the heaviest payload the shuttle ever launched; the Chandra X-ray observatory (formerly known as the Advanced X-ray Astronomy Facility or AXAF) and it IUS booster. The Boeing build solid rocket upper stage called the Inertial Upper Stage was extremely reliable, but a heavy load. It had an interesting history; originally called the Interim Upper Stage (same acronym), its more powerful replacement was cancelled for budgetary reasons and the IUS lived on.

 

Everything appeared quiet, but problem lurked just beneath the surface.
Post flight we calculated that the LOX load was less than planned; due to slight temperature variations, tank volume, placement of the ‘full’ level sensors and other factors, Columbia was going to launch this night with 897 lbs less LOX in her tank than was intended. Given everything else working normally, there should have been a performance reserve still in excess of 3,000 lbs. But everything else did not work as precisely as planned.

 

Columbia, the oldest shuttle, had flown 25 missions by this time and the tiles and thermal blankets showed it; scars and stains over almost all of them. And hidden, deep down inside, were the flaws and insults that any real world flying machine always carries: some minor, some major, some that never had a consequence, and some that ultimately contributed to her loss.

 

One flaw lived in a stretch of 22 gauge kapton insulated wiring, nearly half way down the payload bay, where a single strand of AC current carrying wire had rubbed and chaffed against a screw head which had a minor burr where the tech had over tightened it long ago, and which other techs had probably stepped on during turnaround refurbishment. In the Right SRB, a hydraulic pressure sensor was not completely connected to its wiring and even though it was showing accurate pressure now, the connection could shake loose under vibration just enough to open the circuit and give a false low pressure reading. In the center main engine the two chamber pressure measurements A&B were reading exactly the same with the engines off, but the B channel had a bias that would only show up when the engine reached full throttle; it would read 12 psi high. This was outside the allowable error but still small compared to the 6,000 psi operating pressure. On the right engine, a deactivated LOX post in the main injector had a gold coated pin that wasn’t quite as tightly seated as it should have been, whether it was installed slightly off or worked loose over a number of firings, in the end it didn’t matter. Columbia had a lot of really minor flaws that didn’t affect today’s story, but some of which would show up later with other consequences. Viewgraph PowerPoint spaceships are always flawless; real spaceships are made and maintained by fallible human beings and are less perfect.

 

Somebody posted the NASA TV video of the launch on YouTube – you can watch it at:

So the countdown started up at T-9 minutes as it always did, seemingly uneventful on the surface, but we all knew what was about to happen and the adrenalin built up as we waited. Ashby started the APUs right on time, pumping high pressure hydraulic fluid to all the valves in the main engines among other things. At T-31 seconds the onboard computers took over, calculating the exact launch parameters, allowing the IMU gimbals to turn freely, opening vent doors so that pressures could equalize during the climb to space. At T-10 seconds the ground computers sent the last command onboard – the electronic equivalent of ‘go for main engine start’. The onboard computers reacted immediately by firing off the ‘sparklers’ called ROFIs (never can remember what that stands for) to burn off the excess hydrogen down by the main engines. At T-6.6 seconds the onboard computer commanded the first main engine to start, and staggered the second start by 140 milliseconds, and then the third by the same delay. The SSME controllers commanded the spark igniters on, and started the complex choreography of valve openings to bring the three main engines safely to roaring life.

 

By about T-3 seconds, all engines were up and operating at 100% of rated power level. Exactly when it happened is not clear, but on the right engine, the gold plated pin from LOX post 32 in row 13 came shooting out. Just like a bullet it went through the narrow part of the converging nozzle and flew out into the nozzle extension.
Now two scary things could have happened. First, the LOX post, which was pinned for a reason, could have failed allowing LOX to pour into the engine cavity where the hot hydrogen was introduced. Remember from an earlier post that this could have caused an explosion, or melting of other LOX posts, or other really bad things to happen. Failure of the LOX post was considered a CRIT 1 failure – loss of vehicle and crew ‘promptly’. Fortunately – luckily – the LOX post held together for the next eight and one half minutes. How close we were to disaster has never been determined.

 

Second, the nozzle extension could have failed. Early on in ground testing, one of them did fail – spectacularly – generating a huge explosion and fire. Those nozzle extensions were one of the weak points of the entire complex engine system. Made of 1080 long narrow stainless steel tubes braze welded together, the nozzle extension was cooled by liquid hydrogen flowing down its length. So cold on the outside that a layer of frost condensed out of the moist Florida air in seconds over the outside of the nozzle extension, the temperature inside was hot enough to melt any metal. These tubes were a bear to maintain and the entire extension had to be replaced periodically. The tubes were prone to split which required complex weld repairs until there were just too many repairs and a new nozzle extension had to be installed. The next upgrade to the SSMEs was to build a more robust channel wall nozzle extension. The shuttle program ended before that was done. Someone had calculated that if 5 adjacent cooling tubes split or were otherwise ruptured, there would not be enough local area cooling and a burn through would occur, causing a cascading failure of the nozzle and . . . a CRIT 1 failure.

 

The bullet shaped LOX post pin hit the side of the right engine nozzle extension about two thirds of the way to the end with great force. Just by sheer luck, three nozzle tubes were breached. Three adjacent nozzle tubes lost cooling and started leaking hydrogen into the hot stream of gas coming out of the engine. Three tubes, not five. The adjacent cooling tubes kept the nozzle from failing during the eight and a half minutes the engines operated.

 

At the time, the LOX post pin and nozzle tube leak went undetected. Look closely in the video and see a blue streak on the side of the nozzle. Nobody in Mission Control noticed it until afterwards during the photo review. The nozzle was leaking 3.5 lbs of hydrogen every second.

STS-93.MissionStatusBriefing.Slide (2)

Several things happened as a result. One might think that this would lead to running out of hydrogen, which, as an earlier post noted, would have been very bad. But the engine reacts in to the loss of hydrogen in this area in a very interesting manner. This leak was downstream of the hydrogen flow meter so to all observers – human and digital – it appeared that the right amount of hydrogen was going into the engine. The real loss of hydrogen flow to the turbines driving the pumps on the right engine was lower mixture ratio – the mixture ratio now closer to stoichiometric – and the turbine temperatures increased by about 100 degrees. Normally there is about a 200 degree margin between the normal turbine temperatures and the redline value that will automatically shut an engine down. The nozzle leak immediately used up half the operating margin to the redline.

 

Since the chamber pressure dropped slightly due to the loss of fuel for the fire in the main combustion chamber, the SSME controller commanded more oxygen be sent to the MCC. That may sound strange but it is the way the SSME computer controls the chamber pressure. Again, the mixture ratio is off and not only is the fire hotter in the MCC, but more oxygen is being consumed. Due to both of these processes, post flight calculations showed that the LOX tank should have been short about 3,000 lbs – translating to about 200 fps short at MECO. That large velocity shortfall did not happen because we were lucky: what happened to the Center SSME corrected that. Hold that thought.

 

Meanwhile, about a minute after launch, the booster officer and his team recognized the fact that the right engine turbine temperatures and speeds were higher than normal. They correctly identified that this might be due to a nozzle leak, but there was another potential anomaly that also had the same signature. If the oxidizer pump started to lose ‘efficiency’ (blades rubbing, pump clogging, etc.) it would look the same. As the SSME controller commanded mixture ratio changes to keep up with the loss of efficiency on the pump, the turbines would reach their temp limit and the engine would have to throttle down to prevent a shutdown: this was called ‘thrust limiting’. Until the SSME went into thrust limiting, the Booster team could not tell the difference between an oxidizer turbine/pump efficiency loss and a nozzle leak. The instrumentation just wasn’t precise enough to know what was going on. Jon and his team correctly identified that the engine was running off nominally (‘off tags’) but could not quantify it. Later on, when the FDO asked him, the Booster officer had to report that none of the engines were ‘suspect’. All these terms were precisely defined in the flight rules and had specific actions for the flight controllers and crew to take to maximize safety. But this leak was too small for any of that.

 

The reason the booster officer took over a full minute to identify problems on the right engine was because of distractions: the AC 1 short – which was real, and had real consequences – and a big red light that went off on his console saying the Right SRB hydraulic system pressure had dropped to catastrophic levels – which was not real. Each of the solid rocket boosters put out over twice the thrust as the three SSMEs put together. The steering mechanism was critical and each SRB had two separate hydraulic systems for redundancy. If both hydraulic systems on one SRB failed there would be no steering on that side; probable loss of control, and another one of those ‘prompt’ CRIT 1 failures – loss of vehicle and crew. So having that red light go off right in front of his nose took a few seconds to sort out. It clearly was a transducer failure, not a real hydraulic system failure since all the other parameters were OK. But it surely got the heart rate higher. Good thing it wasn’t displayed to the crew; they never knew about it.

 

Failed SRB TVC could, in the worst case, result in the shuttle flying directly back to the Launch Control Center. Given worst case reaction time by the Range Safety officer (7 seconds), large chunks of SRB – probably with the propellant still ignited – could land on the LCC. I used to think about that when I sat in the front row of the LCC, up near the glass. Those big louvers in front of the windows? They were for shade; not only no protection from flying debris but likely to become lethal missiles themselves.

 

As Columbia lifted off the KSC public affairs officer, Lisa Malone made her little speech about how this flight advanced X-ray astronomy and women’s authority in the world. In the MCC, we didn’t listen to the PAO commentary so we were shocked to hear Eileen report “Fuel Cell PH”.

 

I’ve written about fuel cell PH failure mode before: https://blogs.nasa.gov/waynehalesblog/2009/01/07/post_1231342021582/

 

To make a long story short, it means that one of the fuel cells might be failing “It’s the Kaboom Case, Flight”.  On board Columbia, the Master Alarm klaxon had gone off and there were two messages on the failure summary page: H2O pump, FC 1 PH. As it turns out, these were symptoms, not the cause of the problem. The Fuel Cell was not breaking down, ready to explode, and the water pump (used for cooling) was not shut down. Instead it was one of the alternating current buses- AC 1 Phase A – which had shorted. About 5 seconds after liftoff, for about half a second, it sputtered along with a short of up to 72 amps. That is a lot of current!

93 short

The automatic protection cut in and shut down the affected part of the circuit. The Fuel Cell instrumentation that monitored for the escape of potassium hydroxide ( a strong base, high PH) used this bus, and this unpowered instrumentation gave an erroneous alarm. The water pump slowed and then resumed after the circuit breaker popped and normal power was returned on all the other AC1 equipment. An avionics bay fan (providing air cooling) had slowed but not enough to trip an alarm. What remained was loss of power to the SSME controllers.

 

The A computer on the Center SSME lost power, never to be recovered. The B computer (DCU B) immediately took control and the engine ran on normally. Well, almost – back to that in a moment. The Right Engine lost its B computer, but the A computer stayed in control and the engine, with its hydrogen leak, hung in there.

 

Loss of DCU A on the center engine caused a couple of interesting consequences. First of all, the MCC lost almost all the telemetry on the center engine; valve positions, turbine speed, some temperatures, almost everything was gone with the lost of DCU A. The folks that programmed the SSME computers put everything on the A side and very little – just Main Combustion Chamber pressure and hexadecimal word indicating any problems in the engine – on the B side. We were on ‘backup data’. The B computer also lost all the sensors that were being read by the A computer. Most importantly, the A chamber pressure transducer went away. The B computer no longer averaged the A and B transducers to get its chamber pressure for the basis of engine control but only had the B transducer – and it was reading 12 psi high! This may not seem like a lot, but it caused the center DCU B computer to command a throttle down on the center engine. All of a sudden, less LOX (and hydrogen) was being consumed. Almost invisibly, the Center engine was making up for the large shortfall of LOX that had been created by the nozzle leak on the right engine.

 

How lucky we were. Instead of being 200 or more fps short at MECO, possibly leading to an abort landing or requiring two tons of OMS propellant to make up, we wound up being only 15 fps short, well within the capability of the OMS budget.

 

Less than a minute off the pad and multiple failures. What were you thinking Sim Sup? Surely that is not a realistic case. Except it was real.

 

During a shuttle ascent, the crew normally has very little to do other than monitor the automated systems. As the velocity builds up, the MCC – the FDO – keeps the crew informed of what ‘abort mode’ they might have to use if one or more main engines were to turn off prematurely. As long as nothing happens, the crew happily gets to sit quietly until they are in orbit.

 

For STS-93, the big action for the crew was to take the AC bus sensors to off. Since the inverters – the devices that change the direct current electricity into alternating current – had a failure mode that could possibly make them over volt and fry their associated equipment, the automated circuit protection was enabled to swiftly react to any voltage issues. For the record, that never occurred in 135 shuttle flights. After losing two SSME computers, another AC bus dropping offline would cause one of the two affected main engines to shut down, probably requiring an abort. There was a slight chance the automated voltage control equipment could erroneously trip off an AC bus. Given the situation, the flight rules directed the crew to disable the automated shutdown. So Scooter told Ashby to take the AC Bus Sensors to off. That was the only action that the crew had to take the entire ascent.

 

They were blissfully unaware of the erroneous R SRB HYD pressure transducer. They did not get the word that the FDO was seeing the affects of . . . .something . . . .which turned out to be the combined effects of the right and center engine anomalies: a ‘6 fps thrust update in the ARD’ – not huge, not entirely atypical, but something going on. Not enough to change any calls: ‘no suspect engines’ ‘no under speed predicted’ ‘nominal shutdown plan’ were phrases – all carefully scripted by flight rules – that were passed between the flight controllers.

 

And of course, the flight controllers were unaware of all of what transpired inside Columbia.  That may be a good thing.

 

As it turns out, there was a LOX shortfall of 405 lbs. The LOX low level sensors detected depletion of the LOX, commanded main engine shutdown approximately 0.15 seconds earlier than guidance would have wanted, and the final shuttle velocity was about 15 feet per second short of what was desired; out of more than 25,500 feet per second. There was enough margin in the Orbital Maneuvering System load to make up about 300 lbs, and the flight proceeded normally.

 

Except Mr. Shannon, when informed of the situation shortly after main engine cut off said loud and clear on the Flight loop: “Yikes. We don’t need another one of those.”

 

If you look at the video of NASA TV just after those words, you can see me on the telephone, talking to the program manager down in Florida, about what this all meant.

 

It was just a couple of minutes later that one of the projectors hanging from the ceiling in Mission Control – the projectors that put up the displays on the front screens – overheated and started smoking. Quick action by the Ground Control officer to shut it off probably prevented a fire in the MCC, which would have lead to an evacuation.

 

That’s really over the top, isn’t it Sim Sup?

 

How much more exciting can you get. Give me a nominal, boring launch any day.

From friends at NASASpaceFlight.com watch the annotated video:

https://www.youtube.com/watch?v=O9WjCyWq-iA

 

Now for the lesson: Be prepared. Spacecraft are complex and can fail in complex ways. Never, ever let your guard down. Practice for disaster all the time.

 

And remember:  Murphy does not play by the rules.

Posted in Uncategorized | Tagged , | 50 Comments

Practicing for Disaster

Even though we sometimes hated them, the training teams that prepared mission control and the astronauts for every flight are real heroes.  Without their efforts, all of us flight controllers would have believed we knew everything there was to know about everything and would have tripped over our own shoelaces at the first sign of trouble.

I can remember after about a year of integrated training for STS-1, just when we were feeling like we were hot stuff, a new capability drop for the simulators came on line.  That first day of training with the new capability, the EGIL sang out “Control Bus AB1 is down” on the Flight Director loop.  We all looked at blankly each other and asked, ‘what the heck is a control bus’.  Turns out that failures in the shuttle electrical system works were not simulated before, and now they were.  All of us learned an awful lot about the shuttle electrical system very quickly.

Training flight controllers and astronauts was a complex job especially when we had “integrated training” which was the closest thing to real space flight available.   There were certain ‘cases’ that each flight controller and crew member were required to experience and master.  These malfunction cases were practiced over and over and over again until they became so routine that we were bored by them; identification and reactions were automatic and swift.

Then there were new cases for new crew members and new flight controllers; hundreds of potential malfunctions that could possibly occur.  Each new person had to demonstrate familiarity and confidence to identify and rectify hundreds of potential shuttle systems failures.  Little failures, big failures, complex failures, simple failures.  Hundreds, if not thousands.

Then the gain was raised with complex interacting multiple failures: this computer and that electrical bus, this IMU and that GPS, this hydraulic system and that aero data set with wind shear, this TAL abort and that leaking tire, and on and on and on.  A seemingly infinite set of dual complications.

Master all the dual combinations?  Get ready for triples!  Shesh.

On a busy day the sim team had to make sure that multiple flight controllers saw multiple failures in each eight and a half minute shuttle launch profile.  We generally did six launches in one day.  Or in the entry sims that simulated the last 15 minutes of entry – four of those cases filled up a day.  The nexus for all problems of course was the Flight Director.  Sometimes it felt like the Flight Director was wading through a class of excited grade school students all calling for his attention at once.  The Flight Director had to recognize and prioritize failure responses very quickly.  Some things just had to wait (“Flight, the cockpit voice recorder just failed, have the crew switch to recorder #2.” ” OK INCO, we will do that as soon as we get the cabin air leak stopped, the fire out, and the abort mode selected”).

Flight Directors – including this one – tended to get testy on days like that.  ‘Not realistic Sim Sup’  or ‘We will never have an ascent with that many failures Sim Sup’ or other brief communications that we cannot reproduce in a family oriented publication.  In the old MCC of Apollo heritage which we used for early shuttle missions, the Sim Control team area overlooked the Flight Control Room – but there were curtains on the MCC side  When the Flight Director really had enough, he would have the curtains closed, blocking off the sim control team’s view.  Later on, the sim team used the remote controlled TV cameras in the Flight Control room to observer their victims, er, trainees during tense moments.

An eight and a half minute ascent simulation with a full malfunction count could leave you breathless and heart racing.  Sometimes the debrief took two hours to discuss what had happened and action items would be assigned that might take weeks of research to finally answer.

When we got to real flight, it was so calm by comparison as to be boring.  Many the nominal ascent I would look back and wonder what we trained so hard for.

Then came STS-93.

After that one, the Flight Directors complained a lot less about busy training runs.

 

 

Posted in Uncategorized | Tagged , , , | 15 Comments

STS-93: Dualing computers

In the early days of rocketry, when subsystems reliability was low, hard experience led designers to add redundancy for critical functions where they could.  Redundancy comes at a cost:  increased weight, increased complexity, unintended interactions, complex schemes to manage the redundancy, etc.  Now days, subsystem reliability is much higher, especially electronic parts.  So today we have sophisticated discussions about dispensing with redundancy and living with single string high reliability systems, even in critical areas.

Whenever I am engaged in one of these discussions I remember STS-93 where a 2 cent screw and a 10 cent length of wire demonstrated the vulnerability of an otherwise highly reliable critical system.  “Dualing computers” is not a misspelling, it is a safety concept.

Each Space Shuttle Main Engine has a computer mounted right to its side to run the complicated functions required for safe operation of that very complex, high energy device.  And every SSME controller is made up of two redundant computers:  DCU A and DCU B (Data Control Unit).  The A computer is always in control while the B listens along – until and unless the A computer fails; then the B computer takes over.  Each computer has its own way to control every valve and its own set of instrumentation required to run the engine:  pressures, temperatures, valve positions, turbine speeds.

However, when both A and B are working, they share data.  So there is a pressure measurement for the main combustion chamber wired to DCU A and another one wired to DCU B, but when both computers are working they share data and make computations based on the average of the two chamber pressure measurements.  If one of the computers fails, the other carries on but then has only one measurement to make calculations from – no more averaging.

Almost all the telemetry that is sent from the engine to Mission Control comes from the A computer; if it fails the B computer sends only a few data points, not nearly as many as the A side.

When STS-93 had its little problem, every engine kept working just fine even though two computers on two separate engines went silent.   There was never another case of SSME computer loss in the entire suite of shuttle flights.  These computers were highly reliable. The computers never failed because of an electronic part problem or a software error.

But in systems design, like warfare, defense has to be made at the most vulnerable point.  For the SSME controllers this was the power source.

If the shuttle designers hadn’t built in redundancy, two of the three engines would have shut down just after lift off.  The results from that would not have been good.  The crew has a procedure to run called “2 out First Stage”.  It is one of those procedures that Capt. Young used to describe as “keeping busy while you wait to die.”

The next time someone tells me they have a highly reliable system that doesn’t need redundancy, I will remember STS-93. I hope you do too.

 

Posted in Uncategorized | 26 Comments

STS-93: Dodging Golden Bullets

 

Calling it Rocket Science is, of course, a misnomer. Science provided the background but today it is definitely Rocket Engineering. Scientists and Engineers mix together like, well, cats and dogs. Friendly détente some days, not so much other days.

But there is one part of building liquid rocket engines that can still be called Rocket Art – making the injectors work. All combustion engines mix fuel and oxygen to make a fire; but for really complete combustion – really good gas mileage – that mixing can be very tricky indeed.

In a large, high pressure, high thrust liquid rocket engine like the SSME, a great deal of art is involved in designing the injector plate at the top of the combustion chamber. In a poor design, spray patterns from the injector can create hot spots on the wall of the chamber, defeating the cooling mechanisms and melting out the side – like letting loose a welder’s cutting torch. In other cases poor mixing can lead to combustion instability. Think of an overloaded out of balance washing machine, but much more powerful. The mighty F-1 engines of Apollo were plagued by combustion instability which was never really quite solved. That made each moon launch more of a gamble at the very start than you probably realized.

In the world of technical and economic secrets, injector design is protected by ITAR and economic espionage laws. We won’t go to that level of course.

In the SSME, the very hot partially burned Hydrogen gas must be mixed with the still super cold liquid Oxygen in just the right way to protect the engine through the start transient, main stage with its varying throttle settings, and the shutdown transient. Did I tell you that the propellant flow is over a half a ton a second? And it burns at over 3,000 degrees F? And passes through the throat of the nozzle – about the size of a dinner plate – at the speed of sound? And in all this, the engine is 99.9% of the maximum theoretical efficiency for this type of heat engine? And that’s not all, its reusable, too. Only one in the world.

LOX posts and pin

Looking at the picture, the liquid oxygen is introduced through a forest of stainless steel tubes called LOX posts. Cooled by the LOX inside, heated by the hot Hydrogen outside, the tubes are both robust and at the same time frighteningly fragile. Hugh forces work on the LOX posts, especially during start up. Vibration forces are high throughout. And if one of those posts breaks off at the root, well, very bad things can happen. In the cool NASA parlance, a LOX post failure is CIL Crit 1. Loss of vehicle, loss of crew ‘promptly’ upon just one failure.

To eliminate the potential of a LOX post failure, inspections of the hundreds of LOX posts is performed with ultrasound. If a post shows signs of ‘fatigue’, the remedy is to plug it at the base. They use a gold pin, about the size and shape of a bullet. In the history of the program, over 200 LOX posts were pinned in this way, and only a couple ever worked loose. STS-93 was one of those.

Back to the case of STS-78 discussed in the last edition of this blog; the ‘overboard mixture ratio’ for the vehicle was changed because of the number of LOX posts that were pinned on the engines of that flight. Instead of being the expected 6.03-ish, it turned out to be more like 6.002-ish. That difference of .028 meant more fuel was used, less oxygen, and, in combination with other factors, resulted in fuel depletion just at the guidance commanded MECO. A bigger difference would have cut the engines off early – safe except for the trajectory implications. We got lucky.

On STS-93, a different set of circumstances was in play and the results were worse. Manageable, but that is due to being luckier than we deserved.

As my old boss used to say: “It’s better to be lucky than smart.” I hate that but it’s true.

Stay tuned.

Posted in Uncategorized | 11 Comments

Understanding STS-93: the key is Mixture Ratio

Some time back I started to tell the story of the most interesting shuttle launch:  STS-93.  I think it is time to return to that topic.  To understand what happened, some background is necessary.

If this is too engineering-geeky for you, well, what are you doing thinking about rockets and space travel?  Consider this part of your education.

Consider the following graph – I certainly spent many hours studying it and its relatives.  I would tell you frankly, I am sure I never completely understood it.  So don’t feel bad if you don’t either.  But it gives a summary of some very complex interactions.

FPRFlight Performance Reserve (FPR) is the mass of fuel (Hydrogen) and Oxygen left in the External Tank when it is jettisoned just short of orbital velocity.  Minimizing FPR is a good thing – every ounce thrown away is an ounce that could have been payload – food, water, experiment, satellite – something useful.  FPR thrown away is . . . .wasted.

At the same time, keeping too little FPR, or making the mistake of not keeping any reserve at all, means that one likely comes up short.  Short of energy, short of velocity, not in orbit, but on a ballistic trajectory that re enters the earth’s atmosphere very soon.  Too soon.  STS-93 was nearly that case.

If you look at the burning of Hydrogen and Oxygen – the second highest energy release possible on the Periodic Table – you would find that the stoichiometric ratio for complete combustion and maximum energy release is 16: 2 hydrogen atoms (atomic mass = 1) attached to one oxygen atom (atomic weight 16) for a complete combustion MR of 8.0.  But the space shuttle’s main engines have a mixture ratio – reminiscent of Avogadro’s number without the exponent – of 6.02.  If you look at the chart above, you will find at that mixture ratio, the unusable masses are just about at minimum.  But anything that drives combustion in the engines away from that optimum point will lead to increased unusable mass.

The reason for such a wretchedly low mixture ratio is that the closer the MR is driven toward stoichiometric, the hotter the fire.  The turbine blades in the turbines that power the pumps feeding the engines can’t take a much hotter fire than results from 6.02.  Blades would melt, casings too, bad things indeed would happen.  Temperature sensors in the turbines should trip the engine to shut itself down before that happened. On STS-51 F in July of 1985, both voting temperature measurements failed and started reading higher than the turbine temps actually were:  51F became the only case of an SSME shutdown in flight and it was caused by faulty temperature readings.  But back to our story:   6.02 is ‘just right’. At least to two decimal points.  We spent 30 years arguing about what the next decimal point should be.

Important Safety Note:  if propellant depletion occurs, it must occur first on the oxygen side.  If the hydrogen runs out first, the last sputters at the turbine will be much closer to stoichiometric, and, well, bad things wil happen.  Did occasionally happen early in ground testings.  Big mess in the bottom of the flame trench at Stennis.  Not what anybody wanted in flight.

So STS-78 was  a real wake up call.  On that flight, the low level sensors in the ET flashed ‘dry’ just a fraction of a second before the engines shut down.  No problem ensued, there was enough fuel in the line to shut down safely, but it scared the bejabbers out of everybody.  At least everybody that understood what that meant.

Bet you never heard about that close call.

There is supposed to be a ‘fuel bias’ (extra hydrogen) of almost 1000 lbs.  But on STS-78, due to another instrumentation failure and some funny mixture ratio business, the engine burned right through that extra thousand pounds of hydrogen and all the other ‘dispersion’ allowances that were loaded in the tank.

It all started with plugged LOX posts.  LOX post plugs played a part in STS-93, too.

All space geeks need to stay tuned.  It really is rocket science.

 

 

Posted in Uncategorized | 13 Comments