It was about a quarter past midnight on July 23, 1999 when the Ralph Roe, the Shuttle Launch Director, told Eileen’s crew that they were go for launch and wished them good luck. The launch, which had been scrubbed late in the countdown on two previous attempts, was going to be about 7 minutes late due to some transient communications system problems at the Merritt Island Launch Annex (MILA) tracking station.
In Houston, I was sitting on the back row of mission control acting as the Mission Operations Director. Sitting next to me was Chuck Knarr, former flight director and executive in the United Space Alliance. In the hot seat at the Flight Director position was John Shannon with LeRoy Cain sitting beside him to watch the weather and keep track of the checklists. Capcom was Scott “Scooter” Altman.
The old joke was: “which F-14 pilot do you want as your wingman? Cobra, Maverick, or Scooter?” The correct answer was Scooter – Cobra and Maverick were just Hollywood movie creations for ‘Top Gun’; in that movie all the real flying was done by Scooter.
Down in the trench was Flight Dynamics Officer (FDO) Lisa Shore with Carson Sparks sitting next to her as TRAJ officer. On the Electrical General Instrumentation and Lighting (EGIL) console was Tim North – about to get the ride of his life – and on the back row was Booster Officer Jon Reding, one of the coolest heads to sit at that position. A well seasoned crew to watch over the launch phase for Eileen Collins, the first woman to command a Space Shuttle mission, her pilot Jeff Ashby, and the rest of the unusually small crew of Columbia.
STS-93 carried the heaviest payload the shuttle ever launched; the Chandra X-ray observatory (formerly known as the Advanced X-ray Astronomy Facility or AXAF) and it IUS booster. The Boeing build solid rocket upper stage called the Inertial Upper Stage was extremely reliable, but a heavy load. It had an interesting history; originally called the Interim Upper Stage (same acronym), its more powerful replacement was cancelled for budgetary reasons and the IUS lived on.
Everything appeared quiet, but problem lurked just beneath the surface.
Post flight we calculated that the LOX load was less than planned; due to slight temperature variations, tank volume, placement of the ‘full’ level sensors and other factors, Columbia was going to launch this night with 897 lbs less LOX in her tank than was intended. Given everything else working normally, there should have been a performance reserve still in excess of 3,000 lbs. But everything else did not work as precisely as planned.
Columbia, the oldest shuttle, had flown 25 missions by this time and the tiles and thermal blankets showed it; scars and stains over almost all of them. And hidden, deep down inside, were the flaws and insults that any real world flying machine always carries: some minor, some major, some that never had a consequence, and some that ultimately contributed to her loss.
One flaw lived in a stretch of 22 gauge kapton insulated wiring, nearly half way down the payload bay, where a single strand of AC current carrying wire had rubbed and chaffed against a screw head which had a minor burr where the tech had over tightened it long ago, and which other techs had probably stepped on during turnaround refurbishment. In the Right SRB, a hydraulic pressure sensor was not completely connected to its wiring and even though it was showing accurate pressure now, the connection could shake loose under vibration just enough to open the circuit and give a false low pressure reading. In the center main engine the two chamber pressure measurements A&B were reading exactly the same with the engines off, but the B channel had a bias that would only show up when the engine reached full throttle; it would read 12 psi high. This was outside the allowable error but still small compared to the 6,000 psi operating pressure. On the right engine, a deactivated LOX post in the main injector had a gold coated pin that wasn’t quite as tightly seated as it should have been, whether it was installed slightly off or worked loose over a number of firings, in the end it didn’t matter. Columbia had a lot of really minor flaws that didn’t affect today’s story, but some of which would show up later with other consequences.
Viewgraph PowerPoint spaceships are always flawless; real spaceships are made and maintained by fallible human beings and are less perfect.
Somebody posted the NASA TV video of the launch on YouTube – you can watch it at:
So the countdown started up at T-9 minutes as it always did, seemingly uneventful on the surface, but we all knew what was about to happen and the adrenalin built up as we waited. Ashby started the APUs right on time, pumping high pressure hydraulic fluid to all the valves in the main engines among other things. At T-31 seconds the onboard computers took over, calculating the exact launch parameters, allowing the IMU gimbals to turn freely, opening vent doors so that pressures could equalize during the climb to space. At T-10 seconds the ground computers sent the last command onboard – the electronic equivalent of ‘go for main engine start’. The onboard computers reacted immediately by firing off the ‘sparklers’ called ROFIs (never can remember what that stands for) to burn off the excess hydrogen down by the main engines. At T-6.6 seconds the onboard computer commanded the first main engine to start, and staggered the second start by 140 milliseconds, and then the third by the same delay. The SSME controllers commanded the spark igniters on, and started the complex choreography of valve openings to bring the three main engines safely to roaring life.
By about T-3 seconds, all engines were up and operating at 100% of rated power level. Exactly when it happened is not clear, but on the right engine, the gold plated pin from LOX post 32 in row 13 came shooting out. Just like a bullet it went through the narrow part of the converging nozzle and flew out into the nozzle extension.
Now two scary things could have happened. First, the LOX post, which was pinned for a reason, could have failed allowing LOX to pour into the engine cavity where the hot hydrogen was introduced. Remember from an earlier post that this could have caused an explosion, or melting of other LOX posts, or other really bad things to happen. Failure of the LOX post was considered a CRIT 1 failure – loss of vehicle and crew ‘promptly’. Fortunately – luckily – the LOX post held together for the next eight and one half minutes. How close we were to disaster has never been determined.
Second, the nozzle extension could have failed. Early on in ground testing, one of them did fail – spectacularly – generating a huge explosion and fire. Those nozzle extensions were one of the weak points of the entire complex engine system. Made of 1080 long narrow stainless steel tubes braze welded together, the nozzle extension was cooled by liquid hydrogen flowing down its length. So cold on the outside that a layer of frost condensed out of the moist Florida air in seconds over the outside of the nozzle extension, the temperature inside was hot enough to melt any metal. These tubes were a bear to maintain and the entire extension had to be replaced periodically. The tubes were prone to split which required complex weld repairs until there were just too many repairs and a new nozzle extension had to be installed. The next upgrade to the SSMEs was to build a more robust channel wall nozzle extension. The shuttle program ended before that was done. Someone had calculated that if 5 adjacent cooling tubes split or were otherwise ruptured, there would not be enough local area cooling and a burn through would occur, causing a cascading failure of the nozzle and . . . a CRIT 1 failure.
The bullet shaped LOX post pin hit the side of the right engine nozzle extension about two thirds of the way to the end with great force. Just by sheer luck, three nozzle tubes were breached. Three adjacent nozzle tubes lost cooling and started leaking hydrogen into the hot stream of gas coming out of the engine. Three tubes, not five. The adjacent cooling tubes kept the nozzle from failing during the eight and a half minutes the engines operated.
At the time, the LOX post pin and nozzle tube leak went undetected. Look closely in the video and see a blue streak on the side of the nozzle. Nobody in Mission Control noticed it until afterwards during the photo review. The nozzle was leaking 3.5 lbs of hydrogen every second.
Several things happened as a result. One might think that this would lead to running out of hydrogen, which, as an earlier post noted, would have been very bad. But the engine reacts in to the loss of hydrogen in this area in a very interesting manner. This leak was downstream of the hydrogen flow meter so to all observers – human and digital – it appeared that the right amount of hydrogen was going into the engine. The real loss of hydrogen flow to the turbines driving the pumps on the right engine was lower mixture ratio – the mixture ratio now closer to stoichiometric – and the turbine temperatures increased by about 100 degrees. Normally there is about a 200 degree margin between the normal turbine temperatures and the redline value that will automatically shut an engine down. The nozzle leak immediately used up half the operating margin to the redline.
Since the chamber pressure dropped slightly due to the loss of fuel for the fire in the main combustion chamber, the SSME controller commanded more oxygen be sent to the MCC. That may sound strange but it is the way the SSME computer controls the chamber pressure. Again, the mixture ratio is off and not only is the fire hotter in the MCC, but more oxygen is being consumed. Due to both of these processes, post flight calculations showed that the LOX tank should have been short about 3,000 lbs – translating to about 200 fps short at MECO. That large velocity shortfall did not happen because we were lucky: what happened to the Center SSME corrected that. Hold that thought.
Meanwhile, about a minute after launch, the booster officer and his team recognized the fact that the right engine turbine temperatures and speeds were higher than normal. They correctly identified that this might be due to a nozzle leak, but there was another potential anomaly that also had the same signature. If the oxidizer pump started to lose ‘efficiency’ (blades rubbing, pump clogging, etc.) it would look the same. As the SSME controller commanded mixture ratio changes to keep up with the loss of efficiency on the pump, the turbines would reach their temp limit and the engine would have to throttle down to prevent a shutdown: this was called ‘thrust limiting’. Until the SSME went into thrust limiting, the Booster team could not tell the difference between an oxidizer turbine/pump efficiency loss and a nozzle leak. The instrumentation just wasn’t precise enough to know what was going on. Jon and his team correctly identified that the engine was running off nominally (‘off tags’) but could not quantify it. Later on, when the FDO asked him, the Booster officer had to report that none of the engines were ‘suspect’. All these terms were precisely defined in the flight rules and had specific actions for the flight controllers and crew to take to maximize safety. But this leak was too small for any of that.
The reason the booster officer took over a full minute to identify problems on the right engine was because of distractions: the AC 1 short – which was real, and had real consequences – and a big red light that went off on his console saying the Right SRB hydraulic system pressure had dropped to catastrophic levels – which was not real. Each of the solid rocket boosters put out over twice the thrust as the three SSMEs put together. The steering mechanism was critical and each SRB had two separate hydraulic systems for redundancy. If both hydraulic systems on one SRB failed there would be no steering on that side; probable loss of control, and another one of those ‘prompt’ CRIT 1 failures – loss of vehicle and crew. So having that red light go off right in front of his nose took a few seconds to sort out. It clearly was a transducer failure, not a real hydraulic system failure since all the other parameters were OK. But it surely got the heart rate higher. Good thing it wasn’t displayed to the crew; they never knew about it.
Failed SRB TVC could, in the worst case, result in the shuttle flying directly back to the Launch Control Center. Given worst case reaction time by the Range Safety officer (7 seconds), large chunks of SRB – probably with the propellant still ignited – could land on the LCC. I used to think about that when I sat in the front row of the LCC, up near the glass. Those big louvers in front of the windows? They were for shade; not only no protection from flying debris but likely to become lethal missiles themselves.
As Columbia lifted off the KSC public affairs officer, Lisa Malone made her little speech about how this flight advanced X-ray astronomy and women’s authority in the world. In the MCC, we didn’t listen to the PAO commentary so we were shocked to hear Eileen report “Fuel Cell PH”.
I’ve written about fuel cell PH failure mode before: https://blogs.nasa.gov/waynehalesblog/2009/01/07/post_1231342021582/
To make a long story short, it means that one of the fuel cells might be failing “It’s the Kaboom Case, Flight”. On board Columbia, the Master Alarm klaxon had gone off and there were two messages on the failure summary page: H2O pump, FC 1 PH. As it turns out, these were symptoms, not the cause of the problem. The Fuel Cell was not breaking down, ready to explode, and the water pump (used for cooling) was not shut down. Instead it was one of the alternating current buses- AC 1 Phase A – which had shorted. About 5 seconds after liftoff, for about half a second, it sputtered along with a short of up to 72 amps. That is a lot of current!
The automatic protection cut in and shut down the affected part of the circuit. The Fuel Cell instrumentation that monitored for the escape of potassium hydroxide ( a strong base, high PH) used this bus, and this unpowered instrumentation gave an erroneous alarm. The water pump slowed and then resumed after the circuit breaker popped and normal power was returned on all the other AC1 equipment. An avionics bay fan (providing air cooling) had slowed but not enough to trip an alarm. What remained was loss of power to the SSME controllers.
The A computer on the Center SSME lost power, never to be recovered. The B computer (DCU B) immediately took control and the engine ran on normally. Well, almost – back to that in a moment. The Right Engine lost its B computer, but the A computer stayed in control and the engine, with its hydrogen leak, hung in there.
Loss of DCU A on the center engine caused a couple of interesting consequences. First of all, the MCC lost almost all the telemetry on the center engine; valve positions, turbine speed, some temperatures, almost everything was gone with the lost of DCU A. The folks that programmed the SSME computers put everything on the A side and very little – just Main Combustion Chamber pressure and hexadecimal word indicating any problems in the engine – on the B side. We were on ‘backup data’. The B computer also lost all the sensors that were being read by the A computer. Most importantly, the A chamber pressure transducer went away. The B computer no longer averaged the A and B transducers to get its chamber pressure for the basis of engine control but only had the B transducer – and it was reading 12 psi high! This may not seem like a lot, but it caused the center DCU B computer to command a throttle down on the center engine. All of a sudden, less LOX (and hydrogen) was being consumed. Almost invisibly, the Center engine was making up for the large shortfall of LOX that had been created by the nozzle leak on the right engine.
How lucky we were. Instead of being 200 or more fps short at MECO, possibly leading to an abort landing or requiring two tons of OMS propellant to make up, we wound up being only 15 fps short, well within the capability of the OMS budget.
Less than a minute off the pad and multiple failures. What were you thinking Sim Sup? Surely that is not a realistic case. Except it was real.
During a shuttle ascent, the crew normally has very little to do other than monitor the automated systems. As the velocity builds up, the MCC – the FDO – keeps the crew informed of what ‘abort mode’ they might have to use if one or more main engines were to turn off prematurely. As long as nothing happens, the crew happily gets to sit quietly until they are in orbit.
For STS-93, the big action for the crew was to take the AC bus sensors to off. Since the inverters – the devices that change the direct current electricity into alternating current – had a failure mode that could possibly make them over volt and fry their associated equipment, the automated circuit protection was enabled to swiftly react to any voltage issues. For the record, that never occurred in 135 shuttle flights. After losing two SSME computers, another AC bus dropping offline would cause one of the two affected main engines to shut down, probably requiring an abort. There was a slight chance the automated voltage control equipment could erroneously trip off an AC bus. Given the situation, the flight rules directed the crew to disable the automated shutdown. So Scooter told Ashby to take the AC Bus Sensors to off. That was the only action that the crew had to take the entire ascent.
They were blissfully unaware of the erroneous R SRB HYD pressure transducer. They did not get the word that the FDO was seeing the affects of . . . .something . . . .which turned out to be the combined effects of the right and center engine anomalies: a ‘6 fps thrust update in the ARD’ – not huge, not entirely atypical, but something going on. Not enough to change any calls: ‘no suspect engines’ ‘no under speed predicted’ ‘nominal shutdown plan’ were phrases – all carefully scripted by flight rules – that were passed between the flight controllers.
And of course, the flight controllers were unaware of all of what transpired inside Columbia. That may be a good thing.
As it turns out, there was a LOX shortfall of 405 lbs. The LOX low level sensors detected depletion of the LOX, commanded main engine shutdown approximately 0.15 seconds earlier than guidance would have wanted, and the final shuttle velocity was about 15 feet per second short of what was desired; out of more than 25,500 feet per second. There was enough margin in the Orbital Maneuvering System load to make up about 300 lbs, and the flight proceeded normally.
Except Mr. Shannon, when informed of the situation shortly after main engine cut off said loud and clear on the Flight loop: “Yikes. We don’t need another one of those.”
If you look at the video of NASA TV just after those words, you can see me on the telephone, talking to the program manager down in Florida, about what this all meant.
It was just a couple of minutes later that one of the projectors hanging from the ceiling in Mission Control – the projectors that put up the displays on the front screens – overheated and started smoking. Quick action by the Ground Control officer to shut it off probably prevented a fire in the MCC, which would have lead to an evacuation.
That’s really over the top, isn’t it Sim Sup?
How much more exciting can you get. Give me a nominal, boring launch any day.
From friends at NASASpaceFlight.com watch the annotated video:
Now for the lesson: Be prepared. Spacecraft are complex and can fail in complex ways. Never, ever let your guard down. Practice for disaster all the time.
And remember: Murphy does not play by the rules.