I am told that there is a principle that both the President and Vice President never ride on the same aircraft, for a pragmatic if ghoulish reason. We can’t afford to lose both of them in one accident.
During a space shuttle flight, the entry flight director – specially trained and certified for the unique demands required of the decision maker during that critical operation – was always near at hand, only a phone call and quick drive away, night or day, all hours. If something were to happen to that critical individual, it was comforting to know that there was always at least one other trained and certified entry flight director available; maybe not assigned to the flight at hand, but ready willing and able to take over in a pinch. Usually there were up to about four flight directors so certified but various travel duties, vacations, etc., might take some of them out of the picture at least for a quick call up.
Only one time . . . well, that is this story. It starts with a very technical debate.
A long and contentious debate over some of the fine points of main engine limits management. That is as arcane and complex subject as there ever was about space shuttle operations. A subject that many felt passionately about. Unfortunately, there were passions on both sides of the argument.
The argument is highly technical so I have postponed that discussion to a post script rather than delay the main story. But both the main story here and the technical post script that follow are about risk management in very unusual circumstances and how NASA decides what to do in such cases.
Those who persevere to read and perhaps understand the technical details in the post script will be awarded the space cadet nerd award.
Suffice it to say that the debate had been going on for literally years as improvements were made to the engines and various launch abort procedures; the risk was always a balance between the two. It became essentially an argument between the Flight Directors and the Astronauts. Strong discussions frequently marked the process of arriving at the best possible resolution of an issue at NASA. This was more contentious than most but not the worst I ever saw.
We in the Flight Director’s office were smug in our confidence that we understood both sides of the risk trade; after all the developer of the main engine was our primary source of information. The JSC engineers responsible for the space shuttle orbiter were extremely nervous about the contingency abort profiles that the other side proposed using. The JSC engineers were little use in the discussion because they would not commit to allowing flight conditions to go one iota past the limits that were tested and certified (and which had safety factors built in).
Finally, we in the Flight Director’s office were suspicious of the arguments the Astronauts made. Most of the pilot astronauts were fighter pilots and test pilots. They were supremely confident in their ability to fly any vehicle under any circumstances. Probably too confident, we suspected. The Astronauts had practiced flying convoluted contingency abort scenarios in the Shuttle Mission Simulator. The SMS was great for near normal flight characteristics, but its computer models did not cover extreme cases where, for example, the wings could melt off. We believed the Astronauts were being deceived by a simulator that did not accurately represent the consequences of their flying techniques.
By 1998 the impasse had reached a boiling point: each side was convinced it was right, the other wrong, and the senior management confused about the arcane details of the arguments. There were numerous ‘white papers’ written with varying degrees of statistical calculations based on limited data and limited modeling and testing. It grew very heated.
Finally, the Space Shuttle Program Manager, Tommy Holloway, had enough. He directed the entire contentious contingent to travel to the Marshall Space Flight Center in Alabama where the engine experts resided. A face-to-face meeting with no holds barred. We were instructed to not come home until a consensus was reached.
The problem was getting a time when all the parties could make the trip. All of our schedules were overbooked. The astronauts and their management could fly their T-38’s, but the flight directors could only make the trip in one day by using the JSC management airplane, an ancient Gulfstream I turboprop nicknamed ‘NASA 2’. The only date ‘open’ was during a shuttle mission: STS-90 in April of 1998. Columbia was on an extended 16-day SpaceLab mission. All the astronauts not in space could go. The A/E Flight Directors were free.
All the certified Ascent/Entry flight directors plus those in training were sent. That’s right, all of us. The mission was going well, so the management felt it was OK for us to be gone for a day – up in the morning, back in the evening. After all, the orbit certified flight directors had some training in how to execute an emergency deorbit if necessary.
Ha. An emergency deorbit is not like a regular deorbit. Most orbit flight directors were overly confident from their one (1) simulation during training. The A/E flight directors had sweated over the real thing multiple times and knew it was not easy.
No emergency deorbit was required. The trip was stormy both literally and figuratively, but a consensus was hammered out.
That day of the trip there were thunderstorms forecast along the return flight path of our airplane as there often are that time of year. We were really in turbulence for much of the return trip; the Gulfstream bounced around significantly. Added to the old turboprop’s normal waddling tendency to dutch roll, this herky-jerky made all the flight directors wish they were on the ground. The T-38s could fly high enough to get out of the worst of the turbulence but the old Gulfstream was limited to a lower ceiling where the weather was stronger.
Later on, we talked about how stupid that trip had been. Clearly the odds of the plane crashing were low, but they were not zero. If all of us had been wiped out at once, there would have been a heck of mess to deal with and put the crew of STS-90 at significant risk.
Of course, that would not have been our problem!
Such an event never happened again in our shuttle history. Putting all the certified decision makers on the same plane at the same time, and especially not during a mission. I wonder if the STS-90 crew knew. I don’t think we told them.
Oh, the upshot of the meeting? A very bright young MSFC engineer named Mike Kynard gave a very convincing presentation that swayed us all to the side of the astronauts. It was a hard pill to swallow because I was the leader of the other side.
As a matter of fact, it took me a couple weeks of fuming to come around to the new way of thinking. Pride, I guess. But before the next shuttle launch, we had put the new procedures, rules, and other documents into effect.
Risk? Of course. Necessary? Maybe. Maybe not. The Program Manager got what he wanted: a decision that (eventually) all of us agreed to.
Late addition and thought: After all the effort: years of study, analysis, calculations, and most of all the contentious debates which affected interpersonal team relationships, what was the true outcome?
We practiced over and over again the procedures for these cases. Thousands of hours were spent in simulations reacting to multiple engine failures.
And the bottom line: Never used in real life. No crew member ever again touched the limits switch in actual flight.
Was it worth it?
At least we were prepared.
Postscript
So, for those interested in the technical details, here it comes. Non-technical folks should leave now. Abandon All Hope Ye Who Enter Here.
Large liquid fueled rocket engines are extremely complex, light weight, low margin, and temperamentally subject to coming apart in spectacular ways. The Space Shuttle Main Engines (SSME) were definitely a worrisome product and the most worrisome were the turbopumps rotating at 35,000 revolutions per minute. Every SSME had a computer built right onto the engine that monitored and controlled it. The main shuttle orbiter computers – the General Purpose Computers (GPC) – would issue the ‘big’ commands to the engines: start, shutdown, throttle up/down to x%, etc. The SSME controllers (Main Engine Controllers – MECs) would translate those commands into smoke and fire. For example, when receiving the ‘start’ command, the MEC would send a choreographed time sequenced set of valve position commands to spool up the engines. Each step exquisitely timed to build up thrust without overtaxing any part of the engine. And then the MEC would continuously monitor the ‘redline’ parameters: some critical temperature measurements, some well placed pressure measurements, and a few other parameters such as turbine shaft speeds. Each measurement was monitored to ensure the instrumentation was reading within reasonable values, which ‘qualified’ each measurement. Then, if all the qualified measurements for a monitored measurement exceeded or fell below a safe value (‘the redline’), the MEC would issue a Major Component Fail (MCF) which has an associated Failure Identification code (FID). And then check on instructions from the GPC before doing anything else.
One of the most worrisome values which was monitored was the temperature of gas exiting the high-speed turbines that powered the fuel pumps, normally about 1,300 degrees Fahrenheit. If something went wrong in the turbine it would quickly show up in a dramatic temperature rise; hopefully tripping the safety limit before the metal case melted through. Early versions of the engines had two sensors in the turbine exhaust gas manifold; later models had four sensors. If all were working properly (‘qualified’) and all measurements exceeded the programmed limit, the redline software would issue the MCF and FID. If the MEC had received a shutdown limit enabled command from the GPCs, the MEC would start shutting down the affected engine. This is what happened on STS-51F when both sensors on one of the turbines actually failed and indicated erroneously high temperatures. That was the only in-flight shutdown of an SSME in the entire program. That flight has a story of its own.
Back to the hypothetical story. As the engine shuts down, the chamber pressure – a direct correlation to engine thrust – drops below the 30% level, the MEC sends a software notice to the GPCs that the engine is in ‘shutdown mode’. In a benign shutdown, from the start of the MEC sequence to shut down the engine – the issuance of a MCF – to 30% thrust could take a long as 3 seconds. Keep that in mind. The GPC is not going to change commands to the other engines until it gets that ‘shutdown mode’ indication from a failing engine. In the rocket engine business, 3 seconds is a long time.
Once the GPCs received the ‘L, C, or R engine in shutdown’ bit, red light lit up on the cockpit dashboard – something we never wanted to see but often practiced in simulated mission. Simultaneously telemetry went out to the flight controllers on the ground and the red light on several consoles in the Mission Control Center would come on as well. Flight controller heart rates are not monitored by the flight surgeons, a good thing.
Next, depending the position of a certain switch in the cockpit, another command could go out from the GPCs – or not. On the orbiter flight deck center console, between the commander and pilot, were a number of switches, one of which was the SSME Limits switch. The position and actuation of that switch was the center of the controversy. It had three positions: Inhibit, Auto, and Enable. If the Limits switch was in the Enable position, and an SSME in shutdown mode bit were received, the GPCs would take no action, leaving the enable bit out there for the remaining engines. The automatic redline shutdown function of the two other engines which were running would continue as if nothing had happened. If a second or third engine MEC detected a redline violation, their MECs would execute shutdown as well.
If the Limits switch were in the Auto position, it meant that the automatic redline shutdown function on all three engines, which had been enabled until one of them entered shutdown mode, would be changed. At that point the GPC software would send the redline limits shutdown inhibit command to the two running engines. That means that if either or both of the two still running engines had a violation of any of their redline sensors, those engines would continue to run no matter what – or at least try.
But remember that there was a three second or so delay from the time that an SSME with Limits enabled would be spinning down before it issued the ‘I’m in shutdown’ bit. A bad problem that affected either or both of the other two engines could lead to them shutting down in that little time window. Good news/bad news depending on how you looked at it.
If the Main Engine Limits switch was in the Inhibit position, all three engines would have already received the redline shutdown inhibit command back whenever the switch got set. With redline shutdown inhibit set, any sick or failing engine would continue to try to run no matter what the MCF/FID status was. The MEC would not start the shutdown sequence no matter how bad it got.
Read those last few paragraphs again if they went by too fast.
The only SSME premature (unplanned) shutdown in shuttle flight history occurred in November of 1985 on mission STS-51F. A pair of faulty temperature sensors erroneously indicated a problem with the engine and the MEC shut that engine down. This occurred late enough during the boost phase that the mission continued to a completely successful conclusion. On STS-51F, those temperature measurement devices had a generic flaw, and in addition to the one engine that shut down, there were indications of measurement failures on other engines. The booster officer, Jenny Stein, saved the day by telling the Flight Director to instruct the crew to put the Main Engine Limits switch to inhibit to preempt the redline shutdown software. None of the engines actually had a problem, it was just sensor failures. After that flight, the temperature sensors were upgraded to a more robust, more reliable sensor type.
And remember that the turbine exhaust temperatures were not the only sensors being monitored by the redline detection software; there were other pressures, temperatures, and turbine shaft speeds being monitored. Any of those could shut an engine down if the software was enabled.
High performance liquid rocket engine turbopumps have a tendency to come apart in a hurry when something goes wrong. The SSMEs (now RS-25s) have been extensively instrumented and tested. Their computer control brain – the MEC – has a number of ways to detect an impending failure and turn the engine off before it comes apart.
The system was not completely foolproof, but should prevent an explosive catastrophe in most cases.
Mr. Bob Biggs was the Rockedyne SSME Program Manager all through development and into the middle of the shuttle flight program. He wrote a white paper describing the philosophy behind setting up that redline software. He knew from history that liquid rocket engines can fail very spectacularly in an incredibly short time period, or they can run in a degraded manner for some significant amount of time under some circumstances. The result of a degrading engine which had some redline exceedance was described by Mr. Biggs as having three equally probable outcomes. First, the engine could fail catastrophically in spite of the redline detection; that is to say the mechanism would violently come apart too fast for the redline detection software to react: ‘Inevitable catastrophe’. Second, the engine which would have failed catastrophically in a slightly longer time would be successfully shut down by the redline software and thus avoid the catastrophic failure. These were termed ‘intact engine failures’. The engine might not be reusable but it would not blow the back end of the vehicle off. Third, the engine was running in a degraded mode that exceeded the redline limit but would continue to operate – produce thrust – for an ‘extended’ period of time if the redline shutdown software were inhibited.
Mr. Biggs noted that the cutoff values for each redline were chosen so that equal probabilities would result in each case.
So, for an engine flashing a red light, 1/3 of the time it would blow up before the safety mechanisms could prevent it – always resulting in the loss of the vehicle; 1/3 of the time the safety software properly shut the engine down before it came apart and save the day; and 1/3 of the time the engine could have continued to run if the safety redline software were inhibited.
Clear?
On the other side of the risk discussion are the abort black zones.
The shuttle had a remarkable capability that most other rockets do not. In virtually all expendable rockets, if any one of the booster engines shut down prematurely — even if that shutdown is benign — the mission is over, the payload is going into the ocean somewhere, and the Flight Control Officer (FCO) at the Range Safety console is going to “send functions” (aka kaboom). On the other hand, the shuttle is designed — required — to be able to safely return the orbiter, crew, and payload to a runway landing following the premature benign shutdown of any one of the three SSMEs.
So, if any single SSME shuts down (fails) at any point in the launch phase, a safe return of the shuttle and crew is certified (read ‘guaranteed’) to result. All the various conditions have been examined, analyzed, simulated, and verified by computer analysis, wind tunnel testing, etc.
From launch to about 4 minutes into flight the shuttle can perform the scariest type of abort – a Return to Launch Site abort (RTLS). Prior to the first shuttle flight, somebody proposed that we do an RTLS on purpose as a test — they called it the “Sub-Orbital Flight Test (SOFT)”. Capt. John Young, the chief of the astronaut office and the commander of STS-1 was noted for his colorful memos that he would regularly send on topics of the day. The SOFT proposal drew a classic response: “RTLS requires continuous miracles interspersed by Acts of God to be successful” John wrote in 1980. And in fact, on STS-1, a trajectory bug lofted the shuttle trajectory higher than expected and an RTLS probably would not have been successful. John was right at least for those days.
Since those early days, RTLS was significantly improved and later in the program would most likely have worked if required — but I’m just as glad we never found out for real.
From about 2 1/2 minutes into flight until almost orbital insertion, the premature shutdown of an SSME could result in a Trans-Atlantic Landing abort (TAL). The shuttle keeps going forward but aims for Europe or Africa rather than orbit. The entry is very similar to a normal end-of-mission entry and the landing would occur at a prepared runway in Spain, France, or western Africa.
Later in flight, from about 4 1/2 minutes on, the premature shutdown of an SSME would result in an Abort To Orbit (ATO). In this case the shuttle presses forward and the system is designed to scavenge all the reserve propellant in the External Tank to get to orbit. Sometimes a dump of propellant from the Orbital Maneuvering System is required, sometimes other adjustments to the trajectory are required, but ATO missions can range from landing after a few orbits on launch day to having a fully successful mission depending on many variables. The longer the remaining main engines run, the closer to normal the shuttle can get. STS-51F, the only real-life example of ATO, performed a basically nominal mission.
The Abort Once Around (AOA) mission – which is exactly what it sounds like – was not used except for ‘systems’ problems like a big air leak from the crew cabin. From a trajectory standpoint, ATO and AOA looked very similar until the crew executed a ‘deorbit’ burn well after the main engines were done.
All of that is fine as long as two of the three SSMEs continue to operate and the shuttle remains under control. If control is lost, then all is lost. The shuttle does not fly sideways very well. A capsule might right itself, but the shuttle will break up.
The speeds and energy required to achieve earth orbit is almost beyond conventional understanding. To maintain a low earth orbit, a satellite must travel at over 5 miles each second. At even a fraction of those speeds in the “lower” atmosphere (below say, 80 miles high), air friction converts that vast kinetic energy into tremendous heat. Meteors or re-entering space junk are vaporized in a flash.
For those who may have forgotten their high school physics class; getting to high speed is critical to establish an orbit. To compare with commercial air travel may be helpful. Typical airline travel is around 6 miles high (30,000 feet or higher). Typical airline speed at cruise is around 500 miles per hour. To be in a safe orbit, a satellite needs to be at least 20 times higher (120 miles is a safe orbit for a few weeks) but must also be going about 35 times faster (17,500 mph). Energy, the real measure of the difference, is directly related to height (altitude) and the square of the speed. To achieve earth orbit requires roughly 1,000 times the energy that an airliner has at cruise.
This explains why war-surplus V-2 rockets with WAC Corporal second stages could reach orbital altitude in the late 1940’s but it took another decade to develop rockets that could not only get that high but propel a payload to the extreme velocity required for earth orbit.
Satellite launchers seek the most efficient way to get to orbit — they want to use the least energy (total impulse is the correct term) to get the maximum payload to orbit. Simplistically, one would want to go straight up to altitude first, then pitch over horizontally and accelerate, accelerate, accelerate. Expendable satellite launch vehicles go high early and then pitch over toward the horizontal for the largest part of the rocket burn. Frequently, the trajectory goes higher than needs to be and the rocket accelerates horizontally why simultaneously falling back toward earth! An expendable rocket sending a satellite on a one-way trip to orbit optimizes its trajectory by lofting high early on. If an engine fails, the mission would be lost no matter what the trajectory; abort modes and crew rescue are not a consideration.
Unfortunately, this does not work well if you want to protect a crew from a failure of the rocket engines. In a planned re-entry, the shuttle flew through the upper atmosphere at a fairly shallow angle so that it encountered thicker atmosphere gradually and the lifting body plus its stubby wings created lift. As the re-entry proceeded, the speed (or more precisely, kinetic energy) was bled off gradually limiting the maximum heating temperature and using lift generated by the wings to hold structural loads relatively low. For a suborbital ballistic re-entry, the trajectory is quite steep, encountering the denser parts of the atmosphere while the speed and energy are still quite high leading to a high heat impulse and very high structural loads.
For the space shuttle, a steep re-entry would result in a structure demise. This is called a ‘black zone’. Unsurvivable by the crew.
The use of ‘expendable’ Atlas rocket booster for the Boeing CST 100 Starliner or the Falcon 9 for the Dragon capsule were only made possible by reshaping their launch trajectory lower than that used by those same boosters in an expendable mode. This eliminated what would be the black zones even for those capsules for a premature engine failure. The cost of this ‘abort shaped’ trajectory is performance: the amount of payload to orbit is decreased by flying a safer, more depressed trajectory.
The trajectories for manned spacecraft try to avoid these steep re-entries even on an emergency case. The two real life cases using capsules turned out moderately well. On April 5, 1975 the crew of what would be known later as Soyuz 18A, Vasili Lazarev and Oleg Makarov, were more than half way to orbit – at orbital altitude and traveling at over 10,000 mph – when their second stage refused to be jettisoned. Separating the crew module from the malfunctioning rocket stage resulted in a very high altitude but ‘slow’ horizontal speed ballistic entry. During a Soyuz entry, decelerations of 5 g are normal. Due to the steep angle of the Soyuz 18A abort trajectory, the crew endured up to 21 g. Fortunately, they survived, the capsule did not break up, and they landed safely. But the two crew members never flew in space again. On October 11, 2018 the Soyuz MS-10 carrying NASA astronaut Nick Hague and Russian Cosmonaut Alexey Ovchinin suffered a booster failure at an altitude of just over 30 miles and a much lower speed than the 18A case. The spacecraft successfully executed an abort and the crew landed safely, experiencing only about 6 g, and were able to fly again another day.
The shuttle flew an ascent trajectory that is more depressed than expendable launch vehicles. This allows for potentially graceful abort trajectories following one premature engine shutdown. The program management never intended, never required, and generally never funded the studies or tests to define a capability to safely land with more than one SSME prematurely shut down. In the Flight Operations Directorate, that did not stop us from trying to figure out what the best way out was for those situations.
If two of the SSMEs quit but one remains running, there are some options. Early in the shuttle program there were no good options. However, capabilities were added over the years which slightly improved the possible outcomes in some parts of the trajectory. For missions headed to the International Space Station, plans for ‘single engine’ contingency cases called for steering toward the east coast of the United States with the potential to land at an emergency airfield somewhere on the Atlantic Coast of North America. However, many of these trajectories result in entry conditions that exceed the capability of the shuttle orbiter either thermally or structurally: the dreaded black zones. The possibility of executing a successful East Coast Abort Landing (ECAL) was far from guaranteed, but in that situation, it was felt to be worth a try. What is the other choice? If the shuttle doesn’t break up or burn up on the steep ballistic trajectory for an ECAL there is every reason to believe that a safe landing might occur. As Capt. John Young put it colorfully: “this gives you something to do while you’re waiting to die.”
If the shuttle could not get to a runway, the shuttle still possessed minimal crew escape capability. If the shuttle could achieve a straight and level glide (actually not very level since the shuttle glides like a rock), at around 30,000 feet the crew could jettison the side hatch and bail out with parachutes like some WWII bomber crew. This was considered better than ditching in the ocean or rough terrain. All studies show touchdown “off-runway” would likely not be survivable since the shuttle touched down at around 250 mph and would roll up in a ball. The subsonic, aircraft-in-control, bailout was really the only option. And in most cases the crew would have probably wound up sitting in a tiny inflatable rubber raft in the middle of the North Atlantic waiting for somebody to pick them up. We positioned C-130 crews with pararescue divers so that the wait should not exceed eight hours in the worst case.
Eight hours bobbing in a little rubber raft in the north Atlantic. Ever see the movie “Titanic”?
If all three SSMEs quit prematurely there is real trouble. Of course, many 3 engines out situations probably were caused by an ‘uncontained’ failure of one of the engines, possibly blowing the back end of the orbiter off, taking out the hydraulics, rudder, aileron control, etc. Those cases were not survivable. Assuming that all three SSMEs shut down benignly (not the most likely case) the situation would still be dire. There is little to no way to control the trajectory and the black zones for those cases were immense on the charts. For some few lucky cases a successful ECAL might result but then it’s not really a lucky day if all three engines quit, is it?
In the early shuttle days, the 1/3, 1/3, 1/3 statistics of safe engine shutdown with the redline software was pitted against the low likelihood of surviving a multiple engine out abort. The rule was, launch with the Main Engine Limits switch in auto. Then, if an engine fails prematurely – assuming it is not in the 33% case where it blows up – the system will inhibit auto shutdown of the two remaining engines. A sick engine might limp along but sensor failures would be precluded. The redline software was to be reenabled only very close to the end of the powered phase when the trajectory was acceptable if a second engine shut down.
As time went by, the safety features of flying one of those multi engine out trajectories were improved, the engines were improved and became more reliable. The statistics changed from 1/3, 1/3, 1/3 to something like 25%, 45%, 30%. The astronauts wanted to avoid engines blowing up so they proposed putting that dratted switch in the (hard) enable position. As the senior Ascent Flight Director, I was wedded to the idea that we should leave the limits inhibited until a safe reentry could be reasonably expected. White papers and presentations proliferated. Statistical studies that could make your head swim proliferated. Mathematical shenanigans multiplied. Tempers began to rise and positions became calcified.
The debate came to a head when the Space Shuttle Program Manager, Tommy Holloway, gave a direction to the Flight Directors and the Astronauts to resolve the issue, period. He directed the MSFC SSME guys got involved, after all they were the experts.
When we got to MSFC on that special day, we heard a lot of presentations and had several heated arguments. That is the NASA way, it seems. But the one presentation that won the day was from a young SSME engineer named Mike Kynard. He argued, from ground test data, that engine failures followed a ‘bathtub curve’. That is, the probability of engine failure was highest very early, shortly after start up. If the engine got to a stable running state (which would occur before liftoff), then the probability of engine failure was low but gradually increased with time. The probability graph looked like a sideview slice through a bathtub: high on the ends and low in the middle. Mike further argued that the redline shutdown was very reliable in flight; explosive failures were rare in the middle part of the curve and the software (again based on ground tests) was very unlikely to cause an erroneous shutdown.
That argument changed most minds. Together with the improving capabilities to achieve a two-engine out contingency abort, the verdict was as follows: (1) launch with the Main Engine Limits switch in Auto. Following a first engine failure the software will promptly issued redline inhibit commands to the remaining two engines, at that point (2) the booster office makes a quick assessment of the two running engines and, if operating normally, recommends Main Engine Limits switch action to the Enable position (sending out the enable command) then back to Auto (to allow for assessment if a second engine quits). And part (3), if multiple redline sensor failures are detected (as in STS-51F) to put the Limits switch in Inhibit (which was what was done for that flight).
That is how we flew out the remaining shuttle flights. Not everybody was completely happy, but everybody could live with it: the very definition of a consensus.
Tommy was happy. I think. The arguments subsided.
Are you ready to receive your space cadet nerd badge?
And for the record, I still think it was stupid to put all the A/E Flight directors on the same plane.