Breaking the [Flight] Rules

The official NASA history of STS-109 can be found on the agency web page: 

The last part of that official account reads: 

“After a successful launch, flight controllers in Mission Control noticed a degraded flow rate in one of two freon cooling loops that help to dissipate heat from the orbiter. After reviewing the loop’s performance, mission managers gave the crew a “go” to proceed with normal operations. The problem had no impact on any of the crew’s activities. Both cooling loops performed normally on de-orbit and landing.”

The official NASA description of what happened on STS-109 is a lie.

I should know.  I was there. 

Marianne Dyson and I worked together in Mission Control in the early days of the Space Shuttle program.  She had been in touch with Jim Newman, a crew member on STS-109.  Jim asked her if she knew how she figured in STS-109 even though she was not working in MCC.  Marianne asked me to fill in the story. 

Marianne: ‘I was in the Flight Activities Branch and was the book manager for the Post Insertion (nominal) timeline, Launch Day Deorbit, Loss of FES Deorbit, Loss of 2 Freon Loops Deorbit, and ‘If BFS Omit’ procedures. I was responsible for developing and validating all those procedures for STS-1, 2 and 3.’

These were serious and complex checklist procedures for the astronauts to use in flight.  Post-Insertion covered the period just after launch when the crew was turning the Space Shuttle into an operational orbital outpost.  The Deorbit procedures were all failure responses.  Of all the checklists in the pantheon of Space Shuttle procedures, the very hardest to perform was the dreaded ‘Loss of Two Freon Coolant Loops’ power down and deorbit procedure. 

To understand, a short discussion of the Flight Rules is necessary.  By training and practice the adherence to Flight Rules was burned into the culture of Mission Control.  Careful consideration prior to any mission went into the development, review and approval of Flight Rules.  During a flight, it was considered a cardinal sin to break a Flight Rule.

Buried deep in the book, page 18-105 (or as my electronic version has it, page 1947 of 2053 total pages), the operative words are found in a table (background rationale following in italics): 

Space Shuttle Operational Flight Rules  Volume A   All flights

Rule A18-1001 Thermal Go/No-Go Criteria

 FCL (2)         

Ascent Abort if:          Invoke MDF if:     Enter NPLS if: 

2 Lost                                  —-                               1 Lost

Loss of one Freon loop requires a PLS because the next failure (loss of other Freon loop) could result in loss of crew/vehicle.

Nominal ascent is continued so that more time is available to reconfigure for a one Freon loop entry and also because loss of one Freon loop is not an emergency. If both loops are lost, an emergency entry (ascent abort) is required because all cooling to the vehicle is lost.

If both Freon loops are lost, an emergency entry is required because the FC stack temperatures will reach the specification operational limit of 250 deg F within approximately 50 minutes. In addition, the electrolyte will reach the 25 percent operational limit in approximately 75 minutes. At this point, continued operation of the FC’s is questionable. This assumes . . .

To decode:  this Flight Rule required an immediate abort during the launch phase if two Freon Coolant Loops fail:  either Return to Launch Site, Trans-Atlantic Abort, or Abort Once Around depending on when the failures occurred.  Loss of only one FCL during launch did not require an abort but entered into the next step.

During the ‘on orbit’ phase of the mission, the failure of one of the two freon loops would result in ending the mission, planning a landing at the Next Planned Landing Site (NPLS) which by definition was within 24 hours.  A PLS landing was always to one of our three primary sites:  KSC’s Shuttle Landing Facility in Florida, the Edwards Air Force Base in California, or the White Sands Space Harbor in New Mexico.  Weather and timing would determine which of the three to use.  NPLS minimized the time exposure to the failure of the remaining good loop balanced with the safety of returning to one of the best landing sites.  A third category for some failures involved something called a Minimum Duration Flight but that was not an option for this equipment. 

As an aside, if both of the freon loops were to fail while on orbit, perhaps while waiting for NPLS, other rules mandated an emergency landing as soon as possible.  ELS sites were identified all around the world but did not have the equipment, long runways, and weather forecasting capability that the PLS sites did.

Since my copy of the Flight Rules dates from late in the program, it documents the history of STS-109. Section 18 contains a definition of ‘loss of a freon loop’ with 3 full pages of background ‘rationale’ (all in italic) describing what happened on STS-109 and the subsequent engineering analysis.

The Space Shuttle was an electric airplane; nothing happened without electricity.  There was no control, no anything without the electrical power generated by the three fuel cells.  Batteries were non-existent.  If the fuel cells did not make electricity, the shuttle was a rock. 

Fuel cells combine hydrogen and oxygen to produce electricity, water, and lots of heat.  That heat had to be removed to condense the water vapor in the fuel cells so it could be removed.  If the water was not removed, the fuel cells would ‘flood’ and the chemical process would stop working, electrical generation would cease.  The Freon Coolant Loops (FCL) circulated freon as a fluid to collect the heat generated in various parts of the orbiter and transport it to the radiators or the flash evaporators where the heat was dissipated out into space.  For redundancy the orbiter had been designed with two loops and each of those had two redundant circulating pumps. 

Mission Operations made sure that there was a crew checklist procedure for each and every single item that could break or otherwise fail on the orbiter.  Starting before the first Space Shuttle flight, the Mission Control team built step-by-step procedures which were documented, tested, practiced, and validated.  Which is to say, proven to work properly with either engineering tests or rigorous numerical analysis. 

In a very few cases, there were procedures written for two failures.  Since the loss of both freon loops could be catastrophic in a very short time, quick but complex action had to be taken by the crew.  This was one of the few checklist procedures to address two like-systems failures.  Marianne and a host of other folks worked diligently to provide a way out of that terrible situation.  The Loss of Two Freon Loops procedure required powering down much of the electrical equipment on the orbiter to both conserve electricity and reduce the heat generated which had to be removed.  The checklist was extremely complex, time consuming, and – worst of all – attempts to validate it were unsuccessful. In other words, working the checklist completely ‘right’ was unlikely to succeed. The probability of LOCV was high. 

LOCV – Loss Of Crew and Vehicle.

That is all background to what happened on March 1, 2002. 

STS-109 was a mission to service and repair the Hubble Space Telescope.  The crew and Mission Control team were well trained, excited about the mission, and dedicated to leaving the Hubble in perfect condition.  The Hubble Space Telescope Operations team was anxious to get their instrument fixed.  Prelaunch had been difficult with launch scrubs due to weather and technical issues.  When STS-109 finally left the ground all of us were pleased.

Ascent Flight Director for the mission was my good friend and colleague John Shannon.  The Lead Flight Director was Bryan Austin.  I was assigned to be the Mission Operations Director.  This was a replay of the team on STS-93, when I got launch fever.  I was determined not to fall into that again.  See https://waynehale.wordpress.com/2013/10/31/keeping-eileen-on-the-ground-part-ii-or-how-i-got-launch-fever/

My position as MOD was to coordinate with the other members of the Mission Management Team.  During the countdown and launch, everybody on the MMT except the MOD was in the Firing Room at the Launch Control Center in Florida.  The MMT included the Space Shuttle Program Manager, the JSC, KSC, MSFC, and SSC Center Directors, the Orbiter Project Managers (and project managers of all the other shuttle elements), the Head of the Astronaut Office, the Chief of Space Flight Safety, and almost all the other senior managers in the Space Shuttle Program.  The MMT was charged with making the most important decisions, if time were available, regarding any Space Shuttle flight. The MMT was the only body that was allowed, after deliberation, to change a Flight Rule.

When the Public Affairs Officer refers to ‘mission managers’ he means the MMT.  The MOD is not authorized to act without their direction.  Flight Rules are always to be followed unless the MMT rules otherwise. 

Since the countdown had gone so well, and the launch had been delayed, the MMT was really anxious to get home – back to JSC, MSFC, or SSC – as soon as the launch was over.  After nominal cutoff of the main engines (MECO), the management team had few short speeches, took part in the ceremony of the beans and cornbread, and quickly headed to the Shuttle Landing Facility to board the Gulfstream II management aircraft for the flights home.  Very limited to no communication was available while they were in flight.

In short, those of us in Mission Control were without senior leadership direction for those hours. 

Mission Control never considered ‘ascent’ to be over until after the OMS-2 burn put the orbiter into a stable, non-re-entry orbit, and completed various other critical tasks.  Among those required was closing the ET umbilical doors on the belly of the orbiter; changing the onboard computer system from launch to on-orbit software configuration; opening the payload bay doors and establishing freon loop cooling through the radiators; checking out the star tracker navigation system.  When all those items were completed, the crew was given a ‘go for orbit ops’.  Their first step after that was usually to get out of the bulky launch/entry pressure suits, activate the toilet, and start putting away the chairs on the middeck.

Sometime after MECO, sometime after the MMT got on the airplanes, but before getting a ‘go for orbit ops’, the EECOM (Environmental Electrical, Consumables Manager) spoke up.  Responsible for the cooling on the orbiter, he pointed out that one of the freon coolant loops was not operating at full flow.  The flowmeter in Freon Coolant Loop #1 was showing a flow of only 200 lbs./hour.

The failure limit defined in the Flight Rules was anything less than 211 lbs./hour. 

Technically, legally, analytically, FCL #1 was considered failed. 

Things got very quiet in the Flight Control Room. 

We all knew what that meant. 

At that point, theoretically, the Flight Director should declare a First Day PLS (Planned Landing) and the crew should start working the procedures to land at Edwards AFB on orbit 3.  The timing of the discussion made that dicey; starting down that path would have required a rush job to be ready to retrofire in about 90 minutes.  Also, theoretically, the crew should be directed to perform the Loss of One Freon Coolant Loop power down which was long and involved turning off quite a bit of the redundant equipment. That would leave the vehicle open to other failures. 

The Ascent Flight Director started doing what any good FD will do – asking a lot of questions of the EECOM.  It was the EECOM’s opinion that the Flight Rule was ‘conservative’, the flow rate was just below the limit, and there was enough flow to at least consider continuing.  Flight strongly wanted to get to a stable situation and sort options out. 

There was a short discussion with the crew about potential power down.  They were told to pull out the loss of one freon coolant loop checklist and review it, but take no action just yet.

Here is the crux of the situation:  if the other freon loop – the good one – were to fail, quit, leak, whatever; would the questionable freon loop provide enough cooling to avoid the dreaded 2 Freon Coolant Loop procedure? 

Maybe.

As John remembers it: “Definitely one of those “is it failed or not” cases and of course being stable on-orbit while you figure things out is not a bad idea.” 

The Flight Director turned around and leaned over the MOD console.  John looked at me and said: ‘better tell the MMT’.  But I couldn’t.  They were in the air. 

A decision was required. 

I punted.  I asked John what he recommended.  He was inclined to continue on rather than terminate.  I told him I concurred and the flight should continue.

Later on, I did get to have a long conversation with the MMT.  Much engineering analysis was turned on and worked on very hard during the entire mission. 

FCL #1 never regained full flow during flight.  So much for ‘Both cooling loops performed normally on de-orbit and landing.’

In the end, all the analysis indicated we made the right decision.  As John recently discussed: “This would be a good case for why you have a flight control team instead of just programming the flight rules into a computer. Human judgment and risk trades are critical to spaceflight operations.”

Indeed. 

In a pinch, the low-flowing freon loop would have provided just enough cooling by itself, with an appropriate powerdown, to avoid disaster.

But that does not change the fact that we broke the Flight rule that day.

Weeks after the flight, after all the engineering analysis was complete and double checked, the flight rule was revised.  The new limit at which a FCL is considered failed was 163 lbs./hour, less than the old limit of 211 lbs./hour. New procedures were written and passed validation.  Much work was done in case the situation should ever happened again. 

It never did.

But the decision on STS-109 launch day wasn’t made by ‘the mission managers’.  It was John, EECOM, and me. 

One final change:  when it came my turn to set the rules for the MMT, I added one more step after launch: the MMT had to stay on station at KSC, where there was data and good communications, until after the ‘go for orbit ops’.   

About waynehale

Wayne Hale is retired from NASA after 32 years. In his career he was the Space Shuttle Program Manager or Deputy for 5 years, a Space Shuttle Flight Director for 40 missions, and has retired from consulting and is currently a full time grandpa. He might be available for speaking engagements for the right incentives (coffee and donuts work!)
This entry was posted in Uncategorized. Bookmark the permalink.

19 Responses to Breaking the [Flight] Rules

  1. Norbon Clay Jones says:

    Enjoyed reading this reflection story. Agreed that MMT departure change would close many possible gap scenarios. I am left with a question, though. Why the reduced flow in the one freon loop, that apparently continued thru out the mission?

    • waynehale says:

      Post flight, during orbiter turnaround and maintenance, it was found that a piece of braze material was stuck in part of the line. That restriction caused the reduced flow.

  2. Rob Spohr says:

    Thanks Wayne,

    I always enjoy your blogs!

  3. rangerdon says:

    It takes smart, ethical, courageous people to explore the universe. And many of them are on the ground. Most of them don’t look like movie heroes, either.

    Excellent piece of history to illustrate that.

    Thanks for doing good work, Wayne.

    Don Scott, NASA-AESP, ret

  4. Dan says:

    It was great to see you an the Shuttle series aired last week!

    Did they ever identify why there was a low flow rate? What caused it in the first place?

    • Dan says:

      Fixing my post. It was great to see you interviewed in the shuttle series aired on TV last week.

      • Dave H. says:

        While I take my hat off for everyone who was part of the documentary, it hasn’t changed my opinion as to what the root cause of the accident was: the “in God we trust all others bring data” paradigm.

        While it sounds really great in theory, it failed this time because it doesn’t allow for any “what ifs” to enter a discussion. Until they blew a hole in the RCC the data didn’t exist to support investigation of possible damage.

        It wasn’t anyone’s fault; it was following the paradigm.

        The solution is to abandon the paradigm in favor of a “no substitute for maximum input” one.

      • waynehale says:

        Hindsight is 20/20.

  5. Dave H. says:

      “The flowmeter in Freon Coolant Loop #1 was showing a flow of only 200 lbs./hour.

    The failure limit defined in the Flight Rules was anything less than 211 lbs./hour. 

    Technically, legally, analytically, FCL #1 was considered failed.”

    Maybe I read it too quickly, but I missed the part where the system engineers began to run tests to determine how much cooling efficiency was lost by a less-than-nominal coolant flow rate. Sure, the low flow limit was set for 211 lb/hr, but what was the *functional* effect on the efficiency of the system? Once that has been determined, then you have the data that you need to make a case for go/no go.

    Apparently, that work hadn’t been done or else the low flow rate would have been a hard no-go. If you aren’t pumping enough coolant you aren’t removing enough heat. Learned that lesson when the impeller on my car’s water pump spun off during a drag race.

    • waynehale says:

      Interesting question with a more complex answer than you might have expected. Since there was no complete ground test of the entire cooling system, most decisions regarding thermal constraints were analytical based on piece part testing, specification operating characteristics, etc. Heat dissipation by the radiators or flash evaporator system was tested, albeit somewhat imperfectly, in thermal vacuum chambers. All this data went into the thermal models.

      The thermal analysis models for the Space Shuttle Orbiter were very complex, and following the first flights early in the program we learned that many of them were not as accurate as we needed. The heat generated by various components which was transferred into the FCL had many components and the assumptions – or initial conditions – were also complex. Worst case analysis with these incomplete thermal models was often the best we could do before actual flight experience improved the thermal models. Updates to the thermal analysis models through the early years were common.

      Since the loss of 2 Freon Coolant Loops was considered catastrophic by the program, the senior management was not inclined to focus a lot of analysis time on that problem. Analysis time was a limited resource that decreased as the program become ‘mature’. We had a rough, worst-case analysis using the pre-first-flight thermal models to determine the flow rate necessary to possibly survive. By STS-109 there had been little review of that analysis.

      Within hours of recognizing the problem on STS-109, the Space Shuttle Program management authorized new resources (systems engineer’s time) to do a more detailed and updated analysis. That took much longer than the flight time of STS-109 to be completed, reviewed, vetted and documented. That was the basis for updating the flight rule some weeks after STS-109 landed.

      During the actual flight, since both loops were operating (standard configuration), it was difficult to assess how much heat was being removed by the ‘good’ loop and how much was being removed by the ‘degraded’ loop. Early analysis gave some confidence that operating on only the ‘degraded’ loop – and with a major powerdown – the orbiter could survive.

  6. kevincrusch says:

    Just curious – what’s the normal flow rate in the freon loops? Are coolant demands lower, higher, or the same during the first couple hours of flight (so you can settle into orbit and assess things such as an early landing)?

    • waynehale says:

      At this point, my memory of the normal flow rate is foggy. The cooling demands certainly vary by phase of flight but I don’t believe there is much latency – doesn’t take much time steady out.

  7. obelisktoucherd4f87dc29c says:

    Raises an interesting scenario. If FCL 1 is plugging along at an “unfailed” flow rate of 165 lb/hr (darn brazing material) and FCL 2 decides to buy a farm, which checklist gets pulled out – NPLS (because FCL 1 is unfailed) or Loss of Two Loops (because 165 lb/hr might quickly become 162)? Edge cases suck…

    Also, was the “Loss of Two Freon Loops” checklist ever revisited in an effort to make it “validatable”?

    • waynehale says:

      For STS-109 a special checklist update was prepared to make a severe powerdown but more benign than the Loss of Two Freon Coolant Loops that was in the checklist. Even after the analysis was completed after STS-109 landed, the Loss of Two Freon Coolant Loops procedure could not be validated. Programmatically the case was still considered catastrophic.

  8. Always a pleasure to read. Thank you!

  9. Spacebrat1 says:

    any going to Space is unforgiving, but we are meant to see how good we can do. Was telling a friend today, that after seeing STS lift the bulk of over 1,000,000 lbs 200 miles up to build the ISS) that it took a team of around 33,000, and after watching and shooting all the ISS missions (55+ ?)think it was the most amazing engineering feat ever accomplished, imho. I know Atlantis came back from STS-135 w 0 tags (so said the Big Boss anyhow). I miss it’s Utility as a Space Transport. 

  10. cthulhu says:

    Fully agree with the value of human judgment in these situations, both in the near term (start the non-essential power down checklist now, or get the vehicle in a more steady-state situation first?) and the medium term (is the documented lower limit sacrosanct or is there 5% wiggle room?).

    And never underestimate the value of having smart and well-trained and experienced responsible engineers staffing the control room!

  11. Spacebrat1 says:

    good ole American know-how. for some reason, I think it springs from Freedom. You have nerves of steel…

Leave a comment