STS-93: Dualing computers

In the early days of rocketry, when subsystems reliability was low, hard experience led designers to add redundancy for critical functions where they could.  Redundancy comes at a cost:  increased weight, increased complexity, unintended interactions, complex schemes to manage the redundancy, etc.  Now days, subsystem reliability is much higher, especially electronic parts.  So today we have sophisticated discussions about dispensing with redundancy and living with single string high reliability systems, even in critical areas.

Whenever I am engaged in one of these discussions I remember STS-93 where a 2 cent screw and a 10 cent length of wire demonstrated the vulnerability of an otherwise highly reliable critical system.  “Dualing computers” is not a misspelling, it is a safety concept.

Each Space Shuttle Main Engine has a computer mounted right to its side to run the complicated functions required for safe operation of that very complex, high energy device.  And every SSME controller is made up of two redundant computers:  DCU A and DCU B (Data Control Unit).  The A computer is always in control while the B listens along – until and unless the A computer fails; then the B computer takes over.  Each computer has its own way to control every valve and its own set of instrumentation required to run the engine:  pressures, temperatures, valve positions, turbine speeds.

However, when both A and B are working, they share data.  So there is a pressure measurement for the main combustion chamber wired to DCU A and another one wired to DCU B, but when both computers are working they share data and make computations based on the average of the two chamber pressure measurements.  If one of the computers fails, the other carries on but then has only one measurement to make calculations from – no more averaging.

Almost all the telemetry that is sent from the engine to Mission Control comes from the A computer; if it fails the B computer sends only a few data points, not nearly as many as the A side.

When STS-93 had its little problem, every engine kept working just fine even though two computers on two separate engines went silent.   There was never another case of SSME computer loss in the entire suite of shuttle flights.  These computers were highly reliable. The computers never failed because of an electronic part problem or a software error.

But in systems design, like warfare, defense has to be made at the most vulnerable point.  For the SSME controllers this was the power source.

If the shuttle designers hadn’t built in redundancy, two of the three engines would have shut down just after lift off.  The results from that would not have been good.  The crew has a procedure to run called “2 out First Stage”.  It is one of those procedures that Capt. Young used to describe as “keeping busy while you wait to die.”

The next time someone tells me they have a highly reliable system that doesn’t need redundancy, I will remember STS-93. I hope you do too.

 

About waynehale

Wayne Hale is retired from NASA after 32 years. In his career he was the Space Shuttle Program Manager or Deputy for 5 years, a Space Shuttle Flight Director for 40 missions, and is currently a consultant and full time grandpa. He is available for speaking engagements through Special Aerospace Services.
This entry was posted in Uncategorized. Bookmark the permalink.

26 Responses to STS-93: Dualing computers

  1. Brian Dudenbostel says:

    Wayne, good to see you blogging again…

    Waiting for the next one on the edge of my seat. You should write a book!

  2. denniswingo says:

    During the return to flight after Challenger I worked in Huntsville for a company that build a data multiplexer for the SSME telemetry stream. It was called (I kid you not) The Kennedy-Marshall Redundant Transmission System (KMARTS). The SSME telemetry was actually transmitted down the exhaust plume of the engine during powered flight.

    The data came down to KSC first. As one might expect it was fairly noisy due to the ducting from the plumes. It could come down at 64, 96,128, and 192 kilobits/sec. Our multiplexer would take that data, and it could dynamically reformat the data, and then retransmit it at 256 kilobits/sec up to TDRSS. Then it would come down to Marshall and JSC and then on to you guys on the console.

    The folks in the firing room at KSC fought this tooth and nail because they did not trust microprocessor based system and this was literally the first microprocessor based instrumentation in the firing room. It it was not a MODCOMP it was not wanted! We finally got it installed and as far as I know it worked fine for many years.

    • denniswingo says:

      I hit return too quickly. My boss designed the system and I debugged them, tested, them and certified them for shipping to the cape.

      • waynehale says:

        That high rate stream came down on an S band FM link to MILA. It contained even more information than the orbiter S band PM link which included dowlist from the SSMEs. But the FM link was not mandatory and we would have launched if it were inoperative. The orbiter PM was mandatory and we sometimes had to hold the launch to troubleshoot problems with that system

  3. Burke says:

    Wife: go to bed honey.
    Me: I will, but just found out that Wayne is writing again.

    …welcome back…

  4. Chris Ramsay says:

    Yet another nice post Wayne. I trust that you are counseling the Commercial Crew providers about redundancy. I was in Orion meetings where the vendor said “We don’t need no stinkin’ redundancy, we have DFMR (Design for Minimum Risk). Fortunately, by co-workers in S&MA won the day and got redundant processors in Orion.

  5. waynehale says:

    I think there is a place for DFMR and reduction of redundancy in highly reliable systems. But the whole system has to be highly reliable and sometimes designers forget where the vulnerabilities lie.

  6. Jeff Spencer says:

    Wayne,
    DCU stands for Digital Controller Unit, not Data Control Unit. It’s really interesting reading your perspective on these events after all these years. I guess you’re going to follow up with the rest of the story on the lox post pin and the fuel leak? What were the chances of these two failures occurring on the same flight…

  7. Dave Moon says:

    I can guess where the 2 cent screw and the 10 cent wire belonged, but you didn’t actually say. Maybe got interrupted before you finished writing the post?

  8. M. Mraz says:

    Long ago when the shuttles were still flying, I worked for a large consumer software company. One day we were honored by a visit from and presentation by one of the JSC engineering managers assigned to the software in the general-purpose computers (GPCs). When he mentioned that to change just one bit of flight software code required many months of analysis, testing, and qual, several of the “”software developers”” in the room snickered and made snide comments under their breath. I thought to myself, “Thank God these dopes write windoze code and not flight software for Boeing.”

  9. Miles Archer says:

    Thanks. These are great articles.

  10. ppatin says:

    Wayne,

    I’d read that after Challenger there were some pretty significant changes made to the shuttle’s abort options to a crew should at least have been able to bail out if they lost multiple engines. It sounds like you didn’t have a lot of faith in those abort options?

    • waynehale says:

      None of the multiple engine out abort options were guaranteed to work. The abort procedures for first stage were particularly dicey. If the ship could get to a straight level glide at subsonic speeds, the crew bailout was still problematic; bailing out in the middle of the North Atlantic and waiting in a rubber raft for somebody to find you isn’t fun. But the biggest risk for the first stage abort was getting safely separated from those solid rocket boosters when they burned out and then surviving the g forces during pullout. No, I didn’t have a lot of faith in those abort options.

      • davidw says:

        If the orbiter did lose two main engines during first stage, would the stack have been able to hold together structurally? Would the thrust from the SRBs overpower the connection points without enough thrust from the MEs?

      • waynehale says:

        Early on the structural capability of the stack was not positive for 2 or 3 SSME out in first stage. Somewhere around the year 2000, some structural improvements were made in the ET (which transmitted all the loads) which made it more likely that the stack would hold together. Another problem never really overcome was stability and control as the SRBs burned out – sputtering along at Pc about 50 psi they still generated a lot of thrust, but not always equally. With only one SSME to control pitch and yaw (and orbiter RCS trying to control roll) separation at Mach 4 ish would was very problematic.

  11. Dan B says:

    Love all your posts.

  12. Lewis Van Atta says:

    Like other folks said: waiting on the edge of my seat for the next chapter. Additionally, I always seem to enjoy the eloquence of Capt. John Young. 😉

  13. cthulhu says:

    I’m surprised that the A side of the DCU was always in charge. The dual master-slave systems I’ve been involved with would alternate the in-charge side (using not always easy to design criteria) on a per-power-cycle or per day or per something basis to ensure that everything was loaded equally over time.

    • waynehale says:

      Both A and B sides worked equally; and the command paths were checked out every flow. Remember that ascent for the space shuttle is only 8 and 1/2 minutes long. The SSME controllers ran for about a day prelaunch and were shut down on orbit. Not a lot of on time to accumulate

  14. M. Mraz says:

    “sputtering along at Pc about 50 psi they still generated a lot of thrust, but not always equally.” I always wondered about that. I asked one of the shuttle pilot-astronauts whom I met whether there were yaw excursions as the SRBs tailed off but I don’t think he understood my question (or was politely not answering).

    • waynehale says:

      Given three full throttle SSMEs and the highly effective thrust vector control of the orbiter, the crew normally never felt any bobbles during SRB separation. Any thrust imbalances were fully overcome by the system. If two or all three of the SSMEs had been shut down (a case that never occurred during any shuttle flight) it would not have been so smooth.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s