After a Lion Air 737 Max crashed in Indonesia last October, killing all 189 people aboard, Boeing anticipated that the plane’s flaw would be relatively straightforward to fix. A faulty sensor had caused an automated system to kick in and push the plane’s nose down; the startled flight crew struggled against the system and eventually lost control. Boeing set about rewriting the control-system software, so it wouldn’t misbehave in the same way again, and issued a directive telling pilots how to deal with such situations in the meantime.
When an Ethiopian Airlines 737 Max went down less than five months later under similar circumstances, killing all 137 people onboard, it caused a worldwide furor—and ultimately the grounding of the entire 737 Max fleet. But Boeing still stood behind its plane. The assumption was that the Ethiopian Airlines pilots had lacked the skills and training to follow Boeing’s advice. (Much was made of the pilot’s “panicky” voice during the incident.) “The 737 Max is a safe airplane that was designed, built, and supported by our skilled employees who approach their work with the utmost integrity,” the company said in a prepared statement the day after the crash.
While the twin catastrophes were embarrassing in the short term, Boeing hoped that a few quick tweaks to the faulty automated system would suffice to win back public trust in the plane. These tweaks, however, may not account for the sheer complexity inherent to the software underlying the 737 Max. Lingering problems could slip by regulators and put passengers at risk.
Developments over the last few days have already undermined the notion that Boeing has the problem under control. Ethiopia’s Ministry of Transport released its preliminary accident report yesterday showing that, based on analysis of black box data, the pilots in fact had done just as Boeing’s directive had suggested, but to no avail. Ethiopian Airlines issued a statement declaring that its pilots “followed the Boeing-recommended and FAA-approved emergency procedures to handle the most difficult emergency situation created on the airplane. Despite their hard work and full compliance with the emergency procedures, it was very unfortunate that they could not recover the airplane from the persistence of nose diving.”
Aeronautics experts say that complex control systems like those incorporated into the Max can interact in ways that are difficult to predict.
The fact that Boeing’s advice didn’t work raised questions about whether Boeing really has a firm grasp on the problem. So too did the fact that it had recently pushed back the estimated delivery date on its software patches. Back when the 737 Max fleet was grounded, Boeing said that it would have the control system fixed by the first week of April. On Monday, the FAA announced that the company would need several more weeks.
Even that prognosis, however, assumes Boeing finally understands the extent of the 737 Max’s troubles. Aeronautics experts say that complex control systems like those incorporated into the Max can interact in ways that are difficult to predict. “There are so many possibilities you’d have to model,” says Shem Malmquist, a 777 captain who teaches aeronautics at the Florida Institute of Technology. “There are a lot of opportunities for bad outcomes.”
The origin of the problem lies in Boeing’s attempt to create a state-of-the-art airliner by updating the venerable 737, a model that first flew in 1967. In order to maximize the plane’s fuel efficiency, Boeing needed to equip the plane with bigger engines, and to make them fit, it had to mount the engines further forward on the wings. This caused the plane to be dynamically unstable, meaning that under certain conditions—such as might occur during takeoff—the nose would pitch up strongly. To address the problem, Boeing added an automated system called the Maneuvering Characteristics Augmentation System (MCAS) that would kick in if a sensor detected that the nose was too high.
The system works by engaging a motor to turn a jackscrew in the tail that cranks the movable horizontal part of the tail called the stabilizer. This is normally set so that the plane flies with its nose at a desired angle relative to the horizon, a position called the “trim.” A pilot can then adjust the angle of the nose up or down from its trimmed position by moving the plane’s control yoke, which is connected to a smaller control surface on the back of the stabilizer called an elevator. In the case of the Ethiopian Airlines accident, when MCAS unexpectedly commanded the stabilizer to trim nose-down, the pilots shut the system off, then used the yoke to command an opposite elevator movement to try to get the nose back up. The two control surfaces were essentially at odds with one another, a situation called “mistrim.”
The procedure that Boeing had recommended in the wake of the Lion Air crash, and which the Ethiopian pilots attempted to carry out, was to then manually crank a handle that would return the stabilizer to a neutral position. But former Boeing flight-controls engineer Peter Lemme has made the case on his blog that aerodynamic forces on the mistrimmed tail were so strong that it was difficult or impossible for the Ethiopian Airlines flight crew to manually trim the stabilizer. In desperation, they turned the malfunctioning system back on, but it only made the situation worse, putting the plane into a nosedive.
Boeing’s lack of awareness about its own lack of awareness raises an inevitable question: What else don’t they know?
The software changes that Boeing is now working on will partially address the system flaws that triggered the Lion Air and Ethiopian Airlines crashes. For one thing, the updated system will less aggressively trim the horizontal stabilizer down. And, presumably going forward, Boeing will use the newly reported findings to further refine their changes to minimize the chances that the exact same scenarios will unfold in the exact same way ever again.
What Boeing will never be able to erase, however, is the fact that it was taken by surprise not once but twice by previously unknown failure modes. And it insisted all the while that its planes were fundamentally sound.
Boeing’s lack of awareness about its own lack of awareness raises an inevitable question: What else don’t they know? What other failure modes have they failed to anticipate?
Elsewhere, I’ve faulted the FAA for gradually shifting responsibility for certification away from government regulators to the manufacturer. But Malmquist says that given the complexity of modern aircraft, there really is no alternative. “When you’re looking at a system that has 30 or 40 million lines of code, there is no way that you can spot-check like a police officer giving traffic tickets,” he says. “The only way a regulator is going to fully understand what’s going on would be to have someone working in the program office five days a week, year-round.”
Of greater concern, Malmquist says, is the fact that the procedures established for certifying airplanes were developed during the many decades when aircraft systems were electromechanical in nature. Such approaches can’t handle the huge number of ways that computerized systems can interact with one another.
And complexity is only going to increase. Aircraft designers are increasingly handing over responsibility to automated systems whose sophistication allows them to perform robustly, but which can also open up all sorts of possibilities for unexpected behavior. In trying to anticipate what problems might arise, engineers are limited by their ability to imagine all the possible outcomes. When designing MCAS, for instance, it seems that Boeing contemplated what would happen if the system stopped working, but didn’t game out all the consequences of it running away.
What can’t be imagined can’t be planned for. Failure modes that engineers don’t consider when designing aircraft aren’t going to be anticipated in flight testing either—or in subsequent crew training. “So you have a triple whammy going on,” Malmquist says, “and there’s no good way to escape.”
The problem isn’t limited to airplanes. As automation continues its breakneck expansion, we’re going to see more and more accidents that take place not because something breaks but because humans and complex machinery react in ways that we didn’t—and maybe can’t—expect. “We’re seeing the same accidents happening across different domains,” Malmquist says, from self-driving cars to the Deepwater Horizon catastrophe. “There are a few differences, but in the end, these accidents are almost identical from a systems-theory point of view.”
In a statement made on Thursday after Ethiopia’s release of the preliminary accident report, Boeing CEO Dennis Muilenburg acknowledged that software played some role in the disaster. “As pilots told us, erroneous activation of the MCAS function can add to what is already a high workload environment. It’s our responsibility to eliminate this risk. We own it and we know how to do it.”
The sheer complexity of those automated systems, however, makes it unclear whether Boeing or any company knows enough to eliminate the risk. Safety experts have begun developing new approaches better suited to the automated world. So far, unfortunately, their uptake hasn’t matched the rate at which these systems are spreading into every corner of human life.