Inadequate Systems Engineering and the Boeing 737 Max Catastrophes

mike Haas
The Systems Engineering Scholar
6 min readMar 27, 2021

--

737 Max in a banked turn. Photograph by Uwe Deffner, The New Yorker

In October 2018, a Boeing 737 Max-8 crashed South of Indonesia into the Java Sea, killing 189 people. Shortly afterward in March 2019, another one met a similar fate plundering into the ground near Bishoftu, Ethiopia right before the 737 Max fleet was grounded. Were the systems that aid the pilots in flying the plane safely to blame for these catastrophes? Or were the Concept of Operations (CONOPS) and emergency checklists woefully inadequate in preventing a mishap during an emergency? In either case, what systems engineering practices could have been utilized to have prevented this accident, and potential ones in the future?

The Story

The Boeing 737 max was built from a proven Boeing 737 design. One that has been used for over 50 years successfully, thereby allowing costs in development to be reduced by grandfathering in safety certifications and many pilot training procedures. One could argue that the 737 Max development was rushed, and as Dominic Gates states: “Boeing committed to the MAX without even having a finalized design or engine supplier.”

As part of the FAA safety certification, airliners are required to smoothly execute what is called a wind-up turn. I have personally witnessed one of these maneuvers in a supersonic trainer jet called a T-38. Essentially a constant airspeed is commanded and the aircraft is banked to allow for an increase in the angle of attack to make the turn. Increasing G-force is commanded corresponding to increase angle of attack. Even assuming this maneuver executed by an expert pilot, the average airline passenger would not be too happy to experience this the way we do in the United States test pilot schools. The requirement is to smoothly enter into and exit a wind-up turn per the FAA. Because the 737 max required a redesign to the engine location, the aircraft would pitch upwards too quickly during the turn. This undoubtedly would create an uncomfortable sensation (among other things) for the passengers in the event of a fast-turning maneuver to avoid aircraft traffic. This deficiency created the need for the Maneuvering Characteristics Augmentation System (MCAS). Or a flight control system that automatically pitches the aircraft downwards to counteract this in the event of a rapid angle of attack increase.

Test Flight in a T-38

In October 2018, the Lion Air 737 Max, flight 610 took off from Soekarno–Hatta International Airport and immediately experienced angle of attack sensor failure leading to an erroneous angle of attack readings. These readings triggered automatic MCAS pitch-down maneuvers repeatedly. The pilots flying the Lion Air 737 Max made continuous pull-up counters but were unable to arrest the pitch down commands from the MCAS and the aircraft crashed into the sea. In retrospect, the pilots could have executed a checklist that instructed the removal of power to the horizontal tail, thereby preventing the control surface to be commanded by the MCAS. This motivated the procedural changes issued by Boeing shortly after the crash to counter this type of situation. However, just 5 months later, another 737 Max belonging to Ethiopian Air crashed in the same fashion. On this flight, a broken AOA sensor prompted automatic MCAS pitch-down commands. Summarizing Dominic Gates; when the pilots tried to raise the manual trim wheel as stated in Boeings checklist, the aircraft had already picked up too much airspeed and the horizontal tail could not be moved. The aircraft continued pitching down before it plummeted into the ground.

The Swiss cheese model

James Reason’s Swiss Cheese Model, retrieved from www.skybrary.aero

When looking back at the causes of aircraft mishaps, a swiss cheese model is often used. This model is derived from a metaphor where one may visualize multiple slices of swiss cheese layered on top of one-another. The holes represent problems that could potentially lead to (in the context of this article) an aircraft mishap. The cheese part of the swiss cheese represents defenses that prevent the mishap from getting through all of the layers of the cheese. As stated in Skybrary.aero, The accident occurs when “when holes in all of the slices momentarily align, permitting a trajectory of accident opportunity”.

Inadequate SE?

How could Systems Engineering have helped prevent the aligning holes in the swiss cheese layers? Could a more robust systems engineering approach have added layers of prevention, thereby arresting the catastrophic 737 Max crashes?

On the right side of the systems engineering “V”, we have verification and validation of a system progressively from the component level, all the way up to the fully implemented system of systems. In this case, the Boeing 737 Max took an original 737 and made some design changes to be competitive in the airline production industry. Justin Hayward writes “The MAX uses new, and more efficient, CFM International LEAP engines, and includes several aerodynamic modifications, including distinctive winglets.” Generally, when you make changes to a system, you have to test (at a minimum) the changes that were made, and any differing interactions between the component or subsystem changes and the original system. In the case of the 737 Max, the MCAS was one such system change. Both the interactions with the original system (the aircraft control system), and the MCAS itself should have been tested with respect to those interactions. For example, either modeling potential flying conditions on its interaction with the flight computer or the AOA sensor or testing them in simulated situations. Then test them again in a live flight test.

After the Lion air tragedy, Boeing released procedures that would specifically disable the MCAS if a similar situation were to occur. Either these procedures weren’t validated, or they were inadequate. In my experience as an integration and test engineer, when I write a test procedure, I validate it by testing the system using the procedure or running it on a system test bench which includes most or all of the system under test. The fact that in the second crash, the pilot was unable to manually move the horizontal tail during a high-speed flight (the vast majority of the time that an airline is being used) means that these procedures were inadequately validated.

For safety systems that directly affect aircraft safety, redundancies are often engineered into their operation, because the malfunction or failure of those systems could be catastrophic. In the case of the MCAS, it took inputs from one AOA sensor, and inputs from only one of the two flight control computers. This lack of redundancy could mean that if the one AOA sensor that the MCAS was taking inputs from failed, the MCAS would make non-counterable pitch inputs to the aircraft. Most aircraft (including the Max) have two or more AOA sensors, and the MCAS takes input from one. as Benjamin from EngineeringforHumans.com states: “the MCAS didn’t bother to cross-check the AOA sensors; the probability of the Max failing to recognize and adapt to a potential AoA sensor failure was 100%.” Combine that with the 14 or more 737 Max aircraft in the fleet times hundreds of flights over their lifetime, and the probability that the MCAS would eventually cause safety failure is high.

Moving forward

As of the writing of this article, the 737 Max is now cleared to fly again. According to Boeing.com, the MCAS software design now provides additional layers of redundancy and will counteract an erroneous AOA sensor reading. In addition, “the software has been put through hundreds of hours of analysis, laboratory testing, verification in a simulator and numerous test flights.” The 737 Max crashes are an unfortunate learning experience for affected by many areas of engineering. Not least of which are in the realm of Systems Engineering and why certain aspects of systems engineering processes can have catastrophic (and huge financial) consequences when they are not followed.

1) Boeing. (2020). Boeing: The 737 MAX MCAS software update. https://www.boeing.com/commercial/737max/737-max-software-updates.page.

2) Engineering for Humans (2019, May 14). The Boeing 737 Max crashes represent a failure of systems engineering. The PowerPoint Engineering Society. https://www.engineeringforhumans.com/systems-engineering/737-max-a-failure-of-systems-engineering/#:~:text=The%20Boeing%20737%20Max%20crashes%20represent%20a%20failure%20of%20systems%20engineering,-Posted%20by%20Benjamin&text=In%20the%20latest%20generation%2C%20the,resulting%20in%20two%20fatal%20crashes.

3) Gates, D. (2020, November 18). Q&A: What led Boeing’s 737 MAX crisis? The Seattle Times. https://www.seattletimes.com/business/boeing-aerospace/what-led-to-boeings-737-max-crisis-a-qa/

4) Hayward, J. (2020, June 7). The Boeing 737: The original vs. MAX- what’s the difference? Simple Flying. https://simpleflying.com/boeing-737-original-vs-max/

5) Skybrary. (2016, May 25). James Reason HF Model. Semantic MediaWiki. https://www.skybrary.aero/index.php/James_Reason_HF_Model#:~:text=The%20Swiss%20Cheese%20model%20of,gaps%20in%2Dbetween%20each%20slice.

Picture References:

1)737 Uwe Deffner (2019) [How did the FAA Allow the 737 Max to Fly?] [Photograph] The New Yorker

2) James Reason’s Swiss Cheese Model. [Diagram] www.skybrary.aero

--

--