In 1982, Atomic Energy of Canada Limited (AECL) launched the Therac-25, a computer-controlled radiation therapy machine heralded as a breakthrough in cancer treatment. Instead, it became infamous for one of the deadliest software engineering failures in medical history. Between 1985 and 1987, six known accidents subjected patients to radiation doses up to 100 times the intended levels, resulting in severe injuries and fatalities. The root cause? A lethal cocktail of race conditions, dismissed user reports, and the catastrophic removal of hardware safeguards.

Linear accelerator design similar to the Therac-25. Source: Wikipedia

The Deadly Flaw: When Software Replaced Safety

Previous models (Therac-6 and Therac-20) used hardware interlocks to physically prevent radiation overdoses. The Therac-25 eliminated these, relying solely on software checks. Two critical bugs emerged:

  1. Race Conditions in Mode Switching: If operators corrected a treatment mode (e.g., changing from X-ray to electron therapy) within 8 seconds, the machine could activate the high-power electron beam without the target or scanner in position.
  2. Arithmetic Overflow: A flag variable incremented until overflow reset it to zero, bypassing safety checks.

As Nancy Leveson noted in her definitive analysis:

"A naive assumption is often made that reusing software [...] will increase safety because the software will have been exercised extensively. Reusing software modules does not guarantee safety."

The Human Cost: A Timeline of Failure

  • June 1985 (Georgia): Katie Yarbrough received 15,000–20,000 rad (vs. 200 intended), requiring breast amputation.
  • March 1986 (Texas): A patient absorbed 25,000 rad in <1 second, leading to paralysis and death. Operators heard a "crackle"—radiation saturating ionization chambers.
  • April 1986 (Texas): A face cancer patient died after a similar overdose; physicists replicated the "Malfunction 54" error by rapidly editing commands.

Engineering Hubris and Institutional Failure

AECL’s missteps became textbook examples of what not to do:
- No Independent Review: One developer wrote the assembly code solo; AECL never tested the full hardware-software integration pre-deployment.
- Dismissed Warnings: Operators’ reports of "burning" sensations were ignored; AECL insisted overdoses were "impossible."
- Opaque Errors: "Malfunction 54" codes lacked documentation, leading operators to habitually override pauses.

Legacy: The Birth of Modern Safety Standards

The Therac-25 catalyzed seismic shifts:
1. IEC 62304: This medical-device software standard now mandates rigorous lifecycle controls and legacy code vetting.
2. Defense-in-Depth: Safety-critical systems universally require redundant hardware interlocks alongside software.
3. Human Factors: Error messages must be actionable, and user feedback treated as critical data.

Animation showing medical linear accelerator operation. Source: Wikipedia

Decades later, the Therac-25 remains a grim reminder: In systems where lives hang in the balance, overconfidence in code is a vulnerability as dangerous as any bug. Its lessons resonate in autonomous vehicles, surgical robots, and AI diagnostics—where software isn’t just lines of code, but a covenant of trust.

Source: Adapted from Wikipedia's Therac-25 article (CC BY-SA)