Overview

In software, resilience engineering is the practice of designing systems that can gracefully handle and recover from failures. It assumes that failures are inevitable and focuses on minimizing their impact.

Key Strategies

  • Graceful Degradation: Allowing the system to continue functioning with reduced features during a failure.
  • Self-healing: Automatically detecting and fixing common issues.
  • Observability: Having deep insight into the system's state to quickly identify and resolve problems.

Resilience vs. Robustness

Robustness is the ability to resist failure, while resilience is the ability to recover from it.

Related Terms