Overview

Reliability engineering focuses on ensuring that a system performs its intended function under specified conditions for a specified period. In software, this is closely related to Site Reliability Engineering (SRE).

Key Metrics

  • MTBF (Mean Time Between Failures): The average time a system runs before failing.
  • MTTR (Mean Time To Repair): The average time it takes to fix a failure.
  • Availability: The percentage of time a system is operational.

Techniques

  • Redundancy and failover.
  • Rigorous testing and monitoring.
  • Root cause analysis of failures.

Related Terms