Chaos Engineering

Overview

Pioneered by Netflix with 'Chaos Monkey,' chaos engineering involves intentionally introducing failures—such as killing a server or injecting network latency—to see how the system reacts and to identify weaknesses before they cause real outages.

Principles

Build a Hypothesis: Predict how the system should behave under stress.
Vary Real-world Events: Introduce failures like server crashes or network spikes.
Run Experiments in Production: To get the most realistic results.
Automate Experiments: To run them continuously.
Minimize Blast Radius: Ensure that experiments don't cause major disruptions for users.

Goal

To build resilient systems that can survive the unpredictable nature of large-scale distributed environments.

Overview

Principles

Goal

Related Terms