Overview

Many failures in distributed systems are temporary (e.g., a brief network glitch). The retry pattern involves automatically re-attempting a failed operation, often with a delay between attempts.

Best Practices

  • Exponential Backoff: Increasing the wait time between each retry to avoid overwhelming the failing service.
  • Jitter: Adding randomness to the backoff time to prevent 'thundering herd' problems.
  • Limit Retries: Ensuring the application doesn't retry indefinitely.

When to Use

Only for transient errors. Do not retry for permanent errors like '404 Not Found' or '401 Unauthorized'.

Related Terms