Overview
Many failures in distributed systems are temporary (e.g., a brief network glitch). The retry pattern involves automatically re-attempting a failed operation, often with a delay between attempts.
Best Practices
- Exponential Backoff: Increasing the wait time between each retry to avoid overwhelming the failing service.
- Jitter: Adding randomness to the backoff time to prevent 'thundering herd' problems.
- Limit Retries: Ensuring the application doesn't retry indefinitely.
When to Use
Only for transient errors. Do not retry for permanent errors like '404 Not Found' or '401 Unauthorized'.