When Retries Aren’t Free: Budget, Amplification, and Hidden Costs in Latency Percentiles
#Infrastructure

When Retries Aren’t Free: Budget, Amplification, and Hidden Costs in Latency Percentiles

Backend Reporter
5 min read

A deep dive into the trade‑offs of adding retry logic, showing how each extra attempt consumes capacity, can amplify downstream load, and distort latency metrics such as p95 and p99.

When Retries Aren’t Free: Budget, Amplification, and Hidden Costs in Latency Percentiles

Featured image

The problem – treating retry as a free fix

A common pattern in microservice stacks is to wrap a call with three attempts and exponential back‑off, then stare at a dashboard that shows a higher success rate. The code change looks trivial, the UI looks healthier, and the team declares the problem solved.

What is often missed is the budget each retry consumes:

  • User‑visible latency – the caller waits longer for each extra attempt.
  • Downstream capacity – every retry becomes a real request to the service behind the façade.
  • Systemic pressure – additional traffic can push an already stressed service over the edge.

The experiment in the repo retry‑resilience‑experiment (see the GitHub repo) quantifies these effects with Spring Boot 3.3.5, Resilience4j 2.2.0, Java 21 and k6 as the load generator.


Solution approach – measuring the real cost of retries

The author built three downstream scenarios:

  1. Random failures (35 % failure rate) – failures are independent of load.
  2. Jitter‑random failures – same failure probability but with random latency jitter.
  3. Progressive degradation – the downstream delay grows with the number of real calls it has received (delay = min(900 ms, 80 ms + callNumber × 3)), mimicking a service that slows down as it gets overloaded.

For each scenario the experiment records:

  • successRate
  • retryAmplificationFactor = downstream_calls / total_requests
  • latency percentiles for each attempt (all_attempt_p95_ms, all_attempt_p99_ms)
  • circuit‑breaker and bulkhead rejections

The key metric, retryAmplificationFactor, is a guard against self‑deception – it tells you how many more downstream calls you generated per user request.


Trade‑offs revealed by the data

1. Transient failures – a modest win, but never a zero‑cost operation

Policy Success rate Amplification factor
No‑retry, standard timeout 0.6529 1.00
Immediate retry (3 attempts) 0.955 1.47

The retry policy improves the success rate dramatically, yet it forces the downstream to handle 47 % more traffic. If the downstream is already near its capacity, that extra load can be the tipping point for a cascade failure.

2. Load‑sensitive degradation – retries become an accelerant

Policy Total user requests Downstream calls Amplification
No‑retry, standard timeout 7 720 7 720 1.00
Immediate retry 2 939 8 699 2.96

Even though fewer user requests were completed, the retry policy generated almost three times as many downstream calls. The growing delay in the downstream (PROGRESSIVE_DEGRADATION) means each extra call makes the service slower for everyone, creating a feedback loop often called a retry storm.

3. Time‑out handling masks true downstream latency

The caller aborts an attempt after STANDARD_TIMEOUT = 260 ms. When a timeout occurs, the recorded latency for that attempt is capped at 260 ms, regardless of how long the downstream actually ran. Consequently, p95/p99 values for attempts often exactly equal the timeout, giving a false impression that the downstream is fast enough.

In production, HTTP connections, database queries, or message‑queue operations may continue running after the client has given up. The experiment cannot fully model that residual work, but the gap between observed latency and actual work is a real risk.

4. Circuit‑breaker vs. bulkhead – visible rejections as a protective signal

In the progressive‑degradation run with a circuit breaker:

  • total_requests = 44 777
  • circuit_breaker_rejected = 44 718
  • downstream_calls = 198
  • Amplification ≈ 0.004

The breaker quickly stops sending traffic to the downstream, protecting it at the cost of many client‑side rejections. A bulkhead shows a similar pattern (many rejections, fewer downstream calls) but limits concurrency instead of opening a circuit.


When to use retries – a pragmatic checklist

Situation Recommended approach
Purely transient errors (e.g., brief network hiccups) Enable a modest retry count with jittered back‑off; monitor retryAmplificationFactor to ensure downstream capacity can absorb the extra load.
Downstream exhibits load‑dependent latency (e.g., CPU‑bound service, queue backlog) Prefer circuit‑breaker or bulkhead over blind retries. If retries are necessary, cap the total number of attempts aggressively and combine with rate‑limiting.
Strict latency SLAs (p95/p99) Track latency per user request rather than per attempt; treat timeout‑capped percentiles as a warning that the downstream may be doing hidden work.
Budget‑constrained environments (cost‑per‑call, third‑party API quotas) Treat each retry as a monetary cost; include it in capacity planning and cost forecasting.

Bottom line – retries are a budget, not a free upgrade

  • Retries always raise the amplification factor above 1.0.
  • In load‑sensitive services, that extra traffic can accelerate degradation and trigger a retry storm.
  • Timeout‑capped latency percentiles hide the real work happening downstream.
  • Circuit breakers and bulkheads provide a controlled rejection path that protects the downstream at the expense of client‑visible errors.

The experiment is a sandbox; real systems will have network jitter, connection‑pool dynamics, and residual work after cancellations. The next step for teams is to instrument retryAmplificationFactor in production, correlate it with downstream health metrics, and adjust retry policies accordingly.


Further reading & resources

  • Resilience4j documentation – Retry and CircuitBreaker
  • k6 load‑testing tool – k6.io
  • Article on retry storms – "Understanding and Mitigating Retry Storms" (link not provided here)

Build seamlessly, securely, and flexibly with MongoDB Atlas. Try free.

The experiment repository is available at github.com/JuanTorchia/retry-resilience-experiment. Feel free to run the scenarios, tweak the parameters, and share the results.

Comments

Loading comments...