Rate limiter demo exposes Redis race condition and state gaps

A distributed rate limiter can fail in the places engineers tend to skim: Redis command boundaries, rejected-request accounting, per-process circuit breakers, and demos that simulate outcomes instead of running the algorithm.

Engineers who build a distributed rate limiter face a harder problem than choosing token bucket, fixed window, or sliding window. They have to protect shared state under concurrency, decide which requests consume quota, keep resilience state consistent across replicas, and prove that demos run the same logic the service runs.

The project at the center of this analysis used Redis for shared rate-limit state, FastAPI for the API, nginx for load balancing, Prometheus for metrics, and Jaeger for traces. The code aimed at a production-grade distributed limiter. A closer read found four gaps: a token bucket race condition, a sliding window retry penalty, a circuit breaker with per-process state, and a React demo that generated plausible results with a random formula.

Those flaws teach the useful part of the rate limiter problem. Interviewers ask for an algorithm. Production asks whether engineers preserve correctness across processes and across time.

The three algorithms solve different problems

A token bucket lets each client spend from a bucket that refills over time. Engineers use it when they want burst tolerance. A client that stays quiet can save capacity and spend several tokens at once.

A fixed window counts requests inside a clock-aligned interval, such as one minute. Developers favor it because Redis can handle the core path with INCR and EXPIRE. That cost profile looks attractive, but users can exploit the boundary. A client can send its quota at the end of one window and send another quota at the start of the next.

A sliding window tracks request timestamps, often in a Redis sorted set. Engineers trim old entries, count the entries that remain, and compare that count with the limit. This model gives stronger accuracy than a fixed window, but operators pay with more Redis work and more memory. Each accepted request leaves a timestamp behind.

Those trade-offs matter, but they do not decide correctness. The implementation decides correctness.

Redis atomicity ends at the command boundary

The README claimed Redis operations had no race conditions. That claim holds for a narrow fixed-window path because Redis runs a single INCR command as one atomic operation. It fails for a token bucket implementation that reads, computes, and writes across several network round trips.

In the flawed path, application code reads the token count and last refill timestamp with separate GET calls. Python code calculates the refill and decides whether the request can proceed. The service writes the new state with SET after that decision.

Two requests from the same client can arrive close together. Both handlers can read the same token count before either writes a new value. Both handlers can decide to allow the request. The code can spend one token twice because the application split one logical operation into several Redis commands.

Redis kept each command atomic. The engineers built a non-atomic sequence around those commands.

Teams fix this class of bug by moving the read-compute-write sequence into one server-side script. Redis documents that pattern through EVAL and Lua scripting. The application sends the token bucket logic to Redis, and Redis runs the check and state update as one unit. Engineers still need to test edge cases, but they remove the race between separate client round trips.

That fix also changes API behavior under load. Without the script, a benchmark can report higher throughput because the limiter lets too many requests through. With the script, the system may reject more requests and show higher Redis CPU. That result should comfort the team. The limiter has started enforcing the contract it advertised.

Rejected requests need a policy

The sliding window path exposed a second bug. The implementation added the current timestamp to the sorted set before it decided whether to allow the request. A blocked request still entered the window.

A user who retries after a rejection can punish themselves. Each rejected retry adds a fresh timestamp. The window stays full. The user waits longer than the configured policy suggests.

Engineers need to choose one of two policies. They can count only accepted requests, which means rejected requests do not consume quota. Or they can count all attempts, which discourages retry storms and abusive clients. Both choices can fit a product. The code should make the choice visible.

For a public API, many teams count accepted requests against the main quota and track rejected attempts in a separate abuse signal. That split gives normal clients a clean recovery path after a rejection. It also gives operators a way to detect clients that ignore 429 responses and retry without backoff.

A clear API response helps clients cooperate. The service should return 429 Too Many Requests with headers such as Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset. The HTTP semantics specification defines 429; the limit headers need a documented house contract unless the team adopts a published convention.

Distributed systems need shared failure state

The architecture ran two FastAPI replicas behind nginx. Redis gave the rate limiter shared request state, so either replica could enforce the same client quota. The circuit breaker did not share state.

Each Python process stored breaker state in local instance fields: failure count, success count, and open or closed status. If Redis started failing, one replica could open its breaker after five failures. Another replica could keep sending traffic to Redis because nginx routed a different mix of requests to that process.

Users would see inconsistent behavior during the same backend incident. One request might fail fast through an open circuit. The next request might hit another replica and wait on Redis. Operators would see noisy metrics and a harder incident.

Engineers have several options, and each one costs something.

They can leave breaker state local and accept per-replica behavior. That choice keeps the breaker independent from Redis, which helps during a Redis outage. It also means the system can send uneven pressure during partial failure.

They can store breaker state in Redis. That gives all replicas the same view, but it puts the breaker’s brain inside the dependency it protects. A Redis outage can damage both the service path and the protection mechanism.

They can move breaker decisions to a sidecar, gateway, or service mesh. Envoy, for example, can enforce outlier detection and circuit-breaking policies closer to the network edge. That approach adds operational weight, but it gives teams one place to reason about failure policy across replicas.

The right answer depends on blast radius and team maturity. A demo can tolerate local breaker state. A production API that handles paid traffic should document the choice and alert on per-replica divergence.

A schema does not make a feature live

The project included SQLAlchemy models for rate-limit rules and historical metrics in PostgreSQL. Those tables looked like a path toward live rule management. The API still read from an in-memory dictionary that reset to hardcoded defaults after each restart.

That gap matters because readers treat a schema as evidence. Engineers see database tables, migrations, and model classes, then assume the request path uses them. A README checkbox can widen that misunderstanding.

A production rule path needs more than tables. Engineers have to define rule precedence, cache invalidation, hot reload semantics, audit history, and rollback behavior. They must decide whether a rule update takes effect on all replicas within milliseconds or after the next refresh interval.

Redis can help with fast reads, while PostgreSQL can hold durable policy records. A common design stores canonical rules in PostgreSQL, publishes rule-change events, and lets each API replica refresh a local cache. The team then measures propagation delay and exposes the active rule version in metrics.

That pattern gives operators an answer during incidents. They can ask which rule version a replica enforced, which user changed it, and whether the change reached all processes.

The demo simulated confidence

The React dashboard had an interactive page that claimed to show the three algorithms under different request volumes. The page did not run those algorithms. It used Math.random() against a block-rate formula that produced numbers that looked plausible.

A user could switch from token bucket to sliding window and see the chart change. The chart changed because the demo adjusted display values such as latency, not because it ran different limiter logic.

That failure carries more risk than a cosmetic bug. Demos shape design discussions. A team may choose an algorithm after watching a chart that never exercised the algorithm. A reviewer may trust a behavior because the UI looked consistent.

The fix replaced the random formula with client-side versions of all three algorithms: a refilling token count, a clock-aligned counter, and a timestamp array for sliding window. That change made the visual match the selected algorithm.

Teams can go further. They can run the demo against the real API and show both simulated and server-observed results. They can label synthetic mode. They can include a trace link for each demo request through Jaeger, so a reviewer can connect a chart point to backend code.

MongoDB Atlas image

The API contract matters as much as the algorithm

A rate limiter sits on an API boundary, so clients need predictable responses. Engineers should define the contract before they optimize Redis calls.

A clean contract answers several questions:

Which identity does the limiter use: API key, user ID, IP address, organization ID, or route-specific key?
Which endpoints share a quota, and which endpoints have separate budgets?
Which rejected requests count against abuse detection?
Which headers tell clients how to recover?
Which consistency guarantee should clients expect across regions?

Multi-region deployments raise another set of trade-offs. A single Redis primary gives a clear source of truth, but it adds latency for distant users and creates a regional dependency. Per-region counters reduce latency, but a client can exceed a global quota by spreading traffic across regions. Engineers can use approximate counters or periodic reconciliation, but then they must accept temporary overage.

Strong consistency costs latency. Lower latency costs precision. Product requirements should pick the side that fails in the least damaging way.

Observability should prove enforcement

Prometheus and Jaeger help only when engineers record the right facts. A useful limiter dashboard should show allowed requests, blocked requests, Redis latency, Lua script errors, per-replica breaker state, rule version, and quota key cardinality.

Cardinality deserves care. If engineers label metrics with raw user IDs or API keys, Prometheus can suffer. The service can export aggregate counters by route, algorithm, and decision, while logs or traces carry sampled key-level detail for investigations.

Traces should mark the limiter decision point. A span can include algorithm, rule ID, remaining quota, Redis command count, and the final allow or block decision. During an incident, that span tells an engineer whether the service rejected a client because of policy, Redis failure, stale rules, or local breaker state.

The production checklist changes after the code review

After this review, a production-minded team would make several changes before it trusted the limiter with customer traffic.

First, the team would move token bucket logic into a Redis Lua script and test concurrent requests against one quota key. Unit tests cannot prove this alone. The team needs a concurrency test that sends many requests at the same key and asserts that the allowed count matches the configured limit.

Second, the team would define rejected-request accounting for sliding windows. The code should either add timestamps only after allow decisions or label attempt tracking as an abuse-control feature.

Third, operators would choose a circuit breaker scope. If they accept per-process breakers, they should expose breaker state by replica. If they need shared behavior, they should move the policy to a shared control point or gateway.

Fourth, engineers would connect PostgreSQL-backed rules to the live path before they market database-backed configuration. Until then, the README should say the schema exists for future rule persistence.

Fifth, the demo should run real algorithms or call the service. A chart that invents outcomes can still help with layout, but it should carry a label that says synthetic.

The lesson for system design interviews

Rate limiter interviews often start with algorithm selection. Strong candidates move from that answer into state, concurrency, failure modes, and client contracts.

A token bucket can work well and still fail if engineers split the state update across client round trips. A sliding window can enforce precise limits and still trap clients if rejected requests refill the window. A distributed service can share quota state and still make failure decisions per process. A demo can look convincing and still run no limiter code.

The useful question for an engineer is concrete: if two replicas receive requests for the same key at the same time, who decides which request wins? The answer has to name the shared state, the atomic boundary, and the behavior clients see after rejection.

#Redis #rate-limiting #distributed systems #FastAPI #system-design