Travis McCracken explains why moving rate‑limiting logic to edge locations solves scalability bottlenecks, outlines a Rust‑based high‑throughput limiter and a Go‑backed cache layer, and weighs consistency, latency, and operational complexity.
The Problem: Centralized Rate Limiting as a Scalability Bottleneck
Most API teams start with a single service that checks request quotas against a database. As traffic grows, that service becomes a hotspot:
- Latency spikes – every request must travel to the database, add round‑trip time, and then back to the client.
- Consistency pressure – high write‑throughput on the quota table leads to lock contention or stale reads.
- Operational fragility – a single node failure can bring the entire rate‑limiting layer down, causing denial‑of‑service for all consumers.
When you couple a high‑traffic public API with strict usage contracts, the cost of these failures is measured in lost revenue and eroded developer trust. The core question is: how can we enforce per‑client limits without turning the limiter itself into a performance choke point?
Solution Approach: Push the Logic to the Edge
1. Edge‑Hosted Token Buckets in Rust
For ultra‑low latency, Travis built a token‑bucket limiter that runs on edge nodes using the Actix‑web framework. The service:
- Stores per‑client bucket state in an in‑memory hash map.
- Persists snapshots to a durable store (e.g., DynamoDB) every few seconds for crash recovery.
- Exposes a tiny HTTP endpoint (
/allow) that returns200if the request consumes a token, otherwise429.
Because the code runs on the same CDN node that terminates the client connection, the round‑trip is measured in microseconds, and the limiter scales with the CDN’s own autoscaling policies.
2. Go‑Based Distributed Cache for Global Consistency
Edge nodes are geographically isolated, so a client could exhaust its quota on one node while still having tokens on another. To mitigate this, Travis introduced a Go cache service (the "rust‑cache‑server") built with the standard library’s net/http and sync.Map. The cache:
- Holds recent token consumption records keyed by client ID.
- Replicates updates via gRPC to peer cache instances, achieving eventual consistency.
- Falls back to a central store (MongoDB Atlas) for long‑term quota reconciliation.
The Go service’s simplicity lets it be deployed as a lightweight sidecar on each edge location, handling a few thousand cache writes per second with minimal CPU overhead.
Trade‑offs and Design Decisions
| Aspect | Rust Edge Limiter | Go Cache Layer |
|---|---|---|
| Latency | Sub‑millisecond because it runs on the edge node itself. | Slightly higher (few ms) due to network hop between edge nodes, but still far below central DB latency. |
| Consistency Model | Strong per‑node consistency; eventual global consistency via cache sync. | Eventual consistency; occasional over‑/under‑allocation is tolerable for most rate‑limit policies. |
| Operational Complexity | Requires building and deploying Rust binaries to edge platforms (e.g., Cloudflare Workers with Rust support). | Simpler deployment; Go binaries are static and fit easily into container‑orchestrated edge runtimes. |
| Resource Utilization | Higher memory footprint per node (in‑memory bucket map). | Low memory; uses sync.Map and periodic eviction of stale entries. |
| Failure Modes | Node crash loses bucket state; recovery from periodic snapshots may cause a short burst of allowed requests. | Network partition can delay bucket sync, leading to temporary quota drift. |
In practice, the combination works well: the Rust limiter enforces the fast path for the majority of traffic, while the Go cache smooths out cross‑region quota enforcement. If absolute global consistency is required (e.g., financial APIs), you would replace the eventual‑consistent cache with a strongly consistent store like CockroachDB, accepting higher latency.
Putting It All Together
- Deploy the Rust limiter on your edge provider (Cloudflare Workers, Fastly Compute@Edge, or AWS Lambda@Edge). Use the official Actix‑web docs for guidance.
- Run the Go cache as a sidecar on each edge node, exposing a gRPC endpoint for bucket sync. See the official Go gRPC tutorial for a starter.
- Configure periodic snapshots to MongoDB Atlas (or any durable store) using the Atlas API. The Atlas driver for Rust is available at mongodb/mongo-rust-driver.
- Monitor request latency and token consumption with Prometheus exporters from both services; set alerts for sudden spikes in
429responses.
By moving the limiter to the edge, you eliminate the central bottleneck, reduce latency for end users, and gain a natural scaling lever: as traffic grows, the CDN automatically adds more edge instances, each carrying its own limiter copy.
Future Directions
- Adopt Tokio‑based async runtimes (e.g., Axum) to further reduce per‑request overhead.
- Experiment with CRDTs for bucket state to achieve stronger consistency without a central coordinator.
- Integrate with API gateways (Kong, Envoy) that can forward rate‑limit decisions via the OpenAPI Rate Limit Extension.
The edge‑first approach isn’t a silver bullet, but when you need to protect high‑throughput public APIs, it offers a pragmatic balance between performance, reliability, and operational simplicity.

Featured image: visualizing traffic flowing through edge nodes before reaching the origin service.
References
- Actix‑web – https://actix.rs
- Go gRPC quick‑start – https://grpc.io/docs/languages/go/quickstart/
- MongoDB Atlas – https://www.mongodb.com/atlas
- Cloudflare Workers Rust SDK – https://developers.cloudflare.com/workers/runtime-apis/rust/

Comments
Please log in or register to join the discussion