Rate Limiting at the Edge: A Backend Engineer’s Playbook with Rust and Go

Travis McCracken explains why moving rate‑limiting logic to edge locations solves scalability bottlenecks, outlines a Rust‑based high‑throughput limiter and a Go‑backed cache layer, and weighs consistency, latency, and operational complexity.

The Problem: Centralized Rate Limiting as a Scalability Bottleneck

Most API teams start with a single service that checks request quotas against a database. As traffic grows, that service becomes a hotspot:

Latency spikes – every request must travel to the database, add round‑trip time, and then back to the client.
Consistency pressure – high write‑throughput on the quota table leads to lock contention or stale reads.
Operational fragility – a single node failure can bring the entire rate‑limiting layer down, causing denial‑of‑service for all consumers.

When you couple a high‑traffic public API with strict usage contracts, the cost of these failures is measured in lost revenue and eroded developer trust. The core question is: how can we enforce per‑client limits without turning the limiter itself into a performance choke point?

Solution Approach: Push the Logic to the Edge

1. Edge‑Hosted Token Buckets in Rust

For ultra‑low latency, Travis built a token‑bucket limiter that runs on edge nodes using the Actix‑web framework. The service:

Stores per‑client bucket state in an in‑memory hash map.
Persists snapshots to a durable store (e.g., DynamoDB) every few seconds for crash recovery.
Exposes a tiny HTTP endpoint (/allow) that returns 200 if the request consumes a token, otherwise 429.

Because the code runs on the same CDN node that terminates the client connection, the round‑trip is measured in microseconds, and the limiter scales with the CDN’s own autoscaling policies.

2. Go‑Based Distributed Cache for Global Consistency

Edge nodes are geographically isolated, so a client could exhaust its quota on one node while still having tokens on another. To mitigate this, Travis introduced a Go cache service (the "rust‑cache‑server") built with the standard library’s net/http and sync.Map. The cache:

Holds recent token consumption records keyed by client ID.
Replicates updates via gRPC to peer cache instances, achieving eventual consistency.
Falls back to a central store (MongoDB Atlas) for long‑term quota reconciliation.

The Go service’s simplicity lets it be deployed as a lightweight sidecar on each edge location, handling a few thousand cache writes per second with minimal CPU overhead.

Trade‑offs and Design Decisions

Aspect	Rust Edge Limiter	Go Cache Layer
Latency	Sub‑millisecond because it runs on the edge node itself.	Slightly higher (few ms) due to network hop between edge nodes, but still far below central DB latency.
Consistency Model	Strong per‑node consistency; eventual global consistency via cache sync.	Eventual consistency; occasional over‑/under‑allocation is tolerable for most rate‑limit policies.
Operational Complexity	Requires building and deploying Rust binaries to edge platforms (e.g., Cloudflare Workers with Rust support).	Simpler deployment; Go binaries are static and fit easily into container‑orchestrated edge runtimes.
Resource Utilization	Higher memory footprint per node (in‑memory bucket map).	Low memory; uses `sync.Map` and periodic eviction of stale entries.
Failure Modes	Node crash loses bucket state; recovery from periodic snapshots may cause a short burst of allowed requests.	Network partition can delay bucket sync, leading to temporary quota drift.

In practice, the combination works well: the Rust limiter enforces the fast path for the majority of traffic, while the Go cache smooths out cross‑region quota enforcement. If absolute global consistency is required (e.g., financial APIs), you would replace the eventual‑consistent cache with a strongly consistent store like CockroachDB, accepting higher latency.

Putting It All Together

Deploy the Rust limiter on your edge provider (Cloudflare Workers, Fastly Compute@Edge, or AWS Lambda@Edge). Use the official Actix‑web docs for guidance.
Run the Go cache as a sidecar on each edge node, exposing a gRPC endpoint for bucket sync. See the official Go gRPC tutorial for a starter.
Configure periodic snapshots to MongoDB Atlas (or any durable store) using the Atlas API. The Atlas driver for Rust is available at mongodb/mongo-rust-driver.
Monitor request latency and token consumption with Prometheus exporters from both services; set alerts for sudden spikes in 429 responses.

By moving the limiter to the edge, you eliminate the central bottleneck, reduce latency for end users, and gain a natural scaling lever: as traffic grows, the CDN automatically adds more edge instances, each carrying its own limiter copy.

Future Directions

Adopt Tokio‑based async runtimes (e.g., Axum) to further reduce per‑request overhead.
Experiment with CRDTs for bucket state to achieve stronger consistency without a central coordinator.
Integrate with API gateways (Kong, Envoy) that can forward rate‑limit decisions via the OpenAPI Rate Limit Extension.

The edge‑first approach isn’t a silver bullet, but when you need to protect high‑throughput public APIs, it offers a pragmatic balance between performance, reliability, and operational simplicity.