What Building Ferrox Reveals About API Gateway Design
#Backend

What Building Ferrox Reveals About API Gateway Design

Backend Reporter
9 min read

A Rust API gateway is less about proxying HTTP and more about deciding where failure, latency, identity, and shared policy should live.

Featured image

Problem

Most backend systems do not start with an API gateway. They start with one service, a few handlers, and a routing table that fits in a developer's head. That phase is productive because every request path is obvious. A client calls the service, the service authenticates, reads or writes data, and returns a response.

The shape changes once the system adds more services. Authentication logic appears in several codebases. Rate limiting gets copied into middleware stacks that drift over time. Logs use different fields. Metrics become inconsistent. One service retries a failing dependency aggressively while another gives up too early. A temporary upstream outage can turn into a full incident because every caller keeps sending traffic into a dependency that is already failing.

That is the pressure behind Ferrox, a self-hosted API gateway written in Rust. The project sits in front of backend services and centralizes dynamic routing, JWT and API key validation, Redis-backed rate limiting, circuit breaking, response caching, request logging, Prometheus metrics, and live observability.

This is a familiar pattern, but it is still worth examining because gateways are where distributed systems stop being diagrams and start becoming policy. A gateway is not just a reverse proxy. It becomes the first place where a system decides who can call what, how much traffic is acceptable, which failures should be isolated, and which responses can be reused.

Solution Approach

Ferrox follows the classic gateway model: clients send requests to one front door, and the gateway routes those requests to upstream services. The immediate benefit is operational consistency. Instead of each service implementing authentication, request logging, rate limiting, and failure handling independently, those concerns move into one shared layer.

That trade is attractive because cross-cutting behavior is easy to get slightly wrong in each service. Rate limits may use different windows. Authentication failures may produce inconsistent status codes. Some services may log request IDs while others do not. These differences look small during development, then become expensive during incidents.

The Rust choice is practical for this workload. A gateway runs on the critical path of every request. Any latency it adds becomes user-visible latency. Rust's memory safety, lack of garbage collection pauses, and strong concurrency story make it a reasonable fit for infrastructure that should behave predictably under load. Ferrox uses Actix Web for HTTP serving, reqwest for forwarding, SQLx with PostgreSQL for persisted configuration and logs, Redis for shared counters and caching, Prometheus metrics, tracing for structured logs, and thiserror for application errors.

The interesting part is not that these components exist. The interesting part is how they compose under failure.

Middleware Order Is A System Design Decision

A gateway's middleware order is not a refactoring detail. It defines cost, security posture, and failure behavior.

Ferrox puts cheap rejection paths early: request logging, rate limiting, authentication, route lookup, circuit breaker checks, cache lookup, request transformation, proxy forwarding, cache storage, then final metrics and logs.

That order reflects an important principle: reject bad or excessive traffic before spending expensive resources. A gateway under abusive traffic should not parse expensive authentication flows, hit databases, or open upstream connections for requests that already exceed a limit. Rate limiting before authentication can be a good choice when the rate key is derived from IP address or another cheap identifier. If authenticated user identity is required for the limit, the order changes, but the cost model should be explicit.

Route lookup also has consistency implications. If routes are stored in PostgreSQL, then gateway instances need a coherent view of configuration. The simplest approach is to query the database during request handling, but that adds latency and introduces database availability into the request path. A cached route table reduces latency and protects the database, but introduces propagation delay. In practice, gateway configuration usually accepts eventual consistency. A new route taking a few seconds to reach every gateway instance is often acceptable. A payment authorization decision being stale for a few seconds may not be.

That distinction matters. Not all gateway state has the same consistency requirement.

Redis Rate Limiting Works Because The State Is Shared

The Ferrox article describes a Redis-backed fixed-window limiter using INCR and EXPIRE. Conceptually, each request increments a key such as rl:{route_id}:{client_ip}. The first request sets an expiry, usually 60 seconds. If the counter exceeds the configured limit, the gateway returns 429 Too Many Requests.

That design is simple, and its simplicity is a strength. Since Redis operations are atomic, multiple gateway instances can enforce the same limit without coordinating in process. Horizontal scaling remains viable because the counter does not live inside a single gateway instance.

There are trade-offs. A fixed-window limiter can allow bursts at window boundaries. A client can send requests at the end of one minute and again at the beginning of the next, effectively doubling traffic in a short interval. Sliding-window counters, token buckets, or leaky buckets handle burst behavior better, but they require more state or more complex Redis scripts.

For many APIs, fixed windows are acceptable because the goal is not perfect traffic shaping. The goal is to prevent obvious overload and abuse with a low-cost mechanism. For stricter APIs, especially public APIs with paid tiers, the limiter needs more nuance: per-user keys, per-API-key limits, route-specific quotas, and sometimes separate limits for read and write paths.

The important architectural choice is that rate limit state is externalized. If each gateway instance maintained local counters, adding instances would multiply the effective limit. That is a classic distributed systems failure mode: scaling the control plane changes the policy. Redis avoids that specific problem, while introducing Redis availability as a dependency.

Circuit Breakers Turn Failure Into A State Machine

Circuit breakers are one of the most useful patterns in service-to-service systems because they stop a failing dependency from consuming more resources. A closed circuit allows requests through. An open circuit rejects immediately. A half-open circuit lets a small amount of traffic test whether the upstream has recovered.

Ferrox models this with a Rust enum: closed, open, half-open. That maps cleanly to the behavior operators actually need. After a threshold of consecutive failures, the circuit opens. After a timeout, it moves to half-open. A successful probe closes it again.

The value is not only lower latency for users. It is load reduction for the failing service. Without a breaker, every client keeps retrying. Those retries increase queue depth, consume connection pools, and make recovery harder. With a breaker, the gateway fails fast and gives the upstream room to recover.

The state placement is a real design question. If circuit state is in memory, each gateway instance makes independent decisions. That is fast and avoids adding Redis or PostgreSQL to the breaker path. It also means one instance may keep sending traffic while another has opened its circuit. If circuit state is shared, behavior is more consistent across the fleet, but the breaker now depends on shared storage and may become slower or more fragile.

There is no universal answer. In-memory breakers are often good enough because they are cheap and local. Shared breakers can make sense when upstream protection must be coordinated globally, but they should be designed carefully. A circuit breaker that cannot make decisions because its storage is unavailable has failed at its main job.

API Patterns And Consistency Boundaries

An API gateway sits at a boundary between external clients and internal services. That makes it tempting to add every policy there. Some policies belong in the gateway. Others must remain in the service.

Authentication at the gateway is useful when it verifies tokens, API keys, or coarse route access. Authorization can be more complicated. If a request asks to update invoice 123, the gateway usually does not know whether the caller owns that invoice. That check belongs closer to the domain data. A gateway can reject unauthenticated traffic and enforce broad scopes, but business authorization should not be guessed at the edge.

Caching has a similar boundary. Response caching can save upstream capacity for read-heavy endpoints, especially when routes have clear TTLs. It becomes risky when responses depend on identity, permissions, headers, or rapidly changing data. A cached response from the wrong key is worse than a slow response. Gateways need cache keys that include the right dimensions: route, query string, relevant headers, and sometimes user identity. They also need conservative defaults. If a route is not explicitly cacheable, it should not be cached.

Dynamic routing introduces another consistency model. If routes can be updated without restarting the gateway, operators gain flexibility. Blue-green releases, canary routing, and emergency traffic shifts become easier. The cost is that route configuration becomes distributed state. Multiple gateway instances need to converge on the same routing rules, and operators need visibility into which version each instance is using.

A mature gateway design usually separates data plane and control plane. The data plane handles requests quickly. The control plane manages configuration, validation, deployment, and rollback. Ferrox is still evolving, but the planned gRPC admin API points in that direction. That separation matters because request forwarding and route management have different performance and consistency needs.

Trade-Offs

Centralizing cross-cutting concerns reduces duplication, but it also creates a critical dependency. If the gateway is down, the system is down from the client's perspective. That means the gateway itself needs health checks, horizontal scaling, careful deploys, backpressure, and boring operational behavior. A gateway should be one of the least surprising pieces of the stack.

The Rust implementation helps with predictable resource usage, but language choice does not remove distributed systems problems. Redis can fail. PostgreSQL can be slow. Upstreams can timeout. Clients can retry too aggressively. Metrics can be missing exactly when they are needed. The design has to assume partial failure as the normal case.

Ferrox also raises the build-versus-buy question. Managed products such as AWS API Gateway, Kong, Envoy, and NGINX already solve many gateway problems. Building a gateway is still valuable when the goal is learning, local control, custom behavior, or a narrower operational model. For production adoption, the bar is higher: compatibility, observability, configuration safety, security review, load testing, and upgrade discipline all matter.

The project author's plan to use k6 for load testing is the right next step. Gateways need benchmarks that include more than happy-path throughput. Useful tests should measure p95 and p99 latency under rate limiting, Redis slowness, upstream timeouts, circuit transitions, cache hit and miss behavior, large request bodies, connection reuse, and concurrent route updates. A gateway that is fast only when every dependency is healthy has not been tested against the conditions that define gateway work.

The bigger lesson from Ferrox is that API gateways are policy engines under load. The proxying code matters, but the hard decisions are about ordering, state placement, consistency, and failure isolation. Those are the decisions that determine whether a system degrades cleanly or turns one broken dependency into a wider outage.

Comments

Loading comments...