#DevOps

Why Nix’s Substituter List Needs a Smart Router (and How ncro Solves It)

Tech Essays Reporter
6 min read

Nix’s built‑in substituter logic walks a static, ordered list of binary caches, causing unnecessary latency when multiple caches are configured. The author created ncro, a lightweight Rust HTTP proxy that races cache lookups, remembers the fastest source, and streams NARs without storing them, dramatically improving fetch performance while preserving security.

Nix’s Substituter List Is Not a Routing Table

Published May 24, 2026 • 14 min read


The Core Argument

Nix’s substituter model is elegant in its simplicity: you list a few binary caches in nix.conf, the daemon queries them one after another, and if any cache knows the requested narinfo the build can avoid recompilation. In practice, however, this static ordered loop becomes a performance liability once you operate more than one cache. The daemon treats the list as a preference list, not a routing table; it never learns which cache is fastest for a particular hash, nor does it remember past successes. The result is a serialized scan across several geographically dispersed caches for every nix-shell -p … invocation.

How Nix Currently Works

The algorithm can be boiled down to:

  1. Iterate over the configured substituters in the order they appear.
  2. For each substituter, issue a HEAD /<hash>.narinfo request.
  3. If the response is 200, fetch the NAR and stop.
  4. If the response is 404, continue to the next substituter.
  5. No concurrency, no latency tracking, no state between requests.

Because the list is static, the daemon cannot prefer a private cache that is on‑premises over the public cache.nixos.org when the private cache is reachable, nor can it avoid a slow overseas cache when a local mirror would have answered instantly.

Introducing ncro – Nix Cache Route Optimizer

ncro (pronounced Necro) is a tiny HTTP proxy that sits between the nix-daemon and the configured substituters. It does exactly three things:

  1. Race all upstreams on every narinfo lookup, remembering which one answered first.
  2. Stream NAR bodies directly to the client without persisting them on disk.
  3. Persist routing decisions in a bounded SQLite table so that a restart does not erase the learned latency information.

The proxy is deliberately not a cache mirroring solution like ncps; it never stores payload data, thereby avoiding the operational overhead of disk space management and cache‑invalidation headaches.

Architectural Highlights

Parallel Racing & Priority Tiers

ncro groups substituters by the static priority you configure. Within each tier it launches a FuturesUnordered collection of HEAD requests and stops as soon as the first 200 arrives. This tiered approach lets you express “prefer my private cache, but fall back to the public cache if it is unavailable” without touching Nix’s core.

Deadline & Failure Classification

A per‑lookup deadline prevents a single hung upstream from stalling the whole request. Failures are split into three categories:

  • Not found (404 from every upstream)
  • Network error (TCP handshake failures)
  • Timeout (no response before the deadline)

These distinctions allow the client to receive a precise error instead of a generic failure.

Two‑Layer Cache (Moka + SQLite)

  • Moka: an in‑memory LRU cache with a configurable size (default 1024 entries) that stores recent routing decisions and the raw narinfo bytes. The TTL mirrors the route‑TTL, ensuring stale entries are evicted automatically.
  • SQLite: persists the same data on disk, providing durability across restarts. Eviction is throttled to every hundred writes using an atomic counter, a subtle bug that once caused latency metrics to be skewed until a single‑character fix corrected the off‑by‑one error.

Health Tracking with EMA

Each upstream maintains an exponential moving average (EMA) of latency. The first successful probe seeds the EMA, after which subsequent measurements are smoothed. Upstreams are classified as Active, Degraded, or Down with multiplicative back‑off (×1, ×4, ×10) to reduce probing traffic to unhealthy caches.

In‑flight Deduplication

When multiple clients request the same narinfo concurrently, ncro ensures only one race is performed. The other callers wait on a mutex and then read the result from the LRU cache, eliminating redundant network traffic and preventing a classic TOCTOU bug that could have corrupted the deduplication map.

Signature Verification

ncro validates ed25519 signatures on narinfo files using the ed25519‑dalek crate. Public keys are extracted from Nix’s name:base64(key) format, and the full signature payload (1;StorePath;NarHash;NarSize;refs) is reconstructed exactly as Nix expects. This step is mandatory; a proxy that skips verification would turn a compromised upstream into a universal attack vector.

What ncro Does Not Do

  • No NAR caching – it never stores the actual closure data. Repeated requests for the same large closure will be streamed from the chosen upstream each time. This design keeps the proxy lightweight and avoids the operational complexity of a full cache.
  • No mesh networking by default – an optional gossip layer exists for trusted‑peer environments, but it is disabled out‑of‑the‑box because it introduces a separate trust model.
  • No automatic mirroring or warm‑up jobs – the proxy focuses solely on routing; any additional functionality would dilute its purpose and increase maintenance burden.

Implications for Nix Users

By inserting ncro into the toolchain, developers and CI pipelines can:

  • Reduce average fetch latency from hundreds of milliseconds (or seconds on VPN) to a few tens of milliseconds, because the fastest cache is always chosen.
  • Avoid the “DNS‑lookup‑then‑timeout” pattern that occurs when the first cache in the list never has the requested path.
  • Preserve the declarative purity of Nix without modifying the daemon; the proxy is a drop‑in replacement that respects existing configuration files.

The approach also demonstrates a broader principle: network‑level routing can be retrofitted onto a system that assumes static preferences, without needing upstream changes.

Counter‑Perspectives

Some might argue that adding a proxy introduces another point of failure. While ncro is deliberately small (≈3 kLoC) and written in safe Rust, any additional hop can increase complexity. However, the proxy is stateless with respect to payload data and persists only lightweight routing metadata, minimizing the attack surface. Moreover, the health‑tracking logic ensures that a failing upstream is quickly demoted, preventing the proxy itself from becoming a bottleneck.

Another viewpoint is that Nix could evolve its own substituter logic to incorporate latency awareness. This would indeed be ideal, but it would require changes to the core daemon, a much larger engineering effort and a longer release cycle. ncro offers an immediate, pragmatic solution that can be adopted today.


Conclusion

ncro exemplifies how a focused, well‑engineered proxy can transform a static, ordered cache list into a dynamic, latency‑aware router without altering the underlying Nix daemon. By racing upstreams, caching routing decisions, and preserving cryptographic guarantees, it delivers measurable performance gains while keeping operational overhead low. For anyone managing multi‑cache Nix deployments—especially those pulling large C++ or Rust projects—the proxy is a practical tool that aligns with Nix’s philosophy of simplicity and reproducibility.

Give it a try and see how much faster your builds become.


GitHub repository for ncro

Official Nix documentation on substituters

Comments

Loading comments...