The Postmodern Build System (Updated 2025)
#DevOps

The Postmodern Build System (Updated 2025)

Tech Essays Reporter
7 min read

A deep examination of what a trustworthy, incremental, and distributed build system could look like, why existing tools fall short, and concrete paths forward that combine ideas from Nix, Bazel, Buck2, persistent workers, and emerging deterministic execution techniques.

The Postmodern Build System (Updated 2025)

Featured image

“A build system that trusts nothing, reuses everything, and runs everywhere.”
That sentence tries to capture the paradox at the heart of today’s build infrastructure: we want the safety of hermetic, reproducible builds, yet we also demand the speed of incremental recompilation across many machines. The essay below unpacks the tension, surveys the state of the art, and sketches a roadmap for a system that reconciles these goals.


1. The Core Argument: Trustworthy Incremental Builds without Identity

Traditional incremental systems rely on a notion of build identity: the build engine remembers that “this version of foo.c produced foo.o last time” and reuses the artifact if the inputs appear unchanged. The danger is that the engine’s notion of “unchanged” is usually a binary hash of the file contents. If a bug in the incremental logic causes a stale artifact to be reused, the only way to detect it is a hash collision, an event that is astronomically unlikely but not impossible. In safety‑critical environments even that remote risk is unacceptable.

A postmodern build system therefore discards the idea of a privileged previous version. Instead it treats every build step as a pure function that is re‑executed whenever any of its declared inputs change. The price is wasted CPU cycles because the system cannot recognise that two different inputs nevertheless produce the same output (semantic equivalence). The trade‑off is deliberate: for production pipelines, the cost of a stray incremental bug outweighs the extra compute.

1.1. Sandboxing and Pure Execve

To make a step a true pure function we must sandbox the execve call that runs the compiler or tool. The sandbox guarantees that the process sees only the inputs we declare—source files, environment variables, and read‑only system libraries. When the sandboxed process finishes, its output is stored under a name derived from a hash of all inputs. If the same hash appears later, the result can be reused safely because the sandbox ensures deterministic behaviour.


2. Building on Existing Ideas

2.1. Nix’s Derivation Model

Nix introduced the concept of a derivation: a description of a single execve together with a hash‑based store path. The store path is immutable, which eliminates interference between unrelated projects on the same machine. Nix’s two addressing modes—input‑addressed (hash of the derivation) and fixed‑output (hash of the expected result)—show how to handle both source builds and binary fetches.

However, Nix suffers from two shortcomings for a postmodern system:

  1. Coarse granularity – each derivation corresponds to one process, so a compiler that internally compiles many source files still appears as a single opaque step.
  2. Monadic evaluation – Nix’s language can express dynamic dependencies (the build plan depends on the result of a prior step), but the evaluator blocks while waiting for those steps, losing parallelism.

2.2. Dynamic Derivations and Import‑From‑Derivation

Both ideas attempt to expose the dynamic part of the graph as a separate derivation that can be built later. The result is a placeholder in the overall graph, allowing the evaluator to continue constructing the rest of the plan. In practice these features have been removed from the mainstream Lix implementation because of engineering complexity, but they illustrate a path toward a build graph that can be serialized, shipped to another machine, and executed without a live evaluator.

2.3. Bazel, Buck2, and Persistent Workers

Bazel and Buck2 adopt an applicative model: the full action graph is known before any command runs. This enables fast incremental analysis because a file‑watcher can prune the graph to the minimal set of actions that need rerunning. Their persistent worker design mitigates the startup cost of compilers: a long‑lived RPC server receives compilation requests (e.g., javac, gcc -c) and reuses internal caches such as pre‑compiled headers. Workers do not read previous build products, so they avoid the trust problem that plagues compiler‑level incrementalism.

The worker model therefore offers a pragmatic middle ground: small, hermetic actions with low latency, without sacrificing the guarantee that each action is recomputed from scratch when any input changes.


3. Implications for Multi‑Language, Distributed Builds

3.1. Cross‑Language Incrementality

Languages differ in the granularity of their compilation units. Rust and Haskell treat an entire crate as a single unit, while C/C++ split at the source file level. A postmodern system cannot force a language to expose finer granularity, but it can wrap the compiler in a persistent worker and let the outer build system schedule each source file as a separate action. The worker then performs the heavy lifting (parsing, type‑checking) while still respecting the sandbox’s deterministic contract.

3.2. Remote Build Execution (RBE)

Both Bazel and Buck2 already support the Remote Build Execution protocol. By storing each action’s hash‑derived output in a content‑addressable cache, the same artifact can be fetched from any machine that participates in the RBE pool. This satisfies the postmodern goal of maximising reuse across builds while keeping the system multitenant: the cache key contains the full input hash, so two unrelated projects can safely share the same cache without colliding.

3.3. Integration with Existing Ecosystems

A realistic migration path does not require rewriting every inner build system. Instead we can:

  1. Instrument compilers to expose a small RPC interface (the persistent worker).
  2. Replace the outer orchestrator with a Bazel‑like daemon that builds a static action graph from language‑specific BUILD files.
  3. Leverage Nix‑style derivations for external tools (e.g., clang‑format, protoc) that are naturally single‑process.

The result is a hybrid where Nix‑style purity guarantees deterministic execution of external tools, while Bazel‑style analysis provides fast incremental decisions for source compilation.


4. Counter‑Perspectives and Open Challenges

4.1. Determinism vs. Performance

Pure sandboxed execution can be slower than native runs because the sandbox must copy inputs and isolate the environment. Persistent workers alleviate the cost of process startup but cannot eliminate the overhead of copying large dependency trees. Future work may explore copy‑on‑write snapshots of the sandbox filesystem or overlayfs layers that share unchanged files between workers.

4.2. Build Identity for Human Reasoning

Developers are accustomed to naming builds (e.g., “the last successful CI run”). Removing build identity altogether can make debugging harder because there is no stable label to refer to. A practical compromise is to expose the hash of the action graph as a build fingerprint that can be logged and queried, without using it to decide whether to reuse artifacts.

4.3. Compatibility with Distribution Packaging

Linux distributions still rely on Make‑style packaging pipelines that expect a single monolithic build step per package. Introducing a fine‑grained, distributed system would require a thin compatibility layer that aggregates the many small actions into a single package artifact. Projects such as Buck2‑Nix integration demonstrate that this is feasible, but the tooling ecosystem is not yet mature.


5. Concrete Path Forward

  1. Standardise a Persistent‑Worker API – a language‑agnostic protobuf definition that compilers can implement. The API should expose a deterministic execute(inputHash, payload) → outputHash call.
  2. Adopt a Content‑Addressable Store (CAS) – reuse the design of the Nix store for all action outputs, ensuring that any two identical actions map to the same path.
  3. Build a Minimal Evaluator – a lightweight daemon that reads BUILD files, constructs a static action graph, and hands each node to the worker pool. The evaluator need not understand the semantics of each language, only the declared dependencies.
  4. Provide a Migration Toolkit – scripts that wrap existing Make, CMake, or Stack projects in a thin layer that generates the required BUILD files and registers the compiler as a worker.
  5. Invest in RBE‑Ready Caches – distributed caches that store CAS objects across data‑centers, making cross‑machine reuse a first‑class feature.

6. Conclusion

The waste caused by rebuilding identical artifacts is not merely an inconvenience; it consumes energy, delays feedback, and obscures the true state of a codebase. By combining Nix’s pure execve sandboxing, Bazel’s fast static analysis, and the persistent‑worker model, we can construct a postmodern build system that trusts nothing, reuses everything, and scales across machines. The ingredients already exist; the remaining work is to stitch them together into a coherent, production‑ready toolchain.

If the community embraces these ideas, the next generation of build systems will finally deliver the promise of instantaneous, trustworthy feedback without sacrificing the reproducibility that modern software development demands.

Comments

Loading comments...