Bazel's Silent Saboteur: How Non-Determinism Erodes Build Trust
Share this article
At the heart of Bazel’s promise of fast, reliable builds lies a fragile assumption: that every action—whether compiling code or linking binaries—produces identical outputs given identical inputs. When this determinism fails, it doesn’t just slow down builds; it fractures trust in the entire system. Developers encounter baffling cache misses, inconsistent artifacts, and the dreaded "works on my machine" paradox. This isn’t merely an inconvenience; it undermines reproducibility, security audits, and collaborative efficiency in complex software ecosystems.
The Anatomy of a Bazel Action
Bazel breaks builds into atomic units called actions, defined by three pillars:
1. Command-line invocation (e.g., cc -o lib1.o lib1.c)
2. Input file hashes (source files, tools, dependencies)
3. Environmental configuration (platform details, env variables)
These elements form a fingerprint that determines cacheability. But as Julio Merino explains in his Blog System/5 article, this model is only "quite precisely" reliable. Non-determinism slips through when actions inadvertently introduce variables outside Bazel’s control.
Why Non-Determinism Breaks Everything
Consider a simple genrule:
genrule(
name = "date",
outs = ["date.txt"],
cmd = "date >$@",
)
Each run outputs the current timestamp—a trivial example, but it highlights the chaos. Rebuilds after a bazel clean yield different artifacts, voiding cache benefits. Worse, non-determinism can propagate:
$ bazel build //:copy2 # First run
$ rm -f bazel-bin/date.txt
$ bazel build //:copy2 # Forces partial rebuild
Here, actions downstream of date may rerun unnecessarily, wasting resources. While some targets (like line counters) can "absorb" non-determinism, others amplify it.
Common Culprits: Beyond Timestamps
Non-determinism isn’t always obvious. Key sources include:
- Timestamps & IDs: Compilers embedding build times or PIDs.
- Unordered Data: Hash-based operations leaking unsorted outputs.
- Network Access: Fetching external resources mid-build.
- Host Tool Contamination: Reliance on system tools (e.g., /usr/bin/ar) with hidden dependencies.
- Randomness: Tools sampling /dev/random.
Sandboxing helps by restricting file/network access but isn’t foolproof. Linux sandboxes mask PIDs; macOS variants can’t. As Merino notes: "Sandboxing requires kernel support... Bazel’s capabilities depend on the host machine."
Diagnosing the Invisible
Detect non-determinism by diffing execution logs:
bazel clean
bazel build --noremote_accept_cached --execution_log_json_file=before //:target
bazel clean
bazel build --noremote_accept_cached --execution_log_json_file=after //:target
diff -u before after
Outputs reveal changing file hashes, like a mutated date.txt digest. This pinpoints rogue actions.
Fighting Back: Strategies for Determinism
Prevention demands vigilance:
1. CI Guardrails: Automate log-diff checks to catch regressions.
2. Hermetic Toolchains: Isolate compilers; never use host-provided tools.
3. Strict Sandboxing: Enable --nosandbox_default_allow_network.
4. Env Sanitization: Use --action_env and --strict_action_env to freeze variables.
5. Remote Execution: Offload volatile actions to controlled environments.
Non-determinism isn’t just a technical quirk—it’s a threat to verifiable builds. In an era demanding software transparency, letting timestamps or entropy compromise artifacts is untenable. By auditing actions and embracing hermetic practices, teams can transform Bazel from a brittle tool into a bedrock of reliability. As Merino’s series continues, expect deeper dives into remote caching’s role in sealing these cracks.
Source: Blog System/5