Bazel's Silent Saboteur: How Non-Determinism Erodes Build Trust
#AI

Bazel's Silent Saboteur: How Non-Determinism Erodes Build Trust

LavX Team
2 min read

Bazel's powerful caching relies on action determinism, but subtle non-determinism—like embedded timestamps or unstable environments—can cause insidious build failures and break reproducibility. This deep dive explores common causes, diagnostic strategies, and practical fixes to safeguard your pipelines. Understanding these pitfalls is critical for maintaining developer trust and securing the software supply chain.

Article Image

At the heart of Bazel’s promise of fast, reliable builds lies a fragile assumption: that every action—whether compiling code or linking binaries—produces identical outputs given identical inputs. When this determinism fails, it doesn’t just slow down builds; it fractures trust in the entire system. Developers encounter baffling cache misses, inconsistent artifacts, and the dreaded "works on my machine" paradox. This isn’t merely an inconvenience; it undermines reproducibility, security audits, and collaborative efficiency in complex software ecosystems.

The Anatomy of a Bazel Action

Bazel breaks builds into atomic units called actions, defined by three pillars:

  1. Command-line invocation (e.g., cc -o lib1.o lib1.c)
  2. Input file hashes (source files, tools, dependencies)
  3. Environmental configuration (platform details, env variables)

These elements form a fingerprint that determines cacheability. But as Julio Merino explains in his Blog System/5 article, this model is only "quite precisely" reliable. Non-determinism slips through when actions inadvertently introduce variables outside Bazel’s control.

Why Non-Determinism Breaks Everything

Consider a simple genrule:

genrule(
    name = "date",
    outs = ["date.txt"],
    cmd = "date >$@",
)

Each run outputs the current timestamp—a trivial example, but it highlights the chaos. Rebuilds after a bazel clean yield different artifacts, voiding cache benefits. Worse, non-determinism can propagate:

$ bazel build //:copy2  # First run
$ rm -f bazel-bin/date.txt
$ bazel build //:copy2  # Forces partial rebuild

Here, actions downstream of date may rerun unnecessarily, wasting resources. While some targets (like line counters) can "absorb" non-determinism, others amplify it.

Common Culprits: Beyond Timestamps

Non-determinism isn’t always obvious. Key sources include:

  • Timestamps & IDs: Compilers embedding build times or PIDs.
  • Unordered Data: Hash-based operations leaking unsorted outputs.
  • Network Access: Fetching external resources mid-build.
  • Host Tool Contamination: Reliance on system tools (e.g., /usr/bin/ar) with hidden dependencies.
  • Randomness: Tools sampling /dev/random.

Sandboxing helps by restricting file/network access but isn’t foolproof. Linux sandboxes mask PIDs; macOS variants can’t. As Merino notes: "Sandboxing requires kernel support... Bazel’s capabilities depend on the host machine."

Diagnosing the Invisible

Detect non-determinism by diffing execution logs:

bazel clean
bazel build --noremote_accept_cached --execution_log_json_file=before //:target
bazel clean
bazel build --noremote_accept_cached --execution_log_json_file=after //:target
diff -u before after

Outputs reveal changing file hashes, like a mutated date.txt digest. This pinpoints rogue actions.

Fighting Back: Strategies for Determinism

Prevention demands vigilance:

  1. CI Guardrails: Automate log-diff checks to catch regressions.
  2. Hermetic Toolchains: Isolate compilers; never use host-provided tools.
  3. Strict Sandboxing: Enable --nosandbox_default_allow_network.
  4. Env Sanitization: Use --action_env and --strict_action_env to freeze variables.
  5. Remote Execution: Offload volatile actions to controlled environments.

Article Image

Non-determinism isn’t just a technical quirk—it’s a threat to verifiable builds. In an era demanding software transparency, letting timestamps or entropy compromise artifacts is untenable. By auditing actions and embracing hermetic practices, teams can transform Bazel from a brittle tool into a bedrock of reliability. As Merino’s series continues, expect deeper dives into remote caching’s role in sealing these cracks.

Source: Blog System/5

Comments

Loading comments...