Content-addressed Rust builds: A new paradigm for distributed compilation caching

kache represents a fundamental shift in Rust build caching, moving from path-based to content-addressed storage to enable artifact sharing across different environments, worktrees, and machines.

The evolution of build systems has consistently focused on one primary goal: reducing the time developers spend waiting for code to compile. In the Rust ecosystem, Cargo's incremental cache has long been the standard solution, optimizing the common case of rebuilding a project on the same machine. However, as projects grow in complexity and development environments become more distributed, this approach reveals its limitations. kache emerges as a response to these limitations, introducing a content-addressed approach to build caching that fundamentally changes how we think about sharing compilation artifacts across different contexts.

The Fundamental Problem: Beyond Local Optimization

Cargo's incremental cache excels at what it was designed to do: making the second build of a project on the same machine significantly faster than the first. This optimization targets the tight inner-loop iteration that defines much of the development experience. However, this approach breaks down when developers switch git worktrees, spin up fresh CI runners, or hand projects to teammates—situations where the cache cannot follow because it's path-keyed and tied to a specific target/ directory.

kache addresses a different question: has anyone, anywhere compiled this exact code before? This subtle but crucial distinction shifts the caching paradigm from local optimization to distributed sharing. By operating as a RUSTC_WRAPPER, kache intercepts compilation requests and determines whether the requested artifact already exists in a content-addressed store, regardless of where the build is occurring.

Technical Architecture: Content-Addressed Storage at Scale

The implementation details reveal thoughtful engineering decisions that make kache both efficient and practical. When a compilation request comes in, kache follows a clear sequence of operations: parsing arguments, computing a cache key, checking both local and remote stores, and linking the artifact into the expected target/ location if found. If not found, it acquires a per-key build lock, runs the actual compilation, and stores the results.

The linking mechanism deserves particular attention. On filesystems supporting copy-on-write (APFS, btrfs, XFS-with-reflink), kache creates reflinks—lightweight clones that share underlying disk blocks while behaving as independent files. On other systems, it falls back to hardlinks. In both cases, the result is efficient storage with multiple paths pointing to the same physical data. A 200MB compiled library remains as one file regardless of how many worktrees use it, solving a critical storage efficiency problem that plagues traditional caching approaches.

The per-key build lock addresses an interesting concurrency challenge in workspaces where multiple crates share dependencies. Without synchronization, parallel compilation processes might redundantly compile the same dependency and race to write the cache. With the lock, the first process to acquire it performs the compilation and caches the result; subsequent processes wait, then use the cached artifact, eliminating redundant work.

The Cache Key: Capturing Compilation Identity

The heart of kache's effectiveness lies in its cache key computation—a sophisticated hash that captures every input affecting compilation output. This includes:

The full rustc version, including commit hash and host triple
The target triple
Crate metadata (name, types, edition)
Code generation flags and feature flags
Source files, including module trees and build script outputs
Environment variables referenced by env!() and option_env!()
Hashes of external dependencies
RUSTFLAGS with machine-local path prefixes replaced with stable placeholders

What kache deliberately excludes is equally important: absolute paths, machine identity, and the incremental codegen flag. This normalization ensures the same key is generated across different machines, making remote caching viable. The exclusion of machine-specific elements represents a philosophical shift—treating builds as functions of their inputs rather than their environment.

Visibility and Diagnostics: Understanding Cache Effectiveness

kache includes sophisticated monitoring tools that provide insight into cache performance. The kache monitor command opens a TUI dashboard with four tabs: Build, Projects, Store, and Transfer. The Build tab streams compilation events with outcomes, timing, and artifact sizes. The Store tab shows cached contents, while the Projects tab reveals which target/ directories share physical storage through hardlinks.

The header displays critical metrics: storage utilization and hit rates. The distinction between raw and weighted hit rates is particularly valuable. The raw rate counts every crate equally, while the weighted rate accounts for compile time per crate. The gap between these metrics reveals whether kache is catching expensive dependencies or just cheap ones—a crucial insight for determining when remote caching will provide meaningful benefits.

For quick diagnostics, kache stats provides a snapshot of performance, while kache doctor offers comprehensive system health checks, examining the binary, wrapper config, cargo config, store, and daemon to identify potential issues.

Remote Caching: Extending the Cache Across Boundaries

The same content-addressing approach that enables sharing across worktrees on a single machine also enables sharing across machines. kache supports any S3-compatible backend (AWS, Cloudflare R2, Ceph, MinIO), allowing teams to establish shared caches between CI and development environments.

For GitHub Actions, the official kunobi-ninja/kache-action provides seamless integration with GitHub's built-in cache, eliminating the need for separate S3 infrastructure. This integration represents a significant efficiency gain for CI pipelines, where build artifacts can persist across runs rather than being recreated each time.

Philosophical Implications: Content-Addressing in Build Systems

kache's approach reflects a broader trend toward content-addressing in software systems. By treating build artifacts as functions of their inputs rather than their location, kache enables a more efficient and portable model of compilation. This approach mirrors patterns seen in container registries, package managers, and version control systems—all of which have moved toward content-addressed models to enable sharing and deduplication.

The comparison with Cargo's incremental cache highlights an important insight about optimization: different problems require different solutions. Cargo's cache is optimized for the common case of iterating on a single project on one machine. kache is optimized for sharing across different environments. Both are valuable, but serving different purposes.

Limitations and Considerations

kache makes some deliberate tradeoffs. By default, it skips binary crates, dynamic libraries, and proc-macros because their outputs depend on the linker, may require code signing on macOS, and are more expensive to restore correctly. This exclusion makes sense given that compile time rarely hides in these components—the expensive work is typically in the dependency tree of libraries like serde, tokio, and their procedural macros.

Additionally, kache disables Cargo's built-in incremental compilation (CARGO_INCREMENTAL=0) when active. While both strategies aim to avoid recompiling unchanged crates, running them simultaneously can corrupt build artifacts due to how filesystems handle overlapping file operations.

The Future of Build Caching

kache represents not just a tool but a new way of thinking about build optimization in distributed development environments. As projects grow larger and development becomes increasingly remote, the ability to share compilation artifacts across different contexts becomes increasingly valuable.

The content-addressed approach pioneered by kache could influence future development in other ecosystems and build systems. The principles—normalizing inputs, capturing compilation identity, efficient linking, and remote sharing—are applicable far beyond the Rust ecosystem.

For developers working with large Rust projects, kache offers a practical solution to a persistent pain point. By installing it, setting RUSTC_WRAPPER, and observing the cache monitor, developers can quantify the benefits and potentially save significant time in their daily workflow. As the tool continues to evolve and more people contribute their experiences, it will likely become an increasingly valuable component of the Rust development ecosystem.

The open-source nature of kache, available at github.com/kunobi-ninja/kache, invites community contributions and ensures that the tool will continue to improve as it encounters diverse dependency trees and use cases. This collaborative approach mirrors the broader Rust community's commitment to building tools that solve real problems for developers.

#Build Caching #content-addressed storage #distributed compilation #kache #CI