#DevOps

Recursive Make Considered Harmful: A Deep Dive into Build System Architecture

Tech Essays Reporter
5 min read

Peter Miller's seminal paper argues that recursive make, the traditional approach to building large UNIX projects, creates fundamental problems through artificial partitioning of dependency graphs. By examining the root causes of build inefficiencies and incorrect builds, Miller demonstrates how a whole-project make approach using include statements can maintain modularity while providing complete dependency information.

In the landscape of software development, build systems form the invisible infrastructure that determines how efficiently code transforms into executables. Peter Miller's "Recursive Make Considered Harmful" presents a compelling critique of a deeply entrenched practice in UNIX development, revealing how conventional wisdom around build processes may be leading us astray from optimal solutions.

The Architecture of Dependencies

At its core, Miller's argument rests on a fundamental understanding of how the make program operates. Make functions as an expert system that constructs a directed acyclic graph (DAG) where vertices represent files and edges represent dependencies between them. The program then performs a postorder traversal of this graph to determine which files need rebuilding.

The critical insight, which Miller develops through careful analysis, is that recursive make artificially fragments this complete DAG into incomplete subsets. When each directory contains its own Makefile, the dependencies between files in different directories remain invisible to individual make invocations. This fragmentation creates a system where the whole cannot be properly understood from its parts.

{{IMAGE:1}}

Figure 1: A typical recursive make directory structure

Symptoms of a Fragmented System

The problems stemming from this artificial partitioning manifest in several familiar ways that many developers have come to accept as inevitable:

  1. Build Order Instability: The order of directory traversal becomes increasingly unstable as projects grow, requiring constant manual adjustment of the top-level Makefile.

  2. Multiple Pass Requirements: Many projects necessitate multiple passes over subdirectories to ensure everything builds correctly, directly extending development cycles.

n3. Incomplete Dependency Tracking: Inter-directory dependencies are either omitted or too difficult to express, leading to either overbuilding (to ensure nothing is missed) or underbuilding (where changes aren't properly propagated).

  1. Parallel Build Incompatibility: Recursive make implementations often cannot effectively utilize parallel build capabilities because they lack the complete dependency information needed to safely parallelize work.

Miller illustrates these problems through concrete examples showing how incomplete DAGs lead to incorrect builds. When a header file generated in one module affects source files in another, the recursive approach fails to recognize these dependencies, potentially resulting in silently broken executables.

The Whole-Project Solution

The alternative Miller proposes maintains the benefits of modular organization while providing complete dependency information. Rather than abandoning Makefiles in subdirectories, the approach transforms them into include files referenced by a single top-level Makefile.

{{IMAGE:2}}

Figure 2: Non-recursive project structure with module include files

This approach preserves the organizational benefits of directory-based structure while allowing make to construct a complete DAG. The top-level Makefile includes module-specific fragments, each contributing their portion of the dependency graph without artificial boundaries.

Several practical concerns about this approach are systematically addressed:

  • Size and Complexity: By using include statements, the total Makefile size remains comparable to recursive implementations while being more maintainable.
  • Development Workflow: Developers can still build "just their little bit" by specifying targets, while gaining the benefit of complete dependency information when needed.
  • Performance: Contrary to intuition, whole-project make often builds faster than recursive make because it avoids the overhead of multiple make processes and can make more intelligent parallelization decisions.
  • Memory Usage: Modern systems have ample memory to handle dependency graphs for projects orders of magnitude larger than those originally considered problematic.

Efficient Makefile Techniques

Beyond the architectural shift, Miller provides valuable insights into writing efficient Makefiles:

  1. Immediate Evaluation: Using := instead of = for variable assignments prevents expensive recomputation of shell commands and string operations.

  2. Strategic Include Files: Precomputing and including dependency information rather than recalculating it on each build significantly improves performance.

  3. Proper Dependency Tracking: Implementing accurate include file dependencies ensures that changes to headers trigger appropriate recompilation.

These techniques compound in their effectiveness, particularly when applied to a whole-project approach where the expensive processing occurs only once rather than in each subdirectory.

{{IMAGE:3}}

Figure 3: Complete dependency graph in a whole-project make

Cultural and Practical Barriers

Miller's analysis extends beyond technical issues to address the cultural factors that have sustained recursive make despite its problems. The paper examines influential literature that either promotes recursive make or fails to adequately address its limitations, revealing how established practices can persist despite better alternatives.

The psychological comfort of "one Makefile per directory" is understandable but ultimately misplaced. Directory structures organize files; Makefiles describe relationships between files. Confusing these two concepts leads directly to the problems identified.

Implementation and Migration

For organizations considering this transition, Miller provides practical guidance:

  • Start by identifying cross-directory dependencies in existing recursive builds
  • Gradually migrate modules to the include-based approach
  • Utilize make's VPATH feature to support development sandboxes
  • Consider the implications for shared modules between projects

The return on investment extends beyond faster builds to improved development productivity, earlier error detection, and reduced maintenance overhead of the build system itself.

Broader Implications for Build System Design

Miller's analysis resonates beyond make to inform the design of any build or dependency management system. The fundamental principle—that complete dependency information is essential for optimal builds—applies equally to modern tools like CMake, Bazel, or Pants.

The paper serves as a reminder that tools are only as effective as the information provided to them. In an era of increasingly complex software projects, the architectural decisions around how we specify dependencies may be more critical than the tools themselves.

For developers working with large codebases, Miller's work offers both a diagnosis of common build problems and a path toward more reliable, efficient development workflows. The transition from recursive to whole-project make represents not merely a technical change but a shift in how we conceptualize the relationship between code organization and build dependencies.

Further reading:

Comments

Loading comments...