A groundbreaking analysis of 400 million lines of code reveals that Clojure and Haskell produce the most expressive code per keystroke, while modern languages like Go and Rust show surprisingly high boilerplate levels—challenging assumptions about programming language evolution.
The Measurement That Changes How We Judge Language Efficiency
For decades, software engineers have debated programming language efficiency through subjective lenses of syntax preference and ecosystem maturity. Ben Boyter's ULOC (Unique Lines of Code) metric introduces an objective framework, analyzing 400 million lines from the top 100 GitHub repositories across 34 languages. The results upend conventional wisdom about language expressiveness.
Methodology: Beyond Line Counts
Traditional SLOC (Source Lines of Code) measurements fail to account for:
- Structural repetition (closing braces, mandatory imports)
- License header inflation
- Comment maintenance costs
ULOC addresses these by:
- Counting only unique logical units
- Including comments as maintainable artifacts
- Excluding universal boilerplate
The analysis used Boyter's scc tool with a custom automation script to process 2,703,656 files from 3,418 repositories, calculating "dryness" percentages (ULOC / total lines).
The Density Hierarchy
Languages ranked by expressiveness (ULOC/total):
| Tier | Dryness | Languages | Characteristics |
|---|---|---|---|
| High | 75%+ | Clojure, Haskell, MATLAB | Functional paradigms, minimal syntax |
| Standard | 60-70% | Java, Python, TypeScript | Balanced logic/structure |
| Boilerplate | <55% | C#, Go, CSS | Mandatory ceremonies, config bloat |
Surprising Findings:
- Java (65.72%) outperforms Kotlin (67.72%) and Scala (66.1%) in JVM ecosystem dryness
- CoffeeScript (70.05%) beats modern alternatives like TypeScript (63.34%)
- Go (58.78%) and Rust (60.5%) show nearly identical boilerplate levels
The Lisp Renaissance
Clojure's 77.91% dryness demonstrates Lisp's enduring advantage: every line expresses business logic rather than structural ceremony. Compared to C#'s 58.4%, Clojure developers write 20% less boilerplate—equivalent to saving one workday per week on redundant code.
Modern Language Tradeoffs
Despite improvements in:
- Memory safety (Rust)
- Concurrency (Go)
- Type systems (TypeScript)
these languages introduce new forms of boilerplate:
- Go's explicit error handling
- Rust's trait implementations
- TypeScript's type guards
As Boyter notes: "We spent decades building modern languages to solve old mistakes, but increased our noise-to-signal ratio."
The LLM Wildcard
Large Language Models could neutralize boilerplate disadvantages by:
- Auto-generating repetitive patterns
- Abstracting ceremony behind natural language prompts
- Compressing verbose syntax
However, this creates new challenges in:
- Code review effectiveness
- Architectural coherence
- Cognitive load from generated code
Implications for Engineering Leaders
- Language Selection: Projects requiring rapid iteration benefit more from high-dryness languages
- Tooling Investments: Linters/IDEs must target language-specific boilerplate hotspots
- Training: Engineers need conscious boilerplate recognition skills
As Boyter concludes: "If you want the highest ratio of human thought to keystrokes, the winner is the 60-year-old concept running as a modern JVM language." This paradox forces reevaluation of what constitutes true progress in language design.
Data and methodology: Full technical write-up, Analysis script, scc tool
Comments
Please log in or register to join the discussion