LLVM's DTLTO Now More Efficiently Adding Files To The Link For Much Better Performance

A small LLVM patch dramatically improves DTLTO file linking performance, reducing Windows times from ~2.8 seconds to ~157ms and Linux times from ~255ms to ~41ms.

LLVM's DTLTO (Distributed ThinLTO) support has received a significant performance boost through a small but impactful code change that dramatically improves how files are added to the link process. The improvement, contributed by Ben Dunbobbin, addresses a performance bottleneck that was particularly noticeable on Windows systems with high-core-count processors.

The Performance Problem

When LLVM began implementing DTLTO as an enhancement to their ThinLTO approach for link-time optimizations, developers discovered that adding files to the link was taking an excessive amount of time. The issue stemmed from how DTLTO handled file additions differently from standard ThinLTO.

In the standard ThinLTO backend, object files are typically generated in memory and added directly to the link. However, DTLTO was adding files to the link from disk in all cases, regardless of whether the ThinLTO cache was in use. This approach created unnecessary overhead, especially when the cache wasn't being utilized.

The Technical Solution

The fix introduces an optional AddBufferFn callback that clients can provide to the DTLTO ThinLTO backend. When available, this callback allows the backend to add files to the link more efficiently by moving MemoryBuffer ownership rather than handling files from disk.

Ben Dunbobbin explained the technical details: "The in-process ThinLTO backend typically generates object files in memory and adds them directly to the link, except when the ThinLTO cache is in use. DTLTO is unusual in that it adds files to the link from disk in all cases."

Benchmark Results

The performance improvements are substantial across different platforms:

Windows Performance (AMD Ryzen 16-core @ ~4.5 GHz):

Before: ~2799 ms
After: ~157 ms
Improvement: ~94% reduction in time

Linux Performance (Ryzen 9 5950X @ up to 5.09 GHz):

Before: ~255 ms
After: ~41 ms
Improvement: ~84% reduction in time

These tests were conducted using a Clang link with debug build, sanitizers, and instrumentation, using an optimized toolchain (PGO non-LTO, llvmorg-22.1.0).

Why This Matters

Link-time optimization is a critical phase in the compilation process, especially for large projects and complex software builds. The DTLTO enhancement was designed to improve distributed compilation scenarios, but the file addition bottleneck was undermining its performance benefits.

By reducing the file addition time from nearly 3 seconds to just 157 milliseconds on Windows, this change makes DTLTO much more practical for real-world use cases. The improvement is particularly valuable for developers working on large codebases or in distributed build environments where link-time optimization is essential.

The Code Change

Remarkably, this significant performance improvement comes from reworking just a few dozen lines of code. This demonstrates how targeted optimizations in critical paths can yield substantial benefits without requiring major architectural changes.

The change will be included in LLVM/Clang 23, making it available to developers using the latest compiler versions. For teams already using DTLTO or considering it for their build processes, this improvement removes a major performance concern that might have been deterring adoption.

The patch represents the kind of incremental but meaningful improvement that LLVM developers consistently deliver, showing how careful attention to performance bottlenecks in core functionality can benefit the entire ecosystem of software that depends on these compilers.