Inside the Wild Linker: Zero-Cost Rust Hacks for High-Performance Parallelism
Share this article
When David Lattimore took the stage at RustForge 2025 in Wellington, he didn't just share linker benchmarks—he unveiled a masterclass in squeezing every drop of performance from Rust's type system. As the creator of the Wild linker, Lattimore demonstrated how clever zero-cost abstractions can revolutionize parallelism and memory management in systems programming. Here’s how his team achieves nanosecond-level optimizations for projects like Chromium.
🔄 Mutable Slicing for Lock-Free Parallelism
Wild's symbol resolution relies on a dense Vec<SymbolId> (where SymbolId wraps a u32). To enable parallel writes without locks, Lattimore leverages contiguous memory allocation per object and Rayon's par_bridge:
fn parallel_process_resolutions(mut resolutions: &mut [SymbolId], objects: &[Object]) {
objects
.iter()
.map(|obj| (obj, resolutions.split_off_mut(..obj.num_symbols).unwrap()))
.par_bridge()
.for_each(|(obj, object_resolutions)| {
obj.process_resolutions(object_resolutions);
});
}
Key Insight:
split_off_mutcreates non-overlapping mutable slices, enabling threads to write to adjacent memory regions simultaneously. Cache locality is preserved by grouping symbols per object.
⚡️ Parallel Initialization with Sharded Vec Writer
Initializing a giant Vec serially is inefficient. Wild uses the sharded-vec-writer crate to populate memory in parallel:
let mut writer = VecWriter::new(&mut resolutions);
let mut shards = writer.take_shards(objects.iter().map(|o| o.num_symbols));
objects
.par_iter()
.zip_eq(&mut shards)
.for_each(|(obj, shard)| {
for symbol in obj.symbols() {
shard.push(...);
}
});
writer.return_shards(shards);
Why it matters: This bypasses single-threaded initialization bottlenecks—critical for binaries with millions of symbols.
⚛️ Atomic Conversions: Zero-Cost Type Punning
When Chromium’s C++ headers caused symbol collisions, Wild needed atomic writes to SymbolId. But atomics add overhead. Solution? Type-safe in-place conversion:
fn into_atomic(symbols: Vec<SymbolId>) -> Vec<AtomicSymbolId> {
symbols
.into_iter()
.map(|s| AtomicSymbolId(AtomicU32::new(s.0)))
.collect()
}
Compiler Magic: Rust reuses the heap allocation, and the loop evaporates in assembly. Lattimore’s proof:
movups xmm0, xmmword ptr [rsi]
...
ret // No loops, no branches!
">The representation of
AtomicSymbolIdis identical toSymbolId. We exploit this to make the optimizer do the heavy lifting," Lattimore noted.
♻️ Buffer Reuse Across Lifetimes
Recycling heap allocations often clashes with lifetimes. Wild’s solution? reuse_vec:
fn reuse_vec<T, U>(mut v: Vec<T>) -> Vec<U> {
const {
assert!(size_of::<T>() == size_of::<U>());
assert!(align_of::<T>() == align_of::<U>());
}
v.clear();
v.into_iter().map(|_| unreachable!()).collect()
}
Use Case: Convert Vec<&'a str> to Vec<&'b str> without reallocating. The compiler elides the loop, leaving only a length reset.
🧵 Offloading Deallocation
Freeing huge buffers blocks threads. Wild spawns Rayon tasks to drop them asynchronously:
rayon::spawn(|| drop(buffer));
Caveat: Only beneficial for massive allocations (verified via profiling). Combine with reuse_vec to sidestep lifetime issues.
💣 Bonus: Stripping Lifetimes with Non-Trivial Drop
For structs like Foo<'a> { owned: String, borrowed: &'a str }, Wild uses MaybeUninit to erase lifetimes pre-deallocation:
struct StaticFoo {
owned: String,
borrowed: MaybeUninit<&'static str>,
}
fn without_lifetime(foos: Vec<Foo>) -> Vec<StaticFoo> {
foos.into_iter()
.map(|f| StaticFoo {
owned: f.owned,
borrowed: MaybeUninit::uninit()
})
.collect()
}
Again—zero runtime cost. The assembly is identical to a memcpy.
The Philosophy of Zero-Cost Fearlessness
Lattimore’s tricks reveal a deeper truth: Rust’s type system isn’t a barrier—it’s a toolkit for safe radical optimization. By leaning into representation guarantees and optimizer behavior, Wild achieves C-like speed without unsafe spaghetti. These patterns extend far beyond linkers; imagine applying atomic conversion to GPU buffers or reuse_vec to database caches. As Lattimore concluded: "When the compiler understands your intent, it becomes your most powerful ally in the quest for performance."
Source: David Lattimore's talk at RustForge 2025. Wild linker source code.