Parallel Graph Traversal in Rust: Rayon's Evolution Beyond Fixed Workloads
Share this article
Implementing parallel graph traversal presents unique challenges in Rust, especially when the workload's scope remains unknown until runtime. David Lattimore, developer of the Wild linker, chronicles his team's iterative journey through Rayon's parallelism models—revealing critical insights about Rust's concurrency landscape.
The Core Challenge: Dynamic Workloads
Graph algorithms like dependency resolution often start with known roots but discover nodes dynamically during execution. This unpredictability clashes with Rayon's standard par_iter model, which requires predetermined workloads. Lattimore's team tested three distinct strategies:
1. Spawn Broadcast & Custom Work-Stealing
rayon::spawn_broadcast(|ctx| {
while let Some(job) = channel.try_recv() {
process(job, channel.clone());
}
ctx.wait(); // Custom parking/wake logic
});
Downsides: Complexity exploded as manual thread coordination prevented leveraging Rayon's parallel iterators mid-operation—threads became exclusively dedicated to custom scheduling.
2. Scoped Task Spawning
rayon::scope(|s| {
for root in roots {
s.spawn(|s| explore_node(root, s));
}
});
Trade-off: Simplified code but incurred heap-allocation costs per task—a documented Rayon caveat. Despite overhead, this approach maintained compatibility with nested parallelism.
3. Channel + par_bridge Hybrid
let (sender, receiver) = crossbeam_channel::unbounded();
roots.iter().for_each(|r| sender.send(r.clone()));
receiver.into_iter().par_bridge().for_each(|node| {
explore(node, sender.clone());
});
New Problems:
- Deadlock risk: Nested par_iter calls could block indefinitely if worker threads held sender clones while stealing channel-receiving tasks.
- Composition failure: Couldn't split borrowed data (e.g., mutable slices) across work-item enums due to borrow-checker constraints:
// Impossible: `chunk` borrows `foo`, which can't outlive the work item
WorkItem::ProcessChunk(chunk: &mut [u8])
Async/Await: A Theoretical Alternative
Lattimore speculates async/await could resolve thread-blocking issues by detaching tasks from specific threads. Libraries like async_scoped might enable patterns where:
- Tasks resume on any available thread after I/O or subtask completion
- Eliminates deadlocks from work-stealing contention
However, this remains unexplored territory in Wild's codebase.
Conclusion: Practical Compromises
Despite channel-based efficiency gains, composition limitations forced a return to scoped spawning. Lattimore suggests future Rayon optimizations—like unboxed tasks for sub-32-byte closures—could mitigate allocation overhead. This journey underscores a broader truth: in parallel Rust, ergonomic abstractions often wrestle with allocation costs and borrow-checker boundaries, pushing engineers toward context-specific trade-offs.
Source: David Lattimore's Blog