Implementing parallel graph traversal presents unique challenges in Rust, especially when the workload's scope remains unknown until runtime. David Lattimore, developer of the Wild linker, chronicles his team's iterative journey through Rayon's parallelism models—revealing critical insights about Rust's concurrency landscape.

The Core Challenge: Dynamic Workloads

Graph algorithms like dependency resolution often start with known roots but discover nodes dynamically during execution. This unpredictability clashes with Rayon's standard par_iter model, which requires predetermined workloads. Lattimore's team tested three distinct strategies:

1. Spawn Broadcast & Custom Work-Stealing

rayon::spawn_broadcast(|ctx| {
    while let Some(job) = channel.try_recv() {
        process(job, channel.clone());
    }
    ctx.wait(); // Custom parking/wake logic
});

Downsides: Complexity exploded as manual thread coordination prevented leveraging Rayon's parallel iterators mid-operation—threads became exclusively dedicated to custom scheduling.

2. Scoped Task Spawning

rayon::scope(|s| {
    for root in roots {
        s.spawn(|s| explore_node(root, s));
    }
});

Trade-off: Simplified code but incurred heap-allocation costs per task—a documented Rayon caveat. Despite overhead, this approach maintained compatibility with nested parallelism.

3. Channel + par_bridge Hybrid

let (sender, receiver) = crossbeam_channel::unbounded();
roots.iter().for_each(|r| sender.send(r.clone()));
receiver.into_iter().par_bridge().for_each(|node| {
    explore(node, sender.clone());
});

New Problems:
- Deadlock risk: Nested par_iter calls could block indefinitely if worker threads held sender clones while stealing channel-receiving tasks.
- Composition failure: Couldn't split borrowed data (e.g., mutable slices) across work-item enums due to borrow-checker constraints:

// Impossible: `chunk` borrows `foo`, which can't outlive the work item
WorkItem::ProcessChunk(chunk: &mut [u8])

Async/Await: A Theoretical Alternative

Lattimore speculates async/await could resolve thread-blocking issues by detaching tasks from specific threads. Libraries like async_scoped might enable patterns where:
- Tasks resume on any available thread after I/O or subtask completion
- Eliminates deadlocks from work-stealing contention
However, this remains unexplored territory in Wild's codebase.

Conclusion: Practical Compromises

Despite channel-based efficiency gains, composition limitations forced a return to scoped spawning. Lattimore suggests future Rayon optimizations—like unboxed tasks for sub-32-byte closures—could mitigate allocation overhead. This journey underscores a broader truth: in parallel Rust, ergonomic abstractions often wrestle with allocation costs and borrow-checker boundaries, pushing engineers toward context-specific trade-offs.

Source: David Lattimore's Blog