Embedded Dev Proposes Compiler Optimizations to Reduce Async Rust Binary Bloat

An embedded software engineer has identified significant inefficiencies in async Rust compilation and proposed compiler-level optimizations that could reduce binary size by up to 5% for embedded applications.

Async Rust has become a powerful tool for concurrent programming across different environments, from servers to microcontrollers. However, as Dion, an embedded software engineer at Tweede golf, points out in a recent blog post, the language's async implementation still suffers from what he calls an "MVP state" that introduces unnecessary binary bloat, particularly problematic for resource-constrained embedded systems.

The core issue lies in how async Rust code gets transformed into state machines by the compiler. Dion explains that even simple async functions generate significantly more MIR (Mid-level Intermediate Representation) than their synchronous counterparts. For example, a simple async function that returns a constant value generates 360 lines of MIR compared to just 23 lines for the non-async version.

"Every byte of binary size counts and async introduces a lot of bloat," Dion writes. "This bloat exists on desktops and servers as well, but it's much less noticeable when you have substantially more memory and compute available."

The blog post provides a detailed analysis of the generated code structure, revealing that the compiler creates several states for each async function:

Unresumed - The initial state
Returned - The state after completion
Panicked - The state after a panic occurs
SuspendX - States for each await point

Dion identifies several optimization opportunities that could significantly reduce this bloat:

1. Eliminate panics in the Returned state Currently, when an async future completes and is polled again, it panics. Dion proposes changing this behavior to return Poll::Pending in release builds, which would still be safe but eliminate the panic path. He tested this optimization and observed a 2-5% reduction in binary size for embedded firmware.

2. Eliminate state machines for async blocks without awaits For simple async functions that don't contain any await points, the compiler could generate a simpler implementation that always returns Poll::Ready rather than creating a full state machine. This optimization alone saved 0.2% of binary size in Dion's testing.

3. Future inlining Currently, when one async function calls another, each gets its own state machine. Dion suggests the compiler could inline simpler futures, essentially merging their state machines. This is particularly beneficial for common patterns where async functions are used to transform signatures for trait implementations.

4. Collapse identical states When code paths in async functions are identical (such as in match statements that lead to the same await point), the compiler could collapse these into a single state. Dion demonstrated this by refactoring code that went from 456 lines of MIR with duplicate states to 302 lines without duplication.

"The prime opportunity for inlining is this pattern: async fn foo() { ... } async fn bar() { foo().await } With the current compiler, bar gets its own state machine that calls the foo state machine, which is very wasteful," Dion explains. "Instead, bar could also become foo by just returning the foo future."

These optimizations aren't just academic - they have real-world implications for embedded development. The combination of the first two optimizations resulted in approximately a 3% performance increase in a synthetic benchmark using the smol executor.

Dion has already created proof-of-concept implementations for some of these optimizations and has submitted them as Project Goals to the Rust project. He's seeking funding to complete this work, estimating that €30,000 could implement all or most of these optimizations.

"I want to work on this in the compiler and as such have submitted it as a Project Goal," Dion writes. "But I need your help because I can't do much without funding. If you're with a company or organization that would benefit from this work and would be willing to (partially) fund it, please contact me at [email protected]."

The proposed changes would represent a significant improvement for async Rust, particularly in the embedded space where binary size and memory usage are critical constraints. By addressing these inefficiencies at the compiler level rather than requiring developers to work around them, the Rust ecosystem could become even more attractive for resource-constrained applications.

For developers interested in the technical details, Dion has shared his proof-of-concept implementations on GitHub and GitHub.