A deep‑dive into Cheng Huang’s three‑month effort to rebuild Azure’s Replicated State Library in Rust using AI coding agents, covering the concrete productivity gains, the novel contract‑driven testing workflow, lightweight spec‑driven development, and the limits that still require human oversight.

What a 100 K‑line Rust project really tells us about AI‑assisted development

Claim: AI can replace a team of engineers for a production‑grade consensus system

Cheng Huang reports that, with a handful of AI coding agents (Claude Code, Codex CLI, GitHub Copilot, etc.), he wrote 130 K lines of Rust implementing a full multi‑Paxos engine in roughly six weeks. The codebase includes leader election, log replication, snapshotting, configuration changes, and a test suite of more than 1 300 cases. Performance was tuned from 23 K ops / s to **300 K ops / s** on a laptop, and the resulting system supports pipelining and non‑volatile memory (NVM) – two major gaps in Azure’s original Replicated State Library (RSL) [1].

What’s actually new

1. A modern Rust re‑implementation of RSL

Pipelined voting – requests no longer block while a previous vote is in flight, cutting latency dramatically.
NVM‑aware log – the persistence layer uses the verified PoWER‑Never‑Corrupts design [5] to write directly to persistent memory, reducing commit latency on Azure‑grade hardware.
Hardware‑aware abstractions – the codebase is structured to allow RDMA integration later, even though that part is still pending.

The core protocol logic follows the multi‑Paxos design described by Jay Lorch [4] and matches the feature set of the original RSL, but the implementation is idiomatic Rust, leveraging ownership, zero‑cost abstractions, and async‑free paths where possible.

2. Contract‑driven development powered by LLMs

Contracts as first‑class artifacts – pre‑conditions, post‑conditions, and invariants are written in a DSL that the AI expands into runtime assert! checks for testing and optional debug_assert! for production builds.
Test generation from contracts – once a contract exists, a prompt to Claude or Codex produces targeted unit tests covering edge cases.
Property‑based testing – the AI translates contracts into proptest‑style generators, automatically exploring large input spaces. One such generated test caught a subtle safety violation in the phase‑2a handler before any manual test could expose it.

3. Lightweight spec‑driven workflow

Instead of a heavyweight document chain (requirement → design → task list), Huang uses the spec‑kit toolset:

/specify creates a short markdown spec with user stories and acceptance criteria.
/clarify asks the model to critique and extend the spec, surfacing missing scenarios.
/plan generates a concrete step‑by‑step plan for a single story.

The process treats a single user story as the atomic unit of AI work, which keeps the context size manageable and allows on‑the‑fly adjustments without breaking document consistency.

4. AI‑in‑the‑loop performance tuning

The optimization loop is essentially:

Prompt AI to instrument latency points.
Run a benchmark, collect traces.
Let the model write a Python script to compute quantiles and highlight hot paths.
Ask for concrete code changes (e.g., replace Arc<Mutex<>> with lock‑free AtomicUsize, eliminate redundant clone() calls, batch allocations).
Re‑measure.

Repeating this three‑week cycle raised throughput by ~13×. The biggest wins came from:

Reducing async task spawns on the critical path.
Consolidating allocation‑heavy buffers into a reusable arena.
Applying zero‑copy deserialization for network messages.

Limitations and open questions

Area	Current state	Remaining challenges
Correctness	1 300+ tests, contract‑driven property checks.	Formal verification is still missing; contracts are only runtime checks and can be disabled in production.
Hardware support	Pipelining and NVM integrated.	RDMA integration is still a TODO; the codebase will need careful unsafe‑FFI handling that AI alone may not audit safely.
AI autonomy	Human reviews contracts, test output, and performance suggestions.	Fully hands‑off pipelines (e.g., AI writes, runs, and validates contracts without human sign‑off) remain speculative.
Scalability of prompts	Prompt sizes stay under 8 K tokens by focusing on single stories.	Larger architectural changes (e.g., redesigning the replication log) may exceed context limits, requiring manual chunking.
Cost & rate limits	Two paid Anthropic plans and a ChatGPT Plus subscription were needed to avoid throttling.	Economic feasibility for larger teams is unclear; many organizations would need to budget for multiple concurrent API keys.

Takeaways for practitioners

AI excels at repetitive, well‑scoped tasks – writing boilerplate, generating tests from explicit contracts, and suggesting micro‑optimizations.
Human oversight is still essential – contracts must be reviewed, performance regressions need domain knowledge, and unsafe code paths require manual audit.
Tooling matters – a CLI‑first workflow (Codex CLI) that can be scripted fits naturally into a build‑test‑refine loop. IDE‑centric approaches tend to introduce more context‑switching.
Cost discipline drives usage – treating the AI subscription as a sunk cost (the “$100/month forcing function”) can keep momentum high, but teams should monitor token usage to avoid runaway bills.

Wish list for the next generation of AI‑assisted coding

End‑to‑end story execution – a model that can ingest a user story, generate spec, write code, produce contracts, and run the full test suite without manual prompting.
Automated contract lifecycle – AI that maintains contract‑test sync, updates failing contracts after a refactor, and flags when a contract becomes obsolete.
Self‑optimizing performance loops – a system that can explore a search space of compiler flags, data‑structure variants, and micro‑benchmarks, reporting the Pareto‑optimal trade‑offs.
Integrated cost awareness – prompts that include token‑budget constraints and suggest cheaper alternative models when appropriate.

References & resources

Azure Replicated State Library (RSL) – internal Microsoft documentation (referenced in the post).
Jay Lorch’s design markdown for multi‑Paxos: https://github.com/microsoft/replicated-state-library-design
PoWER‑Never‑Corrupts paper (OSDI 2025): https://www.usenix.org/conference/osdi25/presentation/power-never-corrupts
Spec‑kit CLI: https://github.com/speckit/spec-kit
Codex CLI repository: https://github.com/openai/codex-cli
Claude Code documentation: https://docs.anthropic.com/claude-code

Learnings from 100K Lines of Rust with AI | Cheng Huang’s corner

This article reflects a practitioner’s perspective on what AI‑driven development can achieve today, where it still falls short, and what concrete improvements would make the workflow truly autonomous.

#AI #Rust #Paxos #Performance #Contract-driven testing

What a 100 K‑line Rust project really tells us about AI‑assisted development

What a 100 K‑line Rust project really tells us about AI‑assisted development

Claim: AI can replace a team of engineers for a production‑grade consensus system

What’s actually new

1. A modern Rust re‑implementation of RSL

2. Contract‑driven development powered by LLMs

3. Lightweight spec‑driven workflow

4. AI‑in‑the‑loop performance tuning

Limitations and open questions

Takeaways for practitioners

Wish list for the next generation of AI‑assisted coding

References & resources

Comments

What a 100 K‑line Rust project really tells us about AI‑assisted development

What a 100 K‑line Rust project really tells us about AI‑assisted development