Mistral AI Launches Leanstral: Open-Source Code Agent for Formal Verification

Mistral AI releases Leanstral, the first open-source code agent designed for Lean 4 proof assistant, offering efficient formal verification capabilities at a fraction of the cost of competitors.

Mistral AI has unveiled Leanstral, a groundbreaking open-source code agent specifically engineered for Lean 4, marking a significant advancement in formal verification and proof engineering. The 6B parameter model represents the first dedicated AI agent for this mathematical proof assistant, addressing a critical bottleneck in high-stakes software development where human review of machine-generated code becomes the primary constraint on engineering velocity.

The Problem: Human Review as the Scaling Bottleneck

The announcement highlights a fundamental challenge in modern software development: as AI agents become increasingly capable at code generation, the time and specialized expertise required to manually verify their outputs becomes the limiting factor in engineering productivity. This verification bottleneck is particularly acute in domains ranging from frontier research mathematics to mission-critical software systems, where correctness isn't optional but essential.

Mistral's vision extends beyond simple code generation to a new generation of coding agents that can both execute tasks and formally prove their implementations against strict specifications. The goal is to eliminate the need for humans to debug machine-generated logic, instead allowing developers to simply dictate what they want and have the system deliver provably correct implementations.

Leanstral's Technical Architecture

Unlike existing proving systems that act as wrappers around large generalist models or focus on single math problems, Leanstral is purpose-built for operating in realistic formal repositories. The model employs a highly sparse architecture optimized specifically for proof engineering tasks, with 6B active parameters that deliver surprising efficiency compared to much larger competitors.

The architecture leverages parallel inference with Lean as a perfect verifier, creating a system that is both performant and cost-efficient. This design choice enables Leanstral to operate effectively while maintaining a fraction of the computational overhead required by closed-source alternatives.

Open and Accessible Design

Mistral has committed to making Leanstral widely available through multiple channels:

Apache 2.0 License: The model weights are released under an open-source license, allowing developers to download and run the system on their own infrastructure
Mistral Vibe Integration: Leanstral is integrated directly into Mistral's vibe coding environment for immediate, zero-setup access
Free API Endpoint: A dedicated API endpoint (labs-leanstral-2603) provides free or near-free access for a limited period
Technical Documentation: A comprehensive tech report detailing the training approach and a new evaluation suite called FLTEval will be released

Evaluation and Performance

To assess Leanstral's practical utility, Mistral benchmarked the model against leading coding agents and open-source models using FLTEval, a new evaluation suite designed to move beyond competition math and focus on realistic proof engineering scenarios. The benchmarks measured Leanstral's ability to complete formal proofs and correctly define new mathematical concepts in each PR to the FLT project.

Performance Against Open-Source Models

Leanstral-120B-A6B demonstrates remarkable efficiency advantages over larger open-source peers:

GLM5-744B-A40B: Caps at approximately 16.6 on FLTEval
Kimi-K2.5-1T-32B: Caps at approximately 20.1 on FLTEval
Qwen3.5-397B-A17B: Requires 4 passes to reach 25.4

In contrast, Leanstral achieves a score of 26.3 with just two passes and scales linearly to 29.3 at the same cost level as Qwen3.5's 4-pass performance.

Performance Against Claude Family

Leanstral serves as a high-value alternative to the Claude suite, offering competitive performance at dramatically reduced costs:

Model	Cost ($)	Score
Haiku	184	23.0
Sonnet	549	23.7
Opus 4.6	1,650	39.6
Leanstral	18	21.9
Leanstral pass@2	36	26.3
Leanstral pass@4	72	29.3
Leanstral pass@8	145	31.0
Leanstral pass@16	290	31.9

At pass@2, Leanstral reaches 26.3, beating Sonnet by 2.6 points while costing only $36 compared to Sonnet's $549. Even at pass@16, Leanstral reaches 31.9, comfortably beating Sonnet by 8 points. While Claude Opus 4.6 remains the quality leader, it costs $1,650—92 times more than running Leanstral.

Real-World Case Studies

Stack Exchange Migration Assistance

When breaking changes hit new Lean releases, migrating code can become a massive headache. Mistral tested Leanstral with a real-world question from the Proof Assistants Stack Exchange about a script that stopped compiling in Lean 4.29.0-rc6. The issue involved a rewrite tactic failing to match patterns with a simple type alias initially written as def T2 := List Bool.

Rather than guessing, Leanstral built test code to recreate the failing environment and diagnosed the underlying issue with definitional equality. The model correctly identified that def creates a rigid definition requiring explicit unfolding, which blocked the rewrite tactic from seeing the underlying structure needed for pattern matching. The proposed solution—swapping def for abbrev—was simple and effective, as abbrev creates a transparent alias that is immediately definitionally equal to the original type.

Rocq to Lean Translation

Leanstral successfully translated definitions from Rocq (the Coq implementation) into Lean, even implementing custom notation. The model handled complex inductive definitions for command evaluation, including constructors for skip, assignment, sequence, conditional, and while loop operations. Beyond mere translation, Leanstral could prove properties about the translated programs when given only the Rocq statement without proof.

Availability and Future Directions

Leanstral is available immediately through multiple channels:

Mistral Vibe: Integrated directly for zero-setup vibe coding and proving using the /leanstall command
Labs API: Free/near-free access via the labs-leanstral-2603 endpoint
Self-Hosting: Apache 2.0 licensed weights available for download and local deployment

Mistral is keeping the API endpoint highly accessible for a limited period to gather realistic feedback and observability data to fuel the next generation of verified code models. This approach suggests an ongoing commitment to improving formal verification capabilities based on real-world usage patterns.

Implications for the AI Development Ecosystem

Leanstral represents a significant step toward trustworthy AI-assisted development in high-stakes domains. By combining formal verification capabilities with efficient inference and open accessibility, Mistral is addressing a critical gap in the AI development toolchain. The model's performance against both open-source and closed-source competitors suggests that specialized, purpose-built AI agents can outperform general-purpose models even at dramatically smaller parameter counts.

The emphasis on formal verification and provable correctness positions Leanstral as a tool for domains where traditional software testing is insufficient—such as cryptographic implementations, safety-critical systems, and mathematical software. As AI-generated code becomes increasingly prevalent, tools like Leanstral may become essential for maintaining software reliability and security in an era of rapid, automated development.

#AI #Formal Verification #Lean 4 #Open Source #Mistral AI