A thoughtful exploration of how to use LLMs in software development while maintaining the determinism that critical systems require, drawing parallels with mathematical proof verification and offering practical strategies for combining AI assistance with traditional code-checking tools.
Deterministic Programming with LLMs: Building Reliable Systems in an Uncertain World
The Changing Landscape of Software Development
If you're reading this, you're likely aware that our industry is undergoing a dramatic transformation. Large Language Models (LLMs) capable of writing code have emerged as powerful tools, and the debate around their optimal use has intensified. While much has been written about the ethics of LLM coding, the best approaches for using them, and how to effectively employ AI agents, I want to explore a specific aspect: how LLMs can be used in a deterministic way.
This isn't to suggest this is the only way to use LLMs, but rather to examine it as one valuable tool in our arsenal. To understand this better, let's look at how another field has grappled with similar challenges.
Mathematical Proof and the LLM Challenge
Before diving into software development, I want to examine what's happening in mathematics, where LLMs have also made significant inroads. Mathematical proofs are an area where LLMs have shown surprising capability. In September 2024, Terence Tao, a Fields Medal-winning mathematician, described supervising an LLM as "trying to advise a mediocre, but not completely incompetent, (static simulation of a) graduate student."
This is remarkable because very few humans can operate at this level in mathematics, and LLMs have only improved since then. However, LLMs fundamentally produce outputs that resemble their training data, making them susceptible to hallucinations. In mathematics, this is particularly dangerous because proofs often depend on subtle differences, and plausible-sounding arguments can be misleading.
Mathematicians have turned to a solution that offers valuable lessons: Lean and other proof systems. These tools create rigorous, step-by-step proofs based on axioms and logical inferences. While professional mathematicians rarely use them due to the difficulty of writing proofs in these systems, they provide the determinism that human-written proofs sometimes lack.
In January 2026, a team successfully used this hybrid approach to solve a previously unsolved problem. They used ChatGPT to create an outline, Aristotle to patch logical flaws and express the proof in Lean for verification, and then ChatGPT again to translate the Lean proof into published format. This demonstrates a powerful pattern: using LLMs for their strengths while relying on deterministic tools for verification.
The Determinism Challenge in Software Development
Now let's return to software development, where tools like Claude Code and Gemini Code Assist can "vibecode" moderately complex applications with minimal supervision. These agents operate at roughly the level of a mediocre junior developer, and their existence is rapidly changing our industry.
The key issue is that LLMs, unlike traditional computer programs, are not deterministic. They operate by calculating the likelihood of possible next words based on their training data and then randomly selecting one in proportion to its likelihood. This means they produce slightly different results each time they're used, even with the same input.
This stochastic nature becomes problematic when we need consistent, reliable behavior. Consider automated deployment scripts versus manual deployments. While writing a deployment script takes longer than a single manual deployment, scripts are more reliable because they produce the same results every time. Humans and LLMs, by contrast, are prone to occasional errors.
When Determinism Matters
Not all programming tasks require determinism. One-off tasks like data migration, importing spreadsheet data, or generating presentation charts only need to be done once, so occasional errors aren't catastrophic. Similarly, the process of writing code that will be used many times doesn't need to be deterministic—only the final product does.
The real challenge arises with tasks that need to be performed consistently across an entire codebase. Consider injection attack prevention: before using user-supplied strings in SQL queries, HTML pages, or command-line arguments, they must be properly escaped. This isn't a one-time task but something that needs to be done every single time.
We know from decades of experience that humans, even experienced developers, aren't reliable enough to get this right 100% of the time. Unfortunately, LLMs share this limitation. While we can provide sample code, documentation, or specialized skills, the stochastic nature of LLMs means we can never achieve the deterministic confidence that all strings will be properly sanitized.
This applies to many global practices: following naming conventions, ensuring every log message includes a stack trace, closing every file in a finally block, and countless other standards we want to enforce consistently.
The Solution: Code-Checking Code
The software industry has developed techniques over decades for enforcing policies when universal compliance is needed. Since LLMs share the same limitation as humans regarding determinism, we can use the same solutions.
Programs are extremely deterministic, making them ideal for enforcing consistent behavior. Several approaches can achieve this:
Type System Enforcement: Create distinct types like "UserString" and "SanitizedString" to let the compiler enforce that user strings must be sanitized before use.
Linting Rules: Write custom lint rules to enforce naming conventions or prefer specific logging frameworks over deprecated ones.
Automated Testing: Create tests that scan code to ensure only approved libraries are used or that specific patterns are followed.
Because linters, tests, and compiler-enforced policies run every time code is built, there's no risk of an LLM or human programmer accidentally missing a case.
The LLM's Role in Building Deterministic Tools
Creating code aids like linters and tests requires extra work, which might seem to defeat the purpose of using LLMs for productivity. However, this is where LLMs shine: they're excellent at creating exactly this kind of tool.
When consistency is important, instead of asking your LLM to follow rules each time, ask it to build a program that enforces those rules and incorporate it into your build chain. This approach applies whenever there's a policy that needs consistent application throughout the codebase.
For one-time tasks—whether creating a login screen or writing a lint to enforce some policy—asking the LLM to write it is reasonable. The level of human scrutiny needed afterward remains a topic of debate, though reviewing every line before committing is a prudent approach.
A Balanced Approach to AI-Assisted Development
The key insight is that LLMs are powerful tools for creating code, but they shouldn't be relied upon for tasks requiring absolute consistency. By combining LLM capabilities with traditional code-checking mechanisms, we can harness the productivity benefits of AI while maintaining the reliability our systems demand.
This balanced approach recognizes that different parts of the software development process have different requirements. Creative tasks, one-off scripts, and exploratory coding can benefit greatly from LLM assistance. But when it comes to enforcing standards, preventing security vulnerabilities, and ensuring consistent behavior across large codebases, deterministic tools remain essential.
As our industry continues to evolve with these powerful new tools, finding the right balance between AI assistance and traditional software engineering practices will be crucial. The mathematicians' approach—using LLMs for their strengths while relying on deterministic verification tools—offers a compelling model for how we might navigate this new landscape.
By understanding when determinism matters and building appropriate safeguards, we can use LLMs to dramatically improve our productivity without sacrificing the reliability and security that modern software systems require.
Comments
Please log in or register to join the discussion