Verifiability: The Key to Unlocking AI's Full Potential in Software Engineering
Share this article
The Leaky Faucet of Software Abstractions
Abstractions form the bedrock of modern software development, letting us build complex systems without constantly wrestling with low-level details. From programming languages hiding CPU instructions to SQL databases abstracting storage mechanics, they simplify our world. Yet every abstraction leaks, as Joel Spolsky famously observed. When these leaks force developers to understand hidden complexities, the abstraction's value diminishes.
Software engineer Alperen Keles offers a pivotal reframing in a recent analysis: Rather than focusing on what abstractions leak, we should scrutinize what they hide—and crucially, what we can verify about their behavior.
"Abstractions really only ever preserve the properties we measure," Keles notes. "SQL engines guarantee functional correctness across versions but make no promises about execution speed. If you relied on undocumented performance characteristics, you'll suffer when underlying implementations change."
This insight becomes existential in the age of AI-assisted development. Large language models (LLMs) excel at generating code from prompts, effectively creating new abstractions on demand. But prompts lack executable semantics—the same prompt can yield different outputs based on model versions, randomness, or even "the mood of your favorite CEO."
The Verifiability Bottleneck
Keles identifies a critical limitation: "Verifiability is the limit of what you can create." Whether code comes from humans or LLMs, we can only trust what we can test. Current industry efforts overwhelmingly prioritize prompt engineering—abstracting away decisions like container choices or algorithm implementations. While convenient, this approach suffers from inherent unverifiability, creating systems we can't fully trust.
His proposed alternative? Shift from prompt-centric development to program-based synthesis with machine-driven verification:
# Traditional AI workflow
prompt = "Write Python code to sort this list efficiently"
code = llm.generate(prompt) # Unverifiable abstraction
# Verifiable alternative
reference_code = "def sort_list(l): ..." # Human-written
candidate_code = llm.translate(reference_code, "Rust")
assert differential_test(reference_code, candidate_code) # Validated equivalence
The Next Frontier: Breaking Abstractions with AI
This verifiable approach unlocks transformative use cases:
- Cross-ecosystem translation: Automatically convert libraries between languages using counterexample-guided synthesis, with differential testing ensuring behavioral equivalence.
- Observability-driven optimization: Continuously optimize programs for memory, parallelism, or security by instrumenting runtime behavior and validating improvements.
- Low-level abstraction breaking: Swap compiler-generated assembly with verified LLM-optimized alternatives, much like a JIT compiler but with formal guarantees.
"'Machine go brrr' isn't a specification," Keles warns. Without verifiable guarantees via testing, even 1000x more capable LLMs remain untrustworthy for critical work.
The Roadblocks Ahead
While promising, this vision faces hurdles. Random testing—particularly property-based testing—must evolve to handle complex inputs and cross-language equivalence. Concurrency bugs remain notoriously hard to catch, and impurity in mainstream languages complicates deterministic testing. Keles suggests the functional programming community's work on effect systems could provide pathways forward.
As abstractions continue to both empower and constrain us, their testability becomes the ultimate measure of their value. The organizations that master verifiable AI-assisted synthesis—using LLMs not as oracles but as collaborators within rigorous validation frameworks—will redefine what’s possible in software engineering. The leakiness of abstractions isn't a flaw to lament, but a challenge to systematically conquer through observable, measurable guarantees.