A developer documents their experience creating a Forth-like virtual machine in SPARK/Ada, exploring the challenges of formal verification for a stack-based language and the incremental process of proving program correctness.
The journey toward a verified Forth begins not with moon landings, but with postfix calculators. In this second installment of a project to create a formally verified Forth, the author builds a Forth-like virtual machine in SPARK, navigating the intersection of concatenative programming paradigms and formal verification constraints. The goal is a seedForth-like implementation that could serve as a verified compiler backend, as discussed in the Ada community.
Forth itself is a fascinating study in minimalism. Its power derives from a syntax consisting almost entirely of "words"—sequences of non-whitespace characters that operate on multiple stacks: a parameter stack for data, a return stack for addresses, a terminal input buffer, and a dictionary. Everything from comments to arithmetic to defining new words is accomplished through these words. The language's directness has made it popular for bootstrapping systems, with many implementations emitting assembly directly. To avoid complications with assembly and the SPARK prover, this project implements a Forth-like virtual machine instead.
The implementation makes strategic compromises. Some functions typically implemented in Forth itself, like the text interpreter, are written in SPARK to enable verification. The tokenizer's termination is provable, and arithmetic operations are verified for overflow and underflow. Stack operations like ROT are proven correct: given 1 2 3 on the stack, the operation is guaranteed to produce 2 3 1, unless insufficient elements trigger an underflow error.
The verification approach uses SPARK's contract system extensively. Each operation has preconditions (like Is_Running) and postconditions describing the guaranteed state changes. For Op_Rotate, contract cases specify different guarantees based on stack size. If fewer than three elements exist, an underflow error is signaled. Otherwise, the rotation is proven correct, with only the top three stack positions changed. The verification relies on helper functions like Param_Stack_Equal_From_Bottom_Until, which uses Ada's quantified expressions to check stack equality up to a certain depth.
A crucial insight is that SPARK's prover only understands what it's explicitly told through preconditions and postconditions. Anything not listed in these contracts or guaranteed by type invariants remains opaque. This means the solution to many proof failures is simply adding more postconditions. The author learned this through experience, sometimes discovering that a seemingly correct implementation was limited in usage because constraints weren't properly specified.
The incremental nature of verification presents both opportunities and challenges. Adding new state to the virtual machine can cause previously proven subprograms to fail verification because they don't account for the new state. This requires careful management of proofs as the system evolves. The author adopted a pragmatic approach: establish basic guardrails first, then gradually add constraints, avoiding the trap of getting bogged down in proofs before achieving functional progress.
Custom word definitions—"colon definitions" in Forth parlance—were critical for making this a true language implementation rather than a hardcoded calculator. Without pointers in SPARK, the solution uses arrays for word headers and names. Each word header contains name indices, execution code, and an immediate flag. The type system plays a vital role here: distinct integer types for array indexes, with subtypes introducing bounds that SPARK uses in proofs. Any type coercion requires explicit bounds checking, ensuring array accesses are always valid.
The Allocate_Word procedure demonstrates this verification approach. Its preconditions check whether allocation is possible, and its postconditions verify that the word table was updated correctly: the name starts at the correct position, ends appropriately, and the usage counts increment properly. This level of detail is necessary because SPARK's prover won't make assumptions about array updates.
Execution presents an interesting challenge. The author initially planned two execution boundaries: built-ins (native Ada procedures) and compiled functions (address interpreter). However, SPARK's restrictions on side effects with access to subprograms (function pointers) forced a third boundary. The solution uses a discriminant (tagged union) for word code, with three forms: intrinsic operations, procedure access, and instruction sequences. This allows executing built-ins, calling native procedures, or running compiled user-defined words through an address interpreter.
Instructions are stored in a bulk array similar to names, with an instruction pointer and count. SPARK proves that invalid instructions aren't executed and that the instruction pointer stays within bounds. For control flow, the IF and THEN words demonstrate conditional jumps. IF writes a placeholder in the instruction array and pushes its location onto the stack. THEN later fills in the jump distance, with verification ensuring the jump stays within valid instruction addresses.
The verification here is partial but meaningful. It proves the jump stays within the instruction array but doesn't verify that the jump doesn't leave the current word's instructions. The author notes this could be addressed by tracking the first instruction of each word but ran out of time during holiday development.
Significant gaps remain. The implementation lacks variables, constants, and the DOES> word, which is crucial for creating defining words. Floating-point support is also missing. The current VM is interesting but not practically useful. The next phase would involve restructuring to support multiple backends—x86 or ARM assembly, ELF or PE file generation—transforming the VM from an end product into a compiler backend.
The journey reveals how SPARK verification has evolved. Early experiences felt like "etching every line of code with a chisel on stone," but improvements in tooling, particularly integration with Alire (Ada's package manager), have made it more accessible. The author notes that SPARK itself has improved, though their own understanding of writing verifiable code has also matured.
For those unfamiliar with Forth, the author recommends trying it. The language's minimalism offers surprising power, and understanding it provides valuable insights into programming language design. The Anteforth project code is available for those interested in exploring the implementation details or building upon this work.
This project represents a thoughtful intersection of two seemingly disparate worlds: the minimalist, stack-based elegance of Forth and the rigorous, mathematical certainty of formal verification. Each step reveals new challenges and insights, from managing proof complexity to navigating SPARK's restrictions on side effects. The result is not just a verified Forth-like, but a deeper understanding of what it means to build provably correct systems.
The incremental verification approach—starting with guardrails and gradually adding constraints—mirrors the iterative nature of software development itself. It acknowledges that perfect specification upfront is often impractical, and that verification can be a tool for discovery rather than just validation. This pragmatic philosophy makes formal verification more accessible, especially for developers new to the paradigm.
As the project evolves toward a seedForth-like implementation, the lessons learned here will inform future work. The balance between verification completeness and development velocity, the management of proof dependencies, and the architectural decisions around backend abstraction all provide valuable data points for anyone considering formal verification in their own projects.
The Anteforth repository serves as both a practical implementation and a learning resource. It demonstrates that verified programming needn't be an all-or-nothing proposition—SPARK and Ada can be mixed, and verification can be applied incrementally to stable sections of code. For critical systems, this suggests a hybrid approach: specify more deeply upfront, then verify incrementally as the system matures.
Ultimately, this work is a testament to the idea that verification isn't just about catching bugs, but about thinking more clearly about program behavior. The process of writing preconditions and postconditions forces a deeper understanding of what each operation should and shouldn't do. For a language as minimal as Forth, where words do everything, this clarity is particularly valuable.
The path forward involves expanding the language features, supporting more operations, and eventually targeting actual hardware. Each step will bring new verification challenges, but the foundation built here—understanding SPARK's contract system, managing proof complexity, and architecting for verifiability—provides a solid starting point. The journey from postfix calculator to verified Forth-like continues, with each iteration bringing the goal of a truly useful, verified Forth closer to reality.

Comments
Please log in or register to join the discussion