Lunacy is a side‑project Lua 5.1 interpreter written in Rust that combines an interpreter with a JIT built on Lazy Basic Block Versioning (LBBV). By specializing bytecode based on observed runtime types, it achieves modest speedups over PUC Lua while keeping the implementation simple enough for a single engineer.
Lunacy – A Lua 5.1 Interpreter Built on Lazy Basic Block Versioning
“The virtue of laziness” – that’s the motto behind Lunacy, a Lua 5.1 interpreter I have been developing in Rust. Unlike most hobby‑level language runtimes, Lunacy does not merely interpret bytecode; it applies Lazy Basic Block Versioning (LBBV) to specialize operations on the fly and then JIT‑compiles hot blocks. The result is a system that stays faithful to the spirit of a small side‑project while still delivering measurable performance gains.
The Core Idea: Specializing Bytecode with LBBV
LBBV originated in Maxime Chevalier‑Boisvert’s PhD work on the JavaScript JIT Higgs. The technique is deliberately simple: each basic block carries a specialization context that records concrete types for the values it has observed. When the interpreter reaches an instruction such as ADD, it first checks whether the operand types are already known. If they are, the operation is compiled into a type‑specific version (e.g., integer‑addition). If a type is unknown, the interpreter suspends execution, creates a thunk (a closure that will resume later), and inserts a runtime guard. The guard validates the type on the next execution; a failure spawns another thunk that specializes for the newly observed type.
The magic is that after a few iterations the loop header reaches a fixpoint: the context stabilizes, the block is fully specialized, and the guard disappears. Consequently, hot loops run with virtually no type checks, and the generated code mirrors what a hand‑written, statically‑typed implementation would look like.
{{IMAGE:4}}
Interpreter First, JIT Second
Higgs compiles every bytecode instruction directly to assembly. Lunacy takes a different path: it interprets first and JITs later. Each Lua opcode is implemented as a Rust coroutine that yields effects such as:
guard(stack_slot X is type Y)add operands A and B
When a guard cannot be decided statically, the coroutine is paused inside a thunk. When the guard succeeds, the coroutine resumes, possibly yielding more effects. The yielded effects constitute a tiny residual language consisting of guards, closure calls, and thunk executions. Because the residual language is minimal, the JIT can be written as a small match that emits a static call to the closure’s function pointer and a few native branches for the guards.

Why closures?
Instead of writing a separate assembly routine for every opcode‑type combination, Lunacy generates a Rust closure that captures the concrete stack slots and constant indices required for the operation. At JIT time the closure address becomes a static call target, eliminating dynamic dispatch and keeping the generated code compact. The interpreter and the JIT share the same residual operations, which makes bail‑out trivial: the program counter of the JIT code is identical to the interpreter’s PC, so returning to the interpreter is just a matter of jumping back to the same bytecode offset.
Table Specialization without Shapes
JavaScript engines such as V8 use hidden classes (shapes) to specialize object property accesses. Implementing full shape tracking for Lua tables would be heavyweight: every environment is a table, and shape changes would invalidate many compiled blocks. Lunacy therefore adopts a hash‑slot specialization inspired by LuaJIT:
- The specialization context records the hash witness for each accessed key – the concrete index inside the table’s
IndexMapand the type of the stored value. - An epoch counter on each table detects when a new key is inserted, forcing a guard (
if table.epoch != witness.epoch bail). - Because
IndexMappreserves insertion order, the index never changes due to resizing, simplifying the guard logic.
The following IR snippet shows a block that reads t.a after the witness has been initialized:

Performance Snapshot
| Benchmark | PUC Lua 5.1 | Lunacy (interpreter) | Lunacy (JIT) |
|---|---|---|---|
| nbody (10) | 1.00× | 2.75× slower | ≈ 1.0× faster |
| life (1000) | 1.00× | 2.62× slower | 2.03× faster |
The numbers tell a nuanced story. In pure‑interpreter mode Lunacy is slower, mainly because:
- Value representation – currently a Rust
enum, doubling the size of each value compared to NaN‑boxing or NuN‑boxing. - LBBV overhead – about 20 % of runtime is spent managing thunks, cloning contexts, and hashing.
- Hash‑witness lookup – each new key still triggers a full hashmap search to initialise the witness.
- Residual helper calls – some residual operations are emitted as calls to Rust helper functions, which incurs ABI shuffling.
Nevertheless, the JIT version consistently outperforms the interpreter, confirming that the specialization‑then‑compile pipeline works.
What Remains to Be Done
- NuN‑boxing – a compact value representation would halve the memory traffic and enable register‑resident values.
- Entrypoint specialization – propagating type information across function boundaries would eliminate the repeated hash‑witness initialisation.
- Metatables – currently missing, they are essential for full Lua compatibility and for many real‑world programs.
- Register allocation – the JIT presently calls closures without allocating registers for temporaries. A lightweight register‑window scheme could keep hot values in registers across calls.
- Alias analysis – refining the epoch guard to avoid unnecessary checks when two tables cannot alias would reduce overhead in tight loops.
Reflections on Simplicity and Power
LBBV’s appeal lies in its low barrier to entry: a single PhD student (or a motivated engineer) can implement a working JIT in a few thousand lines of code. Lunacy demonstrates that this promise holds for a language as dynamic as Lua. By keeping the residual language tiny and reusing the same interpreter‑level coroutine logic for both interpretation and JIT, the implementation stays approachable while still delivering tangible speedups.
The project also highlights a broader lesson for language designers: speculation need not be complex. By observing concrete types at runtime, emitting guards, and falling back to thunks when necessary, a JIT can achieve many of the benefits of more sophisticated tracing or method‑based compilers without the associated engineering cost.
Lunacy is an open‑source project. The source code, benchmarks, and the LBBV research paper are available on GitHub.

Comments
Please log in or register to join the discussion