The Evolution of one-more-re-nightmare: From Academic Curiosity to Blazing-Fast Regex

Regular expression engines are foundational to modern text processing, yet many struggle with performance bottlenecks. Enter one-more-re-nightmare—a Common Lisp-based engine that achieves staggering speeds of up to 18 gigacharacters per second (GC/s). Its journey from a university student's automata study project to a SIMD-accelerated powerhouse reveals innovative approaches to regex compilation that defy conventional wisdom.

The Spark: Moving Beyond cl-ppcre's Limitations

The project began as a response to sluggish automata theory classes and inspiration from cl-ppcre, a Common Lisp regex library. While books like Let over Lambda touted cl-ppcre's speed—attributing it to Lisp's ability to compile regexes into native code—the reality was less glamorous:

"With CL-PPCRE, the technical reason for the performance boost is simple: Common Lisp is a more powerful language than C... But in Common Lisp, it is essentially no more difficult to... pass that lisp program to the optimising, native-code lisp compiler."

In truth, cl-ppcre used closure chains, not true compilation. This gap motivated one-more-re-nightmare's core mission: actually generate optimized native code from regex patterns, starting with derivatives—a concept borrowed from Gilbert Baumann. Derivatives simplify regex processing by defining the "remainder" of an expression after consuming a character. For instance, the derivative of ((ab)) with respect to (a) is (b(ab)), enabling efficient deterministic finite automaton (DFA) construction.

Old Compiler: Foundations in Derivatives and DFAs

The initial compiler (mid-2020) translated regexes to DFAs via derivatives, then emitted Lisp prog forms for state management. Performance was respectable—~550 megacharacters/second—but hampered by branch-heavy code and limited submatching. A hand-coded recognizer for ab|ac was only marginally faster, exposing scalability issues:

; Derivative function snippet from the old compiler
(defun derivative (regex char)
  (match regex
    ((or (empty) (fail)) fail)
    ((char c) (if (char= c char) empty fail))
    ...))

Key hurdles included ambiguous Kleene closure handling and the (\mathcal{O}(2^n)) DFA state explosion, restricting practical use.

New Compiler: Mealy Machines and Submatching Breakthroughs

A 2021 overhaul introduced Mealy machines, which output tag assignments during state transitions—crucial for submatching. Unlike DFAs, Mealy machines track registers (e.g., for capture groups), resolving ambiguity via register replication. For example, matching (\ll ab\gg) involves:

  • Assigning start/end positions to registers.
  • Handling derivatives like (\delta_a(a\ll A \leftarrow position\gg bc + ab\ll A \leftarrow position\gg d)) through replication.

This eliminated prior bugs and boosted speeds to 1.25 GC/s, partly by optimizing register allocation to minimize spills in Lisp compilers like SBCL.

SIMD and Scanning: The Speed Multiplier

Performance soared with SIMD-accelerated substring search. By identifying constant prefixes (e.g., treating ab|ac as a[bc]), the engine uses Boyer-Moore-Horspool with SIMD instructions. Character sets like [a-ce] are represented as unions of ranges (e.g., ([97, 100) \cup [101, 102))), optimized for SSE2/AVX2:

  • Signed comparison challenges for bytes are resolved via offset mapping ((x - 128)).
  • Benchmarks show 5.5 GC/s for ab|ac on Unicode and 18 GC/s on ASCII strings—outpacing scalar code by 4.6×.

Despite this, expressions like [0-9]+x[0-9]+ lagged, highlighting opportunities for Hyperscan-inspired "trigram" optimizations.

Implications and Future Frontiers

At just 1,848 lines of Lisp, one-more-re-nightmare leverages the language's compiler for native codegen, proving small codebases can achieve high performance. Limitations remain—no backreferences or lookahead—but its innovations challenge the notion that regex optimization is "solved":

  • Hybrid DFA constructions could reduce (\mathcal{O}(n^2)) backtracking.
  • Better constant-string detection (e.g., middle trigrams) might close performance gaps.

As the creator notes: "The game is still on; and let the best scanner win!" This engine exemplifies how revisiting foundational concepts with modern optimizations can yield breakthroughs, even in mature fields.

Source: applied-langua.ge