CherryScript Post Reopens the AST vs Bytecode Interpreter Debate

A Stack Overflow post about CherryScript turns a product-flavored language pitch into a useful reminder: interpreter architecture choices matter, but benchmarks matter more.

What happened

A Stack Overflow post introduced CherryScript, a proposed custom programming language aimed at high-volume, data-driven workflows. The author describes building the interpreter in Python 3 and weighing two familiar implementation strategies: walking an abstract syntax tree, or compiling the parsed program into bytecode for execution in a small virtual machine.

The answer argues for a streaming lexer, bytecode compilation, immutable pipeline state, and scoped symbol tables. In broad strokes, that advice tracks with how many production language runtimes and workflow engines are designed. AST walking is often the simplest interpreter architecture to build first, but it can become expensive when the same tree is traversed again and again inside loops or data pipelines. Bytecode lowers the program into a flatter instruction format, which can reduce dispatch overhead and make execution easier to optimize.

For readers building their own language, the useful reference points are Python’s ast module, Python’s dis module, and Bob Nystrom’s free book Crafting Interpreters. Those resources give much firmer footing than a single architectural sketch.

Why developers care

Custom languages tend to trigger strong developer reactions because they sit at the intersection of engineering taste, performance claims, and maintenance cost. A language for data workflows can make sense when the domain has stable concepts, repeated patterns, and a user base that benefits from constrained syntax. SQL, jq, awk, Jsonnet, HCL, and workflow DSLs all exist because general-purpose languages are not always the clearest tool for the job.

The hard part is proving that a new language earns its keep. CherryScript’s described goals, readable pipeline logic, deterministic execution, and hardware or device integration, are ambitious. The design claims are plausible, but the post does not provide the evidence developers usually want: benchmark methodology, workload examples, parser details, bytecode format, failure modes, memory profiles, or a public implementation.

The AST versus bytecode question is a good example. An AST-walking interpreter is easy to inspect and debug. Each node maps closely to the source language, which helps when implementing error messages, tracing, and early language experiments. The downside is that execution repeatedly bounces through many Python objects and method calls. For a loop that transforms millions of records, that overhead can dominate the actual work.

A bytecode VM changes the trade-off. The parser still produces structure, but a compiler step turns that structure into compact instructions such as LOAD_CONST, CALL_TRANSFORM, or JUMP_IF_FALSE. The VM then runs a tight instruction loop over a list or array. That can be faster, easier to cache, and simpler to analyze. It also introduces more machinery: instruction encoding, stack discipline, frame management, debugging metadata, and versioning concerns.

For a Python-hosted interpreter, there is another wrinkle. Python itself is already interpreting bytecode. A custom VM written in Python can improve over a naive AST walker, especially if it reduces object churn, but it will still pay Python-level dispatch costs. That means the biggest wins may come from pushing hot operations into existing fast runtimes: NumPy, PyArrow, Polars, DuckDB, SQLite, Rust extensions, C extensions, or vectorized native libraries. A workflow language that compiles transformations into Apache Arrow, DuckDB, or Polars execution plans may outperform one that only optimizes its own Python loop.

The streaming lexer claim is also interesting, but it needs careful framing. Lexing source code lazily can reduce memory usage for very large source files, but most workflow programs are tiny compared with the data they process. Streaming the data is usually more important than streaming the source text. Python generators, documented in the Python language reference, are a natural fit for lazy data pipelines, but generator overhead can also become visible in tight record-by-record processing. Batching often matters more than laziness alone.

State management is where the post lands on firmer ground. Immutable intermediate values and scoped environments are common patterns because they make execution easier to reason about. In data pipelines, hidden mutation creates painful bugs: one transform changes a shared object, another transform reads it later, and the output depends on ordering that was never meant to matter. Isolating pipeline state also helps if work is parallelized or replayed.

The practical version of that advice is not simply “make everything immutable.” Copies can be expensive. Many systems use immutable logical views over shared buffers, copy-on-write behavior, or append-only data structures. That is why projects such as Arrow focus on columnar memory formats that allow large data structures to be passed around without constantly copying bytes.

Community response

The likely developer reaction is mixed, and honestly pretty familiar to anyone who reads Hacker News or r/programming. The architecture sounds reasonable, but the presentation raises the usual alarms: a named company, a named language, large performance claims, and no public code or reproducible numbers.

Developers tend to be generous when someone shows a small interpreter and says, “I built this, here is what I learned.” They get more skeptical when a post frames a still-unproven implementation as a production-ready answer to broad workflow problems. That skepticism is healthy. Language design is full of ideas that are elegant in a prototype and painful once users need tooling, package management, diagnostics, editor support, deployment, and compatibility guarantees.

A constructive community response would probably ask for a minimal CherryScript example. What does a pipeline look like? How are types represented? Are transforms pure functions? Can pipelines be paused and resumed? What happens when a hardware signal fails halfway through execution? Does the VM support backpressure? Are errors reported against source spans? Can bytecode be inspected? Can users write extensions safely?

Benchmark questions would come next. A useful comparison would include at least three implementations: a plain Python generator pipeline, an AST-walking CherryScript interpreter, and the bytecode CherryScript VM. It should run across small, medium, and large datasets, with both CPU-heavy and I/O-heavy transforms. It should measure throughput, latency, peak memory, startup time, and error-reporting overhead. Without that, “bytecode is faster than AST walking” remains generally true but not specific enough to validate CherryScript.

The broader lesson is still valuable. If you are designing a domain-specific language, start with clarity first, then optimize the execution model around real workloads. An AST walker is a good first milestone because it proves the language semantics. Bytecode is a good next step when profiling shows repeated tree traversal is the bottleneck. A streaming data model is useful when the data is large or continuous. Immutable state helps keep pipelines understandable. None of these choices replaces measurement.

CherryScript may or may not become a serious tool. The post is more useful as a snapshot of a recurring developer conversation: when a workflow domain feels repetitive enough, someone will try to build a language for it. The successful ones usually win less by having a clever VM and more by offering clear semantics, great diagnostics, boring deployment, and performance claims that survive contact with real programs.

#interpreters #bytecode #AST #language design #Performance

CherryScript Post Reopens the AST vs Bytecode Interpreter Debate

What happened

Why developers care

Community response

Comments