Discover how to implement a functional JSON parser in the array language BQN using flat arrays and minimal branching. This deep dive reveals how tokenization, depth ordering, and value indexing transform raw strings into nested structures with surprising elegance, showcasing the unique joys of array-oriented parsing.
Parsing JSON typically conjures images of recursive descent or state machines, but in array languages like BQN, a radically different approach emerges—one leveraging flat arrays, depth scans, and strategic indexing. This article walks through building a JSON parser for a practical subset (integers, strings, objects, arrays) using BQN's array-oriented paradigm, where control flow yields to mathematical transformations.
Why Flat Arrays?
Traditional parsers rely heavily on branching and nested function calls. BQN flips this: operations apply uniformly across entire arrays. The result? Code that's dense, branchless, and often faster. We achieve this through:
- Tokenization: Converting input into a "blueprint" of tokens (
0for values, brackets as-is). - Depth Ordering: Using cumulative sums to track nesting depth via
+``opening - closing. - Value Indexing: Assigning positions to literals and sublists via permutation vectors like
⍋⍋.
Tokenization: From Chaos to Structure
Start by identifying numbers, strings, and structural tokens. For numbers:
Tokenise ← {
e ← »<`'\\'=𝕩 # Handle escapes
sb ← »⊸< s← ≠`(¬e)∧'"'=𝕩 # String starts
sr ← (1-˜(s>sb)×+`sb)⊔𝕩 # Extract strings
ex ← s∨ 𝕩∊@+9‿10‿13‿32 # Exclude whitespace/strings
nb ← »⊸< n←(¬ex)∧𝕩∊'0'+↕10 # Number starts
nr ← •ParseFloat¨(1-˜n×+`nb)⊔𝕩 # Parse numbers
ts ← sb∨(¬ex)∧nb∨𝕩∊"[]{},:" # All tokens
⟨ts/ '0'¨⌾(nb⊸/) 𝕩, nr, sr⟩ # Return tokens & values
}
Key insight: Group vectors (⊔) and under (⌾) elegantly isolate values while preserving structure. For strings, backslash escaping is resolved via a rolling mask (»<'\'=𝕩`).
Parsing: Depth Scans and Value Reconstruction
The parser navigates nested structures using depth permutations and value indices:
- Depth Ordering:
d ← ⍋+``(ts∊"{[")-(ts∊"]}")creates a breadth-first traversal order. - Sublist Identification: Nesting levels are tracked via
n ← (⍋d)⊏+``(d⊏ts)∊"[{". - Object Handling: Keys are separated from values using colon positions and object-depth masks.
- Value Assembly: Literals and sublists are stitched together bottom-up using fixed indices:
Parse ← {
# ... (tokenize, compute depth, nesting)
vs ← nums ∾ ⊑ks # Initial values (numbers + non-key strings)
vi ← ⍋⍋ (lit_mask) + values # Assign indices
vi ⊏↩ (↕≠vs)∾(≠vs)+ ≠⊸- c/»n # Adjust for nesting
# Build nested structures recursively
{vs ∾↩ <𝕨Sel𝕩⊏vs}¨○⌽ grouped_indices
¯1⊑vs # Final parsed structure
}
{{IMAGE:1}}
Visualizing depth ordering transforms ][1][ into a traversal path.
Why This Approach Shines
- No Explicit Branching: Logic encoded in array operations (e.g.,
+``for depth). - Data Parallelism: Operations apply holistically, leveraging SIMD-like optimizations.
- Debuggability: Flat arrays make intermediate states inspectable.
Tradeoffs and Joys
As the author notes:
"It’s hard to write, hard to read, hard to modify, but an absolute joy to debug."
While not suited for all parsing tasks, this method offers a refreshing perspective: complexity emerges from composition, not control flow. The complete parser fits in <30 lines—a testament to BQN’s expressive power.
Source: Writing a JSON Parser in BQN by Tony Zorman. {{IMAGE:2}}
Comments
Please log in or register to join the discussion