Building a JSON Parser in BQN: The Magic of Flat Array Techniques
Share this article
Parsing JSON typically conjures images of recursive descent or state machines, but in array languages like BQN, a radically different approach emerges—one leveraging flat arrays, depth scans, and strategic indexing. This article walks through building a JSON parser for a practical subset (integers, strings, objects, arrays) using BQN's array-oriented paradigm, where control flow yields to mathematical transformations.
Why Flat Arrays?
Traditional parsers rely heavily on branching and nested function calls. BQN flips this: operations apply uniformly across entire arrays. The result? Code that's dense, branchless, and often faster. We achieve this through:
1. Tokenization: Converting input into a "blueprint" of tokens (0 for values, brackets as-is).
2. Depth Ordering: Using cumulative sums to track nesting depth via +``opening - closing.
3. Value Indexing: Assigning positions to literals and sublists via permutation vectors like ⍋⍋.
Tokenization: From Chaos to Structure
Start by identifying numbers, strings, and structural tokens. For numbers:
Tokenise ← {
e ← »<`'\\'=𝕩 # Handle escapes
sb ← »⊸< s← ≠`(¬e)∧'"'=𝕩 # String starts
sr ← (1-˜(s>sb)×+`sb)⊔𝕩 # Extract strings
ex ← s∨ 𝕩∊@+9‿10‿13‿32 # Exclude whitespace/strings
nb ← »⊸< n←(¬ex)∧𝕩∊'0'+↕10 # Number starts
nr ← •ParseFloat¨(1-˜n×+`nb)⊔𝕩 # Parse numbers
ts ← sb∨(¬ex)∧nb∨𝕩∊"[]{},:" # All tokens
⟨ts/ '0'¨⌾(nb⊸/) 𝕩, nr, sr⟩ # Return tokens & values
}
Key insight: Group vectors (⊔) and under (⌾) elegantly isolate values while preserving structure. For strings, backslash escaping is resolved via a rolling mask (»<'\'=𝕩`).
Parsing: Depth Scans and Value Reconstruction
The parser navigates nested structures using depth permutations and value indices:
1. Depth Ordering: d ← ⍋+``(ts∊"{[")-(ts∊"]}") creates a breadth-first traversal order.
2. Sublist Identification: Nesting levels are tracked via n ← (⍋d)⊏+``(d⊏ts)∊"[{".
3. Object Handling: Keys are separated from values using colon positions and object-depth masks.
4. Value Assembly: Literals and sublists are stitched together bottom-up using fixed indices:
Parse ← {
# ... (tokenize, compute depth, nesting)
vs ← nums ∾ ⊑ks # Initial values (numbers + non-key strings)
vi ← ⍋⍋ (lit_mask) + values # Assign indices
vi ⊏↩ (↕≠vs)∾(≠vs)+ ≠⊸- c/»n # Adjust for nesting
# Build nested structures recursively
{vs ∾↩ <𝕨Sel𝕩⊏vs}¨○⌽ grouped_indices
¯1⊑vs # Final parsed structure
}
Visualizing depth ordering transforms ][1][ into a traversal path.
Why This Approach Shines
- No Explicit Branching: Logic encoded in array operations (e.g.,
+``for depth). - Data Parallelism: Operations apply holistically, leveraging SIMD-like optimizations.
- Debuggability: Flat arrays make intermediate states inspectable.
Tradeoffs and Joys
As the author notes:
"It’s hard to write, hard to read, hard to modify, but an absolute joy to debug."
While not suited for all parsing tasks, this method offers a refreshing perspective: complexity emerges from composition, not control flow. The complete parser fits in <30 lines—a testament to BQN’s expressive power.
Source: Writing a JSON Parser in BQN by Tony Zorman.