bijou64 – A Canonical, Faster Variable‑Length Integer Encoding
#Security

bijou64 – A Canonical, Faster Variable‑Length Integer Encoding

Tech Essays Reporter
6 min read

bijou64 is a new varint format created for the Subduction CRDT sync protocol. By guaranteeing a single encoding per integer, it eliminates the need for explicit canonical‑form checks and, surprisingly, decodes 2–10× faster than the widely used LEB128. The article explains the motivation, design, benchmarks, and when to consider adopting bijou64.

bijou64 – A Canonical, Faster Variable‑Length Integer Encoding

When working on a security‑sensitive protocol, it is a pleasant surprise to discover that a design decision intended to close a subtle bug also yields a performance win. That is exactly what happened with bijou64, a small variable‑length integer (varint) encoding invented for the Subduction CRDT sync protocol. The original goal was to make each integer representable in exactly one way, thereby preventing canonicalisation attacks on signed data. The side effect? A decoder that runs several times faster than the classic LEB128 varint.


The Problem with Conventional Varints

Binary protocols often need to compress integers that are typically small but occasionally large. Variable‑length encodings solve this by emitting only as many bytes as required. The most common format, LEB128, splits a number into 7‑bit groups and uses the high bit of each byte as a continuation flag. This works well for space efficiency, but it has a hidden flaw: canonicality is not enforced by the encoding itself.

bijou64 LEB128 layout – each byte carries a continuation bit.

Because the continuation bit can be set arbitrarily, the value zero can be encoded as 0x00, 0x80 0x00, 0x80 0x80 0x00, and so on. The same multiplicity applies to almost every other integer. Most decoders accept any of these representations, which is fine for ordinary data but disastrous for signed payloads where the exact byte sequence is part of a cryptographic signature. An attacker can present two different byte strings that decode to the same number, causing signature verification to succeed on one representation and fail on the other.

Canonicalisation attacks have been observed in ASN.1‑based systems (X.509, LDAP), PKCS#1 v1.5, JWTs, Bitcoin transactions, and many other places. The usual mitigation is to add a runtime check that rejects non‑canonical encodings. Unfortunately that check lives outside the core parsing logic, making it easy to forget, strip out, or optimise away.


bijou64’s Core Idea: Canonical by Construction

Rather than layering a validation step on top of an ambiguous format, bijou64 is built so that only a single byte sequence can represent any given integer. Two simple tricks achieve this:

  1. First‑byte double duty – The first byte serves both as a value and as a tag indicating how many additional bytes follow. Values 0–247 are encoded directly as a single byte. Values 248–255 act as tags that specify the length of the subsequent payload.
  2. Offsets – The payload bytes are not raw little‑endian chunks; they are offset by the tag. For example, the tag 0xF8 (248) is followed by a byte that is interpreted as value = 248 + payload. This guarantees that the single‑byte representation of a number cannot be reproduced with a longer form.

bijou64 Byte/tag structure – the first byte tells the decoder exactly how many bytes to read.

Because the decoder knows the total length after reading the first byte, allocation and bounds checking are O(1). In contrast, LEB128 must scan continuation bits until it finds a byte with the high bit cleared, an O(n) operation that hurts branch prediction.

Worked Example

To decode the integer 1738 using bijou64, the first byte is 0xFA (a tag indicating a three‑byte total). The next two bytes are 0x06 and 0xD2. The decoder adds the tag‑specific offset (0x1F8 for three‑byte values) to the big‑endian payload 0x06D2, yielding 1738.

bijou64 Worked example of decoding 1738 (tag + two data bytes).

The offset table follows a predictable pattern:

Total length (bytes) Offset
1 0x00
2 0xF8
3 0x01F8
4 0x0101F8
9 (max) 0x010101010101F8

The final tier (nine bytes) can theoretically represent values beyond 2⁶⁴, but the decoder clamps them to the u64 range, preserving the one‑to‑one mapping.


Benchmarks: Speed Surprises

The authors measured decode and encode performance on three CPUs: Apple M2 Pro (ARM), AMD Zen 5, and a Zen 3 reference. The test harness processed batches of 4096 integers across several value distributions (uniform, small‑value heavy, etc.).

Decoding

Platform bijou64 median (µs) LEB128 median (µs)
M2 Pro ~3 µs (≈0.75 ns/value) ~30 µs (≈7.3 ns/value)
Zen 5 similar advantage

For small numbers that fit in a single byte, bijou64 is about faster; for large numbers that require many continuation bytes in LEB128, the speedup climbs to 8–10×. The cumulative distribution functions (CDFs) show bijou64’s timings clustered tightly around the median, while LEB128’s curve spreads out, reflecting the variability introduced by continuation‑bit scanning.

Encoding

Encoding is also competitive. In the “small” distribution (values 248 – 65 535) LEB128 is roughly 1.24× faster, but for the majority of workloads bijou64 matches or exceeds LEB128 because the encoder can decide the length from the first byte without looping.

Size

In terms of wire size, bijou64 is within a few percent of LEB128 for realistic data sets. The two formats differ only at the tier boundaries; most numbers occupy the same number of bytes in both encodings.


Why the Speed Difference?

  1. Predictable branches – The decoder’s control flow is fixed after the first byte, allowing modern branch predictors to lock in. LEB128’s loop over continuation bits creates a pattern that mis‑predicts for large values.
  2. Big‑endian payload – The payload is a contiguous big‑endian integer, which CPUs can load and byte‑swap in a single instruction. LEB128 forces a mask‑and‑shift on each 7‑bit chunk.
  3. Constant‑time arithmetic – Offsets are pre‑computed constants; adding them is a single ALU operation, whereas LEB128 must assemble the value incrementally.

When Should You Adopt bijou64?

bijou64 is not a drop‑in replacement for LEB128 in every project. LEB128 has been battle‑tested for decades and enjoys native support in many language runtimes. bijou64, by contrast, is a new library (available on crates.io) with a modest amount of real‑world exposure.

Consider bijou64 if:

  • Your protocol signs the raw byte representation of integers and you cannot afford a separate canonicalisation check.
  • You need deterministic, single‑representation encodings for content‑addressed storage or deduplication.
  • Performance matters and you are willing to adopt a newer dependency that has been benchmarked on a limited set of CPUs.

If you are building a format that already relies on LEB128 and canonical checks are already in place, the migration cost may outweigh the speed gain. The authors stress that LEB128 is still perfectly suitable for most applications; bijou64 is an alternative that shines when canonicality and raw speed are both priorities.


Availability and Future Work

The reference implementation is written in Rust and released under a dual MIT/Apache‑2.0 license. The specification is CC BY‑SA 4.0 and includes a WebAssembly/JavaScript wrapper for cross‑platform use. The authors hint at future extensions such as bijou32 and bijou128, which would follow the same design principles for different integer widths.

Community feedback is welcomed: any discovered bugs, edge‑case workloads where bijou64 underperforms, or ports to other languages will help mature the format.


Closing Thoughts

bijou64 demonstrates that security‑driven constraints can lead to performance improvements, not just extra checks. By embedding canonicality directly into the encoding, the format eliminates an entire class of bugs while delivering a decoder that is both faster and more predictable. Whether it becomes a mainstream alternative to LEB128 will depend on adoption, testing, and the willingness of developers to embrace a newer, formally single‑representation varint.

Featured image Featured illustration for the bijou64 article.

Comments

Loading comments...