Article illustration 1

For years, cuckoo hashing remained an academic curiosity—admired for its theoretical elegance but dismissed in production systems for perceived inefficiencies. Industry-standard designs from Google (Swiss Tables) and Meta (F14) favored SIMD-accelerated quadratic probing, citing cuckoo's inferior memory performance. Yet new research reveals this dismissal may have been premature: with meticulous optimization, cuckoo hashing not only competes but often outperforms these giants, especially under high load.

The Cuckoo Advantage Reborn

Traditional quadratic probing searches sequentially expanding positions—first 16 consecutive keys via SIMD, then stepping by 1, 2, 3, etc.—until finding a match or empty slot. Cuckoo hashing takes a radical approach:

// Cuckoo probing sequence
probe_position1 = hash1(key) 
probe_position2 = hash2(key)
Check both positions via SIMD; done.

This fixed two-position search minimizes branch mispredictions and instruction counts. But how does it handle collisions? Through "kicking": if both positions are full during insertion, evict an existing key to one of its alternate positions, recursively relocating keys until space is found. Surprisingly, this chain reaction typically resolves in 0–1 moves.

Performance Breakthroughs

Benchmarks comparing optimized cuckoo against Swiss Tables reveal striking trends:
- In-cache workloads: Cuckoo wins by eliminating unpredictable branches. Branchless implementations using conditional moves (e.g., Rust's std::hint::select_unpredictable) cut 15–35% latency at 75–87.5% load factors.
- Out-of-cache workloads: Large buckets mitigate cuckoo's cache locality disadvantages. At high loads, its shorter probe lengths dominate—quadratic probing checks 5–10 positions, while cuckoo checks exactly two.
- Successful lookups: With Direct SIMD layout (keys/values packed in cache lines), cuckoo reduces probe lengths dramatically, winning across nearly all loads.

Layout Wars: Direct vs. Indirect SIMD

The choice of memory layout proves critical:
- Indirect SIMD (Swiss Tables-style): Separates 1-byte tags from payloads. Efficient for failed lookups but requires two cache misses (tags + payloads) per successful query.
- Direct SIMD: Packs keys/values contiguously. Enables single-cache-miss lookups but fits fewer entries per line. Here, cuckoo shines by minimizing probes.

"Cuckoo’s fixed probe length neutralizes Direct SIMD’s weakness: fewer entries per cache line. You get the one-fetch benefit without quadratic probing’s long searches," explains the study.

Practical Perks Beyond Speed

Cuckoo brings unexpected advantages:
1. Arbitrary table sizes: Avoids power-of-two constraints, saving up to 2× memory.
2. Tombstone-free deletions: Quadratic probing leaves "ghost" entries that slow future queries; cuckoo simply removes keys.
3. Wire-serialization safety: Bounded two-probe max prevents denial-of-service attacks from maliciously crafted tables—ideal for formats like Cap'n Proto.
4. Efficient resizing: Doubling table size becomes a branchless, cache-friendly operation: redistribute keys between buckets B and B+N using a single realloc.

Engineering the Edge

Key optimizations made cuckoo competitive:
- Unaligned probing: Letting SIMD searches start at arbitrary offsets (not bucket boundaries) reduced probe lengths by 15–40%.
- Fast hash functions: Deriving hash2 from hash1 via single-cycle operations (e.g., hash1 = hash0.rotate_left(32)).
- Breadth-first insertion: Contrary to textbooks, BFS outperformed "random walk" eviction by enabling parallel bucket fetches, cutting insertion latency by 30%.

The Verdict

While quadratic probing excels for low-load, cache-resident tables, cuckoo hashing dominates at realistic (50–87.5%) loads and out-of-cache scenarios. Its combination of bounded probes, memory flexibility, and serialization safety makes it compelling for databases, network services, and high-performance computing. As memory grows cheaper but latencies stagnate, minimizing probes—not just cache misses—becomes paramount. Cuckoo hashing, once a theoretical novelty, now demands a serious reevaluation by engineers.


Source: Cuckoo Hashing: Revival Through SIMD by Reiner Pope