WILD: The Database That Lives Entirely in Your CPU's L3 Cache
#Hardware

WILD: The Database That Lives Entirely in Your CPU's L3 Cache

LavX Team
3 min read

WILD flips database architecture on its head by treating CPU cache as primary storage instead of RAM, achieving unprecedented 3-nanosecond read times. This experimental project demonstrates the extreme performance possible when optimizing for cache hierarchies, though it sacrifices durability and general-purpose utility. We examine the technical wizardry behind this cache-resident database and its implications for high-performance computing.

Article Image

Remember when in-memory databases like Redis felt revolutionary? Meet WILD (Within-cache Incredibly Lightweight Database), which laughs at RAM-based solutions by storing everything in CPU L3 cache. This audacious experiment achieves 3-nanosecond read latencies – faster than a photon travels from your screen to your eye – by treating modern CPUs' multi-megabyte caches as primary storage rather than transient buffers.

Why CPU Cache as Storage?

Modern CPUs contain hierarchical memory structures where speed inversely scales with capacity:

Memory Tier Capacity Range Access Latency
L1 Cache 32-64KB ~1ns
L2 Cache 256KB-1MB 3-10ns
L3 Cache 8-144MB 15-50ns
RAM 8-128GB 100-300ns
SSD/HDD 1TB+ µs-ms range

WILD exploits modern CPUs' sprawling L3 caches (up to 144MB in flagship processors) as persistent storage. As creator Canoozie notes: "Your CPU's cache is basically a tiny, incredibly fast SSD that no one bothered to format." This approach enables:

  • Sub-microsecond operations: Database reads complete before kernel scheduler interrupts fire
  • Zero-copy data access: Eliminating serialization/deserialization overhead
  • NUMA-aware placement: Minimizing cross-socket memory latency penalties

Engineering for the Cache Hierarchy

WILD's architecture reflects deep understanding of CPU internals:

Cache-Line-Optimized Records
Each record fits precisely in a 64-byte cache line, aligned to prevent false sharing:

// Zig implementation of cache-aligned record
pub const CacheLineRecord = extern struct {
    metadata: u32,  // Valid flag + length
    key: u64,       // Hash key
    data: [52]u8,   // Payload
}; // Exactly 64 bytes

NUMA and SMT Consciousness
WILD probes /sys/devices/system/cpu at startup to map cache domains and NUMA nodes. Data structures bind to specific cores, avoiding costly cross-NUMA access. It prioritizes physical cores over hyperthreads to prevent execution unit contention.

Flat Hash Table Design
A linear-probing hash table with power-of-two capacity enables bitmask indexing (3 cycles vs. 30+ for modulo). Wyhash and MurmurHash3 minimize collisions while maintaining cache locality.

The Brutal Tradeoffs

WILD's performance comes at significant cost:

  • Zero durability: Data evaporates on eviction or shutdown
  • No ACID/transactions: Crash consistency? Forget it
  • Capacity ceiling: Limited by L3 cache size (e.g., ~1.5M 64B records in 96MB)
  • x86_64/Linux only: No ARM or Windows support

Why This Matters Beyond the Novelty

While impractical for most workloads, WILD demonstrates techniques applicable to serious systems:

  1. Database internals: Buffer pools and write-ahead logs benefit from cache-aware structures
  2. Real-time systems: HFT and industrial control could adopt these patterns selectively
  3. Query optimization: Engineered for L3-resident intermediate results

Benchmarks on a Ryzen 9 7800X3D (96MB L3) show 615.5 million operations/second – orders of magnitude beyond traditional databases. As one commenter observed: "This feels like seeing a Formula 1 car: impractical for groceries, but revolutionary engineering that trickles down to everyday vehicles."

The real value lies in WILD's stark reminder: we leave enormous performance untapped by treating CPU caches as abstracted implementation details. While you shouldn't deploy this to production, its lessons could reshape how we design systems pushing latency boundaries.

Comments

Loading comments...