WILD flips database architecture on its head by treating CPU cache as primary storage instead of RAM, achieving unprecedented 3-nanosecond read times. This experimental project demonstrates the extreme performance possible when optimizing for cache hierarchies, though it sacrifices durability and general-purpose utility. We examine the technical wizardry behind this cache-resident database and its implications for high-performance computing.

Remember when in-memory databases like Redis felt revolutionary? Meet WILD (Within-cache Incredibly Lightweight Database), which laughs at RAM-based solutions by storing everything in CPU L3 cache. This audacious experiment achieves 3-nanosecond read latencies – faster than a photon travels from your screen to your eye – by treating modern CPUs' multi-megabyte caches as primary storage rather than transient buffers.
Why CPU Cache as Storage?
Modern CPUs contain hierarchical memory structures where speed inversely scales with capacity:
| Memory Tier | Capacity Range | Access Latency |
|---|---|---|
| L1 Cache | 32-64KB | ~1ns |
| L2 Cache | 256KB-1MB | 3-10ns |
| L3 Cache | 8-144MB | 15-50ns |
| RAM | 8-128GB | 100-300ns |
| SSD/HDD | 1TB+ | µs-ms range |
WILD exploits modern CPUs' sprawling L3 caches (up to 144MB in flagship processors) as persistent storage. As creator Canoozie notes: "Your CPU's cache is basically a tiny, incredibly fast SSD that no one bothered to format." This approach enables:
- Sub-microsecond operations: Database reads complete before kernel scheduler interrupts fire
- Zero-copy data access: Eliminating serialization/deserialization overhead
- NUMA-aware placement: Minimizing cross-socket memory latency penalties
Engineering for the Cache Hierarchy
WILD's architecture reflects deep understanding of CPU internals:
Cache-Line-Optimized Records
Each record fits precisely in a 64-byte cache line, aligned to prevent false sharing:
// Zig implementation of cache-aligned record
pub const CacheLineRecord = extern struct {
metadata: u32, // Valid flag + length
key: u64, // Hash key
data: [52]u8, // Payload
}; // Exactly 64 bytes
NUMA and SMT Consciousness
WILD probes /sys/devices/system/cpu at startup to map cache domains and NUMA nodes. Data structures bind to specific cores, avoiding costly cross-NUMA access. It prioritizes physical cores over hyperthreads to prevent execution unit contention.
Flat Hash Table Design
A linear-probing hash table with power-of-two capacity enables bitmask indexing (3 cycles vs. 30+ for modulo). Wyhash and MurmurHash3 minimize collisions while maintaining cache locality.
The Brutal Tradeoffs
WILD's performance comes at significant cost:
- Zero durability: Data evaporates on eviction or shutdown
- No ACID/transactions: Crash consistency? Forget it
- Capacity ceiling: Limited by L3 cache size (e.g., ~1.5M 64B records in 96MB)
- x86_64/Linux only: No ARM or Windows support
Why This Matters Beyond the Novelty
While impractical for most workloads, WILD demonstrates techniques applicable to serious systems:
- Database internals: Buffer pools and write-ahead logs benefit from cache-aware structures
- Real-time systems: HFT and industrial control could adopt these patterns selectively
- Query optimization: Engineered for L3-resident intermediate results
Benchmarks on a Ryzen 9 7800X3D (96MB L3) show 615.5 million operations/second – orders of magnitude beyond traditional databases. As one commenter observed: "This feels like seeing a Formula 1 car: impractical for groceries, but revolutionary engineering that trickles down to everyday vehicles."
The real value lies in WILD's stark reminder: we leave enormous performance untapped by treating CPU caches as abstracted implementation details. While you shouldn't deploy this to production, its lessons could reshape how we design systems pushing latency boundaries.

Comments
Please log in or register to join the discussion