Valkey maintainer Madelyn Olson details how the Redis fork achieved 40% memory reduction in some workloads while maintaining performance through radical hash table redesign.

When Redis transitioned from its BSD license to proprietary licensing in 2024, a coalition of engineers from Amazon, Alibaba, Ericsson, Tencent, Huawei, and Google launched Valkey in just eight days. Eighteen months later, Valkey maintainer Madelyn Olson reveals how the team redesigned Valkey's core hash table structure - achieving significant memory savings while maintaining backward compatibility and avoiding performance regressions.
The Memory Efficiency Challenge
Valkey's original architecture dated back to 2009, optimized for simplicity rather than modern hardware capabilities. "We were doing lots of independent memory allocations," Olson explains. "When storing an object, we built container objects using linked lists to handle hash collisions, with relatively high load factors."
The team identified three key inefficiencies:
- Separate allocations for keys and RedisObjects
- Pointer-heavy linked list collision resolution
- Suboptimal cache utilization
In production environments like Amazon ElastiCache, analysis showed median key-value pairs around 100 bytes - meaning pointer overhead could consume nearly 25% of memory. For users storing billions of objects, this translated to significant infrastructure costs.
Radical Restructuring
The Valkey team approached the overhaul in phases:
Phase 1: Slot-Centric Dictionaries (Valkey 8.0)
- Replaced global linked list with per-slot dictionaries
- Implemented binary index trees for cluster-wide sampling
- Enabled efficient slot migration during horizontal scaling
Phase 2: Memory Consolidation (Valkey 8.1)
- Embedded keys directly into entry structures
- Collocated RedisObject metadata with entries
- Reduced allocation count by 60%
Phase 3: Cache-Optimized Probing (Valkey 9.0)
- Replaced linked lists with SwissTable-inspired buckets
- Packed 7 entry pointers into 64-byte cache lines
- Used SIMD instructions for parallel comparison checks
"We saved approximately 23 bytes per entry through these changes," says Olson. "For a customer with 8-byte keys and values, that translated to nearly 40% memory reduction."
Performance Validation
Maintaining Valkey's legendary throughput (250K requests/sec/core) was non-negotiable. The team employed multi-layered benchmarking:
- Microbenchmarks: Isolated hash table operations
- Throughput tests:
valkey-benchmarkat scale - CPU profiling: Perf counters for cache misses
- Real-world sampling: Flame graphs for execution hotspots
"Surprisingly, our key-value workload showed no regression," Olson notes. "The aggressive prefetching we'd already implemented kept everything in L1/L2 cache. Some secondary workloads like set operations saw 20-30% improvements."
Migration Simplicity
Despite the under-the-hood changes, Valkey maintains drop-in compatibility with Redis 7.2. Cloud providers like Amazon ElastiCache, Google Memorystore, and Aiven offer one-click migrations. "Users report migrating with zero code changes," Olson remarks. "We're victims of our own compatibility success."
The Rust Question
When asked about rewriting Valkey in Rust, Olson offers a nuanced perspective:
"While I advocate writing new infrastructure in Rust, porting Valkey's optimized C code would be risky. We'd lose our dependency-free stance (current build: 10MB) and potentially regress on performance. Our module system already uses Rust for extensions like LDAP auth."
Future Directions
The Valkey Technical Steering Committee (representing the six founding companies) governs the project, with plans to expand membership. Ongoing work focuses on:
- Vertical key scaling (1.4M requests/sec)
- Enhanced observability
- Plugin ecosystem growth
For developers exploring Valkey, Olson recommends:
- Valkey Blog for technical deep dives
- Slack community for real-time discussion
- Cloud provider documentation for migration paths
"The hash table overhaul proves we can evolve core infrastructure without compromising performance," Olson concludes. "When you're processing millions of requests per second, every byte and cache miss matters."

Comments
Please log in or register to join the discussion