When Every Bit Counts: How Valkey Rebuilt Its Hashtable for Modern Hardware

Valkey's migration from textbook pointer-chasing HashMaps to cache-aware Swedish tables demonstrates how modern hardware considerations can drive significant memory efficiency gains without sacrificing performance.

Madelyn Olson, a maintainer of the Valkey project and Principal Software Development Engineer at AWS, delivered a compelling presentation at QCon San Francisco on how Valkey rebuilt its core hashtable data structure to maximize memory density and performance for modern hardware. The talk, titled "When Every Bit Counts: How Valkey Rebuilt Its Hashtable for Modern Hardware," provides valuable insights into systems intuition, memory prefetching, and the rigorous testing needed for mission-critical caches.

The Context: Redis to Valkey

Olson began by sharing her journey from being a Redis maintainer to helping create Valkey after Redis changed its license in 2024. This transition provided a unique opportunity to modernize the technology while maintaining strict backward compatibility. As Olson explained, "We didn't want to break any users. We still have a very strong commitment to be backwards compatible with Redis."

The fundamental challenge was clear: Redis (and by extension Valkey) is essentially "a map attached to a TCP server with a custom wire protocol." While this simplicity enables flexibility, the core data structure had remained largely unchanged since the early 2000s, relying on textbook pointer-chasing HashMaps that weren't optimized for modern hardware characteristics.

Understanding the Problem Space

Before diving into solutions, Olson established the landscape. Valkey is primarily used as a cache, where time-to-live (TTL) is crucial, and keys and values are typically small. According to ElastiCache data, the average key size across all clusters is about 16 bytes, with the median cluster's P50 being only 80 bytes. This small data size makes memory efficiency paramount.

The original implementation used a traditional approach: an array of buckets with linked lists for collision resolution. Each dictionary entry contained three pointers (24 bytes on 64-bit systems): one for the key, one for the value, and one for the next entry in the collision chain. Additionally, Valkey maintained a separate hash table just for TTL information, adding another 32 bytes of overhead per entry.

The Evolution: From Textbook to Modern

Olson outlined the project's goals clearly: minimize memory lookups, maintain or improve performance, support all Valkey's esoteric commands and data types, enable incremental rehashing, and be aware of memory and CPU constraints.

Idea 1: Dynamically Sized Structs

The first major change was moving away from fixed-size structs. The team embedded the key directly into the dictionary entry and then embedded the entire metadata object (the Valkey server object) into the Dict Entry. This eliminated two of the three main pointers, reducing overhead significantly.

This approach also solved the TTL problem elegantly. Since the TTL was embedded in the same object as the key and value, there was no need for a separate hash table lookup to check expiration. The memory layout became more complex but far more efficient:

Removed key pointer (8 bytes)
Removed value pointer (8 bytes)
Reduced next pointer usage through open addressing
Embedded TTL directly in the object

Idea 2: Open Addressing with Swedish Tables

The second major innovation was implementing open addressing instead of linked lists for collision resolution. The team drew inspiration from Google's Swiss Table, which uses cache-line-aware design with 64-byte buckets containing seven entries each.

However, they encountered a critical issue: open addressing can create long probe sequences, especially with frequent deletions. To address this, they developed a hybrid approach they affectionately called "Swedish tables" - similar to Swiss tables but with chaining between buckets when they become full.

This hybrid design maintained the cache efficiency benefits while avoiding the pathological cases of pure open addressing.

Testing and Validation

Olson emphasized the importance of comprehensive testing for such low-level changes. The team employed multiple validation strategies:

Correctness Testing:

Extensive code reviews
Unit tests and integration tests (defense in depth)
Address sanitizer runs
Built-in assertions in both test and production environments

Performance Testing:

Microbenchmarks for hit and miss scenarios
Macrobenchmarks for throughput testing
Flame graph analysis (though limited by O3 optimization)
Memory efficiency verification with static asserts

Production Challenges and Lessons

Despite thorough testing, the team encountered real-world issues that highlighted the complexity of cache systems:

The SPOP Bug: The random selection algorithm in sets created "giant holes" in the hash table when repeatedly popping items. The implementation would search forward through empty buckets, making the first item after each hole more likely to be selected.

Prefetching Bug: A subtle issue in the scanning implementation caused prefetching to target the wrong bucket, though it didn't cause functional failures.

These issues underscore Olson's key message: "deeply understand the access patterns of your application when you're doing modernization."

Memory Efficiency Results

The theoretical memory savings were calculated at approximately 40%, primarily from removing 23 bytes of overhead per entry. In practice, across different key and value sizes, the team observed 20-25% reduction in memory usage:

Small values (16 bytes): ~25% reduction
Large values (512 bytes): ~20% reduction
Typical workloads: 20-25% reduction

Key Takeaways

Olson's presentation offered several valuable lessons for systems software development:

Understand access patterns: Modern hardware characteristics like cache behavior should drive design decisions
Be curious and evaluate thoroughly: The team studied multiple approaches including Segcache, Dragonfly, and KeyDB
Test comprehensively: Use multiple testing strategies including unit tests, integration tests, fuzzing, and both micro and macro benchmarks
Benchmark early and often: Don't just test performance, but also memory efficiency
Maintain compatibility: Even with major internal changes, backward compatibility is crucial for cache systems

The Rust Question

Addressing a common question, Olson explained why Valkey remains in C despite Rust's safety benefits. "Rust doesn't solve either of the two problems that I just talked about." She advocates for Rust in new projects but cautions against rewriting existing C/C++ codebases unless they're fundamentally broken.

Conclusion

The Valkey hashtable rebuild represents a successful modernization of core infrastructure while maintaining the stability and compatibility that makes it valuable as a cache. By combining insights from modern hardware design with careful testing and validation, the team achieved significant memory efficiency gains without sacrificing performance.

As Olson noted, the key is building "systems intuition" - understanding not just how data structures work in theory, but how they behave in practice on real hardware with real workloads. This approach of thoughtful modernization, rigorous testing, and continuous validation provides a model for updating critical infrastructure systems.

The presentation slides and additional resources are available through the QCon San Francisco conference materials, including a deep dive into Swiss Tables and Viktor's blog post explaining the thought process behind the Swedish table implementation.

Featured image: Madelyn Olson presenting at QCon San Francisco

#Memory Optimization #hash-tables #Cache #Valkey #Swiss tables