Keybench: A Scriptable Benchmarking Tool for Key-Value Stores

Keybench provides a framework for consistently benchmarking different key-value storage engines using customizable Lua workloads, enabling fair performance comparisons across various implementations.

Keybench, a new open-source project from developer Alex Gaetano Padula, addresses a critical need in the database and storage communities: the ability to fairly compare different key-value storage engines using consistent workloads. The tool offers a scriptable, extensible framework for benchmarking sorted key-value stores with detailed performance metrics.

What Keybench Offers

At its core, Keybench allows developers to write workloads in Lua and run them against multiple storage engines, measuring throughput and latency with precision. The tool separates the benchmark harness from the storage engines being tested, ensuring that comparisons reflect the engines' actual performance rather than implementation details of the testing framework.

The project currently supports three storage engines:

An in-memory skiplist (reference implementation)
RocksDB (Facebook's LSM-tree based key-value store)
TidesDB (a newer LSM-tree with transaction support)

Technical Architecture

Keybench's architecture consists of five replaceable components:

Engine: Controls concurrency and manages worker threads
Workload: Lua scripts defining the operations to be benchmarked
Store: A consistent interface (kv table) that works across all engines
Backend: Self-registering plugins for each storage engine
Reporter: Multiple output formats including console, TSV, and timeline

The implementation is noteworthy for its clean separation of concerns. Each worker thread runs the Lua script in its own isolated state with its own random seed, ensuring that measurements reflect true parallelism without artificial serialization. The benchmark harness never holds locks around engine calls, allowing serialized engines to report as such and parallel engines to demonstrate their concurrency benefits.

Performance Metrics

Keybench reports two primary metrics:

Workload units per second (wu/s): The rate of complete operations as defined by the script, such as a "view cart" or "checkout" transaction
Primitive operations per second (ops/s): The rate of raw key touch operations (individual put, get, del, range, or scan calls)

When a workload operation involves multiple primitive operations, both metrics are reported separately. For batched operations, ops/s will be exactly B times wu/s, where B is the batch size, demonstrating the amortization of fixed per-call costs over more keys.

Latency is recorded as distributions rather than simple averages, with percentiles (p50, p99, p99.9) and maximum values reported for each operation type (put, get, del, range, mget, mput, mdel). This provides a more complete picture of performance characteristics than simple averages.

Practical Applications

Keybench's design makes it particularly valuable for several use cases:

Engine Selection: Organizations evaluating different key-value stores can run identical workloads to compare real-world performance
Performance Tuning: The tool's ability to sweep across thread counts and batch sizes helps identify optimal configurations
Regression Testing: Ensures performance doesn't degrade as storage engines are updated
Research: Provides a consistent framework for academic studies on key-value store performance

The tool's configuration system allows for complex benchmark scenarios. Users can define grids of tests combining different engines, thread counts, and batch sizes, with each test repeated multiple times to establish median performance values and reduce noise.

Implementation Highlights

Keybench is written in C with Lua scripting support vendored into the project. The build system is straightforward, requiring only a C compiler and pthreads for the default configuration. Additional engines can be compiled by specifying appropriate make variables.

One notable design decision is the consistent kv interface that abstracts the underlying storage engine. Workloads interact with a global kv table, calling methods like put, get, del, range, scan, mget, mput, and mdel. The same interface works across all supported engines, ensuring that workload performance isn't affected by differences in API design.

The tool also includes system probes that collect hardware and OS information during benchmarks, providing context for interpreting performance results. These probes report CPU usage, memory consumption, disk I/O, and other system metrics that can significantly impact storage performance.

Extensibility

Keybench is designed to be easily extensible. New storage engines can be added by:

Creating a new directory under backends/
Implementing the kv_backend vtable from src/bench.h
Writing an open function and registering it with KV_REGISTER_BACKEND
Adding a backend.mk file that specifies the build requirements

This plugin architecture allows the tool to support a wide variety of storage engines without modifying the core benchmarking logic.

Reporting and Visualization

Keybench offers multiple output formats:

Console: Human-readable reports with performance summaries
TSV: Machine-readable output suitable for spreadsheet analysis
Timeline: Detailed sampling of performance metrics over time

The project includes a Python script (scripts/plot.py) that can render TSV output into various visualizations including throughput bars, scalability curves, batch amortization plots, latency percentiles, and live timeline graphs.

Limitations

Despite its strengths, Keybench has some limitations:

Current Engine Support: While the plugin architecture is flexible, the project currently only includes three engines (skiplist, RocksDB, TidesDB). Adding others requires implementation effort.
Lua Workload Language: While Lua offers good performance and simplicity, some teams might prefer other scripting languages for their workloads.
Complex Configuration: The tool's powerful configuration system can be complex to set up for comprehensive benchmark suites.
Limited Built-in Workloads: The project includes several example workloads (mixed, cart, scan, batch), but real-world scenarios often require more specialized patterns.

Conclusion

Keybench represents a well-designed, technically sound approach to key-value store benchmarking. Its strength lies in its consistent methodology, detailed metrics, and extensible architecture. By separating the benchmark harness from the storage engines and providing a uniform interface for workloads, it enables fair comparisons that reflect actual engine performance rather than implementation artifacts.

For developers and organizations evaluating or optimizing key-value storage systems, Keybench provides a valuable tool that goes beyond simple throughput measurements to offer nuanced insights into performance characteristics under various configurations.

The project is available on GitHub under the GPL-2.0 license, with documentation and example configurations included in the repository. The clean architecture and extensible design suggest that it could become a standard tool in the key-value store benchmarking toolkit.

#Key-Value Store #Benchmarking #Lua #RocksDB #Performance