tsink: A High-Performance Embedded Time-Series Database for Rust

tsink v0.8.0 introduces a ground-up rewrite of the Rust time-series database, featuring an LSM-tree storage engine with Gorilla compression, full PromQL support, and impressive performance benchmarks that position it as a compelling alternative for embedded time-series data needs.

In the ever-evolving landscape of time-series databases, tsink emerges as a purpose-built solution for Rust applications, delivering a complete time-series engine that can be embedded directly within applications without requiring external dependencies. Version 0.8.0 represents a significant architectural overhaul, showcasing a sophisticated LSM-tree implementation combined with advanced compression techniques and comprehensive querying capabilities.

Architectural Foundation

At its core, tsink employs a Log-Structured Merge-Tree (LSM-tree) architecture, a well-suited choice for write-heavy workloads common in time-series scenarios. The implementation features tiered compaction across L0, L1, and L2 levels that operates transparently in the background, balancing write throughput with read performance. This approach minimizes read amplification while maintaining high write speeds through a dual-lane encoding system that separates numeric and blob data processing paths.

The database's sharded design—partitioned into 64 internal shards—effectively eliminates write contention points, allowing concurrent operations without blocking. This architecture enables lock-free reads to proceed simultaneously with writes, facilitated by configurable worker pools that adapt to the specific runtime environment.

Compression Innovation

One of tsink's most compelling aspects is its sophisticated compression system, which achieves remarkable space efficiency. The adaptive codec selection mechanism evaluates multiple encoding strategies and selects the most compact representation for each dataset. For timestamp encoding, the system employs Fixed-step Run-Length Encoding (RLE) for regular intervals, delta-of-delta bitpacking as the primary strategy, and varint-encoded deltas for irregular patterns.

Value compression leverages specialized algorithms tailored to different data types:

IEEE 754 floats benefit from Gorilla XOR compression
Integers utilize zigzag delta bitpacking (i64) or simple delta encoding (u64)
Constant values apply Run-Length Encoding
Boolean values compress to a single bit each
Binary and string data employ delta block compression

This multi-faceted approach results in an impressive 23x compression ratio, reducing storage requirements from approximately 16 bytes per data point to just 0.68 bytes—a critical advantage for applications handling massive volumes of time-series data.

Query Capabilities

tsink distinguishes itself with a complete PromQL query engine, built from scratch without external dependencies. This implementation includes a full lexer, parser, and evaluator that supports 23 built-in functions ranging from rate and irate calculations to time-based aggregations like avg_over_time and sum_over_time. The query engine handles 15 binary operators and 7 aggregation operators, providing comprehensive data manipulation capabilities.

The system supports both instant queries (evaluating at a single timestamp) and range queries (evaluating across a time window), with sophisticated vector matching capabilities. Series discovery employs matcher-based filtering with support for equality, inequality, and regex matching operators (=, !=, =, !), enabling dynamic exploration of metric series based on label combinations.

Performance Benchmarks

The performance metrics demonstrate tsink's capabilities as a high-performance time-series database:

Single insert latency averages approximately 1.7 microseconds
Batch insert throughput reaches 6.4 million points per second
Point queries complete in about 114 nanoseconds, enabling 8.8 million queries per second
Range queries maintain high throughput even with larger datasets (64M points/sec for 1K points, 48M points/sec for 1M points)

These benchmarks position tsink among the highest-performing time-series solutions available, particularly for workloads requiring both high ingestion rates and low-latency querying.

Resource Management and Durability

tsink implements robust resource management mechanisms to prevent unbounded growth and ensure system stability. Memory budget enforcement with admission backpressure prevents out-of-memory conditions by rejecting writes when approaching memory limits. The series cardinality cap controls the number of unique metric and label combinations, preventing uncontrolled growth in metadata storage.

Durability is addressed through a segmented Write-Ahead Logging (WAL) system with CRC32 checksums and configurable sync strategies. The database supports two WAL modes: Periodic sync (defaulting to 1-second intervals) for optimal throughput, and Per-append for maximum durability. This flexibility allows applications to balance performance requirements with data safety guarantees.

Integration Ecosystem

The library offers multiple integration pathways to suit different application architectures:

As an embedded library directly within Rust applications
Through a runtime-agnostic async API with bounded queues
Via a tokio-based HTTP server providing Prometheus compatibility
With Prometheus remote read/write support for seamless integration with existing monitoring stacks

The server mode implementation includes TLS support, bearer token authentication, and graceful shutdown capabilities, making it suitable for production deployments. The Prometheus compatibility extends to query endpoints, allowing existing tooling and dashboards to work with tsink without modification.

Developer Experience

tsink emphasizes simplicity through a fluent builder API that allows precise configuration of all database aspects. Sensible defaults minimize setup complexity while extensive customization options enable optimization for specific workloads. The comprehensive API documentation and examples facilitate rapid adoption, with clear patterns for common time-series operations.

The type system provides six value variants (f64, i64, u64, bool, bytes, string) with automatic conversions between compatible types. Custom codec and aggregator traits allow extension of the system for specialized data types and aggregation strategies beyond the built-in set.

Container Optimization

tsink demonstrates awareness of modern deployment environments through automatic detection of cgroup v1/v2 constraints for CPU and memory quotas. This container-aware optimization enables appropriate thread pool sizing in Docker, Kubernetes, and other containerized environments, preventing resource contention and ensuring optimal performance in orchestrated deployments.

Conclusion

tsink represents a significant advancement in embedded time-series databases for Rust, combining sophisticated storage techniques with impressive performance characteristics. Its ground-up rewrite in version 0.8.0 addresses many common pain points in time-series data management, particularly for applications requiring high ingestion rates, efficient storage, and low-latency querying within the Rust ecosystem.

The combination of an LSM-tree architecture, adaptive compression, full PromQL support, and comprehensive resource management makes tsink suitable for a wide range of applications—from IoT sensor data collection to microservice monitoring and financial time-series analysis. As the Rust ecosystem continues to mature, solutions like tsink provide compelling alternatives to traditional time-series databases, offering performance, efficiency, and tight integration with the language's type system and concurrency model.

For developers interested in exploring tsink, the project is available on GitHub with comprehensive documentation and examples. The crates.io page provides easy integration into Rust projects, while the documentation site offers detailed API reference and usage guides.

#Rust #Time-Series Database #LSM-tree #PromQL #Compression