As TiKV clusters pushed RocksDB into multi-terabyte, hundreds-of-thousands-of-SST territory, a single global mutex quietly became a 100 ms tail-latency tax. PingCAP’s engineers dissected RocksDB’s LogAndApply pipeline, peeled CPU-heavy work out of the lock, and turned a pathological bottleneck into a 100x latency win. Here’s how—and why this pattern should reshape how you think about concurrency in storage engines.