I Added a Cache and the System Got Slower: The Hidden Cost of Caching

Caching promises free performance, but often introduces network overhead, complex failure modes, and stampede effects that can increase p95 latency. The real win comes from measuring true cost, caching selectively, and implementing patterns like single-flight and stale-while-revalidate.

Caching is often presented as a straightforward performance win: drop in Redis, flip a flag, and watch latency drop. The reality is messier. A cache is another network hop, another failure surface, and another place where requests can queue or timeout under load. When we added a cache to a hot endpoint running near capacity, we expected fewer database hits and lower latency. Instead, p95 latency increased. "Fast locally, slow in prod" became a daily refrain. Incidents became harder to diagnose because every request now had two potential bottlenecks: the cache and the origin. The code looked clean, cache dashboards were green, but users saw worse performance.

Where the slowdown actually comes from

Cache hits still have overhead. Even on a hit, you pay for:

A network round trip to the cache (if remote)
Serialization and deserialization
Connection pooling, TLS, retries, and associated overhead

On a well-indexed, warm database, this extra hop can cost more than the original query. The cache becomes an expensive middleman rather than a shortcut.

Misses create double work. The "simple" read path becomes:

GET from cache → miss
Fetch from origin
SET into cache

If data changes frequently or isn't reused, you've added overhead to almost every request. The hit rate might look decent, but most cost hides on the miss path.

Stampede and churn. When a popular key expires, requests pile onto the origin simultaneously. That thundering herd effect spikes p95 exactly when the system is under heaviest load. Short TTLs plus high-cardinality keys compound this: entries constantly evict, and the cache behaves like an expensive passthrough layer.

The cache becomes the bottleneck. When the remote cache service slows or exhausts resources, the application drags down. App logs may show no clear errors, but APM reveals "external call" time dominating the entire request. You've traded "DB is slow" days for "cache is slow" days.

Measuring whether caching actually helps

Treat the cache as its own service with proper metrics:

cache_hit_rate (per endpoint or key group)
cache_get_ms and cache_set_ms (p50 / p95 / p99)
origin_ms
request_total_ms
cache_timeouts and cache_errors

Run a controlled experiment: bypass the cache for 5–10% of traffic and compare p95/p99. Analyze end-to-end latency for both hit and miss paths. If the hit path isn't clearly cheaper than calling the origin directly, the cache is just adding complexity.

Changes that actually worked

Cache less, but smarter. Only cache reads that are both expensive and reusable. Avoid caching cheap queries or data that changes constantly.

Store smaller objects. Cache minimal DTOs instead of fully hydrated object graphs. Smaller payloads reduce serialization overhead and network transfer time.

Use request coalescing or single-flight. Collapse concurrent requests for the same key so only one goes to the origin. This eliminates duplicate work during stampedes. Libraries like singleflight (Go) or similar patterns in other languages can implement this.

Add TTL jitter. Prevent keys from expiring simultaneously by adding random variation to TTLs. This smooths load and reduces stampede risk.

Implement stale-while-revalidate. Serve slightly stale data immediately while refreshing the cache in the background. This maintains low latency even during cache misses or refresh cycles. Many HTTP caches support this via stale-while-revalidate directives, or you can implement it in application logic.

Set tight timeouts and intentional fallbacks. Don't let cache timeouts dictate API latency. Fail fast and fall back to the origin if the cache is slow or unavailable.

With these changes, the cache finally acted like a performance layer instead of "just another production problem."

Broader patterns and trade-offs

Caching introduces consistency trade-offs. Strong consistency requires careful TTL management and cache invalidation strategies. Event-driven cache invalidation via message queues (e.g., Kafka, RabbitMQ) can keep caches coherent but adds operational complexity. Write-through caches simplify consistency but increase write latency. Write-behind caches improve write performance but risk data loss on failure.

Network topology matters. A remote cache adds latency but shares state across instances. An in-process cache reduces latency but increases memory pressure and complicates cache invalidation across instances. Hybrid approaches—local caches with short TTLs backed by a remote cache—can balance these concerns.

Caching also affects database load in non-obvious ways. A cache that reduces read load but increases stampede risk can cause more database contention during peak traffic. Proper connection pooling and query optimization at the origin remain critical.

When not to cache

Low cardinality, high churn: Data that changes constantly and has few distinct keys benefits little from caching.
Cheap queries: Well-indexed queries with predictable latency often cost less than cache overhead.
Unreusable data: Data accessed once per request rarely justifies caching.
Cache-unfriendly access patterns: Sequential scans or ad-hoc filters don't benefit from key-value caching.

Final thoughts

Caching is a performance optimization, not a default architecture. It requires careful measurement, intentional design, and ongoing tuning. The hidden costs—network overhead, consistency complexity, and failure modes—can outweigh benefits if applied blindly. Focus on understanding your access patterns, measuring true cost, and implementing patterns like single-flight and stale-while-revalidate to make caching work for you, not against you.