Anthropic's Claude Code cache changes spark user outrage over quota drain

Anthropic's recent cache TTL reduction from one hour to five minutes for Claude Code is causing users to burn through quotas faster, with some Pro users getting as few as two prompts in five hours.

Anthropic's recent changes to Claude Code's prompt caching system have sparked significant user frustration, with many reporting that their monthly quotas are depleting at an alarming rate despite the company's assurances that costs should remain stable.

The controversy centers on a cache time-to-live (TTL) adjustment that Anthropic implemented in March 2026. Originally, Claude Code had a one-hour cache for context data, which allowed the system to avoid reprocessing previously used prompts and background information. However, Anthropic reduced this to just five minutes for many requests, a change that users say is dramatically increasing their usage costs.

The cache optimization debate

User Sean Swanson first identified the cache change through detailed analysis, noting that the five-minute TTL is "disproportionately punishing for the long-session, high-context use case that defines Claude Code usage." When developers use AI coding assistants, they send additional context along with their prompts—such as existing code or background instructions. While this context improves AI accuracy, it also requires more processing power.

Anthropic's Jarred Sumner, creator of the Bun JavaScript runtime who now works for Anthropic, defended the change. He argued that the five-minute cache actually makes Claude Code cheaper because "a meaningful share of Claude Code's requests are one-shot calls where the cached context is used once and not revisited." Sumner explained that the Claude Code client automatically determines the cache TTL and there are no plans for a global setting.

However, Swanson revised his analysis and acknowledged that sessions using subagents do benefit from the lower write cost of the five-minute cache since they interact quickly and "their caches almost never expire." The real issue, he argued, is that many long-term users are suddenly hitting quota limits for the first time.

Context window costs compound the problem

Another significant factor is the large one-million-token context window available on paid plans with Claude Opus 4.6 or Sonnet 4.6 models. Claude Code creator Boris Cherny explained that "prompt cache misses when using 1M token context window are expensive... if you leave your computer for over an hour then continue a stale session, it's often a full cache miss."

Cherny revealed that Anthropic is investigating a 400,000-token context window by default, with an option for one million tokens if preferred. This configuration setting already exists, but the larger contexts are becoming common because users are "pulling in a large number of skills, or running many agents or background automations."

Performance degradation concerns

The cache optimization discussion may be masking a more fundamental issue: users report that Claude's performance has noticeably declined. A user on the enterprise team plan described how "in March I could use Opus all day and it was getting great results. Since the last week of March and into April, I've had sessions where I maxed out session usage under 2 hours and it got stuck in overthinking loops, multiple turns of realising the same thing, dozens of paragraphs of 'but wait, actually I need to do x' with slight variations."

This sentiment echoes similar complaints from an AI director at AMD, who publicly criticized Claude Code for becoming "dumber and lazier" since the last update. The combination of faster quota depletion and perceived performance degradation has left many users questioning the value of their subscriptions.

Technical issues compound user frustration

Adding to the frustration, users have reported multiple bugs in the caching code. One user bluntly stated that "before those are fixed likely any 5 minutes vs 1 h discussion is entirely moot since numbers are totally flawed." These technical issues make it difficult to determine whether the quota problems stem from intentional policy changes or unintended software bugs.

The situation has reached a critical point where Pro users paying $20 per month report getting as few as two prompts in five hours. This represents a dramatic reduction in value and has led to widespread complaints across developer communities.

What this means for AI coding assistants

The Claude Code controversy highlights the delicate balance AI companies must strike between cost optimization and user experience. While prompt caching is an important technique for reducing computational costs, the implementation details can have significant real-world impacts on how developers use these tools.

For developers relying on Claude Code for their daily work, the current situation creates uncertainty and frustration. The combination of faster quota depletion, perceived performance degradation, and technical bugs has undermined confidence in the service. Whether Anthropic can address these concerns through technical fixes, policy adjustments, or clearer communication remains to be seen, but the current user sentiment suggests significant work is needed to restore trust in the platform.