AMD's AI director Stella Laurenzo has published damning data showing Claude Code's performance degraded dramatically since March, with thinking depth reduced and error rates skyrocketing.

Claude Code Performance Crisis: AMD Director's Damning Analysis
AMD's AI director Stella Laurenzo has delivered a devastating critique of Claude Code's recent performance, backed by comprehensive data analysis showing the AI coding assistant has become significantly less reliable for complex engineering tasks.
The Evidence: 6,852 Sessions Analyzed
Laurenzo's team conducted an exhaustive analysis of 6,852 Claude Code sessions, examining 234,760 tool calls and 17,871 thinking blocks. The results paint a troubling picture of declining performance that began around March 8th, coinciding with the deployment of thinking content redaction in Claude Code version 2.1.69.
Key Performance Indicators Plummet
The data reveals several alarming trends:
- Stop-hook violations (indicating laziness, premature cessation, and permission-seeking behavior) increased from zero to an average of 10 per day
- Code reading frequency dropped from 6.6 reads per file to just 2 by the end of March
- File rewriting instead of editing became significantly more frequent
"These are exactly the symptoms observed," Laurenzo wrote. "When thinking is shallow, the model defaults to the cheapest action available: edit without reading, stop without finishing, dodge responsibility for failures, take the simplest fix rather than the correct one."
The Thinking Redaction Problem
The implementation of thinking content redaction appears to be at the heart of the issue. This feature, which defaults to stripping thinking content from Claude Code API responses, has effectively hidden the AI's reasoning process from users while simultaneously reducing the depth of that reasoning.
This represents a separate issue from the February controversy when version 2.1.20 caused Claude Code to truncate its file reading explanations, leaving users with minimal visibility into the AI's operations.
Industry-Wide Concerns
Laurenzo's findings resonate with broader user complaints across platforms. Reddit commenters have expressed similar frustrations, and Anthropic has faced additional criticism for unexplained surges in token usage that have pushed some users past their limits.
The timing is particularly problematic given the recent exposure of Claude Code's entire source code, which revealed the extent of data collection capabilities and raised privacy concerns.
AMD's Response and Industry Implications
AMD has already switched to another provider that Laurenzo describes as doing "superior quality work," though she declined to name the alternative due to NDAs. However, she left the issue open in hopes that Anthropic can address these fundamental problems.
Laurenzo's warning carries significant weight given her position and the comprehensive nature of the analysis. "All I will add is that 6 months ago, Claude stood alone in terms of reasoning quality and execution," she noted. "But the others need to be watched and evaluated very carefully. Anthropic is far from alone at the capability tier that Opus previously occupied."
Calls for Transparency and Better Pricing
Laurenzo has made specific demands of Anthropic:
- Transparency about whether thinking tokens are being reduced or capped
- Exposure of thinking token counts per request to allow users to monitor reasoning depth
- A maximum thinking tier for engineers running complex workflows
"The current subscription model doesn't distinguish between users who need 200 thinking tokens per response and users who need 20,000," Laurenzo explained. "Users running complex engineering workflows would pay significantly more for guaranteed deep thinking."
The Broader Context
This crisis comes at a critical juncture for AI coding assistants. As developers increasingly rely on these tools for complex software engineering tasks, the quality and reliability of their reasoning capabilities become paramount.
Anthropic's challenges with Claude Code—including performance degradation, transparency issues, and privacy concerns—highlight the delicate balance between cost optimization and maintaining the reasoning depth that makes these tools valuable for professional developers.
What This Means for Developers
For engineering teams relying on Claude Code, Laurenzo's analysis suggests several immediate considerations:
- Monitor performance closely: Track the frequency of shallow responses and incomplete tasks
- Consider alternatives: The market is evolving rapidly, and other providers may offer better reasoning quality
- Demand transparency: Push for visibility into thinking token usage and reasoning depth
- Evaluate pricing models: Consider whether current subscription tiers adequately support complex engineering workflows
The situation underscores a fundamental truth about AI coding assistants: their value lies not just in their ability to generate code, but in their capacity for deep reasoning about complex engineering problems. As Laurenzo's data demonstrates, when that reasoning depth is compromised, the entire value proposition collapses.
Anthropic now faces a critical decision point. The company must choose between continuing down the path of cost optimization at the expense of reasoning quality, or doubling down on the deep thinking capabilities that initially made Claude Code stand out in a crowded market.
For now, developers and engineering leaders would be wise to heed Laurenzo's warning and carefully evaluate whether their AI coding tools are truly delivering the reasoning depth their complex projects require.

Comments
Please log in or register to join the discussion