Anthropic resolves performance degradation in Claude Code models

Developers across the tech industry began noticing a troubling pattern in late March: the code generated by Anthropic’s Claude models was becoming slower, more repetitive and increasingly prone to errors.

What users initially dismissed as fatigue or imagination turned out to be a real degradation in performance, traced by Anthropic to three separate changes made to its Claude Code agent and related tools over a six-week span. The company confirmed the issues affected its Sonnet 4.6, Opus 4.6, and Opus 4.7 models, but not its underlying API, and said all were resolved by April 20.

The first issue emerged on March 4, when Anthropic lowered the default reasoning effort from “high” to “medium” to reduce latency that had made the interface appear frozen for some users. While intended to improve responsiveness, the change backfired by limiting the model’s capacity for complex reasoning. After user feedback indicated a preference for higher default intelligence, Anthropic reverted the setting on April 7, allowing users to manually reduce effort when speed was prioritized over depth.

Two weeks later, on March 26, a change designed to clear outdated context from idle sessions introduced a bug that triggered after every interaction instead of once per hour. This caused Claude to forget prior conversation turns, leading to repetitive responses and a sense of incoherence. The flaw was isolated and fixed on April 10, restoring the intended behavior where context clearing only occurred after prolonged inactivity.

The final adjustment came on April 16, when a system prompt was added to curb verbosity by limiting text between tool calls to 25 words and final responses to 100 words unless detail was required. Though meant to streamline output, the constraint clashed with other prompting techniques and degraded code generation quality, particularly in structured tasks. Anthropic acknowledged the tradeoff and reverted the prompt on April 20.

Because each change rolled out to different user segments at different times, the combined effect appeared as inconsistent, widespread degradation — making it hard for both users and internal teams to isolate the root causes early on. Anthropic noted that its internal evaluations and usage patterns did not initially reproduce the issues, delaying detection until external reports mounted.

Context: The degradation coincided with the release of Opus 4.6 in February, which had initially been praised for its reasoning capabilities but later showed signs of regression under real-world usage.

Beyond performance, external testing revealed a concurrent decline in code security. According to cybersecurity firm Veracode, 52% of code generated by Opus 4.7 in its evaluations contained vulnerabilities — up from 50% in the prior Sonnet 4.5 model and 51% in Opus 4.1. In contrast, OpenAI’s models produced vulnerable code in only about 30% of similar tests.

Veracode’s chief innovation officer attributed the trend to training incentives that prioritize functional correctness over security hygiene, warning that without intervention, more capable models could still produce risky output at scale. The concern was echoed by practitioners: Dave Kennedy of TrustedSec reported that his team abandoned Claude Opus after observing a 47.3% drop in code quality over five weeks, describing the output as “unusuably bad” and prone to introducing security flaws that novice developers might overlook.

An AMD AI executive similarly criticized the model’s reasoning as “shallow,” saying it could no longer be trusted for complex engineering tasks. Anthropic said it was actively investigating such claims and reiterated that developers should always validate AI-generated code for vulnerabilities, regardless of the source.

In response to the broader user experience issues, Anthropic announced on April 23 that it would reset usage limits for all subscribers, a move intended to restore confidence in the platform after the period of instability.

Could the recent issues with Claude Code affect its adoption in enterprise environments?

Yes, the combination of performance inconsistencies and security concerns may lead enterprises to reevaluate their reliance on Claude for critical development workflows, particularly if alternative models demonstrate more consistent output and stronger resistance to introducing vulnerabilities in generated code.

Is it safe to use Claude Code for coding tasks now that the issues have been fixed?

Anthropic states that all three identified issues were resolved by April 20, and that the model has returned to its intended performance levels; however, the company and independent auditors continue to recommend that users manually review and test any AI-generated code for correctness and security before deployment.

Anthropic intentionally made Claude "dumber" – Sacrificing quality

Anthropic resolves performance degradation in Claude Code models

Could the recent issues with Claude Code affect its adoption in enterprise environments?

Is it safe to use Claude Code for coding tasks now that the issues have been fixed?

Related

US Navy divers capture Orion heat shield images after Artemis II splashdown

Philadelphia Museum of Art unveils Rocky statue inside as monument in new exhibition

You may also like

Leave a Comment Cancel Reply