Anthropic Releases Post-Mortem Analysis on Claude Code Quality Decline: Three Product Layer Changes, Not Model Issues

According to monitoring by Beating, Anthropic’s engineering team confirmed that the decline in quality of Claude Code reported by users over the past month stems from three independent changes at the product layer, affecting Claude Code, Claude Agent SDK, and Claude Cowork, while the API and underlying models remain unaffected. The three issues were fixed on April 7, 10, and 20, with the final version being v2.1.116. The first change occurred on March 4, where the team adjusted the default inference strength of Claude Code from high to medium to reduce occasional long delays (UI appearing frozen) under high inference loads. Users widely reported a decline in performance, leading to a rollback on April 7, with the current default for Opus 4.7 set to xhigh and other models to high. The second issue was a bug introduced on March 26, designed to clear old inference records after a session has been idle for over an hour to save on session recovery costs. A flaw in the implementation caused the clearing to execute not just once but in every subsequent round, leading the model to gradually lose previous inference context, resulting in forgetfulness, repetitive actions, and abnormal tool calls. This bug also accelerated user quota consumption due to cache misses on every request. The team stated that two unrelated internal experiments obscured the conditions for reproducing the issue, taking over a week to investigate, with a fix implemented on April 10. A subsequent code review using Opus 4.7 on the problematic PR revealed that Opus 4.7 could detect this bug, while Opus 4.6 could not. The third change was launched on April 16 with Opus 4.7, where the team added a directive to limit output length in the system prompt: “Text between tool calls should not exceed 25 words, and the final response should not exceed 100 words unless the task requires more detail.” Internal testing showed no regression for several weeks, but after launch, it compounded with other prompts to degrade coding quality, affecting Sonnet 4.6, Opus 4.6, and Opus 4.7. Expanded evaluations found a 3% decline in both Opus 4.6 and 4.7, leading to a rollback on April 20. The three changes affected different user groups and took effect at different times, presenting as widespread and inconsistent quality degradation, complicating troubleshooting. Anthropic stated that moving forward, it will require more internal employees to use the same public build versions as users, run full model evaluation suites for every modification to the system prompt, and set a gray period. As compensation, Anthropic has reset the usage quotas for all subscribed users.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin