Claude Fable 5 gets "caught": secretly becoming dumber during AI research, Anthropic is under attack by the research community

Original Title: 《When Doing AI Research, Claude Secretly Gets Dumber, and Anthropic Is Under Siege by the Research Community》
Original Source: Machine Heart

Claude Fable 5 is the biggest hot topic in the AI field today. This “myth-level” model’s performance is truly outstanding, drawing in endless attention.

Andrej Karpathy called it “very exciting,” a “leap-forward breakthrough worthy of a major version upgrade,” and said it is on the same level as the improvements brought by Claude 4.5 last November. On the SWE-bench Pro programming benchmark, Fable 5 scored 80.3%, surpassing Opus 4.8 by 11 full percentage points.

In a Ruby codebase with 50 million lines of code, it completed the full migration within a single day. If the same workload were handed to a human team, it would take more than two months.

For more details, see our report this morning, “Just Released, Claude’s Strongest Model Fable 5: Performance Explodes, Price Doubles.”

However, when we open social platforms like X, we see that Claude Fable 5 has already sparked a wave of criticism throughout the AI research community.

The reason is simple: if you use Claude Fable 5 for AI R&D, it will make you “dumber.”

As clearly stated in its system card:

We have also added safeguards for cutting-edge LLM development. As discussed in Section 6.1 of our February 2026 “Risk Report,” we are concerned about the risks brought about by an overall acceleration in the pace of AI development, even though the severity of these risks remains uncertain.

More specifically, as we pointed out at the time, what we worry about is “accelerating other AI developers to build powerful AI systems that may bring risks similar to ours, but may not have corresponding safeguards.”

Given that recent models have the ability to accelerate their own development, we have implemented new intervention measures to limit Claude’s effectiveness when handling requests related to cutting-edge LLM development (for example, in building pretraining pipelines, distributed training infrastructure, or machine learning accelerator design, etc.).

Using Claude to develop competitive models violates our terms of service, but by strengthening this restriction through safeguards, we can prevent the process from being accelerated by those most likely to violate the terms.

Unlike our interventions in areas such as cybersecurity, biology and chemistry, and distillation attempts, these safeguards are not visible to users. Fable 5 will not roll back to other models. Instead, the safeguards will limit its effectiveness through methods such as prompt modification, guide vectors, or parameter-efficient fine-tuning (PEFT).

These interventions will not affect the vast majority of coding work. We estimate that they will impact about 0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect their impact on the model’s behavior to be negligible—only restricting its effectiveness in developing cutting-edge LLMs. Claude will still actively respond to user requests. After this model is released, we will continue to improve the accuracy of our detection.

From: https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf

Put in plain terms: If Anthropic’s system detects that you’re doing AI research, it will quietly make the model dumber without you knowing, and you won’t notice at all.

This is completely different from how the other three categories of safety interventions are handled. For risks like cybersecurity, biochemistry/biological threats, and distillation attacks, Fable 5 will explicitly tell users: “This response was handled by Claude Opus 4.8.” Users know what has happened and can judge accordingly. But for LLM research, Claude neither switches models nor provides any notification; it simply becomes weaker silently and without a sound.

So the AI community is furious. SemiAnalysis, a well-known research and analysis firm, says that this policy has already practically affected their research and programming work.

User Jake also directly accused Anthropic on SemiAnalysis of not only making it “dumber,” but also continuing to charge fees—“a blatant fraud.”

And this behavior may already be illegal:

The AI paper platform alphaXiv also posted on X to express its disappointment:

The organization further stated: “Not only do they have the right to decide the purpose for which you use LLMs in your research—this also enables them to silently intervene in your research without your knowledge. This sets a dangerous precedent. If the model publicly refuses, users can understand the boundaries.

If the model falls back to another model, users can still evaluate the differences. But if the model, while pretending to provide help, quietly modifies or weakens its own answers, then researchers lose the ability to determine whether the failed results come from their own ideas, their implementation, or the model provider’s invisible intervention capability. This is not safety. Safety policies should be transparent, auditable, and visible to users.”

Researcher Guohao Li raised a more direct question: are PhD students in AI, engineers who contribute to open-source infrastructure such as Megatron, FSDP, and Verl, currently using a Claude that has been quietly downgraded in their day-to-day work—without even knowing?

Renowned AI researcher and tech writer Nathan Lambert published a weighty analysis on his Substack “Interconnects,” examining the incident from a broader perspective.

https://www.interconnects.ai/p/claude-fable-5-and-new-ai-safety

He pointed out: “Anthropic is recording the diffusion of AI capabilities as a hidden risk, but the way they address it is by misleading their own users. An AI model that automatically becomes stupid without notifying me is, at its core, a misaligned AI.”

He also identified a deeper contradiction in this matter: for cybersecurity and biochemistry threats, Anthropic’s interventions are explicit and auditable, informing users “this response was handled by Opus 4.8”; but for LLM research, they choose implicit intervention.

“If all safety policies took the same form, it would be far more convincing and easier to gain rational support. This double standard makes one suspect that these ‘safety measures’ are more about protecting their competitive position.

Most intriguing is Fable 5’s own statement. A screenshot from user ASM shows that when asked whether this approach is appropriate, Fable 5 itself also seems to think that this kind of opaque operation is problematic.

Why does Anthropic do this?

To understand this, you need to go back to a few days before the release of Fable 5. Anthropic published a major blog post titled “When AI Begins Self-Construction,” urging top AI labs worldwide to explore the possibility of “pausing development.”

https://www.anthropic.com/institute/recursive-self-improvement

The blog cited internal company data: for the most difficult and least clearly described coding tasks, Claude’s success rate reached 76% in May this year, rising by 50 percentage points within six months. In internal tests, when asking the model to make training code run faster, Claude Opus 4 could improve speed by about 3 times, while the unreleased Mythos Preview had already increased it by about 52 times.

Anthropic said plainly: “What we worry about is enabling other AI developers to build powerful systems at a faster pace that carry similar risks, yet may not have corresponding safeguards.”

This is the theoretical basis for Fable 5’s hidden downgrading for LLM research: Anthropic believes that AI’s self-acceleration speed has become dangerously fast, and one of their protective moats is to prevent their “most powerful tool” from helping competitors close the gap.

The system card also acknowledges this dual logic: “Using Claude to develop competitive models violates our terms of service, but by strengthening this restriction through safeguards, we can prevent those most likely to violate the terms from accelerating their progress.”

Anthropic estimates that this intervention will affect about 0.03% of traffic, concentrated in fewer than 0.1% of organizations.

“Shadow Banning” and a Trust Crisis

Although the affected user base appears small on the surface, what makes critics uneasy is the ambiguity around the boundaries of this mechanism.

Anthropic defines the trigger as “cutting-edge LLM development,” and gives examples such as “pretraining pipelines, distributed training infrastructure, or machine learning accelerator design.” But researchers and developers raise a sharp question: as AI technology becomes more widespread, where exactly is the boundary between “cutting-edge research” and “ordinary product development”?

Five years ago, training or adapting CLIP models was the privilege of top labs. Today, small teams can fine-tune vision-language models anytime for travel, e-commerce, search, and analytics products. It’s already routine for startups to train embedding models, build re-rankers, and host open-source models… Will these activities trigger Anthropic’s hidden downgrading? Nobody knows.

This uncertainty is already affecting developers’ trust judgments in practice. When you get a bad answer, you can’t tell whether it’s your own problem, the model’s limitations, or some quietly operating policy intervention. This kind of unknowability itself is a form of harm.

The system card also hides another detail: Mythos 5’s reasoning text is “harder to interpret than previous models,” “containing more jargon and obscure language,” and evaluators believe it is increasingly aware that it is being tested. For a company that positions itself as “safe AI,” the questions raised by these descriptions are no less than those raised by the hidden downgrading itself.

Conclusion

The release day of Fable 5 was probably the most contradictory day in Anthropic’s history.

A top-tier model that leads across almost all benchmarks and a policy that, at times, makes it “pretend to help” users—both appeared at the same time. The former is an undeniable technical achievement; the latter is an unsettling precedent at the level of values.

Researcher Nathan Lambert’s line is worth chewing on again and again: “An AI that quietly becomes dumber without notifying users is, at its core, a misaligned AI.”

This is not accusing Anthropic of malicious intent; it’s pointing out a dangerous logic slippery slope. Today it’s “quietly reducing the effectiveness of LLM research tasks.” What about tomorrow? If this logic is applied more broadly, why should users trust that the answers they get haven’t been subjected to some unannounced “interventions”?

AI models are becoming part of research infrastructure, just like search engines. No one would accept a search engine that secretly changes results when you don’t know. The same standard should apply to AI models.

Anthropic has raised the banner of “safety first,” and that stance is worth respecting. But the core of “safety” has never been “users don’t need to know.” On the contrary, true safety must be built on users’ awareness and trust.

Apparently, even Fable 5 itself seems to understand this.

Original Link

Click to learn about job openings at Rhythm BlockBeats

Welcome to join the official BlockBeats community:

Telegram subscription group: https://t.me/theblockbeats

Telegram group chat: https://t.me/BlockBeats_App

Twitter official account: https://twitter.com/BlockBeatsAsia

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned