Accused by the community of secretly sabotaging, Anthropic apologizes and cancels Claude’s secret downgrade restrictions

According to “Beating Monitoring,” Anthropic announced that it is adjusting the development safety strategy for its new model, Claude Fable 5, and is removing measures that impose restrictions on silent performance degradation. The silent degradation mechanism has been accused by the community of being “covert sabotage,” leading to a strong backlash from the artificial intelligence research community.

Under Anthropic’s service terms, users are not allowed to use Claude to train competing models. Anthropic plans to directly reduce the performance of Claude Fable 5 for accounts suspected of training competing models, without notifying users. AI researchers warn that silent performance reduction will interfere with the testing work of third-party security assessment organizations and will hinder collaboration within the open-source community in the field of AI safety.

In response to community criticism, Anthropic issued a statement offering a public apology, acknowledging that it made the wrong decision in weighing its safety strategy, and adjusting its development safety safeguards to provide public prompts. If the system detects that a user is attempting to build a high-capability AI, it will explicitly refuse the request or redirect the user to a lower-capability model. Anthropic warns that, because public protection mechanisms are easier to bypass through targeted attacks, it will expand the screening scope for future safety interceptions, which may also result in some normal, harmless requests being mistakenly blocked.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned