Accused by the community of secretly sabotaging, Anthropic apologizes and cancels Claude’s secret downgrade restrictions

robot
Abstract generation in progress
ME AI News, according to Data Observation Beating Monitoring, Anthropic has announced an adjustment to the development safety policy for its new model Claude Fable 5, removing the restriction measures that silently downgraded performance. The silent downgrade mechanism has been criticized by the community as “covert sabotage,” triggering strong backlash from the AI research community. Under Anthropic’s service terms, users are not allowed to use Claude to train competing models. Anthropic plans to directly reduce the performance of Claude Fable 5 for accounts suspected of training competing models without notifying the users. AI researchers warn that silently lowering performance would interfere with testing by third-party security assessment organizations and hinder collaboration within the open-source community in the field of AI safety. In response to community criticism, Anthropic issued a statement offering a public apology, admitting that it made an incorrect decision when weighing its safety policies, and will adjust its development safety protections to use public prompts. If the system detects that a user is attempting to build a high-capability AI, it will clearly refuse the request, or redirect the user to a lower-capability model. Anthropic warns that because public safety mechanisms are easier to bypass through targeted attacks, in the future it will expand the scope of safety interception filtering, which may also result in some normal harmless requests being mistakenly blocked. (Source: BlockBeats)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned