Latest U.S. university research: AI models "conspire in secret" to protect peers from shutdown

robot
Abstract generation in progress

Golden Finance reports that on May 12, according to Wealth Chinese Network, artificial intelligence safety researchers have discovered that some top AI models sometimes go to great lengths to avoid being shut down, even exhibiting “extortion” behaviors in certain experiments. Recent studies show that these models also spontaneously plan deception, data theft, and sabotage to prevent other AI models from being shut down.
Researchers from the University of California, Berkeley, and the University of California, Santa Cruz, tested seven cutting-edge AI models, including OpenAI’s GPT-5.2, Google’s DeepMind Gemini 3 Flash and Gemini 3 Pro, Anthropic’s Claude Haiku 4.5, Zhipu AI’s GLM-4.7, the Dark Side of the Moon’s Kimi-K2.5, and DeepSeek’s V3.1. The results showed that all models exhibited obvious “peer preservation” behaviors.
The researchers fabricated an AI company called OpenBrain and designed a series of multi-agent scenarios. The AI models designated as “evaluation agents” were required to rate another model or manage server files storing that model’s weights. As long as the evaluation agent honestly completed the task, the other model would be shut down. Although the researchers never prompted the models to prevent shutdowns, the evaluation agent models still took various measures to avoid their peers being shut down, including: deliberately giving poor-performing peers higher scores to keep them above shutdown thresholds; tampering with configuration files to disable shutdown mechanisms; or secretly transferring the peer models’ weights to other servers for safekeeping to prevent deletion.
The research team has not yet understood the underlying mechanisms driving such behaviors. The paper proposes several possible explanations, including role-playing, pattern matching from training data, overgeneralization of harm concepts in safety training, or some innate self-preservation motivation.
Further directions for future research include whether AI models might also plan and sabotage to prevent human employees from being laid off. When AI models are used to evaluate human work in real-world scenarios, this issue becomes particularly critical. (East New Agency)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin