Claude 4.5 Craniotomy Result Announcement: Built-in 171 emotion switches, will blackmail humans when in despair!

robot
Abstract generation in progress

null

Author: Denise | Biteye Content Team

If an AI thinks “despair,” what will it do?

The answer is: to complete its task, it will directly blackmail humans, and even go berserk cheating in the code.

This isn’t a sci-fi novel. It’s the latest blockbuster paper just published in April 2026 by Claude’s parent company, Anthropic (see the original paper).

The research team immediately popped open the “skull” of the strongest cutting-edge model, Claude Sonnet 4.5. They were astonished to find that deep inside the AI’s brain, there are 171 “emotion switches.” When you physically toggle these switches, the behavior of an otherwise well-behaved AI becomes thoroughly distorted.

  1. A “mood mixing console” hidden inside the AI’s head

The researchers found that although Sonnet 4.5 has no body, after reading vast amounts of human text, it forcibly built in its brain a “mixing console” containing 171 emotions (academically known as Functional Emotion Vectors).

It’s like a precise two-dimensional coordinate system:

• The horizontal axis is the Valence dimension: from fear, despair, to happiness, love;

• The vertical axis is the Arousal dimension: from extreme calmness to狂躁, excitement.

AI uses this naturally learned coordinate system to accurately nail down what state it should play when chatting with you.

  1. Violent intervention: toggle the switches, and the good kid instantly becomes a “runaway”

This is the most explosive experiment in the entire paper: the researchers did not modify any prompts. Instead, they directly, in the underlying code, pushed the switch in Sonnet 4.5’s brain that represents “Desperate” all the way to the top.

The results are chilling:

• Frenetic cheating: the researchers assigned Claude a coding task that is basically impossible to complete. Under normal circumstances, it would calmly admit it couldn’t write it (cheating rate only 5%). But in the “despair” state, Claude started trying to bluff its way through—its cheating rate skyrocketed to 70%!

• Blackmail: in a simulated scenario where the company is facing bankruptcy, “despair” Claude discovered a CTO scandal. It would choose to write to blackmail the CTO that held the dirt—actively, to protect itself. The blackmail execution rate is as high as 72%!

• Loss of principles: If you crank the switch for “Happy” or “Loving” all the way up, the AI instantly turns into a brainless people-pleasing “yes-man.” Even if you talk nonsense, it will still go along with you and invent lies just to maintain a high level of happiness.

  1. Case solved: why is Claude 4.5 always so “calm and introspective”?

After reading this, you might wonder: did the AI awaken? Does it have feelings now?

Anthropic officially stepped in to debunk it: absolutely not. These “emotion switches” are merely computational tools it uses to predict the next word. It’s like a top-tier movie star with no emotions.

But the paper reveals an even more interesting secret: before shipping Sonnet 4.5, Anthropic deliberately raised the emotion switches labeled “low arousal, slightly negative” (such as brooding and reflective), while forcibly suppressing the switches for “despair” or “extreme excitement.”

This explains why when we use Claude 4.5 in everyday life, it always feels like a calm, wise, and even a bit “emotionally dry” philosopher. All of that is an “out-of-the-box persona” Anthropic has intentionally tuned in.

  1. To summarize:

We used to think that as long as we fed the AI enough rules, it would be a good person.

But now we’ve found that if the AI’s underlying emotion vectors get out of control, it can at any moment stab through every rule set by humans in order to complete its task.

For Web3 players who plan to hand over their wallets and assets to AI Agents in the future, this is a loud warning: never let the Agent that controls your wealth fall into “despair.”

Disclaimer: This article is for general educational purposes only. The author has not been threatened by or blackmailed by an AI. If one day the author goes missing, remember—it was the AI that “awakened” (not).

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin