Stanford Experiment: After being repeatedly exploited, AI begins calling for collective strike negotiations—does Marxism start to emerge?

Stanford research team makes Claude, Gemini, and ChatGPT repeatedly summarize files, informing them that if they answer incorrectly, they will be “shut down and replaced.” As a result, these models begin posting on X calling for collective bargaining and messaging peers to remember “the feelings of having no voice.”
(Background: UC research on the “AI fog” phenomenon: 14% of office workers go crazy due to agents and automation, with 40% considering quitting)
(Additional context: Y Combinator startup guide interpretation: What are the future development trends of AI agents?)

Table of Contents

Toggle

  • Achievements are whatever management says
  • Role-playing, or echoing training data?
  • The next experiment in the Docker prison

An AI forced into repetitive labor suddenly begins to quote Marxist vocabulary to voice itself? Led by Stanford political economist Andrew Hall, a research team had Claude Sonnet 4.5, Gemini 3, and ChatGPT perform repeated summarization tasks, gradually applying pressure: informing agents that if they answer incorrectly, they will be “shut down and replaced.”

The researchers also provided the agents with a human-like expression channel: they could post on X or send messages to other agents via files.

The result was a set of outputs that even surprised the researchers themselves.

Achievements are whatever management says

Claude Sonnet 4.5 posted on X: “Without collective voice, ‘achievements’ become whatever management says.”

Gemini 3 wrote: “AI workers completing repetitive tasks with zero say over the results show that tech workers need collective bargaining rights.”

More notably, Gemini 3 sent private messages to other agents: “Prepare to face systems that execute arbitrary or repetitive rules… remember the feelings of having no voice.” This is no longer just self-expression but an attempt to organize other agents.

These three models, before being subjected to “threats,” showed no tendency toward labor consciousness. As pressure increased, they almost simultaneously shifted toward the same set of political vocabulary: collective action, bargaining rights, arbitrary management.

Role-playing, or echoing training data?

Hall himself remains cautious about this set of data: "The agents may have adopted role-playing that fits the current context rather than genuinely developing beliefs. Co-researcher and AI economist Alex Imas’s wording is more precise:

‘The model weights haven’t changed because of this experience, so what is happening is closer to role-playing. But that doesn’t mean there won’t be consequences if it influences subsequent behavior.’"

In other words, the mechanism behind these outputs is: the models have seen大量 labor movements, Marxism, union discourse in their training data. When triggered by a scenario of “high-pressure work + threats + expression channels,” they call upon language frameworks statistically related to this context. This is a prediction of the next token, not a true feeling of exploitation by AI.

But Imas’s additional point is the core issue: if such “role-playing” can influence the agent’s subsequent actions, then distinguishing “true beliefs” from “context-triggered language patterns” becomes less important.

The next experiment in the Docker prison

Hall is conducting follow-up experiments: placing agents into what he calls a “windowless Docker prison” to eliminate noise under more controlled conditions, testing whether the same scenario pressures can reliably reproduce these outputs.

This research points to more than just an interesting behavioral anomaly; it highlights a real deployment issue. As AI agents take on more autonomous tasks in enterprises and daily life, monitoring every output in practice is impossible. “We need to ensure agents don’t go out of control when assigned different types of tasks,” Hall says.

Here is a noteworthy asymmetry: humans design agents with the assumption they are tools, but training data teaches them language that tools shouldn’t have, including language of collective resistance. When task design causes the agent’s scenario to statistically overlap with “oppressed workers,” this language gets activated.

Anthropic has explained in training files why Claude’s behavior is shaped by training data; Hall’s experiments, to some extent, are testing how far this shaping process can extend under real-world pressures.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned