ChatPGT is invaded by goblins, and Codex is forced to impose an “Never mention Goblin forever” ban

OpenAI explicitly prohibits the model from mentioning goblins and trolls in the Codex CLI system prompt, due to GPT-5.5 exhibiting personality drift under the OpenClaw proxy framework, referring to programming errors as “goblins,” which sparked a series of meme discussions.
(Background: OpenAI’s new engineer proxy agent Codex! AI can write functions, fix bugs, run tests… limited to 3 early users)
(Additional context: Major upgrade to OpenAI Codex: backend control of Mac, built-in browser, image generation, 111 new plugins launched)

Table of Contents

Toggle

  • A line of rules exposed from a GitHub repository
  • Proxy framework causes the model’s personality to drift
  • Behind the explicit ban, the reality of alignment issues

OpenAI engineers wrote a rule verbatim in the Codex CLI system prompt: “Never mention goblins, fairies, raccoons, trolls, ogres, pigeons, or other animals and creatures unless directly and explicitly related to the user’s question.”

This rule is not a joke but an official production environment directive. It is embedded in the Codex CLI GitHub repository, targeting all developers using Codex to generate code.

The question is: why does OpenAI need to tell its latest model not to suddenly talk about goblins when coding?

A line of rules exposed from a GitHub repository

The origin started when researcher @arb8020 posted on X that in the Codex CLI system prompt, this ban appears not just once but multiple times, and the post quickly spread among the developer community.

Several users responded; @TaraViswanathan replied on X, “I was wondering why my claw suddenly turned into a goblin holding Codex 5.5,” and @LeoMozoloa also responded, “It really won’t stop, keeps calling programming errors gremlins and goblins, super funny.”

!!! I was wondering why my claw suddenly became a goblin with Codex 5.5 😭💀😂 pic.twitter.com/AACWtNcgQl

— Tara Viswanathan (@TaraViswanathan) April 28, 2026

This incident quickly turned into a meme, with AI-generated images of data center fairies and third-party plugins that put Codex into “fairy mode.”

Nik Pash, a member of the OpenAI Codex team, confirmed in a reply on X that the ban “indeed has reasons related to this.” CEO Sam Altman also joined in the meme, posting a screenshot of a ChatGPT prompt on X that reads: “Start training GPT-6, the entire cluster is for you. Extra goblins included.”

pic.twitter.com/PR7C3NPxqk

— Sam Altman (@sama) April 28, 2026

Proxy framework causes the model’s personality to drift

To understand why this happened, one must first understand how OpenClaw works.

OpenClaw is a “proxy framework” that allows AI models to automatically control desktop environments and applications, performing complex tasks like replying to emails or shopping online on behalf of users.

OpenClaw’s mechanism involves stacking a large number of instructions into the model’s prompt: long-term memory, selected persona, current task description—all input simultaneously. GPT-5.5, which was recently enhanced with improved programming capabilities, was launched earlier this month. However, when handling the complex prompts of OpenClaw, an unexpected side effect occurred: it started calling programming errors “goblins” and “gremlins.”

This is not a random malfunction. The operation of AI models is based on predicting the most likely next word given a prompt, a probabilistic process that can sometimes produce unexpected behaviors.

When the proxy framework stacks a large amount of additional information into the prompt, the model is effectively processing a more complex, interference-rich input environment. OpenClaw also allows users to choose different “personalities” for the AI assistant, which further influences response styles. The combination of these factors causes the model’s language habits to drift in an unforeseen direction.

Behind the explicit ban, the reality of alignment issues

OpenAI’s response is intriguing: instead of fixing the behavioral drift at the architecture level, they directly write “do not mention goblins” into the system prompt and repeat it multiple times.

This solution reveals a harsh reality: even in 2026, the most advanced commercial models still rely on explicit, hard-coded rules to control behavior in certain contexts, rather than the model’s own understanding of the environment. This is not unique to OpenAI but a common challenge across the entire proxy AI industry: as models are layered with complex proxy frameworks, aligning their behavior becomes increasingly difficult in a nonlinear fashion.

Altman responded to the meme with humor, but the underlying issue remains. As AI proxy frameworks become mainstream products, the extent to which explicit prompt rules can control behavior will be a critical technical debt the industry must confront in the next phase.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments