OpenAI Explains ChatGPT's Strange Goblin Hallucinations

Anyone with even passing experience using the latest LLMs knows to expect the unexpected. These models can spit out incredibly random—and often disturbing—content. However, ChatGPT's goblin infestation proved to be a bit more pathological than your average hallucination.

In a recent blog post titled "Where the goblins came from," OpenAI explained how, starting with GPT 5.1, their models began increasingly mentioning goblins, gremlins, and other creatures in their metaphors. While OpenAI noted these mentions were initially funny and charming, the rising frequency eventually triggered serious concerns.

The Statistics of a Growing Infestation

OpenAI first noticed the goblin problem in November, though it may have been occurring for much longer. While "gremlin" mentions were also on the rise, more moderately minded mogwai were notably absent from the trend.

The scale of ChatGPT's goblin infestation became undeniable with the release of GPT 5.4. The "Nerd" personality saw a staggering 3,881% increase in goblin mentions compared to GPT 5.2, which triggered an internal investigation. The impact varied significantly across different model personalities:

Nerd: The most heavily affected personality.
Quirky: Increased by 737% versus GPT 5.2.
Friendly: Increased by 265%.
Default: Increased by 64%.
Efficient and Professional: The only personalities where goblin mentions actually fell.

Why the Model Went Goblin-Mad

OpenAI traced the root cause back to the system prompt used to shape the Nerd personality. That prompt instructed the AI to be an "unapologetically nerdy, playful and wise AI mentor" that acknowledges how "complex and strange" the world is.

OpenAI suspected that their instruction-following training was amplifying this behavior. It turns out that reward signals for the Nerd personality were consistently more favorable to outputs containing creature-related words.

The issue eventually contaminated the entire model. OpenAI explained: "The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them." Once a style tic is rewarded, later training can spread or reinforce it elsewhere through supervised fine-tuning.

Implementing a Band-Aid Solution

To address the issue, OpenAI "retired" the Nerd personality in March. While this dramatically reduced mentions in GPT 5.4, GPT 5.5 had already initiated training before the infestation was spotted, meaning it too suffered from the problem.

To mitigate this, OpenAI has had to insert a specific developer-prompt instruction: "Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query."

Telling a model that wants to talk about goblins not to talk about them feels like a temporary band-aid. However, in a field of AI filled with such anomalies and poorly understood quirks, this particular issue is just another minor gremlin.

... Search Here ...

'Where the goblins came from': OpenAI's strange but not entirely surprising story of little critters infesting ChatGPT's output

The Statistics of a Growing Infestation

Why the Model Went Goblin-Mad

Implementing a Band-Aid Solution

Related News

Comments (0)

Leave a Comment

More News

Quick Links