The Unavoidable Hallucinations of Large Language Models
Share this article
By Davis Yoshida (Former Machine Learning Engineer, Continua AI)
Geoffrey Hinton once noted that humans often "remember" plausible but incorrect memories with misplaced confidence. As it turns out, Large Language Models (LLMs) do the exact same thing. At Continua AI, where we build social AI systems, understanding—and combating—LLM hallucination isn’t academic; it’s core to our mission.
Why do even the most advanced models confidently invent facts, misinterpret user history, or ignore explicit instructions? The answer lies in their lifecycle and how they process context. Let’s walk through three critical phases:
1. Early Education: Hallucination as Default
During pre-training, an LLM observes vast internet text snippets, constantly forced to predict the next token with incomplete information. Consider this scenario encountered in training:
smith
Birthday: 01/01/1990
Education: PhD @ UC Davis
Place of residence: Seattle, WA
Occupation:
The model might reasonably infer "I can't know that," but the correct completion is "Machine Learning Engineer." The lesson? Plausible guessing is rewarded; skepticism is punished. This foundational experience hardwires hallucination as a default behavior.
"Hallucination is nearly inevitable in a 'raw' LLM. Its most common lived experience is predicting the unknowable," notes Yoshida.
2. The Context Rug Pull: When Systems Betray the Model
Post-deployment, LLMs rely on systems like Retrieval Augmented Generation (RAG) for real-time data. But flawed implementation breeds confusion:
User: Based on movies we've talked about, what should I watch?
<retrieved_content>
User loves historical fiction...
</retrieved_content>
Assistant: Have you seen Braveheart?
...
[Later, without retrieved content]
User: What should I get for dinner?
Assistant: Since you like Italian, try Maggianos!
User: What? I hate Italian food!
Here’s the failure: If context containing retrieved data isn't preserved in chat history, the LLM’s subsequent responses appear baseless. The model isn’t lying—it’s operating on information that vanished.
3. The Sliding Window Trap: Hallucination by Amnesia
To manage costs and context limits, many systems use a sliding window of recent messages. This induces catastrophic hallucinations when prior grounding details scroll out of view:
[Tool result: Restaurant info]
...
User: When do they close?
Assistant: Until 9 PM, but call for holiday hours.
User: What’s their phone number?
Assistant: (206) 555-1234
User: What’s their most popular dish?
When tested:
- Claude 3.5 Sonnet admitted ignorance (correct).
- Claude 3.7 Sonnet hedged (non-committal).
- GPT-4o & GPT-4o-mini hallucinated dishes like "spicy garlic butter shrimp pasta" (confidently wrong).
Why Can’t We Fix This in Training?
Supervised Fine-Tuning (SFT) aims to curb hallucinations by showing ideal responses. But teaching a model when to say "I don't know" risks suppressing its pretrained knowledge. It’s a fragile balancing act—evidenced by OpenAI’s own struggles with hallucination in production models.
The Path Forward: Append-Only Context
At Continua, we’ve adopted a critical principle: Never discard information used to generate a response. Chat history must be append-only—no deletions, no edits. This prevents the model from "forgetting" the basis of its answers. While sliding windows remain a cost-effective necessity, solutions like selective context retention or prompt caching offsets are under active development.
LLMs aren’t "stupid." They’re probabilistic engines navigating imperfect systems. Understanding their constraints—especially how context management dictates reliability—is the first step toward building AI that users can truly trust.
Source: Seeing Like an LLM, Continua AI Blog