GPT-5's 'Research Goblin': How Interleaved Reasoning is Redefining AI-Powered Search
Share this article
For years, the mantra "don't use chatbots as search engines" held true—until GPT-5 arrived. Developer Simon Willison's extensive testing reveals how OpenAI's GPT-5 Thinking model (affectionately named "Research Goblin") leverages interleaved reasoning and deep web searches to tackle complex, multi-layered queries with startling competence. This represents a paradigm shift in AI-assisted research, moving beyond simple retrieval to analytical investigation.
Beyond Keyword Matching: The Anatomy of a Research Goblin
Willison's experiments showcase GPT-5 Thinking's ability to:
1. Execute multi-step investigative workflows: When asked about Exeter's cliff-carved restaurant vaults, it:
- Identified 1820s construction timelines
- Cross-referenced Historic England registries
- Attempted geospatial mapping (with mixed results)
- Drafted archival request emails after failing to find specific diagrams
2. Interpret and synthesize documents: For Starbucks' UK cake pop availability, it:
- Consulted official PDF nutrition/allergen guides
- Differentiated between corporate and licensed locations
- Cited regional launch timelines
3. Handle ambiguous, opinion-based queries: Tasked with comparing Lidl and Aldi's "fanciness," it:
- Conducted market analysis
- Generated a subjective ranking of UK supermarkets
- Adjusted depth based on follow-up prompts
Technical Underpinnings: Why This Changes the Game for Developers
Unlike traditional RAG (Retrieval-Augmented Generation), GPT-5 Thinking uses tool calling to dynamically chain searches, analysis, and follow-ups within a single reasoning loop. Key technical differentiators:
# Simplified pseudocode of GPT-5's interleaved process
query = "History of Exeter Quay vaults"
for step in reasoning_chain:
if needs_search(step):
results = execute_search_tool(query=refined_subquery)
analysis = analyze(results)
if requires_confirmation(analysis):
new_query = generate_follow_up()
else:
compile_final_answer()
Why this matters: "The integration of search tools directly into the reasoning loop allows GPT-5 to act like a relentless research assistant," observes Willison. "It’s not just fetching answers—it’s designing and executing an investigation."
Implications and Caveats
- Mobile research dominance: The efficiency enables complex queries on smartphones, reducing reliance on multi-tab desktop sessions.
- Verification remains crucial: While outputs include citations, users must still evaluate sources (e.g., the flawed map overlay).
- Developer opportunity: This showcases the power of structured tool calling over naive RAG implementations—a blueprint for building more sophisticated AI agents.
As AI transitions from a search alternative to a research collaborator, the "Research Goblin" moniker captures its industrious yet imperfect nature. For developers, it signals a future where offloading investigative grunt work to AI could become as routine as Googling—but with exponentially deeper returns.
Source: Simon Willison's "GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search"