GPT-5's 'Research Goblin': How Interleaved Reasoning is Redefining AI-Powered Search

For years, the mantra "don't use chatbots as search engines" held true—until GPT-5 arrived. Developer Simon Willison's extensive testing reveals how OpenAI's GPT-5 Thinking model (affectionately named "Research Goblin") leverages interleaved reasoning and deep web searches to tackle complex, multi-layered queries with startling competence. This represents a paradigm shift in AI-assisted research, moving beyond simple retrieval to analytical investigation.

Beyond Keyword Matching: The Anatomy of a Research Goblin

Willison's experiments showcase GPT-5 Thinking's ability to:
1. Execute multi-step investigative workflows: When asked about Exeter's cliff-carved restaurant vaults, it:
- Identified 1820s construction timelines
- Cross-referenced Historic England registries
- Attempted geospatial mapping (with mixed results)
- Drafted archival request emails after failing to find specific diagrams
2. Interpret and synthesize documents: For Starbucks' UK cake pop availability, it:
- Consulted official PDF nutrition/allergen guides
- Differentiated between corporate and licensed locations
- Cited regional launch timelines
3. Handle ambiguous, opinion-based queries: Tasked with comparing Lidl and Aldi's "fanciness," it:
- Conducted market analysis
- Generated a subjective ranking of UK supermarkets
- Adjusted depth based on follow-up prompts

Technical Underpinnings: Why This Changes the Game for Developers

Unlike traditional RAG (Retrieval-Augmented Generation), GPT-5 Thinking uses tool calling to dynamically chain searches, analysis, and follow-ups within a single reasoning loop. Key technical differentiators:

# Simplified pseudocode of GPT-5's interleaved process
query = "History of Exeter Quay vaults"

for step in reasoning_chain:
   if needs_search(step):
      results = execute_search_tool(query=refined_subquery)
      analysis = analyze(results)
   if requires_confirmation(analysis):
      new_query = generate_follow_up()
   else:
      compile_final_answer()

Why this matters: "The integration of search tools directly into the reasoning loop allows GPT-5 to act like a relentless research assistant," observes Willison. "It’s not just fetching answers—it’s designing and executing an investigation."

Implications and Caveats

Mobile research dominance: The efficiency enables complex queries on smartphones, reducing reliance on multi-tab desktop sessions.
Verification remains crucial: While outputs include citations, users must still evaluate sources (e.g., the flawed map overlay).
Developer opportunity: This showcases the power of structured tool calling over naive RAG implementations—a blueprint for building more sophisticated AI agents.

As AI transitions from a search alternative to a research collaborator, the "Research Goblin" moniker captures its industrious yet imperfect nature. For developers, it signals a future where offloading investigative grunt work to AI could become as routine as Googling—but with exponentially deeper returns.

Source: Simon Willison's "GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search"

#GPT5_Thinking #AI_ResearchTools #InterleavedReasoning

GPT-5's 'Research Goblin': How Interleaved Reasoning is Redefining AI-Powered Search

Share this article

Beyond Keyword Matching: The Anatomy of a Research Goblin

Technical Underpinnings: Why This Changes the Game for Developers

Implications and Caveats

Share this article