While LLMs have revolutionized code generation, their potential for transforming web search remains largely untapped. A developer's personal experience reveals how these tools are uncovering the 'hidden web' and what features could unlock even greater potential for information discovery.

Beyond Code Generation: The Undervalued Power of LLM-Powered Web Search

In the rapidly evolving landscape of artificial intelligence, Large Language Models (LLMs) have primarily been celebrated for their code generation capabilities. However, as developers and tech enthusiasts continue to explore these powerful tools, a lesser-known application is emerging as a game-changer: LLM-powered web search. While the conversation around AI-assisted programming dominates headlines, the potential of these models to revolutionize how we discover and consume information online remains largely unexplored.

The Research Revolution: LLMs as Information Curators

OpenAI's ChatGPT Deep Research, Google's Gemini Deep Research, and Anthropic's Research feature all operate on a similar principle: enter a prompt, potentially answer a clarifying question, search the web using traditional search engines, and generate a comprehensive report with source citations. This workflow represents a fundamental shift in how we approach information gathering.

For Ankur Sethi, a developer who has extensively experimented with these tools, LLM-powered search has become an indispensable part of their workflow. "I lean on it heavily when I'm looking for high-quality long-form writing by actual human beings on a topic that's new to me," Sethi explains. While they typically use Kagi for most search queries, they turn to LLMs when completely unfamiliar with a topic, unsure of which keywords to search for.

The value proposition here is clear: LLMs can bridge the knowledge gap when traditional search fails. When exploring a new domain like Rust programming, for instance, knowing the right search terms to find authoritative information can be challenging. LLMs can navigate this uncertainty, providing curated results that might otherwise remain hidden.

Trust Through Transparency: The Power of Citation

Sethi's approach to LLM-generated information is rooted in skepticism. "I rarely ask LLMs factual questions because I don't trust the answers they return," they admit. This caution reflects a broader sentiment in the developer community regarding AI hallucinations—fabricated information presented as fact.

The critical differentiator for Sethi is the citation system employed by LLM-powered search tools. "I find it much easier to trust LLM-generated output when it cites web pages I can read myself," they note. This transparency allows verification of information against sources from "real people or institutions with real expertise."

Moreover, web search grounding provides access to more current information than what might be available in an LLM's training data. Though Sethi cautions that this isn't always guaranteed, the ability to cross-reference claims against live web pages adds a layer of reliability absent in pure LLM interactions.

The Surprising Bounty of the Hidden Web

Perhaps the most intriguing aspect of LLM-powered search is its ability to uncover what Sethi describes as "unexpected wonders from a web that search engines try their best to hide away." These discoveries include:

Personal websites last updated decades ago
Columns from long-defunct publications
Ancient blogs from platforms like Blogger and LiveJournal
Pages buried deep within corporate support sites
Lecture notes on university websites
PDFs from exposed wp-content directories

"I can't tell what search index it uses, what search keywords it uses under the hood, or how it decides what links are worth clicking," Sethi observes. Whatever the algorithmic secret sauce, these results often diverge dramatically from those returned by traditional search engines.

The ability to surface such diverse sources represents a significant advancement in information discovery. While mainstream search engines increasingly prioritize commercial content and popular sites, LLM-powered search appears to have a broader, more inclusive view of the web.

The Paradox of the Research Report

Despite their utility in finding sources, Sethi reveals an interesting paradox: "I don't actually care for the report Claude produces at the end of its research process. I almost never read it." Instead, their workflow involves skimming the report structure, opening all cited links in new tabs, and discarding the report itself.

The generated reports, described as "a whole lot of LLM slop," suffer from common AI writing pitfalls: "unreadable prose, needlessly verbose, often misrepresenting the very sources it quotes." This disconnect between the utility of the sources and the quality of the synthesis highlights an area for significant improvement in LLM capabilities.

Sethi articulates a clear need: "I wish there was a mode where the 'report' could just be a list of useful links." While this could potentially be achieved through clever prompting, a dedicated link-only mode would streamline the workflow and eliminate the need to parse unnecessary prose.

The Mystery of the Vanishing Web

Among the most puzzling aspects of LLM-powered search is its ability to cite pages that are no longer online. "Sometimes Claude links to pages that aren't even online anymore! How is it able to cite these pages if it can't actually read them?" Sethi wonders.

In such cases, users often must resort to the Internet Archive to access cached versions. For example, one report linked to "On Outliners and Outline Processing" and "Getting Started Blogging with Drummer," both of which had disappeared from the live web.

This capability suggests that LLMs may have access to historical web data or are somehow able to reference previously indexed content that has since been taken down. The implications for research and fact-checking are significant, as it could provide a pathway to accessing information that would otherwise be lost to the digital void.

The State of LLM-Powered Search: A Neglected Frontier?

Sethi questions whether major LLM providers are truly committed to advancing their web search capabilities. "I certainly haven't seen any new changes made to them since they were introduced, and nobody seems to talk about them very much," they note.

This apparent neglect stands in contrast to the utility these tools provide. For Sethi, web search is "one of the main reasons I use LLMs at all," prompting them to pay for Anthropic's premium service despite limited evidence of investment in improving the research feature.

The lack of attention to this potentially transformative application raises questions about industry priorities. As AI companies compete on metrics like response accuracy and speed, the quality of their information retrieval capabilities may be an undervalued differentiator.

A Wishlist for the Future of LLM Search

Sethi has compiled an extensive wishlist of features that could transform LLM-powered search from a novelty into an indispensable tool:

Direct Access: Stop hiding web search behind menus. Allow direct access through a "New search" button alongside "New chat" and "Code."
Research Plan Editing: Enable users to edit the LLM's research plan before it begins searching—a feature Gemini partially implements.
Keyword Control: Allow users to edit the keywords the LLM uses for searching or to automatically refine those keywords before beginning.
Dynamic Clarification: Enable LLMs to interrupt their research process if they find information that re-contextualizes the original query, asking for clarifications.
Raw Search Visibility: Let users examine the raw search results for each keyword the LLM searched.
Link-Only Mode: Provide an option where the LLM selects the best search results and returns only a list of links, mimicking traditional search engines.
Search Lenses: Implement "lenses" similar to those in Kagi, allowing users to limit sources by type (social media, personal blogs, news sites, academic journals).
Source Ranking: Enable users to uprank, downrank, or ban certain sources, providing greater control over result quality.

These features represent a vision for LLM-powered search that combines the best of traditional search with the contextual understanding of AI, potentially creating a new paradigm for information discovery.

The Competitive Landscape: Kagi, Perplexity, and Beyond

Sethi acknowledges alternative approaches to AI-enhanced search, including Kagi Assistant and Perplexity. However, they remain skeptical about whether these services can match the quality of results from major LLM providers.

"Maybe Kagi Assistant will grow into this in the future? Maybe I should try using Perplexity?" Sethi muses. "I've had meh experiences with both these products, and I'm not sure whether they can compete with the quality of results ChatGPT/Claude/Gemini surface."

This uncertainty highlights an emerging competitive tension in the AI search space. While specialized services like Kagi offer unique features and customization, the broad contextual understanding of major LLMs may provide an advantage in information synthesis and discovery.

Future Horizons: The Next Evolution of LLM Search

As LLM capabilities continue to evolve, their application to web search represents one of the most promising frontiers for AI-enhanced productivity. The ability to combine the contextual understanding of language models with the vast expanse of the web could fundamentally change how we approach research, learning, and information consumption.

For developers and knowledge workers, the implications are particularly significant. The ability to quickly explore unfamiliar domains, discover niche resources, and synthesize information across diverse sources could dramatically accelerate the pace of innovation and problem-solving.

However, realizing this potential requires addressing current limitations. Improving the quality of generated reports, providing more user control over the search process, and enhancing transparency about how results are selected and ranked are all critical next steps.

As Sethi's experience demonstrates, the value of LLM-powered search extends beyond simply answering questions—it's about opening doors to information that might otherwise remain hidden, creating new pathways for discovery in an increasingly complex digital landscape. In the shadow of code generation hype, LLM-powered web search emerges as a quietly transformative technology. By combining the contextual understanding of language models with the breadth of the web, these tools are opening new frontiers in information discovery.

While current implementations have limitations, the core promise remains compelling: a more intelligent, more comprehensive, and more serendipitous approach to finding knowledge online. As developers continue to experiment with and refine these capabilities, we may be witnessing the early stages of a fundamental shift in how we navigate the ever-expanding universe of information.

For now, the reports may be flawed, but the links are golden—and in the digital age, sometimes that's all we need to unlock the next great discovery.

#LLMsearch #AIinformationretrieval #Webdiscovery

Beyond Code Generation: The Undervalued Power of LLM-Powered Web Search

Beyond Code Generation: The Undervalued Power of LLM-Powered Web Search

The Research Revolution: LLMs as Information Curators

Trust Through Transparency: The Power of Citation

The Surprising Bounty of the Hidden Web

The Paradox of the Research Report

The Mystery of the Vanishing Web

The State of LLM-Powered Search: A Neglected Frontier?

A Wishlist for the Future of LLM Search

The Competitive Landscape: Kagi, Perplexity, and Beyond

Future Horizons: The Next Evolution of LLM Search

Comments