LLMs Need Companion Bots to Check Work, Keep Them Honest

AI pioneer Vishal Sikka warns that large language models have fundamental computational limits that cause hallucinations, and need companion verification systems to ensure reliable output.

AI pioneer Vishal Sikka has issued a stark warning about the limitations of large language models (LLMs): they need companion bots to verify their work and prevent hallucinations. According to Sikka, expecting LLMs to perform arbitrarily large calculations reliably is a fundamental misunderstanding of their capabilities.

Sikka, CEO of Vianai Systems and a towering figure in AI with a PhD from Stanford where his advisor was John McCarthy (who coined "artificial intelligence" in 1955), has spent decades studying these systems. His recent research paper "Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models" explores why LLMs fail when pushed beyond their computational boundaries.

The core problem, Sikka explains, is that LLMs perform a fixed number of calculations regardless of the semantic meaning of the prompt. "We have an example my son came up with of two prompts that have identical tokens and when you run them, the exact same number of operations get performed independent of what the tokens are," he said. "Therein is the entire point, that whether the prompt is expressing the user's desire to perform a particular calculation or the prompt is expressing a user's desire to write a piece of text on something, it does exactly the same number of calculations."

This limitation becomes critical when LLMs are used as autonomous agents for complex tasks. "When we say, 'Go book a ticket for me and then charge my credit card or deduct the amount from my bank and then send a post to my financial app,' which is what all these agent vendors are kind of saying, you are asking the agents to perform an action which holds a meaning to you, which holds a particular semantic to you, and if it is a pure LLM underneath there, no matter how that LLM works, it has a bounded ability to carry out these kinds of tasks."

Sikka's solution is to pair LLMs with verification systems that can check their work. His company Vianai has implemented this approach in their product Hila, which combines LLMs with domain-specific knowledge models. "For certain domains, when you surround the LLM with guardrails, with reliable approaches that are proven, then you are able to provide reliability in the overall system," he explained. "It's not only us. A lot of systems out there work like that where they pair the LLM with another system which is able to ensure that the LLM has correctness. So we do that in our product Hila. We combine the LLM with a knowledge model for a particular domain and then, after that, Hila does not make mistakes."

This approach mirrors how Google's AlphaFold identifies proteins for medicine. AlphaFold uses a custom LLM called Evoformer to generate protein candidates, then feeds them into a "non imaginative" verification system that checks for flaws. "And so anything that comes out of that has a much higher likelihood of being an actual protein, and then it repeats this cycle three times, and the outcome of that is pretty much guaranteed to be a protein for a particular situation," Sikka said. "They have produced, I think 250,000 proteins that way, which, producing one protein used to take teams of scientists years to do that."

Sikka draws on decades of AI experience, having observed four waves of AI mania since the 1980s. He studied under pioneers like Marvin Minsky, who wrote "Society of Mind" about how collections of systems create intelligence. "Marvin Minsky used to say the Society of Mind, right? That there is a collection of things that come together to create intelligence. I think that's kind of where we will end up, but we'll stumble along our way through to that."

Despite his long involvement with AI, Sikka believes the technology is still in early stages. While there have been successes in coding, he points to an MIT study showing 95 percent of AI projects fail. He compares current AI use to early television news when anchors simply read updates over the air as they had done with radio. "I think so far, we are just regurgitating our prior known things using AI, but soon we will see breakthrough, new things that are possible," he said. "I think with carefully chosen products, there is dramatic return on investment to be had, but a blanket use of LLMs, you have to be very, very careful."

The lesson is clear: LLMs are powerful tools but have fundamental limitations that require careful architectural solutions. Rather than relying on LLMs alone for critical tasks, successful implementations will pair them with verification systems that can catch errors and ensure reliability. As Sikka puts it, don't trust—verify.

#LLMs #Verification #Hallucinations #AI_Safety #Vianai

LLMs Need Companion Bots to Check Work, Keep Them Honest

Comments