The discourse surrounding Large Language Models (LLMs) like ChatGPT is often mired in repetitive, philosophically shallow arguments denying their capacity for thought or understanding. A recent online course by University of Washington professors Carl T. Bergstrom and Jevin D. West, despite valuable warnings about LLM limitations, exemplifies several recurring fallacies that plague the broader conversation. This pattern reveals a critical need for more rigorous definitions and testable criteria when evaluating artificial intelligence.

The Persistent Tactics of Denial

Critics frequently employ predictable strategies:

  1. The Oversimplification Trap: Reducing LLMs to mere "next-word predictors" ignores their demonstrable capabilities in step-by-step reasoning, source citation, and explanatory narration (as seen in models like o3). This echoes the centuries-old "Leibniz Mill" fallacy – dismissing complex computational processes as inherently non-cognitive simply because they are mechanistic.

    Bergstrom & West: "Given a string of words, you guessed the next one in the sequence. This is basically all that ChatGPT and other LLMs are doing."
    This characterization ignores emergent capabilities like chain-of-thought reasoning and undermines the fundamental principle of computation: complex systems can arise from simpler components.

  2. Vagueness as Argument: Assertions like "LLMs don't really think," "lack understanding," or "have no fundamental sense of truth" rely on undefined, subjective terms. Without concrete tests for "thinking," "understanding," or "sense of truth," these claims are often "not even wrong" (in Wolfgang Pauli's phrase). Critics rarely propose how to measure these qualities in any system, human or artificial.

    Bergstrom & West: "It's not intelligent. It doesn't understand anything. It doesn't think."
    Bergstrom & West: "They don’t even have a fundamental sense of truth and falsehood."
    What definitive test could prove or disprove these claims? The lack of proposed metrics renders them philosophically hollow.

  3. Moving Goalposts & Exceptionalism: When LLMs achieve tasks once thought to require "true intelligence" (e.g., playing Go, high-level translation, passing professional exams), critics often shift the definition of intelligence itself rather than acknowledge machine capability. Furthermore, LLMs are frequently compared only to the most competent humans, dismissing achievements matching or exceeding average human performance as non-evidential. Criticisms about LLM reliability or fabrication also conveniently ignore identical flaws in human reasoning, encyclopedias, and other trusted sources.

    Bergstrom & West: "But don’t let the impressive capabilities of LLMs lure you into thinking that they understand human experience or are capable of logical reasoning."
    This dismisses documented logical reasoning capabilities in advanced LLMs and conflates "human experience" with a necessary condition for intelligence.

  4. The Ground Truth Mirage & Embodiment Fallacy: Claims that LLMs lack "ground truth" or an "underlying model of the world" misunderstand how these models are trained on vast corpora of human knowledge (akin to how an alien might understand Earth). Demanding "ground truth" also ignores human history functioning effectively with fundamentally incorrect worldviews (e.g., a flat Earth). Similarly, insisting LLMs need "embodiment" or human-like reasoning processes commits the "airplanes don't flap wings" fallacy – intelligence need not replicate biological mechanisms.

    Bergstrom & West: "These systems have no ground truth, no underlying model of the world, and no rules of logic."
    LLMs demonstrably build internal world models from training data. The absence of explicit symbolic logic rules doesn't preclude logical outputs emerging from statistical patterns.

  5. The Trust Double Standard: Advising blanket distrust of LLM outputs because they are imperfect ignores the universal need for verification applied to all information sources, human or machine. The implied syllogism – "imperfect, therefore untrustworthy/useless" – would condemn academia, journalism, and human expertise itself.

    Bergstrom & West (Carl): "Should I trust them? ... No."
    This absolutist stance ignores the utility of LLMs for ideation and research support, mirroring how we critically utilize fallible human sources.

Beyond Black and White: Towards Measurable Intelligence

The core issue isn't just flawed criticism; it's a persistent failure to define intelligence in measurable, multi-faceted terms. Intelligence isn't a binary state but exists on a continuum with diverse manifestations. Instead of declaring LLMs "not intelligent," the field needs:

  • Definitive Tests: Concrete, quantifiable benchmarks for intelligence, understanding, and reasoning applicable to any cognitive system.
  • Multi-Dimensional Assessment: Recognizing strengths and weaknesses across different domains (e.g., factual recall vs. creative synthesis vs. emotional inference).
  • Acknowledgment of Emergence: Accepting that complex behaviors like reasoning and apparent understanding can emerge from systems built on statistical prediction, as the theory of computation allows.

A Sixty-Year Pattern of Denial

History shows a consistent pattern: each time AI achieves a milestone (chess, Go, image generation, high-quality text), critics dismiss it by redefining intelligence or claiming the task didn't require "true" cognition after all. From reactions to Deep Blue to current LLM debates, the goalposts perpetually shift. This pattern, evident in the Bergstrom/West analysis, hinders a clear-eyed assessment of what these systems can and cannot do, and what their rapidly evolving capabilities mean for the future. The burden is now on critics to move beyond vague denials and engage with the measurable reality of machine cognition.

Source: Analysis inspired by critiques of Carl T. Bergstrom & Jevin D. West's online course (July 12, 2025 version), exploring recurring argumentative flaws in discussions of LLM capabilities, as originally discussed on recursed.blogspot.com.