Despite claims from AI executives about imminent human-level artificial intelligence, technical analysis reveals fundamental limitations in transformer architectures and training methods. Current models lack evolutionary cognitive primitives like object permanence and spatial reasoning, struggle with architectural constraints, and show minimal progress on benchmarks testing abstract reasoning—indicating AGI remains distant despite hype.
CEOs at OpenAI and Anthropic have repeatedly claimed human-level artificial general intelligence (AGI) is imminent—or already here. Sam Altman asserts OpenAI knows how to build superintelligent systems, while Dario Amodei predicts AI surpassing Nobel laureates by 2026. Yet a 2025 Association for the Advancement of Artificial Intelligence (AAAI) survey of 475 researchers tells a different story: 76% deem scaling current approaches to achieve AGI "unlikely" or "very unlikely," citing persistent gaps in reasoning, generalization, and embodiment.
The Cognitive Primitives Gap
Evolutionary neuroscience reveals that human cognition rests on hardwired primitives shared across vertebrates: object permanence, causality, number sense, spatial navigation, and animate-inanimate distinction. These capabilities evolved over hundreds of millions of years through perception-action coupling—organisms interacting with a physical world—and form the foundation for language. When humans read "Mary held a ball," they infer unstated physical constraints: gravity, object boundaries, and persistence. Language models, however, attempt to reverse-engineer these primitives solely from statistical patterns in text, video, or audio data.
This explains well-documented failures. Transformers cannot reliably perform multi-digit arithmetic (lacking innate number sense) or infer logical symmetry (e.g., deducing "B is A" from "A is B"). Training on video, as in Google DeepMind's SIMA 2, improves physical prediction but falls short of genuine understanding. SIMA 2 achieves near-human gameplay performance by cloning human actions from videos, but its core intelligence derives from text pretraining (Gemini Flash-Lite). When tested in embodied environments, SIMA 2 showed no evidence that physical training improved language reasoning—the two capabilities coexisted without integration.
Similarly, DeepMind's Dreamer trains agents in simulated environments (e.g., Minecraft) using reinforcement learning. While it learns task-specific skills like "find diamond," its representations don't transfer to abstract reasoning. Stanford's ENACT benchmark—testing affordance recognition and long-horizon memory—confirms this: current models lag far behind humans, with performance degrading as tasks require extended interaction.
Architectural Limitations
Transformer architectures are fundamentally feed-forward and parallelizable, processing tokens sequentially without revisiting earlier computations. This enables scalable training via KV caching but imposes mathematical constraints. Theoretical work by Merrill and Sabharwal (2022) proves transformers with log-precision arithmetic fall within the complexity class TC⁰—incapable of solving basic problems like recognizing regular languages or graph connectivity in a single pass. This isn't fixable with more data; it's an architectural ceiling.
The human brain, by contrast, uses bidirectional, recurrent connections where activations reverberate until convergence. Feedback loops in the visual cortex, for instance, refine object recognition iteratively. While alternatives like neurosymbolic hybrids or recurrent networks could theoretically overcome this, they break transformer scalability. Training such architectures at modern LLM scale remains an unsolved engineering challenge.
Benchmarking the Reality
The ARC-AGI benchmark highlights the reasoning gap. Its visual puzzles test spatial composition and few-shot generalization using simple grids—tasks humans solve in seconds. When ARC-AGI-2 launched in 2025, standalone LLMs scored near zero. By year-end, refinement scaffolding (iterative generate-verify cycles) pushed GPT-5.2 to 75% success—but at $30 per task and massive compute. Claude Opus 4.5 scored only 37.6% without scaffolding. The 85% Grand Prize threshold remains unclaimed.
ARC-AGI-3 now incorporates interactive reasoning, demanding exploration and planning. This progression isn't "moving goalposts" but refining diagnostics: genuine general intelligence would transfer between benchmarks. Instead, each iteration reveals new failures. Multi-digit arithmetic, trivial for humans, remains a weakness, exposing how far models are from cognitive primitives.
Secrecy and the Path Forward
Claims of secret labs nearing AGI breakthroughs ignore the interdisciplinary nature of the problem. Solving gaps in embodied cognition, architecture, and reasoning requires global collaboration across neuroscience, physics, and ML—not isolated sprint. Current AI investment focuses on scaling transformers, not fundamental research. While AGI is possible—potentially via simulated symbolic training—decades of iterative work lie ahead. As transformer limitations crystallize through benchmarks like ARC and ENACT, the real progress is in mapping the vast territory AI has yet to traverse.
Comments
Please log in or register to join the discussion