Two Turing Laureates Tell BAAI the Hard Part of AGI Safety Is Still Unsolved
#AI

Two Turing Laureates Tell BAAI the Hard Part of AGI Safety Is Still Unsolved

AI & ML Reporter
5 min read

At the 8th BAAI Conference in Beijing, Whitfield Diffie and Andrew Barto arrived from opposite ends of computer science at the same uncomfortable conclusion: we cannot yet specify what we want safe AI to do, nor reward it correctly for doing so.

Two researchers who built foundational pieces of modern computing walked onto the same Beijing stage on June 12, 2026, and delivered a message the AI industry rarely wants to hear at a flagship conference: the theory needed to make AGI safe does not exist yet, and building it will take much longer than the current pace of product launches implies.

The 8th Beijing Academy of Artificial Intelligence (BAAI) Conference opened with keynotes from Whitfield Diffie, the 2015 Turing Award winner who co-invented public-key cryptography, and Andrew Barto, who shared the 2024 award with Richard Sutton for foundational work in reinforcement learning. Neither coordinated with the other, yet both landed on the same structural problem. We are handing machines autonomy faster than we can mathematically describe what we want them to do.

Featured image

The specification problem

Diffie approached the question from information security, a field where formal guarantees actually work. Cryptography succeeds, he argued, precisely because its goals are narrow and precisely stated. You can define what it means for an encryption scheme to be secure, write that definition down, and then prove a system meets it. Decades of academic and industrial effort went into designing, verifying, and standardizing protocols like the ones underpinning TLS and modern key exchange.

AGI breaks that model at the first step. A system meant to do everything cannot have its correct behavior written as a formal specification. "We want to prove a system conforms to its specification, but first you have to be able to write that specification," Diffie said. His pointed example: nobody has even a formal definition of what it would mean for a model not to hallucinate. Without that, there is nothing to prove conformance against.

The practical takeaway is sobering for anyone shipping AI agents today. Diffie characterized current LLM security as early and disorderly, comparing it to cryptography before its standardization era. The implication is that agent security will need its own long arc of protocol design and standardization before claims of control mean anything verifiable. That is a multi-year, possibly multi-decade engineering and research program, not a feature to be patched in.

The reward problem

Barto came at the same wall from reinforcement learning. He traced the field's long and often forgotten history, from Edward Thorndike's puzzle-box experiments with cats in 1898 through to AlphaGo, to make a point about how slowly the theoretical groundwork actually accumulated. The bottleneck he identified is reward function design.

In a closed game like chess or Go, the reward signal is unambiguous. You win or you lose, and the objective is fully captured by the rules. Real-world tasks offer no such clarity. There is no clean function that encodes what a human actually wants from an autonomous system operating in an open environment, and Barto argued that writing a perfect one is not merely difficult but impossible in principle.

He reached back to Norbert Wiener's warning from the cybernetics era: a machine will give you what you asked for, not what you wanted. Barto called it the Midas Touch problem, where an agent optimizing the literal objective destroys the value the objective was supposed to serve. This is the same failure mode that shows up in reward hacking and specification gaming across the RL safety literature. As autonomous agents multiply, he warned, the chances of one of them optimizing the wrong thing compound rather than average out.

His prescription was deliberately unglamorous. A single reward function cannot carry the safety burden. Systems need dynamic guardrails validated through extensive experimentation, layered defenses rather than one clever objective.

What is actually new here

Readers should be clear about what this was and was not. Neither speaker announced a model, a benchmark, or a technique. There were no numbers to report. What makes the session worth attention is the convergence itself. A cryptographer who spent his career on provable security and a reinforcement learning pioneer who spent his on reward-driven agents independently identified the same two missing pieces, the specification and the objective, as the gap between current systems and trustworthy autonomy.

That convergence is a useful corrective to the framing common at industry events, where alignment is often treated as an engineering detail that scaling and better data will eventually absorb. Both laureates argued the opposite. The hard part is theoretical, and theory has historically moved on the order of decades. Diffie noted that the road from Claude Shannon's information theory to deployed cryptographic standards ran about half a century. Barto's reinforcement learning timeline stretched a full century from Thorndike to AlphaGo.

The limitation of the talks, viewed skeptically, is that diagnosis is easier than treatment. Saying a perfect reward function is impossible does not tell a team building an agent today what to do on Monday, beyond the sensible advice to add guardrails and test heavily. The specification critique is similarly directional rather than constructive. There is active work attempting to formalize pieces of the problem, including research on scalable oversight and process-based supervision, that neither speaker engaged with in detail.

Still, having two people of this standing say plainly that the foundations are not in place carries weight that a hundred safety blog posts do not. The message to take from Beijing is not that AGI safety is hopeless. It is that the timeline for solving it should be measured against how long cryptography and reinforcement learning each took to mature, and that anyone promising controllable general agents on a product roadmap is selling something the theory cannot yet back up.

Comments

Loading comments...