OpenAI's GPT-5 Presentation Derailed by 'Mega Chart Screwup' in Deception Evaluation
Share this article
OpenAI's much-anticipated GPT-5 livestream took an unexpected turn when a series of charts designed to demonstrate the model's prowess in reducing hallucinations contained significant visual inaccuracies, turning the event into a case study in data misrepresentation. The most glaring error appeared in a graph comparing 'deception evals' across models, where GPT-5's reported 50.0% deception rate for 'coding deception' was depicted with a larger bar than the 47.4% score for OpenAI's smaller o3 model—despite the lower number. Ironically, this visualization flaw occurred in a segment touting GPT-5's advances in honesty, with CEO Sam Altman later acknowledging the blunder on social media.
Caption: The erroneous chart shown during OpenAI's GPT-5 livestream, where bar sizes misrepresented actual deception rates. (Source: OpenAI/The Verge)
According to OpenAI's blog post, the correct figures reveal GPT-5's deception rate at just 16.5%, starkly contrasting the onstage error. Altman addressed the gaffe directly, calling it a 'mega chart screwup' while assuring viewers that accurate data was available online. An OpenAI marketing staffer echoed the sentiment, apologizing for the 'unintentional chart crime' and confirming a fix in their official documentation.
"We fixed the chart in the blog guys, apologies for the unintentional chart crime," an OpenAI representative stated.
The incident highlights a critical tension in AI development: as models like GPT-5 push boundaries in reducing hallucinations, human errors in communication can undermine trust. For developers, this serves as a cautionary tale about the importance of rigorous data validation in presentations, especially when showcasing sensitive metrics like deception rates. While OpenAI hasn't confirmed if GPT-5 generated the flawed charts, the mishap raises questions about internal review processes at a company leading the charge in AI reliability.
In an industry where precise data drives adoption, this 'vibe graphing' fail—where visual appeal overshadowed accuracy—reminds us that even the most advanced AI systems are only as credible as the humans presenting them. As teams integrate GPT-5 into their workflows, verifying outputs against source data becomes paramount, turning this stumble into a teachable moment for the entire tech ecosystem.
Source: Jay Peters, The Verge