ChatGPT Agent Put to the Test: One Brilliant Spark Amidst a Sea of Hallucinations
ZDNET's exhaustive 12-hour evaluation of OpenAI's ChatGPT Agent reveals a tool struggling with reliability, plagued by hallucinations and execution flaws across most tasks. While it stumbled on shopping comparisons, data scraping, and presentation design, a lone success in municipal code analysis hints at its transformative potential—if it can overcome fundamental accuracy hurdles.