GPT-5 outperforms human judges in legal accuracy tests, but raises questions about AI in justice system

University of Chicago researchers find OpenAI's GPT-5 follows legal rules perfectly in tests, outperforming human judges who only comply 52% of the time, though experts warn this 'formalism' may not be desirable for dispensing justice.

Legal scholars have found that OpenAI's GPT-5 follows the law more accurately than human judges in controlled tests, raising both excitement and concern about artificial intelligence's role in the justice system.

Researchers from the University of Chicago Law School, Professor Eric Posner and researcher Shivam Saran, conducted a study comparing AI models against human judges in legal decision-making scenarios. Their findings, published in a paper titled "Silicon Formalism: Rules, Standards, and Judge AI," reveal that GPT-5 achieved a perfect 100% compliance rate with applicable legal doctrines, while human judges only followed the law correctly 52% of the time.

This research builds upon their earlier 2025 study that tested OpenAI's GPT-4o in war crimes cases at the International Criminal Tribunal for the Former Yugoslavia. In that initial experiment, they found AI models behaved more like law students—rigidly following legal precedent—compared to human judges who considered broader contextual factors.

The latest tests involved more mundane legal scenarios, specifically determining which state law would apply in car accident cases across different jurisdictions. Posner and Saran presented these questions to multiple AI models including GPT-5, Google Gemini 3 Pro, and various Llama models, comparing their responses to those of actual federal judges.

Interestingly, Google's Gemini 3 Pro matched GPT-5's perfect score, while other models showed varying degrees of accuracy: Gemini 2.5 Pro achieved 92%, o4-mini reached 79%, and Llama 4 Maverick scored 75%. Some models performed poorly, with Llama 4 Scout and GPT-4.1 each scoring only 50%.

Despite these impressive results, the researchers caution against rushing to replace human judges with AI systems. "The apparent weakness of human judges is actually a strength," Posner and Saran argue in their paper. "Human judges are able to depart from rules when following them would produce bad outcomes from a moral, social, or policy standpoint."

The study highlights a fundamental tension in legal philosophy. While AI models demonstrate superior rule-following capabilities, they lack the discretionary judgment that allows human judges to consider extenuating circumstances, societal impact, and moral considerations. This "formalism"—strict adherence to written law—may produce technically correct but potentially unjust outcomes in certain cases.

For example, an AI following the letter of the law might impose harsh penalties on sympathetic defendants or reward unsympathetic ones if the legal framework technically supports such outcomes. Human judges, by contrast, can interpret laws with consideration for broader societal values and individual circumstances.

The researchers also note that AI models can be influenced through parameter settings and training data, raising questions about who controls these "settings" and how they might be manipulated to achieve desired outcomes in legal proceedings.

This isn't the first time AI has shown promise in legal applications. A mock trial held last year at the University of North Carolina at Chapel Hill School of Law explored similar questions about AI's role in judicial decision-making. The legal community appears to be actively grappling with these issues as AI technology becomes more sophisticated.

Current applications of AI in legal work have been mixed. While AI shows potential for legal research, document review, and case prediction, there have been cautionary missteps. Several high-profile cases have involved lawyers submitting AI-generated briefs containing fabricated case citations, highlighting the technology's current limitations and the need for human oversight.

The question of whether society should accept doctrinaire AI judgments remains open. Would the public trust a system that prioritizes legal formalism over human judgment? How do we balance the benefits of consistent, rule-based decision-making against the need for compassionate, context-aware justice?

As AI continues to advance, legal experts, lawmakers, and the public will need to determine whether these technologies should remain in supporting roles or be granted more consequential decision-making authority. The perfect scores achieved by GPT-5 and Gemini 3 Pro demonstrate that AI can follow legal rules more consistently than humans, but whether this represents progress or a step backward in the pursuit of justice remains a matter of philosophical debate.

For now, the consensus appears to be that while AI can be a powerful tool in legal work, the human element—with all its imperfections—remains essential to a justice system that serves society's broader moral and social needs.

#GPT-5 #legal #judges #formalism #AI in Law

GPT-5 outperforms human judges in legal accuracy tests, but raises questions about AI in justice system

Comments