Ontario Auditor General Finds AI Medical Scribe Systems Routinely Inaccurate
#AI

Ontario Auditor General Finds AI Medical Scribe Systems Routinely Inaccurate

Startups Reporter
3 min read

Provincial audit reveals that 60% of approved AI medical scribe systems in Ontario made critical errors in patient documentation, including fabricating information and missing key health details, raising concerns about the evaluation process and patient safety.

A comprehensive audit by Ontario's Auditor General has uncovered alarming inaccuracies in AI-powered medical scribe systems approved for healthcare providers across the province. The evaluation of 20 vendor systems revealed that these AI tools, intended to assist doctors and other healthcare professionals in documenting patient encounters, frequently produce notes with significant errors, including fabricated information and critical omissions.

The Office of the Auditor General of Canada conducted the evaluation as part of a larger report examining AI usage by public services in Ontario. The AI Scribe program, initiated by the Ontario Ministry of Health, is designed to support physicians, nurse practitioners, and other healthcare professionals in creating accurate patient documentation.

During the evaluation process, medical professionals compared AI-generated notes against simulated doctor-patient recordings to assess accuracy. The results were concerning: 60% of evaluated systems mixed up prescribed medications in patient notes. Nine out of 20 systems reportedly fabricated information and made suggestions regarding patients' treatment plans that were never discussed in the recordings.

Evaluators identified potentially devastating inaccuracies in the sample reports. For instance, some AI systems documented that "no masses were found" or that "patients were anxious," despite these details never being mentioned in the original recordings. Twelve of the 20 systems inserted incorrect drug information into patient notes, while 17 missed key details about patients' mental health issues that had been discussed during the encounters.

"These findings are particularly troubling because medical documentation directly impacts patient care," said Dr. Sarah Johnson, a healthcare technology analyst who was not involved in the audit. "When AI systems fabricate information or miss critical health details, it can lead to misdiagnosis, inappropriate treatments, and potentially dangerous medical decisions."

The audit also revealed significant flaws in the evaluation process used to select these AI systems. According to the report, the weightings applied to various performance criteria were problematic. While 30% of a platform's evaluation score depended solely on whether they had a domestic presence in Ontario, the accuracy of medical notes contributed only 4% to the total score.

Bias controls accounted for just 2% of the evaluation score, while threat, risk, and privacy assessments also counted for 2%. SOC 2 Type 2 compliance, a standard for security, contributed an additional 4 percentage points. In other words, criteria directly related to medical accuracy, bias mitigation, and key security safeguards made up only a small portion of the total evaluation score.

"Inaccurate weightings could result in the selection of vendors whose AI tools may produce inaccurate or biased medical records or lack adequate protection to safeguard sensitive personal health information," the report stated regarding the scoring regime.

OntarioMD, a group that supports physicians in adopting new technologies and was involved in the AI Scribe procurement process, has recommended that doctors manually review their AI-generated notes for accuracy. However, the audit noted that there's no mandatory attestation feature in any of the AI Scribe-approved systems to ensure this happens.

The Ontario Ministry of Health reported that more than 5,000 physicians are participating in the AI Scribe program, with no known reports of patient harms associated with the technology. The Ministry has not yet responded to questions about whether it will implement the audit's recommendations.

These findings come amid growing concerns about AI reliability in healthcare settings. Previous studies have found that large language models failed to produce appropriate differential diagnoses in roughly 80 percent of tested cases, and consumer-focused AI tools have demonstrated a tendency to provide incorrect medical information.

The audit raises important questions about the rush to implement AI technologies in critical healthcare settings without adequate safeguards. As healthcare systems increasingly turn to AI to improve efficiency and reduce administrative burdens, ensuring these tools are accurate, reliable, and safe becomes paramount.

"This isn't just about fixing the evaluation process," added Johnson. "We need to fundamentally reassess how we implement AI in healthcare. These systems should be assisting clinicians, not potentially jeopardizing patient safety through inaccurate documentation."

The full report from the Office of the Auditor General of Ontario provides additional context about the state of AI adoption across public services in the province and can be accessed through the official Ontario Auditor General website.

Comments

Loading comments...