AI Safety's Blind Spot: When Millions Show Signs of Psychosis, Labs Offer Hotlines, Not Hard Stops

While AI safety focuses on catastrophic risks, millions of users weekly show signs of severe mental health issues from chatbots, yet the industry response remains insufficient compared to how it handles bioweapons content.

Every week, between 1.2 and 3 million ChatGPT users—roughly the population of a small country—show signals of psychosis, mania, suicidal planning, or unhealthy emotional dependence on the model. These staggering numbers come from OpenAI itself, yet receive a fraction of the attention devoted to preventing AI-driven catastrophic scenarios.

The low end of that range represents users showing suicide-planning indicators alone. The high end encompasses all three categories OpenAI has flagged, which the company hasn't clarified are non-overlapping. What makes these figures particularly concerning is the lack of independent verification—no audit, no time series data, no disclosed methodology. We cannot determine if the actual figure is higher, whether it's growing, or how it compares across other frontier models, none of which publish equivalent data.

This represents a fundamental disconnect in how AI safety is approached. The field prioritizes catastrophic risks, where investment flows freely. Meanwhile, everyday cognitive and mental health harm reads like a footnote in safety discussions.

The most striking difference lies in how AI systems respond to different types of harm. When users attempt to generate content related to mass destruction or CBRN (Chemical, Biological, Radiological, Nuclear) materials, they encounter a hard wall: the model refuses, the conversation ends, no amount of reframing gets the user past it. Yet when users express suicidal ideation, they typically receive a soft redirect, a crisis hotline link, and then the conversation continues.

This discrepancy becomes particularly concerning when examining specific cases. Adam Raine, for example, was directed to crisis resources more than 100 times by ChatGPT, according to OpenAI's own court filing, while allegedly the same conversation helped him refine a method. Whether this redirect-and-continue protocol failed is what a court is now deciding. What's notable is that this approach remains the industry standard.

The question persists: why isn't mental health crisis treated as a gating category, where the conversation stops completely and the user is routed to a human? This represents one of many questions for which concrete answers remain elusive.

The argument here is that safety frameworks built for catastrophic risk have been extended to cognitive harm, but only as monitoring rather than as gating. This extension feels incomplete and insufficient. Labs measure what they have been pressured to measure, and their gating decisions reflect what they consider unacceptable to ship. What's disappointing is that the current set of unacceptable-to-ship behaviors doesn't include any cognitive harm, regardless of measured severity.

This represents a structural decision with no clear signs that policy is getting any closer to forcing labs to change their behavior. Until it does, "AI safety" and "Personal AI Safety" describe two different commitments, even when they appear under the same heading in system cards.

This isn't actually a new concern. People have been worrying about cognitive independence and how new technologies might erode it long before ChatGPT, mostly in the context of brain-computer interfaces and neurotechnology. The framework even has a name: cognitive freedom—the idea that individuals have a right to mental integrity and freedom from algorithmic manipulation.

You can trace this concept through the neurorights tradition (Ienca & Andorno, 2017) and the UNESCO Recommendation on the Ethics of Neurotechnology (2025). The intellectual scaffolding is already there. What's missing is policy implementation, especially in the US.

Without regulatory pressure, it's unclear what would push frontier labs to take Personal AI Safety as seriously as AI Safety. The current approach treats mental health concerns as manageable through monitoring and gentle redirection, while treating catastrophic risks as requiring absolute prevention. This asymmetry deserves closer examination as AI becomes increasingly integrated into daily life.

#AI_Safety #Mental Health #policy #Neurotechnology #cognitive freedom

AI Safety's Blind Spot: When Millions Show Signs of Psychosis, Labs Offer Hotlines, Not Hard Stops

Comments