A Media Lab experiment tracked 67 people for a month and found a measurable cost to leaning on chatbots for news verification: assisted accuracy went up 21 percent, but unassisted detection dropped 15 points once the AI was removed. The fix isn't a better model, it's a different interaction style.
A new open-access study from the MIT Media Lab puts a number on something many people in human-computer interaction have suspected for a while: using a large language model to check whether news is real makes you better at it right now, and worse at it later when the model is gone.
The team tracked 67 participants over four weeks as they evaluated headline-image pairs and decided which were fake. During sessions where a chatbot assisted them, participants were 21 percent more accurate at spotting misinformation. That part confirms earlier work from the MIT Sloan School of Management showing that conversational AI can genuinely reduce belief in false information. The new and uncomfortable finding came after the support was withdrawn. By week four, participants evaluating fresh news items on their own performed 15 percentage points worse than they had before the study began. Roughly a quarter of them reported feeling they were improving, even as their measured accuracy fell.

The paper, titled "Dialogues with AI Reduce Beliefs in Misinformation but Build No Lasting Discernment Skills," was presented at the 2026 CHI Conference on Human Factors in Computing Systems. It was co-led by media arts and sciences PhD students Anku Rani and Valdemar Danry, with co-authors Paul Pu Liang, Andrew Lippman, and senior author Pattie Maes.
Why this looks familiar
Researchers call the effect the "AI dependency paradox," and it sits inside a much older pattern that anyone who builds autonomous systems will recognize. We have decades of documentation on deskilling, also called cognitive offloading. Calculators weakened mental arithmetic. Turn-by-turn GPS navigation measurably degraded people's ability to build spatial maps and orient themselves without a screen. A 2025 study found that physicians who used AI assistance got worse at detecting cancer on their own once the assistance was removed.
The mechanism is the same across all of these cases. When a tool reliably handles a task, the user stops doing the underlying cognitive work that maintains the skill. The tool's competence and the user's competence are not the same thing, and in many systems they trade off against each other. In robotics this shows up constantly in supervisory control: an operator monitoring a highly capable autonomous system loses situational awareness and reaction readiness precisely because the system is good enough that they stop actively engaging. The failure mode is not the automation working badly, it is the automation working well enough to disengage the human.
The context matters because of how news consumption is shifting. Pew Research Center reporting over the past year found that one in five U.S. teens regularly use LLMs like ChatGPT, Claude, and Gemini to get their news, and one in four young adults have used them for that purpose at least once. The population doing this offloading is large and growing.

What the model actually is
Part of the problem is a mismatch between how the tool feels and how it works. "Users get excited about these 'magical' LLMs, but forget that they're just statistical models that predict the next 'token' in a sequence," Rani notes. "Many impressive behaviors emerge from scaling this, but it comes with real limitations, both in what the model can reliably generate and in its broader impact on the people using it."
Those limitations get sharper exactly when verification matters most. The authors point out that these models are particularly prone to errors during emotionally charged breaking news, the moments when misinformation spreads fastest and when users are most tempted to ask a chatbot what is true. There is a compounding issue underneath it: the human-created news content used to train these models is itself increasingly unreliable or biased, so the verifier inherits the flaws of the corpus it learned from.
The qualitative side of the study found distinct behavioral patterns. The team labeled about one fifth of participants "Dependency Developers," people who started out actively reasoning through the headlines and gradually slid into passive acceptance of whatever the AI suggested. One participant described the shift directly, and another captured the gap in what the tool taught: "While the chatbots did emphasize that you must check across multiple sources to make sure a story is true, they didn't teach me much about exploring the context of the images themselves." The model handed over conclusions, not methods.
Coach versus crutch
The most useful result for anyone designing these systems is that the outcome was not fixed by the model's accuracy. It was determined by the interaction style. The researchers draw a line between conversational strategies that help in the moment and strategies that build skill that persists after the tool is gone.

The approaches that produced stronger independent detection later all shared a property: they slowed the user down and made them do work. One was the Socratic method, where the AI asks guided questions instead of supplying answers. Another the team calls "deep probing," where the system offers gently persuasive nudges when a user appears to be drifting toward a wrong conclusion, without simply overriding them.
"AIs that 'tell' by providing direct answers are more likely to foster reliance, while those that 'ask' via Socratic questioning are better at engaging someone to actually learn how to discern the truth on their own," Danry says. "But it's very much a trade-off between speed and effort."
That trade-off is the practical takeaway. A system tuned to maximize immediate accuracy and minimize user friction is, by the same design choices, optimizing for dependency. A system that wants the user to retain capability has to deliberately preserve some of the cognitive load, which makes it feel slower and less satisfying in the moment. Most commercial assistants are tuned for the first goal, because that is what users reward in the short term.
Limits and what comes next
The study is honest about its scope. The dataset was roughly 50 validated news items, the cohort was small at 67 people, and participants came from the United States and the United Kingdom. Rani says future work aims at more geographically diverse cohorts, including low-resource communities, and at testing other interaction formats. One direction the team wants to explore is whether multimodal approaches, such as culturally adaptive digital twins rather than text-based chat, help people build durable detection skills rather than temporary accuracy.
The researchers frame the real target as education and design practice. "It's especially important to raise awareness in our schools and academic communities about the shortcomings of using AI as learning tools," Maes says. "People need to know that if they 'delegate' their thinking, they're not going to get better at that particular brand of problem-solving."

Danry frames it as a literacy problem that the field will have to keep working on as the models change. "There's a lot of work to do in making sure that we don't just fully offload critical tasks that we want to be able to keep on doing to these models," he says. "We need to develop a new kind of AI literacy."
The finding generalizes well beyond news. Any time an autonomous or semi-autonomous system takes over a judgment task a human still needs to be able to perform, the same question applies: is the interface teaching the operator or replacing them? For news verification the stakes are diffuse but enormous, spread across millions of casual users who will not notice their own skills eroding until they are asked to judge something with no model in front of them. The research and the full paper are available through the MIT Media Lab, and the work was supported in part by the Media Lab Consortium, an MIT Tata Center Technology and Design Fellowship, and a Google PhD Fellowship in Human-Computer Interaction.

Comments
Please log in or register to join the discussion