AI Takes a Swing at Online Anonymity: LLMs Can Now Deanonymize Users at Scale
#Privacy

AI Takes a Swing at Online Anonymity: LLMs Can Now Deanonymize Users at Scale

Hardware Reporter
5 min read

Researchers demonstrate that large language models can identify pseudonymous internet users with 67% accuracy by analyzing their writing patterns, threatening online privacy at a cost of just $1-4 per profile.

Researchers have demonstrated that large language models can now deanonymize pseudonymous internet users with alarming efficiency, marking a significant shift in the landscape of online privacy. The study, conducted by a team from ETH Zurich and Anthropic, shows that LLMs can identify individuals from their anonymous online posts with 67% accuracy at 90% precision, all for a cost of just $1-4 per profile.

Featured image

The research builds upon decades of work in data privacy, tracing back to Latanya Sweeney's 2002 research on k-Anonymity, which showed that 87% of the US population could be identified using just three data points: ZIP code, gender, and date of birth. What once required manual investigation by skilled human analysts can now be automated and scaled using AI.

How the Deanonymization Works

The researchers collected data from 338 Hacker News users who had publicly linked their profiles to LinkedIn accounts, creating a ground-truth dataset for testing. They then created structured data profiles based on users' comments and posts, anonymized this information, and fed it to an AI agent.

The LLM agent successfully identified 226 of the 338 targets, demonstrating that writing style, topic preferences, and behavioral patterns serve as unique fingerprints that AI can recognize across different platforms. The system works by extracting identity-relevant signals from unstructured text, efficiently searching through millions of candidate profiles, and reasoning about whether different accounts belong to the same person.

The Scale and Cost Implications

Perhaps most concerning is the scalability and affordability of this approach. The entire experiment cost approximately $2,000, with each profile analysis estimated between $1 and $4. This represents a dramatic reduction in the cost and effort required for deanonymization attacks.

"Where previous approaches required predefined feature schemas, careful data alignment, and manual verification, LLMs can extract identity-relevant signals from arbitrary prose," the researchers explain in their paper titled "Large-scale online deanonymization with LLMs."

Real-World Applications and Threats

The researchers outline several potential misuse scenarios:

  • Government surveillance: Targeting journalists, activists, or political dissidents
  • Corporate profiling: Building detailed advertising profiles from forum activity
  • Social engineering: Creating highly personalized phishing attacks
  • Harassment and doxxing: Identifying targets for online abuse

The Privacy Paradox

This research highlights a fundamental tension in online privacy. While many users seek anonymity to protect themselves from harassment, discrimination, or simply to express opinions freely, the very act of writing and participating online leaves traces that AI can now connect.

Simon Lermen, an AI engineer at MATS Research and one of the corresponding authors, emphasizes that netizens need to reconsider their online behavior: "Ask yourself: could a team of smart investigators figure out who you are from your posts? If yes, LLM agents can likely do the same, and the cost of doing so is only going down."

The combination of writing style, topic expertise, posting times, and interaction patterns creates what Lermen calls "a unique fingerprint" that becomes increasingly identifiable as more data points are collected.

Technical Methodology

The researchers tested their approach across multiple platforms including Hacker News, Reddit, LinkedIn, and anonymized interview transcripts. Their methodology involved:

  1. Data collection: Gathering posts and comments from target users
  2. Profile creation: Building structured representations of writing patterns
  3. Search prompt generation: Creating anonymized queries based on the profiles
  4. AI analysis: Using LLMs to search for matching patterns across platforms
  5. Verification: Checking predictions against ground-truth identities

Limitations and Ethical Considerations

The technique isn't foolproof - it succeeds in about two-thirds of cases and requires substantial data to work effectively. The researchers deliberately avoided actually deanonymizing real users without consent, instead using publicly linked profiles to establish ground truth for their experiments.

However, the success rate is high enough to pose a genuine threat to online anonymity. As AI models become more sophisticated and access to computing power increases, the accuracy and speed of these deanonymization attacks will likely improve.

Implications for Online Communities

This research has profound implications for online communities that rely on pseudonymity. Platforms like Reddit, Hacker News, and various forums have long provided spaces where users can discuss sensitive topics, seek help, or express controversial opinions without fear of real-world consequences.

The ability to deanonymize users at scale threatens these spaces and may lead to:

  • Reduced participation in sensitive discussions
  • Self-censorship due to fear of identification
  • Migration to more secure platforms with stronger anonymity protections
  • Increased demand for privacy-enhancing technologies

The Future of Online Privacy

As LLMs become more powerful and accessible, the researchers argue that traditional approaches to online anonymity may no longer be sufficient. The cost of deanonymization is dropping rapidly, while the accuracy of AI systems continues to improve.

This suggests that future online privacy may require more sophisticated approaches, such as:

  • Advanced anonymization techniques that go beyond simple pseudonymity
  • AI-resistant writing styles or content generation tools
  • Stronger platform-level protections and data minimization
  • Legal frameworks to regulate the use of deanonymization technologies

The research team, which includes experts from ETH Zurich and Anthropic, represents a growing concern among AI researchers about the unintended consequences of powerful language models. Their work serves as both a warning and a call to action for developers, platform operators, and users to reconsider how online anonymity is protected in the age of AI.

For now, the message is clear: online anonymity is under threat, and the tools to compromise it are becoming increasingly accessible. Users who value their privacy may need to dramatically rethink their online behavior and the platforms they trust with their data.

Comments

Loading comments...