When Anthropic co‑founder Chris Olah defended the “mystery” of large language models at a Vatican event, the remarks underscored why regulators are sharpening enforcement. With more than 100 lawsuits alleging unlawful data scraping and potential GDPR and CCPA violations, Anthropic faces a growing compliance gauntlet that could bring hefty fines and demand concrete transparency measures.
What happened
During a special audience at the Vatican, Chris Olah – co‑founder of Anthropic and head of its interpretability team – pushed back against Pope Leo XIV’s warning that “machine ‘intelligence’ is not human intelligence.” Olah described large language models (LLMs) as “grown” from vast corpora of human text, suggesting a kind of organic mystery that even their creators cannot fully explain. The remarks sparked a flurry of commentary, but they also reminded regulators and privacy advocates that Anthropic, like many AI firms, is under fire for how it collects and uses the data that fuels its models.

Legal basis for scrutiny
GDPR (EU General Data Protection Regulation)
- Lawful basis – Articles 6 and 9 require a clear legal ground for processing personal data. Scraping publicly available web pages does not automatically satisfy these clauses when the data includes personal identifiers.
- Transparency – Article 13 obliges controllers to inform data subjects about the purposes of processing. Anthropic has not published a granular data‑source registry, leaving a gap in the required transparency.
- Data‑subject rights – Rights to access, rectification, erasure, and objection (Articles 15‑21) are difficult to honor when the training set is a monolithic, opaque dump of billions of documents.
CCPA / CPRA (California Consumer Privacy Act / Privacy Rights Act)
- Consumer notice – Section 1798.100 requires clear notice of data collection practices. The lack of a public list of scraped sources may be deemed a violation.
- Right to opt‑out – California residents can demand that their personal information not be sold (Section 1798.120). AI firms that treat training data as a commercial asset risk being classified as “selling” personal information.
- Statutory damages – Non‑compliant entities can face up to $7,500 per intentional violation under the CPRA.
Impact on users and companies
Users
- Privacy erosion – Personal statements posted online can be ingested into LLMs, potentially resurfacing in generated text without the original author’s consent.
- Risk of mis‑attribution – When a model reproduces a user’s phrasing, the individual may be linked to content they never authored, creating reputational harm.
Companies (Anthropic and peers)
- Litigation exposure – Over 100 lawsuits have been filed across the U.S. and EU alleging unlawful data scraping. If courts apply GDPR’s “fair processing” test, fines can reach 4 % of global annual turnover, which for a multi‑billion‑dollar AI firm could mean billions in penalties.
- Operational disruption – Courts may issue injunctions requiring firms to halt the use of certain data sets, forcing costly retraining cycles.
- Reputational damage – Public perception of AI as a “black box” deepens when companies appear to sidestep privacy obligations, potentially slowing adoption by enterprise customers wary of compliance risk.
What changes are coming (or should happen)
- Transparent data‑source disclosures – Companies will need to publish searchable registries showing, at a minimum, the categories of sources and any personal data included. The EU’s upcoming AI Act is expected to make such disclosures mandatory for high‑risk systems.
- Robust consent mechanisms – Where feasible, firms should obtain explicit consent from data subjects before ingesting their content, or rely on legitimate‑interest assessments that are documented and auditable.
- Data‑minimisation pipelines – Implement automated filters that strip identifiers before data enters training pipelines, aligning with GDPR’s principle of minimisation (Article 5(1)(c)).
- Rights‑management tooling – Deploy systems that can locate and remove a specific individual's data from a model on request, a technical challenge but increasingly demanded by regulators.
- Regulatory engagement – Proactive dialogue with bodies such as the European Data Protection Board (EDPB) and the California Attorney General can help shape clearer guidance and potentially mitigate enforcement actions.
Why the Vatican moment matters for digital rights
The Pope’s encyclical warned against equating machine output with human consciousness. Olah’s counter‑argument, while philosophically interesting, inadvertently highlighted the opacity that regulators are trying to curb. When AI developers describe their models as “mysterious” or “grown,” they risk normalising a lack of accountability that conflicts with fundamental privacy rights.
For users, the takeaway is simple: AI‑generated content is not a free‑for‑all repository of public speech. Their words may be harvested, repurposed, and monetised without clear consent. For companies, the message is louder – non‑compliance with GDPR, CCPA, and forthcoming AI‑specific statutes can result in multi‑million‑dollar fines, mandatory data‑removal orders, and a loss of market trust.
Bottom line: The Vatican debate is a reminder that ethical and legal frameworks must keep pace with AI hype. Transparency, respect for data‑subject rights, and proactive regulatory engagement are no longer optional; they are essential to avoid the next wave of enforcement actions.

Comments
Please log in or register to join the discussion