Blocking Suspicious Old Browsers: A Response to Modern Crawler Threats

A blog owner explains why they're blocking outdated browsers to combat malicious crawlers, particularly those used for LLM training data collection.

The Problem: Malicious Crawlers Masquerading as Old Browsers

In early 2025, a significant challenge emerged for website operators: a surge in high-volume crawlers using outdated browser user agents, particularly old Chrome versions. These crawlers, apparently gathering data for LLM training, have become so prevalent that they're straining server resources and forcing site owners to take defensive measures.

Why Old Browsers Are Being Blocked

The core issue is that malicious actors are exploiting the trust historically placed in older browser user agents. By presenting themselves as outdated Chrome browsers, these crawlers bypass many standard security measures. The volume is substantial enough that it's impacting legitimate site operations, leading to the decision to block all traffic from browsers deemed "suspiciously old."

Impact on Legitimate Users

Unfortunately, this defensive strategy creates collateral damage. Users with genuinely old browsers—whether due to legacy systems, corporate environments, or personal choice—find themselves unable to access affected sites. The blocking mechanism doesn't distinguish between a crawler pretending to be an old browser and a real person using an old browser version.

Specific Browser Considerations

Vivaldi Users

Vivaldi browser users face a particular challenge. Due to ongoing attacks, Vivaldi's "User Agent Brand Masking" feature, which allows the browser to identify itself as Google Chrome for compatibility, now triggers security blocks. Users need to disable this setting so Vivaldi identifies itself correctly, even with the current version.

Archive.is Users

Archive.* services (archive.today, archive.ph, archive.is) present another complication. These archival services crawl pages using old Chrome User-Agent values, distribute their crawling across widely dispersed IP addresses, and some even falsify reverse DNS entries to claim they're Googlebot IPs—a tactic typically associated with malicious actors. The site owner recommends using archive.org instead, as it's better behaved and can access the blog.

What This Means for the Web

This situation highlights a growing tension in web development: the balance between accessibility and security. As AI training and data collection become more aggressive, site owners are forced to implement increasingly strict measures that may inadvertently exclude legitimate users.

The trend suggests we may see more sites adopting similar policies, potentially creating a web where older software versions become increasingly isolated. For users, this means staying current with browser updates becomes not just about security patches, but about maintaining access to content.

Moving Forward

For users affected by these blocks, the path forward is clear: update your browser to a current version. For site owners, the challenge is developing more sophisticated detection methods that can distinguish between malicious crawlers and legitimate users with older software, without creating excessive false positives.

This situation represents an evolving arms race between content providers protecting their resources and actors seeking to harvest data at scale. As AI development continues to drive demand for training data, expect these defensive measures to become more common and potentially more aggressive.

#Security #Vulnerabilities #AI #LLMs #Cybersecurity