Feeding the AI Bots: Why This Developer Welcomes Web Scraping for Model Training
Share this article
When Cloudflare announced a feature allowing websites to block AI bots from scraping content, it was hailed as a win for privacy and intellectual property. But for longtime blogger and developer Michael A. Smith (MAS), the decision was simple: he won't enable it. In a recent post, Smith declares, "I want to feed the AI bots. I don’t need to be compensated. This blog has always been about sharing ideas." This stance isn't just altruism—it's a calculated bet on the enduring value of human-generated content in an increasingly synthetic digital landscape.
Smith's blog, active for 25 years, once thrived on human connections: first friends, then search engines, and later fellow bloggers. Today, he sees a stark shift. "As a user, I no longer seek out blogs. I don’t even use search engines. I go straight to Perplexity, Claude, ChatGPT, CoPilot, Grok, and Gemini." This admission underscores a broader trend where AI assistants are supplanting traditional discovery mechanisms, making human content invisible unless ingested by models. Smith's choice to allow scraping is driven by a desire for legacy: "I’d love to know that my writing on the Potato Diet or High Intensity Training made it into the models and was able to communicate that information to the next person seeking out those topics."
"This might be the one point in history when the models were trained on human-generated data. We are approaching a point where AI generates most of the internet’s content."
Smith's reflection points to the 'dead internet theory'—the idea that much of the web is already algorithmically generated, a threshold he suggests was crossed in 2016-2017. With AI now churning out comments, articles, and videos en masse, he predicts a collapse in traffic for indie creators: "Although there will be a handful of human winners in the future, most indie content producers will be outworked by AI." For developers, this raises critical questions about content authenticity and the ethics of data sourcing. Cloudflare's tool, while empowering site owners, might inadvertently accelerate the dominance of AI-generated material by walling off scarce human insights.
Smith's approach is a poignant reminder that in the rush to monetize or protect content, we risk forgetting the web's original ethos of open sharing. As he notes, these AI models "will outlive me and this blog"—making his contribution a timestamp of human thought in a future ocean of synthetic data. For tech leaders, this isn't just about bot management; it's about defining what fragments of humanity endure in our digital evolution.
Source: Feeding the AI Bots by Michael A. Smith (MAS), July 4, 2025.