A deep dive into the surprisingly vibrant ecosystem of non-commercial websites, revealing that the 'small web' is far larger and more active than most people realize.
The "small web" is bigger than you might think. There are currently several initiatives that attempt to reclaim some part of the Internet for non-commercial, personal use; some of these I described in an earlier article. Here I'm using the term "small web" to mean the use of ordinary web browsers and servers, but for private sites, free of advertising and corporate tracking.
I've also recently adopted the Gemini protocol, which uses completely different protocols and software from the regular web. The Gemini protocol is so limited that it's almost incapable of commercial exploitation which is, of course, the main part of its appeal. It's hard to assess the number of people using Gemini; there seem to be about 6,000 Gemini "capsules" (sites) world-wide, but many of them seem to be defunct. Gemini online forums typically seem to have perhaps a hundred or so active users. It's a small community, mostly populated by IT professionals at present.
What got me thinking about the size of the small web was my use of feed aggregators in Gemini. Like regular websites, the owners of Gemini capsules often announce updates to their contents using feeds. A feed is a simple file in an agreed format that lists site updates, usually with a time-stamp for each update. As in the regular web, Gemini feeds are usually in ATOM or RSS format – XML documents with a particular layout, described by a specification.
A "feed aggregator" is a service that examines the feeds from a range of sites or capsules, and publishes updates in chronological order, typically in a way that is accessible to the public. By looking at a Gemini feed aggregator, I can see immediately what new content is available, across the whole the Gemini community. Since there aren't all that many active Gemini capsules, it's practicable to list updates from many capsules – perhaps all of them – on a single page, which I find very convenient.
There are several active Gemini feed aggregators, differing in how they discover, or are informed about, new content. I look at aggregated Gemini feeds on most days, and I always find something of interest. So I wondered: could I implement something similar for the small web as a whole?
Of course, aggregating updates to the whole web on a single page would be absurd, perhaps even impossible. Still, I reasoned, the small web is, well, small. Maybe there aren't that many updates every day?
To attempt such a task, I first needed a list of sites on the small web. I wasn't going to try to construct one myself but, fortunately, I didn't have to. The Kagi search engine has such a list as part of its small web initiative. This list contains sites nominated by Kagi users; apart from "smallness", one of the criteria for inclusion is that the site publishes an update feed. Kagi's list mostly includes private sites and blogs, although some blogs are hosted on corporate platforms like Blogger.
When I last looked at it, some time last year, I think it listed about 6,000 sites. That's about the same number as known Gemini capsules, and I rather expected many of those sites to be moribund. Yesterday I looked at the list again, and I saw it had risen to about 32,000 entries. That's more than I expected but, for the purposes of aggregating feeds, it's not the number of sites, but how often they're updated, that matters.
It's not easy to tell from the feed list alone how active each site is, so I wrote a program that downloaded each feed, and checked the timestamps to see the frequency of updates. Not all feeds have timestamps – they're not compulsory – but, of course, to list updates chronologically I do need a time-stamp. So I excluded all the sites whose feeds didn't have time-stamps, or were otherwise incapable of being indexed properly.
Of course I had to exclude the sites that were down, or that didn't produce a valid feed from the listed URL. These exclusions reduced the list of sites to about 25,000 – still a lot. So I excluded all sites that produced fewer than one update per month. That left about 9,000 sites.
Of this number, some sites had only one update a month, others had many updates each day. It turned out that on March 15 there were 1,251 updates. The numbers weren't very different for earlier dates. I should point out that these "updates" aren't just fixing spelling mistakes: they're additions of new content to a site.
So there's good news and bad news, although actually it's the same news: the small web is too large, and too active, to publish all the updates on a single page, even for just one day. Well, I could publish them, but nobody has time to read them all.
It's good news, because the small web is very much alive, and growing. It's bad news because I was hoping to be able to implement a feed aggregator, of the same kind that exists for Gemini, and the current scale of the "small" web makes that impractical.
To be fair, I should point out that the "small" web was never defined by the number of sites, but by the lack of commercial influence. That there's still a place for private, non-commercial websites on an Internet dominated by advertising is something we should celebrate.

Comments
Please log in or register to join the discussion