Xikipedia: When Wikipedia Becomes Your Personalized Social Feed
#Machine Learning

Xikipedia: When Wikipedia Becomes Your Personalized Social Feed

Tech Essays Reporter
4 min read

A clever experiment that transforms Wikipedia into a personalized content feed using simple algorithms, demonstrating how recommendation systems work without collecting user data.

Xikipedia represents an intriguing experiment in content recommendation systems that transforms Wikipedia into a personalized social media-style feed. Created by rebane2001, this project demonstrates how even basic algorithms without machine learning or user data collection can effectively learn user preferences and suggest relevant content.

At its core, Xikipedia is a demonstration of how recommendation algorithms function in practice. The project uses content from Simple Wikipedia - a version of Wikipedia designed for easier reading - and presents it in a feed format similar to social media platforms. What makes this particularly interesting is that the entire algorithm runs locally in your browser, with no data collection or sharing involved. The moment you refresh or close the tab, all your interaction data disappears.

How the Algorithm Works

The recommendation system behind Xikipedia is surprisingly straightforward yet effective. Each Wikipedia article in the feed is categorized based on two main factors: its Wikipedia category tree and the internal links (pagelinks) within the article. These categories are then assigned point scores based on user interactions.

The interaction scoring system is designed to learn user preferences through engagement:

  • Scrolling past a post: -5 points
  • Liking a post: 50 points plus a bonus of 4 times the number of posts since the last like
  • Clicking on an article: 75 points
  • Clicking on an image: 100 points

This scoring mechanism creates a feedback loop where the system learns what content you find interesting based on your actions. Posts with images receive a small base score boost of +5, while posts you've already seen multiple times get heavily penalized with a score of (3^(post_seen_times) - 1) * -5000, ensuring you don't see the same content repeatedly.

The Selection Process

When determining which post to show next, Xikipedia employs a three-pronged approach with different probabilities:

  1. Weighted random selection (40% chance): Posts are chosen based on their accumulated scores, with higher-scoring posts having a greater likelihood of being selected
  2. Highest score selection (42% chance): The post with the highest score is shown directly
  3. Complete randomness (18% chance): A completely random post is displayed to maintain variety

This hybrid approach balances personalization with discovery, preventing the feed from becoming too narrow while still prioritizing content aligned with user interests.

Technical Implementation

The project is built to be self-contained and privacy-focused. The data processing is handled through a Python script (process_data.py) that can work with Wikipedia data dumps. While the repository includes a pre-processed .json file for Simple Wikipedia, users can create their own datasets by replacing the files in the processing script with their own Wikimedia data dumps.

One clever optimization involves category names and surnames, which start with a base score of -1000. This prevents these common categories from dominating the feed, as they would otherwise appear too frequently due to their prevalence across Wikipedia articles.

Licensing and Accessibility

Xikipedia is released under the AGPLv3 license, which ensures the project remains open source while protecting user freedoms. The creator has noted that the license applies to the project code itself but not to the included Wikipedia data file. For users who need different licensing terms, the creator has expressed willingness to consider relicensing upon request.

Why This Matters

Xikipedia serves as an educational tool that demystifies how recommendation algorithms work. In an era where social media feeds are often criticized for creating echo chambers or manipulating user behavior, this project shows how simple, transparent algorithms can create personalized experiences without the privacy concerns associated with data collection and tracking.

The project also highlights the potential for repurposing existing knowledge bases like Wikipedia into more engaging, personalized formats. By treating Wikipedia articles as social media posts and applying recommendation logic, Xikipedia creates a new way to discover and consume information that feels more dynamic than traditional encyclopedia browsing.

For developers and researchers, Xikipedia provides a clean, understandable implementation of recommendation logic that can serve as a foundation for more complex systems. Its simplicity makes it an excellent teaching tool for explaining concepts like collaborative filtering, content-based recommendations, and the balance between personalization and serendipity in content discovery systems.

You can try Xikipedia yourself at xikipedia.org or explore the source code on GitHub. The project demonstrates that sophisticated-feeling recommendation systems don't require complex machine learning or invasive data collection - sometimes, simple algorithms executed thoughtfully can achieve surprisingly effective results.

Comments

Loading comments...