corpus: A Self-Hosted Music Scrobbling Platform That Puts You in Control
#Privacy

corpus: A Self-Hosted Music Scrobbling Platform That Puts You in Control

Tech Essays Reporter
4 min read

corpus is a self-hosted ListenBrainz and Last.fm proxy that stores your listening metadata and cover images, giving you complete control over your music listening data with a PureScript frontend for exploration.

The way we interact with music has fundamentally changed in the streaming era, but the data trail we leave behind remains fragmented across multiple platforms. While services like Last.fm and ListenBrainz have long provided ways to track what we listen to, they often operate as black boxes where our listening data becomes someone else's asset. Enter corpus, a self-hosted solution that puts the power back in your hands by creating a personal music scrobbling platform that you control entirely.

At its core, corpus functions as both a proxy and a storage system for your music listening data. It connects to ListenBrainz and Last.fm APIs to pull in your scrobbles—the records of every track you've listened to—while simultaneously storing this metadata locally along with cover art images. This dual approach means you're not just mirroring data from external services; you're creating your own authoritative source of truth about your listening habits.

The architecture of corpus is thoughtfully designed for both performance and flexibility. Built using PureScript for the frontend, the application provides an interactive interface for exploring your listening history. The choice of PureScript is particularly interesting, as it brings functional programming principles to web development while compiling to efficient JavaScript. This results in a responsive, type-safe frontend that can handle the complexity of music metadata without sacrificing user experience.

For data storage, corpus leverages DuckDB, a modern analytical database that's particularly well-suited for time-series data like scrobbles. DuckDB's columnar storage format and efficient query engine make it ideal for the analytical queries that music enthusiasts often want to run—things like "what was my most listened to genre last year?" or "how has my taste evolved over time?" The schema details and analytical queries are well-documented, allowing users to dive deep into their listening patterns.

One of the standout features of corpus is its comprehensive cover art handling. The system doesn't just store metadata about tracks; it actively caches cover images, pulling them from various sources including Last.fm and Discogs. This creates a rich, visual archive of your musical journey. The cover art caching can be configured to use S3-compatible storage, making it easy to scale and back up your collection of album art.

The multi-user support in corpus is particularly noteworthy. Rather than being limited to a single user, the platform can manage multiple listening profiles, each with its own configuration. This makes it perfect for households or small communities where different people want to track their listening separately but under one roof. Each user profile can have its own ListenBrainz and Last.fm credentials, database file, and storage settings.

Configuration in corpus is handled through environment variables and per-user settings, providing flexibility without unnecessary complexity. The system supports automatic historical syncing on first run, ensuring you don't lose your listening history when migrating to the platform. The backup functionality, while optional, provides peace of mind by automatically backing up your database to S3 at configurable intervals.

From a development perspective, corpus embraces modern tooling. The use of Just and Nix for development and deployment ensures reproducible builds and consistent environments across different machines. The PureScript ecosystem provides a robust foundation for the frontend, while the overall architecture remains modular and extensible.

What makes corpus particularly compelling is how it addresses the growing concern around data ownership in the digital age. By self-hosting your music scrobbling platform, you're not just avoiding vendor lock-in; you're creating a personal archive that you can query, analyze, and preserve indefinitely. The platform's design encourages exploration and analysis of your listening data, turning what could be a simple log of tracks into a rich dataset about your musical tastes and habits.

The technical implementation shows careful consideration of real-world use cases. The system handles API rate limiting gracefully, provides fallbacks for missing data, and includes metrics endpoints for monitoring. The documentation is thorough, covering everything from deep architectural dives to specific usage instructions, making it accessible to both casual users and those who want to understand every detail of how their data is processed.

For music enthusiasts who have grown tired of relying on third-party services for tracking their listening habits, corpus represents a mature, well-engineered alternative. It combines the convenience of modern music tracking services with the control and privacy of self-hosting, all while providing powerful tools for exploring and understanding your musical journey. Whether you're a casual listener curious about your top tracks of the year or a data enthusiast wanting to analyze listening patterns over decades, corpus provides the foundation for a truly personal music tracking experience.

As we continue to generate more data about our daily lives, tools like corpus remind us that we don't have to surrender control of our digital footprints. With thoughtful design and modern technology, it's possible to create systems that serve us rather than the other way around. Corpus isn't just a music scrobbling platform; it's a statement about data ownership and the value of personal archives in an increasingly centralized digital world.

Comments

Loading comments...