The Joke Filesystem That Keeps Teaching Real Lessons About Data, Compression, and Hype
#Dev

The Joke Filesystem That Keeps Teaching Real Lessons About Data, Compression, and Hype

Trends Reporter
5 min read

πfs stores your files in the digits of pi instead of on disk, achieving what its author calls 100% compression. It is a gag, but more than a decade after its release it still circulates through developer feeds because the joke lands on something genuinely uncomfortable about how the industry talks about storage and AI.

Every so often a GitHub repository resurfaces that is technically a joke but keeps getting passed around because it argues a point better than most serious blog posts. philipl/pifs, the "data-free filesystem," is one of those projects. It has been on GitHub for years, the FUSE code still builds, and developers still star it and share it in the same breath they use for real tools. That persistence is the interesting part.

Featured image

What it claims to do

The pitch is delivered completely straight. πfs does not waste space storing your data on a hard drive. Instead it stores your data in pi. The reasoning chain is internally consistent, which is what makes it funny. Pi is conjectured to be a normal number, meaning every finite sequence of digits appears somewhere in its expansion. Read pi in hexadecimal and, if the conjecture holds, every file that could possibly exist is already sitting in there. Your files. My files. Files nobody has written yet. The first published version of this observation dates to around 2001, and the project leans on it without blinking.

From there the logic marches forward. If pi contains every file, why store anything? Just record the index into pi where your file begins and its length, then extract the bytes later using the Bailey-Borwein-Plouffe formula, which can compute hexadecimal digits of pi at arbitrary positions without calculating the ones before them. The README even reaches the obvious punchline about copyright: if your file was always present in pi, was it ever really yours to infringe?

Where the trick actually lives

The gag survives a second reading because the math is real, even if the economics are absurd. Finding a long byte sequence inside pi is computationally brutal, so the implementation cheats in a way that quietly destroys the entire premise. It looks up each individual byte separately. A single byte has only 256 possible values, so locating one in a normal sequence is trivial. But the index that tells you where that byte lives is itself a number, and for an evenly distributed sequence the position of an arbitrary byte will, on average, take roughly as many bits to write down as the byte saved.

That is the whole joke compressed into one sentence. The metadata directory, where πfs stores the offsets and lengths, ends up holding at least as much information as the original file. The README plays this completely deadpan. Worried about losing your file locations? No problem, the locations are just metadata, and your files are still sitting safely in pi forever. It is a perfect parody of a real failure mode, the system that moves the cost somewhere you stop looking at it and then declares victory.

Why it keeps coming back

πfs endures because it is a working refutation of a sales pattern, not just a one-liner. The information-theoretic point it makes is the pigeonhole bound on lossless compression: you cannot map every possible N-bit file to something shorter, because there are not enough shorter strings to go around. Any scheme that claims otherwise is shuffling the bits into a place it has chosen not to count. Students who have never sat through a coding theory lecture absorb that idea from πfs in about ninety seconds.

The counter-perspective from people who dislike the project is that it is a strawman. Nobody seriously sells 100% compression, so why keep applauding a takedown of a claim no honest engineer makes? That is fair as far as it goes. The trouble is that adjacent versions of the claim show up constantly, dressed in respectable language, and they are harder to dismiss precisely because they are subtler.

The new wrinkle

The repository now points at a follow-up, inferencefs, billed as the latest in data-free filesystems. The framing is pointed. We have spent two years watching large language models described as lossy compressors of their training data, and watching people propose, sometimes seriously, that a model could regenerate documents, code, or images from a short prompt instead of storing them. πfs stored bytes in pi. A model-backed filesystem would store them in weights. The structural objection is identical: a model that could reproduce any input exactly would need to encode that input somewhere, and the prompt plus the model is not free.

This is where the old joke earns its second life. The pi version is obviously ridiculous because pi is fixed and public, so the only thing you are actually storing is your enormous pile of indices. A neural network blurs that intuition. The weights look like they contain knowledge rather than your specific files, so it feels plausible that they could conjure your data back from almost nothing. They cannot, not losslessly and not for arbitrary inputs, for the same pigeonhole reason. The model that memorizes your file verbatim has paid for that file in parameters, and the model that only approximates it is doing lossy compression with an unusually expensive and unpredictable codec.

What the community actually takes from it

The healthy reading is not cynicism about compression or about machine learning. Lossy compression is genuinely useful, generative models are genuinely useful, and approximate reconstruction is a legitimate goal for plenty of workloads. The lesson πfs keeps teaching is narrower and more durable: when someone claims to have eliminated a cost, find where the cost went. With πfs it went into the metadata directory. With a normal number it went into the index. With a model it goes into the parameters and the prompt and the tolerance for being slightly wrong.

The README closes with a roadmap of future improvements, including arithmetic coding, parallelizable lookup, cloud-based pi lookup, and πfs for Hadoop. Every item is a real technology bolted onto an impossible foundation, which is the joke's final form. You can make an unworkable idea look like a serious engineering program just by surrounding it with buzzwords and a benchmark you promise to fix later. That trick has not aged a day, and the fact that a gag filesystem from the FUSE era still skewers it cleanly says less about πfs than it does about the rest of us.

Comments

Loading comments...