Wirth's Revenge: The Growing Cost of Software Complexity

An exploration of how software complexity continues to outpace hardware improvements, from Niklaus Wirth's 1995 essay to modern AI challenges, examining the tradeoffs between accessibility and efficiency.

Wirth's Revenge

In 1995, Turing laureate Niklaus Wirth wrote an essay called A Plea for Lean Software in which he mostly gripes about the state of software at the time. Among these gripes is this claim which Wirth attributes to his colleague Martin Reiser1, though it's become to be known as Wirth's Law: Software is getting slower more rapidly than hardware becomes faster. Doing his best grandpa Simpson impersonation, Wirth complains: About 25 years ago, an interactive text editor could be designed with as little as 8,000 bytes of storage. (Modern program editors request 100 times that much!) An operating system had to manage with 8,000 bytes, and a compiler had to fit into 32 Kbytes, whereas their modern descendants require megabytes. Has all this inflated software become any faster? On the contrary. Were it not for a thousand times faster hardware, modern software would be utterly unusable.

Aside from the numbers involved here, which must sound utterly preposterous to the average modern reader, there's a lot to relate to. My 25 year career in software, all of which happened after 1995, was in many ways a story in two parts about Wirth's Law, an action and a reaction. Personally, I disagree with Wirth's conclusion that nothing of value had been gained for the loss in efficiency. When he laments "the advent of windows, cut-and-paste strategies, and pop-up menus, [...] the replacement of meaningful command words by pretty icons", he is not properly appreciating the value these features had in making computing accessible to more people, and is focusing too much on their runtime cost. Programmers are often too quick to judge software on its technical merits rather than its social ones.

Wirth passed away 2 years ago, but he was a giant in the field of Computer Science and a huge inspiration to me and to many of my other inspirations. In many ways, my focus on simplicity and my own system design sensibilities find their genesis with him. Wirth's law is so self evidently true that it's been a topic of continuous investigation and rediscovery. A notable example of this was Dan Luu's great post on input lag back in 2017. He felt that input latency was getting worse over time, so he got a high speed camera and measured the delay between pressing a key and the letter appearing on screen across a lot of different hardware. The lowest latency computer was the Apple 2e from 1983. Input latency has gone up since 1983 because there is a lot more software involved in the pipeline for handling input. The kind of hardware interrupt based input handling the Apple 2e had is not flexible enough to meet modern requirements, so this additional complexity buys us a lot of value... but it's certainly not free, and if you're not careful, one of the costs is latency.

Luu goes on to write a lot about complexity and simplicity, and makes an interesting observation: the modern systems that fix the latency issue mostly do so not through removing complexity but by adding it. There was a lot of talk about complexity and simplicity at the time, because a huge number of software developers were working on another great tradeoff that had been made, this time in the datacenter: the widespread adoption of cloud computing. In 1995, when Wirth wrote his essay, if you wanted to run a new internet company, you could just get a computer and run with it. Amazon didn't launch until July of that year, but it was famously started out of Jeff Bezos' garage. The requirements for an internet company were simpler back then. The web wasn't some ubiquitous technology with total population penetration. There were only about 16 million people using it at the time; more people had an SNES than used the internet. Slashdot didn't exist. There was no real expectation of 5 9's 24/7/365 availability, and no opportunity to "go viral."

By 2010, this had changed. Sure, I guess you could still get Slashdotted then, but more relevantly, Twitter and Facebook could drive tons of traffic to you overnight. The number of internet users had ballooned to 2 billion. To run your company's software, you could build your own datacenter, but this is a complicated task requiring a lot of expertise; you need land, permits, contractors, etc. I wouldn't even know where to start. You could buy an existing datacenter, but you'd still need to manage power, backup generators/batteries, air conditioning, fire suppression, racks, maintenance, networking. Again, decades in the software industry, and I can barely build a competent list of requirements. It's a big investment, and there's a lot of opex. You could rent a rack in a colo and focus on your compute needs, but those needs could change in an instant. If you plan out costs for 100,000 users and you never gain traction, you've overspent and are burning cash on pointless hardware you could be using to develop your product. If you get hit with a tidal wave of interest, you could be showing people the fail whale for years or miss your opportunity for success entirely.

Cloud computing was the era's answer. By 2010, Amazon was out of Jeff's basement and running its own massive datacenters for their online operations. As Steve Yegge wrote in 2010, Bezos had distributed an influential API mandate in 2002 requiring all internal teams to make their services available via an API. By 2006, they had already built a platform that they felt they could release to paying customers in the form of EC2 and S3. AWS let you rent capacity in Amazon's datacenter through a web interface or via direct API calls, billed on granular timescales. Each step in this pipeline imparts additional cost, but they're all pretty valuable, especially the last step. There are even more steps in that pipeline today, with fully managed services like RDS and IAM which abstract the management of software and Lambda which even further abstract your hardware requirements and allow you to scale (and pay) purely on utilization.

Even though the cost to run software had gone up, the improved accessibility of cloud platforms and the reduction of risky capex led to an explosion in web software. All the while, hardware was racing ahead, making the rented capacity more powerful per unit cost and reducing the per-user cost. Unfortunately, not every "Wirth tradeoff" is sound engineering. Early in my career, I was working at a newspaper publisher owned by Condé Nast. I was the lead engineer on one of the company's more forward looking development projects, a Django application that managed local sports results data across dozens of regional newspapers. The application provided a single source of truth for very different users and use cases:

Reporters could add box scores and statistics on the go Their websites could display results, league tables, etc. The backend could publish feeds to syndicate into the print editions of each newspaper

The print syndication system involved writing some pretty gnarly templates in order to generate feeds that could be understood by their newspaper printing systems. After a while, we noticed a problem. These templates were taking minutes to render and setting the database on fire every night. If you've been around the block in software, you might already know the issue. The templates were using an ORM which dynamically loaded foreign key fields on attribute access, so innocent looking loops were doing one database query per iteration. These were complex templates with many nested loops: they were sending hundreds of thousands of database queries per render, many of them just the same query over and over again from a different loop in a different part of the template.

We could fix some of the terrible performance by pre-loading those fields with a join, but we had a lot of templates, and they were complex enough and the database large enough that this wasn't a realistic path to success. As Dan Luu concludes in his study on latency, the solution that worked required adding more complexity in the form of a smart caching layer. Although we didn't know we were making it when we started the template project, this was a "bad" Wirth tradeoff. It still had utility: instead of having to manage what data might be needed in what template carefully, we could grab a list of top level objects and let the ORM fetch the rest of the data we needed on the fly. The project started up quickly, but even at a pretty low scale of complexity, it became impossible to execute successfully. Before we realized what the problem was, we were using the convenience of these auto-loaded fields without understanding their true cost, and the software we built was a wasteful monstrosity as a result.

I see the same thing happening now but at broader scale with LLMs, and I feel myself sympathizing more and more with Wirth's cane shaking wrath. Programming is the act of getting a computer to do something for you. Many people are discovering that for the first time, thanks to LLMs, they can ask a computer to do something for them, and it will actually go and do it. However, limiting yourself to programming only through this approach poses some problems. While they might not be the unbound ecological disaster that many of their detractors claim they are, LLMs are still intensely computationally expensive. You can ask an AI what 2 * 3 is and for the low price of several seconds of waiting, a few milliliters of water and enough power to watch 5% of a TikTok video on a television, it will tell you. But the computer you have in front of you can perform this calculation a billion times per second. If the problem of my accidental database denial of service syndication feed was down to ignorance over the costs of ORM usage, it's pretty obvious that a similar kind of ignorance can lead to enormous unintended costs once we start integrating LLMs into our automation.

I've seen a few instances of this out in the wild that lead me to believe that this trap might be particularly tricky to avoid. Despite the capacity for LLMs to educate, or simulate education, or at least point you towards related materials some of which may be real, that's not how laypeople use them. They present the LLM with a problem and ask it solve that problem. One example of this is from myself, as this is how I used LLMs in my first go, too. I had a dump of recipes from a great but sadly unmaintained recipe site that I wanted to import into a self-hosted recipe management app. I thought "Well, this sounds tedious, let me ask an LLM to do this." So I pointed a local LLM to the specification for the destination format, and asked it to convert the files. It converted one file every 10 minutes, inaccurately and without proper formatting. It was slow and it produced trash.

When you see engineers heap praise on programming agents, this isn't how they are using them. You don't ask the LLM to perform a repetitive and precise task, you ask it to build a script that performs that task. Except in rare cases, this script does not itself use LLMs. Ironically, if you have the foresight to describe this problem to a major AI model and ask it how you should use an AI to solve it, this is exactly what it will tell you to do. This approach subtly different from the way you might use LLMs for many other tasks, but it's crucial to getting results that reliably get a computer to do something for you. LLMs don't do reliable, they don't do repeatable. Building a program allows you to iterate on a deterministic solution with a stable source of truth, and you come away with an artifact that may or not be useless, but which actually works, and in my case converts 70 files/sec.

Another example I came across was this twitter thread by BenjaminDEKR, which I saw being ridiculed on bsky. He asked his personal agent to remind him to get milk, and this led the agent to repeatedly ask Opus if it was daytime yet. Along with the context from his heartbeat file, this resulted in a $0.75 charge for each heartbeat, costing him almost $20 during a single night's sleep. What was the solution? Maybe you decide that for your purposes 00:00 is night and 08:00 is day and use a basic local gettimeofday call to determine which span you're in. Maybe you're unsatisfied with anything other than astronomical day/night and can generate a sunrise/sunset table for the year using NOAA's unmaintained solar calculator to dynamically produce your day/night spans? You could do these things, but not if asking the LLM to solve problems is your problem solving approach. If asking an LLM is the only way you know how to solve problems, then you optimize the question asking by reducing heartbeat frequency and running on a cheaper model. Problem solved!

The overall concern is that having a magic box that gives you the answers ends up being a thought terminating solution to any problem. When I wrote about the ecological impacts of AI, one of the non-ecological impacts I cited was the possibility that "AI erodes human skill." A recent release by research fellows at Anthropic, "How AI Impacts Skill Formation", suggests this fear isn't unfounded: We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average.

A few weeks ago, I was at a family gathering watching some of the kids go through a huge backlog of red envelopes that their grandparents had saved for them over various missed holidays. They were pulling out all of the cash, at which point they were going to tally it all up and split it evenly. When the adults challenged them to come up with better counting strategies, one of them suggested they could throw all of the money on the floor, take a picture, and ask ChatGPT to tally it all. The instincts are for people to get the AI to do work for them, not to learn from the AI how to do the work themselves.

Wirth's law posits that software can erase gains faster than hardware can make them, but I'm afraid the reality is much worse than that. If you've studied computer science, you might have heard of a function called Busy Beaver. The name is unfortunately silly, but it's a fairly important thought experiment in computability. BB(N) is defined as the maximum number of steps a terminating turing machine with N states can run. This function is known to be noncomputable, because any algorithm that could compute it would be able to solve the halting problem, which is known to be undecidable. BB(1..3) were known in the 1960s to be 1, 6, and 21. In a pleasant bit of symmetry with Dan Luu's experiments, BB(4) was discovered to be 107 in 1983, the same year his Apple 2e was built. In 2024, BB(5) was proven to be 47,176,870. As N grows, BB is known to eventually outgrow any computable sequence, including famous fast-growing sequences like TREE(). BB(6) has a lower bound that is so large that it is impossible to explain how large it is to someone without a significant background in mathematics.

Software has an unprecedented capability to produce a correct answer in the most resource consuming way possible. Of course, producing incorrect answers or no answer at all is also an option. Despite the apparent truth of Wirth's law, engineers have been actively battling against it for decades, but I worry that with LLMs we might have lost the war. Am I just the latest in a long line of engineers who can't appreciate the newfound democratization of programming, or have we crossed into a "bad" Wirth tradeoff, where the growth curve of runtime complexity is something that hardware advancements cannot possibly dig us back out of?

If you are wincing at the last name Reiser in the context of vaguely old computing, we are both of a very specific place and time, and I want you to know that Martin Reiser is not and has never been Hans Reiser. Feb 2

Wirth's Revenge: The Growing Cost of Software Complexity

Comments