The AI Compute Crunch: Why Intelligence on Tap Will Cost More Than You Think

As AI usage explodes at unprecedented rates, compute shortages are forcing AI companies to ration access and raise prices, creating a volatile market where demand for human-level intelligence far outstrips supply.

The AI industry is facing a perfect storm of exponential demand growth and constrained supply that's reshaping how we access artificial intelligence. Recent public admissions from OpenAI and Anthropic about being "compute starved" are just the tip of the iceberg in what's becoming a fundamental bottleneck for the entire AI ecosystem.

Usage Is Exploding Beyond All Expectations

The scale of AI adoption is staggering. GitHub's COO recently revealed that the platform has seen a ~14x annualised increase in commits over just three months. While commits are an imperfect proxy for inference demand, this explosion points to an unprecedented surge in compute requirements. The reality is likely even worse - many new "vibe coders" haven't yet mastered Git, and countless AI-assisted coding sessions happen outside GitHub entirely through tools like Cowork.

OpenAI's Thibault Sottiaux, head of the Codex team, confirmed what many suspected: AI companies are experiencing demand that vastly outstrips available supply. Rumors suggest even high-profile projects like Sora were temporarily shut down to reallocate compute resources to more pressing needs.

This creates a domino effect across the industry. When one provider experiences compute-related outages or tightens usage limits, users migrate to alternatives, creating cascading pressure across the entire ecosystem. The result is a volatile market where no provider has breathing room.

The Supply Chain Reality Check

Many observers misunderstood the massive GPU commitments being announced by tech giants. A $100 billion commitment to purchase GPU capacity doesn't magically create that capacity overnight. The physical infrastructure required - concrete pouring, power connections, natural gas turbines, GPU fabrication, racking, and networking - all face significant constraints.

The rollout of NVIDIA's GB200 chips has been particularly problematic. Unlike previous generations, GB200 requires full liquid cooling rather than air cooling. At gigawatt scale, liquid cooling in data centers is largely uncharted territory. The increased power density per square meter makes electrical engineering more complex, while shortages of skilled labor to implement liquid cooling systems and high-end plumbing components have caused massive delays.

Even more concerning are the hard constraints on DRAM fabrication. While SK Hynix recently signed an $8 billion deal for more EUV production equipment from ASML, new capacity won't come online for years. Google's Sundar Pichai specifically called out memory as a significant constraint in a recent podcast appearance. Although innovations like TurboQuant show promise in reducing memory requirements through KV cache compression, the pace of AI usage growth means these optimizations only buy temporary breathing room.

The Coming Price Shock

The next 18-24 months will be defined by compute shortages. When exponential demand growth meets linear supply increases, market volatility is inevitable. The cracks are already visible - Anthropic's uptime has fallen to "one nine" reliability, and the company has implemented increasingly aggressive measures to manage demand.

Anthropic has cut peak-time usage limits significantly and banned third-party agent harnesses from using their API. However, these measures can only go so far. If the company is indeed seeing 10x quarter-over-quarter inference demand growth, first-party usage will quickly consume any capacity freed up by restricting third-party access. Time-based rationing helps smooth demand but eventually leads to 24/7 maximum utilization.

This leaves price as the primary lever for managing demand. While gaining market share has been crucial in the AI race, the game theory changes when all providers face compute constraints simultaneously. Paradoxically, as models improve dramatically - with rumors of OpenAI's "Spud" and Anthropic's "Mythos" pointing to significant advances - users become less price sensitive.

Many users who initially balked at $200 monthly subscriptions now consider them excellent value, and would likely pay significantly more for access to cutting-edge models. We're entering uncharted territory regarding what people will pay for intelligence on tap.

The Long-Term Outlook

The electrification of Europe and North America in the late 1800s and early 1900s offers some historical parallels, but the differences are stark. AI demand growth is far steeper, and supply constraints are more concentrated. There's near-infinite demand for machines approaching or surpassing human cognition, even if that capability is unevenly distributed across domains.

Supply will eventually catch up, but "eventually" is the painful part. In the meantime, users face higher prices, rationing, and reliability issues. The industry is learning that intelligence on tap comes at a cost that many didn't anticipate.

One potential hedge against these constraints is the rapid improvement in small models. Local models like Gemma 4 26b-a4b running on consumer hardware are becoming increasingly impressive for software engineering tasks. While not quite good enough yet, we may be only months away from local models on consumer hardware being "good enough" for many use cases.

The AI compute crunch isn't just a temporary bottleneck - it's a fundamental reshaping of how we access and pay for artificial intelligence. The era of unlimited, cheap AI may be over, replaced by a more constrained but ultimately more valuable resource.