#Hardware

Musk’s Colossus 1 mixed‑GPU supercluster repurposed for Anthropic inference as xAI pivots to homogeneous Blackwell‑only Colossus 2

Chips Reporter
4 min read

SpaceX’s 220 k‑GPU Colossus 1, built with a blend of H100, H200 and GB200 chips, proved too inefficient for training Grok. Elon Musk has leased the entire cluster to Anthropic, which will use it for inference workloads, while xAI shifts its frontier training to the new all‑Blackwell Colossus 2. The move supplies Anthropic with immediate capacity, offsets xAI’s losses and sets the stage for a potential IPO.

Musk’s Colossus 1 mixed‑GPU supercluster repurposed for Anthropic inference as xAI pivots to homogeneous Blackwell‑only Colossus 2

{{IMAGE:2}}

Announcement

Last week Anthropic announced a lease of SpaceX’s Colossus 1 data centre – more than 220 000 Nvidia GPUs delivering roughly 30 MW of power. The agreement gives Anthropic exclusive access to the entire cluster, which Musk’s xAI originally built to train its Grok models. The lease raises two immediate questions: why hand a flagship AI asset to a direct competitor, and what does the mixed‑GPU design mean for the cluster’s usefulness?

Technical specs and architectural constraints

Heterogeneous GPU mix

Colossus 1 comprises about 150 000 H100 GPUs, 50 000 H200 GPUs and 20 000 GB200 GPUs. The three generations arrived at the data centre at different times, reflecting supply‑chain pressures rather than a planned architecture. For distributed training this heterogeneity is a liability. Training algorithms require every accelerator to finish a step before the next iteration can start. Faster GB200 chips finish early and must wait for the slower H100s – a classic “straggler” effect. At the scale of 220 000 devices the idle time compounds, driving reported utilization down to roughly 11 %.

Inference versus training

Inference workloads are far less sensitive to synchronization. A request can be routed to any available GPU, and the result returned as soon as that chip finishes. Consequently, the same mixed‑GPU pool can achieve high throughput for serving a trained model, even though it is inefficient for training. Anthropic’s Claude services, which now face token caps, peak‑hour throttling and API rate limits, will benefit from the added capacity without suffering the straggler penalty.

Power and cooling demands

The cluster consumes about 30 MW, equivalent to the load of a small city. Cooling requirements are on the order of 150 MW of chilled water flow, and the power draw translates to roughly 10 GWh per month. Operating at 11 % utilization means the infrastructure is burning roughly 9 GWh of electricity each month for idle capacity – a significant cost factor.

Transition to Colossus 2

Musk’s next‑generation system, Colossus 2, is being built exclusively with Nvidia’s Blackwell‑based GPUs. A homogeneous fleet eliminates the straggler effect, allowing utilization rates above 40 % – the benchmark set by Meta and Google. Uniform hardware also simplifies the software stack, enabling tighter kernel optimizations and lower latency for both training and inference.

Market implications

Revenue offset for xAI

Mirae Asset analysts estimate that leasing Colossus 1 could generate $5‑6 billion in annual revenue at market‑rate GPU lease prices. xAI reported a Q1 2026 net loss of about $6 billion, so the deal could bring the business close to breakeven on a cash‑flow basis.

Growth potential for Anthropic

Anthropic’s leadership estimates that roughly 50 % of AI‑industry compute spend goes to inference, and that inference spend translates to revenue at a 3× multiple. Applying that ratio to the $5 billion lease value suggests an incremental $15 billion in annual recurring revenue for Claude services – a material boost to the company’s reported $30 billion ARR.

Strategic positioning ahead of an IPO

The lease aligns with Musk’s broader narrative of turning SpaceX assets into cash‑generating platforms before a potential public offering. By converting a depreciating compute asset into a revenue stream, xAI improves its balance sheet, while simultaneously showcasing an AI‑cloud capability that could be marketed to other customers.

Supply‑chain context

The mixed‑GPU build of Colossus 1 was a direct response to the 2023‑2024 GPU shortage, where Nvidia’s H100s were allocated on a first‑come‑first‑served basis. By the time the GB200 chips entered production, the cluster was already at scale, forcing a patchwork architecture. The lesson for the industry is clear: rapid scale‑up without a uniform hardware roadmap can create long‑term efficiency penalties.

Outlook

With Colossus 2 slated to reach gigawatt‑scale capacity later this year, xAI will have a purpose‑built training platform that can compete with the world’s largest AI labs. Meanwhile Anthropic gains an immediate inference boost that should lift Claude’s token limits, remove throttling for Pro and Max tiers, and expand API request caps for enterprise customers.

The partnership illustrates how mixed‑generation hardware, once a liability for training, can be monetized as an inference asset. It also underscores the growing importance of flexible compute leasing models as AI demand outpaces the build‑out of new data centres.

{{IMAGE:5}}

For further technical details on Nvidia’s Blackwell architecture, see the official Nvidia roadmap.

Comments

Loading comments...