OpenAI has been evaluating non-Nvidia hardware options for AI inference workloads since 2025, testing Cerebras' wafer-scale chips and Groq's LPU architecture amid concerns about performance and cost efficiency of Nvidia's latest inference accelerators.

According to internal sources via Reuters, OpenAI has been actively testing AI accelerator hardware from Cerebras Systems and Groq since late 2025 as potential alternatives to Nvidia's GPUs for inference workloads. While Nvidia's H100 and upcoming Blackwell architecture GPUs remain dominant in training large language models, the company appears dissatisfied with certain aspects of Nvidia's inference-specific offerings like the L40S and upcoming L4 inference GPUs.
What's Actually New
Architecture Differences:
- Cerebras' CS-3 uses wafer-scale technology with 900,000 cores optimized for sparse attention patterns common in transformer inference
- Groq's LPU employs deterministic execution architecture claiming 750+ tokens/second on Llama 70B
- Both architectures avoid Nvidia's CUDA dependency through custom software stacks
Performance Per Watt Concerns: Early benchmarks suggest Cerebras' 15kW CS-3 system delivers comparable throughput to Nvidia's 10kW DGX H100 cluster for certain inference tasks, while Groq's architecture shows latency advantages in public benchmarks for smaller batch sizes.
Total Cost of Ownership: OpenAI's exploration comes as cloud providers charge $1.50-$4.00/hour for single H100 instances. Cerebras offers on-premises pricing at $2-3 million per system, while Groq's cloud API costs $0.27/million tokens for Mixtral 8x7B.
Technical Limitations
- Software Ecosystem: Neither alternative matches CUDA's maturity - Cerebras requires using their Graphcore-like SDK, while Groq developers work with limited ONNX support
- Model Compatibility: OpenAI's largest models (GPT-4-class) still require Nvidia GPUs for optimal performance due to memory bandwidth constraints
- Scaling Challenges: Cerebras' wafer-scale systems require specialized cooling infrastructure uncommon in commercial data centers
Industry Context
This move aligns with broader industry trends:
- Microsoft Azure (OpenAI's primary cloud provider) has expanded its AMD MI300X offerings
- Amazon Web Services continues pushing Trainium and Inferentia chips
- Graphcore recently partnered with Jony Ive's firm on next-gen AI hardware
Practical Implications
While Nvidia maintains ~98% market share in data center GPUs (Counterpoint Research Q4 2025), OpenAI's exploration signals:
- Specialization Trend: Different hardware architectures emerging for training vs inference workloads
- Cost Pressure: AI companies seeking alternatives as model serving costs consume 70-80% of operational budgets
- Vertical Integration: Potential for OpenAI to follow Google/Amazon in developing custom silicon long-term
Financial Considerations
OpenAI's hardware tests coincide with reports of rising infrastructure costs:
- Running ChatGPT reportedly costs ~$700,000 daily
- GPT-4 inference costs estimated at $0.06/1k tokens
- Nvidia's H200 commands ~40% premium over H100 despite similar architecture
Conclusion
While Nvidia remains unavoidable for large-scale training, the inference market shows signs of fragmentation. OpenAI's tests with alternative architectures reflect practical engineering concerns rather than ideological rejection of CUDA. However, given Nvidia's recent Blackwell architecture improvements and Microsoft's deep integration of Nvidia tech in Azure, any wholesale transition appears unlikely before 2027.


Comments
Please log in or register to join the discussion