OpenAI's GPT-5.3-Codex-Spark Achieves 15x Speed Boost on Cerebras Hardware
#Hardware

OpenAI's GPT-5.3-Codex-Spark Achieves 15x Speed Boost on Cerebras Hardware

Cloud Reporter
3 min read

OpenAI launches GPT-5.3-Codex-Spark, its first production model on Cerebras wafer-scale chips, delivering 1,000 tokens/second for real-time coding assistance.

OpenAI has unveiled GPT-5.3-Codex-Spark, marking a significant milestone in AI hardware deployment as the company's first production model to run on Cerebras wafer-scale chips rather than traditional Nvidia GPUs. The new model delivers approximately 1,000 tokens per second—roughly 15 times faster than previous versions—enabling a genuinely real-time, interactive coding experience that OpenAI claims transforms how developers work with AI assistance.

Featured image

The model is being released as a research preview to ChatGPT Pro users, allowing developers to experiment with the technology while OpenAI collaborates with Cerebras to scale datacenter capacity and refine the user experience. This strategic partnership represents a notable diversification in OpenAI's hardware approach, though the company emphasizes this doesn't signal a departure from GPUs as the foundation of their training and inference pipeline.

Engineering for Speed and Responsiveness

Codex-Spark was specifically engineered for low-latency, interactive coding workflows rather than deep reasoning or general-purpose tasks. The model maintains its predecessor's capability to handle long-running processes—operating for "hours, days, and weeks without intervention"—while prioritizing immediate responsiveness for coding tasks.

Under the hood, OpenAI implemented substantial infrastructure improvements to reduce latency across the entire request-response pipeline. These optimizations include:

  • Streamlined client-to-server and server-to-client response streaming
  • Complete rewrite of key inference stack components
  • Reworked session initialization to accelerate time-to-first-token
  • Introduction of persistent WebSocket connections
  • Multiple enhancements to the Responses API

These changes collectively reduced per-client/server roundtrip overhead by 80%, per-token processing time by 30%, and time-to-first-token by 50%. OpenAI plans to make these improvements the default for all models moving forward.

Performance Benchmarks and Real-World Impact

On specialized benchmarks like SWE-Bench Pro and Terminal-Bench 2.0—designed to evaluate software engineering capabilities—Codex-Spark achieved results positioned between GPT-5.1-Codex-mini and GPT-5.3-Codex, but in a fraction of the time. This performance profile makes the model particularly suited for tasks requiring rapid iteration and immediate feedback.

The speed improvements have sparked debate within the developer community. While some users on Reddit emphasize preferring "maximum intelligence and reliability" over speed—noting they'd happily wait an hour for superior results—others highlight the cumulative cost of repeated iterations that slower models can incur. Nicholas Van Landschoot's independent measurements suggest practical speed improvements closer to 1.37x rather than the claimed 15x, explaining that the dramatic figure comes from comparing Codex-Spark to a specific high-reasoning configuration of Codex designed to maximize accuracy.

Hardware Architecture and Future Directions

Codex-Spark runs on Cerebras' Wafer Scale Engine 3 accelerators, which excel at low-latency, high-speed inference tasks. The wafer-scale architecture allows for massive parallelism and reduced data movement, making it particularly effective for the model's interactive use case. OpenAI notes that Cerebras accelerators can be combined with GPUs to leverage the strengths of both architectures.

The model features a 128k context window and currently supports text-only input, with OpenAI planning to introduce faster models with larger contexts based on insights gathered from the developer community during this research preview phase.

Strategic Implications

This deployment represents more than just a performance upgrade—it signals OpenAI's willingness to diversify its hardware strategy and explore alternative architectures for specific use cases. While maintaining its core reliance on GPUs, the company is clearly experimenting with specialized hardware for particular workloads where speed and responsiveness are paramount.

For developers, the immediate benefit is a coding assistant that can keep pace with human thought processes, making AI pair programming feel more natural and less frustrating. The ability to see results immediately as you type or make changes could significantly accelerate development workflows, though the trade-off between speed and reasoning depth remains a subject of ongoing discussion within the AI community.

The success of Codex-Spark may influence how future AI models are designed and deployed, particularly for applications where real-time interaction is critical. As OpenAI continues to refine the technology and expand its hardware partnerships, the line between human and AI coding workflows may continue to blur, potentially reshaping software development practices across the industry.

Comments

Loading comments...