OpenAI Launches GPT-5.4: A New Frontier Model for Professional Work
#AI

OpenAI Launches GPT-5.4: A New Frontier Model for Professional Work

Startups Reporter
5 min read

OpenAI has released GPT-5.4, its most capable frontier model yet, featuring enhanced reasoning, coding, and computer-use capabilities across ChatGPT, API, and Codex platforms.

OpenAI has unveiled GPT-5.4, its latest frontier model designed specifically for professional work, marking a significant advancement in AI capabilities for knowledge workers, developers, and enterprises.

Professional-Grade Performance

The new model demonstrates substantial improvements across multiple domains. On GDPval, which tests agents' abilities to produce well-specified knowledge work across 44 occupations, GPT-5.4 achieves a state-of-the-art 83.0% success rate, matching or exceeding industry professionals in the majority of comparisons. This represents a notable leap from GPT-5.2's 70.9% performance.

GPT-5.4 particularly excels at spreadsheet modeling tasks, achieving a mean score of 87.3% on internal benchmarks that simulate junior investment banking analyst work. This is a significant improvement over GPT-5.2's 68.4%. The model also generates more aesthetically pleasing and effective presentations, with human raters preferring GPT-5.4's output 68.0% of the time over GPT-5.2 due to stronger visual variety and better use of image generation.

Enhanced Reasoning and Planning

A key feature of GPT-5.4 Thinking in ChatGPT is the ability to provide an upfront plan of its reasoning process, allowing users to adjust course mid-response while the model is working. This addresses a common pain point where users had to wait for a complete response before making corrections or adjustments.

The model also shows improved deep web research capabilities, particularly for highly specific queries, while maintaining better context for questions requiring longer thinking processes. These improvements translate to higher-quality answers that arrive faster and stay more relevant to the task at hand.

Native Computer Use Capabilities

GPT-5.4 represents OpenAI's first general-purpose model with native computer-use capabilities, marking a major step forward for developers building agents that complete real tasks across websites and software systems. The model achieves a state-of-the-art 75.0% success rate on OSWorld-Verified, which measures navigation of desktop environments through screenshots and keyboard/mouse actions.

This performance significantly exceeds GPT-5.2's 47.3% and even surpasses human performance at 72.4%. On WebArena-Verified, which tests browser use, GPT-5.4 achieves a leading 67.3% success rate when using both DOM- and screenshot-driven interaction.

The model's improved computer use is built on enhanced general visual perception capabilities. On MMMU-Pro, a test of visual understanding and reasoning, GPT-5.4 achieves an 81.2% success rate without tool use, improving over GPT-5.2's 79.5%. This translates into better document parsing capabilities, with GPT-5.4 achieving an average error rate of 0.109 on OmniDocBench compared to GPT-5.2's 0.140.

Advanced Tool Integration

GPT-5.4 introduces tool search in the API, which allows models to work efficiently when given many tools. Previously, when a model was given tools, all tool definitions were included in the prompt upfront, potentially adding thousands or even tens of thousands of tokens to every request. With tool search, GPT-5.4 receives a lightweight list of available tools along with a tool search capability, dramatically reducing token requirements for tool-heavy workflows.

In testing with 250 tasks from Scale's MCP Atlas benchmark, the tool-search configuration reduced total token usage by 47% while achieving the same accuracy. The model also improves tool calling accuracy and efficiency, particularly in the API, achieving higher accuracy in fewer turns on Toolathlon, a benchmark that tests how well AI agents can use real-world tools and APIs to complete multi-step tasks.

Enhanced Web Search Capabilities

GPT-5.4 shows significant improvements in agentic web search. On BrowseComp, a measurement of how well AI agents can persistently browse the web to find hard-to-locate information, GPT-5.4 leaps 17 percentage points over GPT-5.2, achieving 82.7% success. GPT-5.4 Pro sets a new state of the art at 89.3%.

This means GPT-5.4 Thinking is stronger at answering questions that require pulling together information from many sources on the web. It can more persistently search across multiple rounds to identify the most relevant sources, particularly for "needle-in-a-haystack" questions, and synthesize them into clear, well-reasoned answers.

Coding and Development Tools

GPT-5.4 combines the coding strengths of GPT-5.3-Codex with leading knowledge work and computer-use capabilities. It matches or outperforms GPT-5.3-Codex on SWE-Bench Pro while being lower latency across reasoning efforts. The model excels at complex frontend tasks, with noticeably more aesthetic and functional results than previous models.

OpenAI has also released an experimental Codex skill called "Playwright (Interactive)" that allows Codex to visually debug web and Electron apps. As a demonstration, OpenAI created an interactive isometric theme park simulation game from a single prompt, using Playwright Interactive for browser playtesting and image generation for the isometric asset set.

Safety and Deployment

OpenAI is treating GPT-5.4 as High cyber capability under its Preparedness Framework, deploying it with corresponding protections including expanded cyber safety stacks, monitoring systems, trusted access controls, and asynchronous blocking for higher-risk requests. The company maintains a precautionary approach to deployment while continuing to calibrate policies and classifiers.

On Chain-of-Thought monitorability, GPT-5.4 Thinking shows low ability to control its reasoning, which is a positive property for safety as it suggests the model lacks the ability to hide its reasoning and that CoT monitoring remains an effective safety tool.

Availability and Pricing

GPT-5.4 is rolling out gradually across ChatGPT and Codex. In the API, it's available as gpt-5.4, with GPT-5.4 Pro also available as gpt-5.4-pro for developers needing maximum performance on complex tasks.

In ChatGPT, GPT-5.4 Thinking is available starting today to ChatGPT Plus, Team, and Pro users, replacing GPT-5.2 Thinking. GPT-5.2 Thinking will remain available for three months for paid users in the model picker under the Legacy Models section before being retired on June 5, 2026.

API pricing reflects the model's improved capabilities, with GPT-5.4 priced at $2.50 per million input tokens and $15 per million output tokens, compared to GPT-5.2's $1.75 and $14 respectively. Batch and Flex pricing are available at half the standard API rate, while Priority processing is available at twice the standard API rate.

The launch of GPT-5.4 represents OpenAI's most comprehensive update yet, combining advances in reasoning, coding, computer use, and professional knowledge work into a single model that promises to significantly enhance productivity across a wide range of professional applications.

Comments

Loading comments...