OpenAI Opens the Hood on Codex CLI: A Deep Dive into Agent Loop Architecture

OpenAI publishes detailed technical breakdown of Codex CLI's agent loop design, revealing practical strategies for managing LLM context and performance.

OpenAI has launched a new article series that pulls back the curtain on the inner workings of Codex CLI, their software development agent. The inaugural post offers a comprehensive look at the agent loop architecture, providing valuable insights for developers building similar AI-powered tools.

The agent loop forms the backbone of Codex CLI, functioning as a continuous cycle that processes user input, generates tool calls through an LLM, and returns responses. However, the implementation goes beyond a simple loop, incorporating sophisticated strategies to handle the inherent limitations of large language models, particularly around context management and prompt caching.

The article reveals that some of these strategies emerged from real-world challenges. Several were born from bugs reported by users, demonstrating how practical deployment experiences shape architectural decisions. This transparency about the development process offers valuable lessons for the broader AI engineering community.

A key architectural advantage of Codex CLI is its LLM-agnostic design. By leveraging the Open Responses API, the CLI can work with any model wrapped by this interface, including locally-hosted open models. This flexibility means the design patterns and lessons OpenAI shares can benefit anyone building agents on top of the Responses API.

Inside the Agent Loop

The article walks through a single turn of the conversation between user and agent. It begins with assembling an initial prompt that includes:

Instructions (system messages containing general rules and coding standards)
Tools (a list of MCP servers the agent can invoke)
Input (text, images, and file inputs including AGENTS.md, local environment information, and the user's message)

This JSON payload is sent to the Responses API, triggering LLM inference that produces a stream of output events. These events fall into two categories:

Tool calls: The agent invokes specified tools with given inputs and collects the output
Reasoning outputs: Steps in the agent's plan or thought process

Both tool calls and reasoning are appended to the initial prompt, which is then passed back to the LLM for additional iterations. This continues until the LLM responds with a "done" event containing the final user response.

Performance Challenges and Solutions

A significant challenge in this architecture is LLM inference performance, which becomes "quadratic in terms of the amount of JSON sent to the Responses API over the course of the conversation." This quadratic growth makes prompt caching essential. By reusing outputs from previous inference calls, performance shifts from quadratic to linear growth.

However, prompt caching introduces its own complexities. Any changes to the tool list invalidate the cache. In fact, Codex CLI's initial MCP support had a bug that "failed to enumerate the tools in a consistent order," causing cache misses and performance degradation.

The system also employs compaction to manage context size. When conversation length exceeds a token threshold, the agent calls a special Responses API endpoint that provides a compressed representation of the conversation, replacing the previous input while preserving essential information.

Community Response and Future Posts

The technical community has responded positively to OpenAI's transparency. Hacker News users particularly praised the decision to open-source Codex CLI, especially given that competing tools like Claude Code remain closed-source. One developer noted: "This is a big deal and very useful for anyone wanting to learn how coding agents work, especially coming from a major lab like OpenAI."

OpenAI has hinted at future articles that will explore additional aspects of Codex CLI, including:

The CLI's overall architecture
Implementation details of tool use
Codex's sandboxing model

The Codex CLI source code, along with bug tracking and fix history, is available on GitHub, providing a valuable resource for developers looking to understand or build upon these architectural patterns.

This series represents a significant contribution to the growing body of knowledge around agentic AI systems, offering practical insights that extend far beyond Codex CLI itself.