Microsoft Foundry Gets OpenAI's Latest Models: GPT-5.3-Codex, GPT-Realtime-1.5, and GPT-Audio-1.5

Microsoft Foundry now hosts OpenAI's newest models including GPT-5.3-Codex for complex coding tasks, GPT-Realtime-1.5 for low-latency voice interactions, and GPT-Audio-1.5 for enhanced speech understanding, marking a significant evolution in AI capabilities for developers.

Microsoft has expanded its Azure OpenAI offerings in Microsoft Foundry with three new models designed to address the growing complexity of modern AI applications. GPT-5.3-Codex, GPT-Realtime-1.5, and GPT-Audio-1.5 represent a strategic shift from simple prompt-response interactions toward AI systems capable of sustained reasoning, collaboration, and real-time engagement.

GPT-5.3-Codex: Engineering at Scale

The flagship GPT-5.3-Codex model represents a significant evolution in AI-assisted development. By combining the coding prowess of GPT-5.2-Codex with the reasoning capabilities of GPT-5.2, OpenAI has created a unified system optimized for real engineering work rather than isolated code generation tasks.

Key capabilities include:

25% faster execution than previous versions, enabling developers to accelerate application development
Long-running task support with sustained context maintenance across complex, multi-step operations
Mid-task steerability allowing developers to redirect and collaborate with the model without losing context
Enhanced computer-use capabilities spanning the full spectrum of technical work

This model is particularly suited for scenarios where requirements evolve during development, such as refactoring large legacy applications, performing multi-step migrations, running agentic developer workflows, automating code reviews, and working in security-sensitive environments.

Pricing: $1.75 per 1M input tokens, $0.175 for cached input, and $14.00 for output tokens

GPT-Realtime-1.5 and GPT-Audio-1.5: Voice-First AI

For applications requiring natural voice interaction, OpenAI has introduced GPT-Realtime-1.5 and GPT-Audio-1.5, both showing measurable improvements in speech understanding and reasoning capabilities.

Performance improvements include:

+5% lift on Big Bench Audio (reasoning tasks)
+10.23% improvement in alphanumeric transcription accuracy
+7% gain in instruction following while maintaining low-latency performance
More natural-sounding speech with improved pacing and prosody
Higher audio quality with clearer, more consistent output
Function calling support enabling structured, tool-driven interactions within real-time audio flows

These models are ideal for conversational voice agents, voice-enabled assistants embedded in applications or devices, live voice interfaces for kiosks and demos, and hands-free workflows where audio replaces keyboard interaction.

Pricing:

GPT-Realtime-1.5: $4.00 per 1M input tokens, $0.04 cached input, $16.00 output tokens, with audio processing at $32.00/$0.40/$64.00
GPT-Audio-1.5: $2.50 per 1M input tokens, $10.00 output tokens, with audio processing at $32.00/$64.00

Strategic Implications for Developers

These releases signal OpenAI's recognition that modern AI applications require sustained engagement rather than discrete interactions. The models address common pain points including latency spikes, instruction drift, and unreliable tool calls that disrupt both user conversations and developer workflows.

For enterprise developers, the integration into Microsoft Foundry provides a unified environment for evaluation, deployment, and governance. This allows teams to progress from experiments to scalable applications while maintaining security and operational controls—a critical consideration for regulated industries and security-sensitive environments.

Getting Started

Developers can begin exploring these models immediately in Microsoft Foundry, where they can evaluate performance and experiment with Azure OpenAI models in a controlled environment. The platform's integrated approach to evaluation, deployment, and governance streamlines the path from prototype to production deployment.

The rollout of these models represents a maturation of the AI development ecosystem, moving beyond simple prompt engineering toward collaborative, context-aware systems that can reason and act over extended periods. For organizations building sophisticated AI applications, these capabilities address fundamental limitations that have constrained the effectiveness of earlier models in real-world scenarios.