Zhipu AI's GLM-4.5 Series Breaks New Ground in Unified Reasoning and Agentic AI

Zhipu AI has unveiled GLM-4.5 and GLM-4.5-Air, its latest large language models designed to unify reasoning, coding, and agentic capabilities in a single architecture. These models outperform rivals like Claude and GPT-4 in key benchmarks, offering hybrid modes for complex tasks and efficiency gains through MoE innovations. Available via API and open weights, they signal a leap toward versatile, developer-ready AI for real-world applications.

The race for more capable AI models just intensified with Zhipu AI's release of the GLM-4.5 series—comprising the flagship GLM-4.5 and its leaner counterpart, GLM-4.5-Air. Announced on the Zhipu AI Blog, these models represent a deliberate shift from specialized AI toward unified intelligence, tackling everything from complex reasoning to full-stack development within a single framework. For developers and enterprises, this isn't just another incremental update; it's a foundational step toward AI that can seamlessly handle the tangled demands of modern agentic systems.

The Architecture: Hybrid Reasoning and MoE Efficiency

GLM-4.5 packs 355 billion total parameters (32 billion active), while GLM-4.5-Air scales down to 106 billion total parameters (12 billion active). Both employ a Mixture-of-Experts (MoE) architecture, but with a twist: Zhipu reduced model width (hidden dimensions and expert count) while increasing depth (more layers), betting that deeper networks enhance reasoning. This counters trends like DeepSeek-V3's wider designs. Key innovations include:

Grouped-Query Attention with partial RoPE and 96 attention heads—double typical counts—which boosted performance on reasoning benchmarks like MMLU despite no training loss improvements.
Hybrid Modes: A 'thinking mode' for multi-step reasoning and tool use, and a 'non-thinking mode' for instant responses, allowing dynamic adaptation to task complexity.
Efficiency Features: Loss-free balance routing, sigmoid gates for MoE layers, and a Multi-Token Prediction layer for speculative decoding, accelerating inference.

As one Zhipu engineer noted in the announcement: "> Our focus on depth over width unlocked unexpected gains in logical problem-solving, proving that sometimes, going deeper is smarter than going wider."

Benchmark Dominance: Topping Agentic and Coding Tasks

Zhipu rigorously tested GLM-4.5 against models from OpenAI, Anthropic, Google DeepMind, and others across 12 benchmarks. The results are compelling:

Agentic Prowess: With 128K context and native function calling, GLM-4.5 matched Claude 4 Sonnet on 𝜏-bench and BFCL-v3. In web browsing (BrowseComp benchmark), it scored 26.4% accuracy, beating Claude-4-Opus (18.8%) and nearing GPT-4o-mini (28.3%).
Coding Excellence: On SWE-bench and Terminal Bench, GLM-4.5 excelled in agentic coding, achieving a 90.6% tool-calling success rate—outpacing Claude-4-Sonnet (89.5%) and Kimi-K2 (86.2%). It also dominated in full-stack development, generating functional web apps with polished frontends and backends from minimal prompts.
Reasoning Strength: In math and science benchmarks like AIME and GPQA, the thinking mode delivered robust results, validated by automated LLM checks.

A Pareto Frontier analysis confirmed GLM-4.5 and GLM-4.5-Air sit on the optimal performance-efficiency boundary for their scale. This isn't just about raw power; it's about practical utility. For instance, Zhipu's demo of a 'PPT/Poster agent'—where GLM-4.5 autonomously creates slides from user inputs—showcases how these models blur the line between coding and creative tasks.

Training and RL: The Secret Sauce

Behind the scenes, Zhipu's training pipeline is a marvel of modern AI engineering. The base model consumed 15T tokens of general data and 7T tokens of code/reasoning corpora, followed by domain-specific fine-tuning. The real game-changer is slime, an open-sourced RL infrastructure designed for scalability:

Hybrid Architecture: Supports synchronous training for reasoning tasks and asynchronous, disaggregated training for agentic RL—decoupling rollouts from training to avoid GPU bottlenecks.
Optimized Rollouts: Uses FP8 precision for data generation and BF16 for training, dramatically speeding up long-horizon tasks like coding or web interactions.
Specialized RL Stages: Curriculum-based training for reasoning and execution feedback-driven RL for coding, enabling skill transfer to broader tool use.

This approach allowed Zhipu to enhance capabilities iteratively, moving beyond GLM-4's foundations to prioritize agentic fluency. As the blog states, "> slime's design ensures our GPUs stay saturated even when data generation is slow—critical for real-world agent deployment."

Why This Matters for Developers

For the tech community, GLM-4.5 isn't just another model—it's a toolkit for innovation. Its unification of strengths means developers can build agents that handle reasoning, coding, and tool chaining without switching between specialized systems. Availability lowers barriers:

Access Points: Chat with GLM-4.5 on Z.ai, use the OpenAI-compatible API, or deploy locally via Hugging Face and ModelScope weights with vLLM/SGLang support.
Integration: Seamless pairing with frameworks like Claude Code and Roo Code for enhanced coding workflows.

In an AI landscape fragmented by single-skill models, GLM-4.5 offers a glimpse of a more cohesive future—where one model can draft a presentation, debug code, and browse the web, all while optimizing for efficiency. As agentic applications explode in complexity, Zhipu's bet on unification might just set the new standard.

Source: Zhipu AI Blog