StepFun Releases Step 3.5 Flash, an Open-Source Foundation Model Built for AI Agents

StepFun has open-sourced Step 3.5 Flash, an agent-first foundation model delivering up to 350 TPS inference, 256K context support, and performance approaching closed-source models in reasoning and agent tasks.

On February 2, 2026, Chinese AI company StepFun officially released Step 3.5 Flash, its latest and most powerful open-source foundation model, positioning it as “born for agents.” The company says the model delivers strong reasoning capabilities, high stability, and performance optimized specifically for Agent-based workflows.

Key highlights include:

Faster: Inference speeds of up to 350 tokens per second for single-request coding tasks
Stronger: Performance in Agent scenarios and mathematical reasoning comparable to leading closed-source models
More stable: Capable of handling complex, long-horizon, multi-step tasks

According to StepFun, the next generation of foundation models must not only be “smarter,” but also trustworthy, responsive, and cost-efficient. To achieve this balance, Step 3.5 Flash adopts several architectural innovations:

Sparse Mixture-of-Experts (MoE): Each token activates only around 11 billion parameters, out of a total 196 billion, significantly reducing compute costs

MTP-3 (Multi-Token Prediction): The model predicts three tokens per step, effectively doubling inference efficiency

Hybrid Attention Architecture (SWA + Full Attention): A 3:1 sliding-window-to-global attention mix allows the model to focus on key segments in long texts, enabling efficient processing of up to 256K context length with lower computational overhead

Step 3.5 Flash is now fully available, and StepFun also revealed that training for the Step 4 model has already begun. The company invited developers and researchers to participate in the model's open development and ecosystem co-creation.

With its emphasis on Agent intelligence, long-context reasoning, and inference efficiency, Step 3.5 Flash signals StepFun's ambition to establish a competitive open-source foundation for next-generation AI agent systems.

Technical Deep Dive: What Makes Step 3.5 Flash Different

The 350 tokens per second (TPS) inference speed is particularly noteworthy in the current landscape where many open-source models struggle to match the throughput of proprietary solutions. This performance metric suggests StepFun has made significant optimizations in their inference pipeline, likely leveraging both hardware acceleration and architectural choices that minimize computational overhead.

The MoE Architecture Advantage

The Sparse Mixture-of-Experts approach represents a clever solution to the scaling problem. By activating only 11B out of 196B parameters per token, Step 3.5 Flash achieves a remarkable efficiency ratio. This means the model can maintain competitive performance while requiring significantly less computational resources during inference compared to dense models of similar parameter counts.

Multi-Token Prediction: Doubling the Speed

The MTP-3 mechanism is particularly interesting from an engineering perspective. Traditional autoregressive models predict one token at a time, creating a sequential bottleneck. By predicting three tokens simultaneously, Step 3.5 Flash effectively increases throughput by approximately 3x, though the actual gain is closer to 2x when accounting for the additional computational complexity of multi-token prediction.

Long Context Processing

The 256K context window is becoming increasingly important as AI agents need to process longer documents, conversations, and codebases. The hybrid attention architecture combining sliding-window attention (SWA) with full attention allows the model to maintain efficiency while still capturing global dependencies when necessary. The 3:1 ratio suggests a thoughtful balance between local pattern recognition and global context understanding.

Agent-First Design Philosophy

StepFun's explicit focus on agent workflows represents a shift in how foundation models are being developed. Rather than creating general-purpose models and retrofitting them for agent use cases, Step 3.5 Flash appears to have been designed from the ground up with agent requirements in mind.

This agent-first approach likely influenced several design decisions:

The emphasis on stable, long-horizon task completion
The focus on inference efficiency for real-time agent responses
The optimization for reasoning tasks common in agent workflows
The support for large context windows to maintain conversation and task state

Open Source Strategy and Ecosystem Building

By open-sourcing Step 3.5 Flash, StepFun is positioning itself in the growing ecosystem of open foundation models. This strategy offers several advantages:

Community Development: Open sourcing allows the broader developer community to contribute improvements, find bugs, and create specialized adaptations
Trust and Transparency: Organizations can audit the model's behavior and ensure it meets their requirements
Ecosystem Lock-in: As developers build on the model, they become invested in the StepFun ecosystem
Competitive Positioning: Open source serves as a counterweight to closed models from companies like OpenAI and Anthropic

Performance Claims and Real-World Implications

The claim that Step 3.5 Flash's performance "approaches closed-source models" in reasoning and agent tasks is significant. If verified, this would represent a major milestone for open-source AI, demonstrating that the performance gap between open and closed models is narrowing.

However, such claims require careful evaluation. The specific benchmarks and comparison methodologies matter significantly. Users should look for:

Independent third-party evaluations
Reproducible benchmark results
Performance across diverse task types
Real-world deployment experiences

Looking Forward: The Step 4 Announcement

The announcement that Step 4 training has already begun suggests StepFun is operating on an aggressive development timeline. This rapid iteration cycle is characteristic of the current AI landscape, where companies are racing to maintain competitive advantages.

For developers and organizations considering Step 3.5 Flash, this raises questions about the model's longevity and whether investing in the current version makes sense given the impending next generation.

Practical Considerations for Adoption

Organizations considering adopting Step 3.5 Flash should evaluate several factors:

Infrastructure Requirements: While the MoE architecture reduces compute costs, the model still requires significant GPU resources for optimal performance

Integration Effort: Agent workflows often require careful prompt engineering and system design beyond just the model itself

Support and Maintenance: Open source models require organizations to handle their own support, updates, and troubleshooting

Licensing and Compliance: Understanding the open-source license terms and any usage restrictions is crucial

Conclusion

StepFun's Step 3.5 Flash represents a significant contribution to the open-source AI landscape, particularly for agent-focused applications. The combination of high inference speed, large context support, and agent-optimized design addresses many practical requirements for deploying AI agents in production environments.

The model's success will ultimately depend on real-world performance and adoption. As the community begins to experiment with and deploy Step 3.5 Flash, we'll gain a clearer picture of how it compares to both other open-source models and proprietary alternatives.

For now, StepFun has positioned itself as a serious player in the foundation model space, with ambitions that extend beyond just creating another large language model. Their focus on agent intelligence and open development suggests a long-term strategy aimed at shaping the future of AI agent systems.

#Open Source #Foundation Models #Agent-First #MoE #Long-Context