Symbolica's Agentica SDK Achieves 36% on ARC-AGI-3, Outperforming Human Baselines

Symbolica's Agentica SDK achieves 36.08% on ARC-AGI-3, solving 113 levels and 7 games while costing $1,005 versus $8,900 for comparable models.

Symbolica's Agentica SDK has achieved a breakthrough in AI reasoning, scoring 36.08% on the ARC-AGI-3 benchmark—a significant leap from the 0% baseline and surpassing human performance on several games. The system solved 113 out of 182 playable levels and completed 7 of the 25 available games in the competition, demonstrating remarkable progress in abstract reasoning capabilities.

Cost-Effective Performance

The Agentica implementation delivers exceptional value, achieving 36.08% accuracy for just $1,005 in compute costs. This stands in stark contrast to traditional Chain of Thought (CoT) models: Opus 4.6 Max achieved only 0.25% for $8,900, while GPT 5.4 High managed 0.3% for a comparable investment. Figure 1 illustrates this dramatic cost-performance advantage, showing Agentica's position as both the highest-scoring and most cost-effective solution among tested approaches.

Game-Winning Performance

Agentica demonstrated its reasoning prowess by winning 7 games outright, with several achieving impressive completion rates:

CN04: 97.6% win rate (118 actions)
LP85: 84.16% win rate (273 actions)
AR25: 83.28% win rate (516 actions)
FT09: 77.59% win rate (123 actions)

The system's performance across all games shows consistent reasoning ability, with individual game scores ranging from 97.60% down to 0.22%. The complete breakdown reveals strengths in pattern recognition and logical deduction across diverse puzzle types.

Technical Implementation

Built on Symbolica's SDK, Agentica leverages advanced agentic reasoning to tackle the ARC-AGI-3 challenges. The system's architecture enables persistent task execution, allowing it to work through complex puzzles systematically. The team has made the implementation available on GitHub at symbolica-ai/ARC-AGI-3-Agents, providing transparency into the approach and enabling further research.

Beyond Benchmark Performance

Symbolica has sandboxed the SDK to demonstrate its versatility, allowing it to run any persistent task including solving ARC puzzles. The team highlights potential applications beyond academic benchmarks, such as tracking Zillow listings or other real-world reasoning tasks that require sustained logical analysis.

This achievement represents a significant milestone in AI's ability to reason about abstract patterns and solve novel problems—capabilities that remain challenging for traditional machine learning approaches. The combination of high performance, low cost, and open-source availability positions Agentica as a compelling platform for advancing AI reasoning research.

For developers and researchers interested in exploring the implementation, the complete codebase and detailed performance metrics are available on the project's GitHub repository.

#AI #Agentic Reasoning #ARC-AGI-3 #Open Source #Cost-effective

Symbolica's Agentica SDK Achieves 36% on ARC-AGI-3, Outperforming Human Baselines

Cost-Effective Performance

Game-Winning Performance

Technical Implementation

Beyond Benchmark Performance

Comments