Spirit AI has open-sourced its Spirit v1.5 embodied AI model after it achieved top performance on the RoboChallenge benchmark and demonstrated industrial viability in CATL's battery production line.
Spirit AI has released Spirit v1.5 as open-source software following its top-ranking performance on RoboChallenge, a globally recognized benchmark for embodied AI systems. The Vision-Language-Action (VLA) model scored 66.09 on the Table30 leaderboard with a 50.33% success rate, becoming the first to exceed 50% success on this benchmark. More significantly, the model is now operational in industrial settings, powering humanoid robots on CATL's battery production lines.
Benchmark Design and Significance
RoboChallenge, co-developed by Dexmal, Hugging Face, and the Beijing Academy of Artificial Intelligence (BAAI), represents a methodological shift in embodied AI evaluation. Unlike simulation-based benchmarks, RoboChallenge conducts 24/7 testing on physical robotic hardware including Franka, Arx5, UR5 single-arm systems and ALOHA dual-arm configurations. The platform uses multi-view RGB and depth sensors to measure how Vision-Language-Action models:
- Generalize across unseen physical environments
- Execute time-dependent reasoning tasks
- Handle multi-stage operations requiring long-horizon planning
- Transfer skills between different robot morphologies
This hardware-in-the-loop approach creates what researchers consider a more realistic assessment of practical deployment readiness.
Performance Beyond the Benchmark
Spirit v1.5's benchmark achievement is substantiated by field deployment at CATL's Zhongzhou facility, where it controls 'Moz' humanoid robots on new energy battery PACK production lines. According to operational data:
- 99% plug-in success rate for battery components
- 3x efficiency improvement over human operators
- Demonstrated reduction in high-voltage safety risks
The production environment validates claims about the model's industrial-grade reliability, particularly in precision tasks requiring consistency over extended periods.
Technical Context and Limitations
Spirit v1.5 advances the field through system-level optimization rather than breakthrough innovations. Its benchmark score exceeds the previous leader (Pi0.5) by 15 percentage points, but the 50.33% success rate indicates nearly half of attempted tasks still fail. This suggests limitations in:
- Handling unstructured environments outside controlled settings
- Adapting to task variations not encountered during training
- Scaling to more complex manipulation scenarios
While CATL deployment proves effectiveness in structured industrial workflows, performance in dynamic settings like warehouses or households remains unverified. The open-source release enables community validation of these constraints.
Industry Implications
The dual validation—benchmark leadership plus production deployment—signals Chinese embodied AI's transition from research prototypes to viable industrial solutions. Spirit v1.5 represents one of the first models to demonstrate measurable productivity gains in real manufacturing contexts. However, the field remains nascent: RoboChallenge results show no model exceeds 67% success, indicating fundamental challenges in robotic generalization remain unsolved. The open-source release provides researchers crucial data to address these gaps while allowing manufacturers to test the technology's boundaries in practical applications.

Comments
Please log in or register to join the discussion