Spirit v1.5 Tops RoboChallenge Benchmark and Enters Industrial Deployment

Spirit AI has open-sourced its Spirit v1.5 embodied AI model after it achieved top performance on the RoboChallenge benchmark and demonstrated industrial viability in CATL's battery production line.

Spirit AI has released Spirit v1.5 as open-source software following its top-ranking performance on RoboChallenge, a globally recognized benchmark for embodied AI systems. The Vision-Language-Action (VLA) model scored 66.09 on the Table30 leaderboard with a 50.33% success rate, becoming the first to exceed 50% success on this benchmark. More significantly, the model is now operational in industrial settings, powering humanoid robots on CATL's battery production lines.

Benchmark Design and Significance

RoboChallenge, co-developed by Dexmal, Hugging Face, and the Beijing Academy of Artificial Intelligence (BAAI), represents a methodological shift in embodied AI evaluation. Unlike simulation-based benchmarks, RoboChallenge conducts 24/7 testing on physical robotic hardware including Franka, Arx5, UR5 single-arm systems and ALOHA dual-arm configurations. The platform uses multi-view RGB and depth sensors to measure how Vision-Language-Action models:

Generalize across unseen physical environments
Execute time-dependent reasoning tasks
Handle multi-stage operations requiring long-horizon planning
Transfer skills between different robot morphologies
This hardware-in-the-loop approach creates what researchers consider a more realistic assessment of practical deployment readiness.

Performance Beyond the Benchmark

Spirit v1.5's benchmark achievement is substantiated by field deployment at CATL's Zhongzhou facility, where it controls 'Moz' humanoid robots on new energy battery PACK production lines. According to operational data:

99% plug-in success rate for battery components
3x efficiency improvement over human operators
Demonstrated reduction in high-voltage safety risks
The production environment validates claims about the model's industrial-grade reliability, particularly in precision tasks requiring consistency over extended periods.

Technical Context and Limitations

Spirit v1.5 advances the field through system-level optimization rather than breakthrough innovations. Its benchmark score exceeds the previous leader (Pi0.5) by 15 percentage points, but the 50.33% success rate indicates nearly half of attempted tasks still fail. This suggests limitations in:

Handling unstructured environments outside controlled settings
Adapting to task variations not encountered during training
Scaling to more complex manipulation scenarios
While CATL deployment proves effectiveness in structured industrial workflows, performance in dynamic settings like warehouses or households remains unverified. The open-source release enables community validation of these constraints.

Industry Implications

The dual validation—benchmark leadership plus production deployment—signals Chinese embodied AI's transition from research prototypes to viable industrial solutions. Spirit v1.5 represents one of the first models to demonstrate measurable productivity gains in real manufacturing contexts. However, the field remains nascent: RoboChallenge results show no model exceeds 67% success, indicating fundamental challenges in robotic generalization remain unsolved. The open-source release provides researchers crucial data to address these gaps while allowing manufacturers to test the technology's boundaries in practical applications.