AI labs pay XDOF for robot training data

XDOF wants to own the labor layer behind robot foundation models: teleoperators, sensor rigs, cleaned trajectories and the dull work AI labs need before robots can learn useful tasks.

AI researchers spent the language model boom feeding models text from the web. Robotics teams face a rougher job. They need humans to move robot arms, reset failed tasks, label sensor streams and repeat the same motions until a model sees enough examples to learn from them.

XDOF, pronounced "ecks-doff," has raised $70 million from Thrive Capital, Spark Capital, a16z, Lux and WndrCo to build that data layer. Co-founder and CEO Philipp Wu told TechCrunch the company works with 20 customers, including several frontier AI labs, though he declined to name them.

The timing fits a wider push into physical AI. OpenAI has returned to robotics work, and humanoid companies such as Figure AI keep chasing robots that can handle homes, warehouses and factories. Those teams can buy chips, rent compute and hire model researchers. They still need data that captures contact, motion, failure and recovery in the physical world.

Wu learned that constraint as a University of California, Berkeley doctoral student. He studied robot learning from large data sets, then ran into a basic shortage: researchers could not train broad robot models without broad robot data. Wu and future XDOF co-founder Fred Shentu helped build GELLO, a low-cost teleoperation system that lets a human operator control a robot arm and record demonstrations. The GELLO paper framed the problem in practical terms: better imitation learning depends on better demonstrations.

XDOF now wants to turn that academic bottleneck into an infrastructure business. Wu, Shentu and co-founder and Chief Operating Officer Nemo Jin launched the company in October 2024. The team has about 60 employees and sells data collection, cleaning, annotation and tooling for robotics customers.

The company also plans to release ABC with UC Berkeley's AI Research lab. The data set includes 130,000 robot manipulation trajectories, 300 hours of simulation and 100 hours of evaluations, according to TechCrunch. Researchers have used the data for benchmark tasks such as folding T-shirts, flattening boxes and loading AirPods into cases.

The community signal matters because robotics has lacked the shared pretraining corpora that helped language and image models improve. Open data sets give academic labs a way to test methods against the same tasks. They also give startups a talent and standards pipeline, since students can train against the same formats commercial teams use.

XDOF's approach spans three data sources. The highest-value source comes from teleoperators controlling the same robot a customer plans to deploy. A second source uses general teleoperated systems, including GELLO-style devices. A third source uses egocentric data from humans doing ordinary tasks, with XDOF planning wearable sensors for that work.

That hierarchy shows why robot data costs more than web text. A teleoperator has to manage hardware, camera placement, calibration, resets and task design. A small camera change can hurt hand tracking. A loose calibration step can contaminate a batch of demonstrations. A warehouse full of robot arms needs technicians, trainers and managers before researchers can train a model.

AI labs could build those operations themselves. Some may do that for strategic tasks or sensitive data. Wu argues many labs will outsource the work because they would rather spend staff time on models and deployment. That argument resembles earlier cloud and labeling businesses: teams keep core research close, then pay specialists for the repetitive infrastructure they need at scale.

Skeptics have reasons to push back. Data businesses can turn into services firms with narrow margins. Robotics customers may demand custom rigs, custom task taxonomies and custom annotation rules. Large AI labs could also pull the work inside once the field settles on standard hardware and data formats.

XDOF's defense sits in the feedback loop. The company does not want to sell raw demonstrations alone. It wants operators, collection tools, cleaning systems and annotation workflows to improve together as customers train models and report failures. If XDOF can capture those lessons across customers, its advantage comes from operational memory as much as data volume.

The company name points at the ambition. Robotics engineers use degrees of freedom to describe the independent motions a robot can make. A human arm has seven from shoulder to wrist. Figure's humanoid robots have far more. Wu says the X stands for arbitrary or unlimited degrees of freedom.

That branding reaches beyond one startup. The robotics race now depends on people doing repetitive work in labs and warehouses, far from demo videos. AI teams can announce better models in minutes. They teach robots through hours of human motion, failed grasps and reset tables.

#Robotics #AI training data #teleoperation #Foundation Models #Data Infrastructure

AI labs pay XDOF for robot training data

Comments