Data Is the Bottleneck, Not the Architecture
Robotics doesn't need a new model — it needs a three-source data engine: internet-scale + simulation + real-robot.
Fan's core GEAR data thesis. Transformers are not the limit yet — 'we have not pushed transformers to their limit.' The constraint is data, because you cannot download motor-control signals from the internet. The fix is a three-bucket strategy that combines complementary strengths: (1) internet-scale video for common-sense priors but no actions; (2) GPU simulation for infinite, 10,000×-real-time action data but a sim-to-real gap; (3) real-robot teleoperation data with no sim gap but bounded by 24 hours a day and human cost. Combine the strengths, cancel the weaknesses.
- Internet data gives priors but carries no action labels.
- Simulation is effectively infinite and ~10,000× real-time, but has a sim-to-real gap.
- Real-robot data has no sim gap but is capped at 24h/day and is expensive.
- A winning strategy mixes all three and tokenizes the result for one transformer.
Laid out across the 'Three kinds of data for robotics' section in response to why GEAR leans on simulation while most of the industry chases real-world data.