STRATEGYPipeline-building; a multi-year compounding investment.95% confidence

Data Is the Bottleneck, Not the Architecture

Robotics doesn't need a new model — it needs a three-source data engine: internet-scale + simulation + real-robot.

Problem it solves

Misdiagnosing the robotics ceiling as an architecture problem when the real constraint is action-labeled data.

Best for

Deciding where to spend a robotics data budget; arguing against premature architecture bets.

Not ideal for

Domains where action-labeled data is already cheap and abundant.

Overview

Why this framework exists

Fan's core GEAR data thesis. Transformers are not the limit yet — 'we have not pushed transformers to their limit.' The constraint is data, because you cannot download motor-control signals from the internet. The fix is a three-bucket strategy that combines complementary strengths: (1) internet-scale video for common-sense priors but no actions; (2) GPU simulation for infinite, 10,000×-real-time action data but a sim-to-real gap; (3) real-robot teleoperation data with no sim gap but bounded by 24 hours a day and human cost. Combine the strengths, cancel the weaknesses.

Core principles

4 total
  1. Internet data gives priors but carries no action labels.
  2. Simulation is effectively infinite and ~10,000× real-time, but has a sim-to-real gap.
  3. Real-robot data has no sim gap but is capped at 24h/day and is expensive.
  4. A winning strategy mixes all three and tokenizes the result for one transformer.

Origin story

How this framework came to be

Laid out across the 'Three kinds of data for robotics' section in response to why GEAR leans on simulation while most of the industry chases real-world data.

Source

Traced to primary
Source · PODCAST
Jim Fan on Nvidia's Embodied AI Lab and Jensen Huang's Prediction that All Robots will be Autonomous
Sequoia Capital (Training Data) · 2024
Open source →

Related frameworks

Browse all Strategy →