The 10,001st World
Train across 10,000 randomized simulations and reality becomes just the 10,001st — sim-to-real by distribution, not fine-tuning.
Fan's reframing of domain randomization. Train a policy across 10,000 parallel simulations, each with slightly different physics (gravity, friction, weight). An agent that masters all 10,000 configurations treats the real physical world as just the 10,001st sample from the same distribution — so it generalizes zero-shot, no fine-tuning. DrEureka demonstrated it: a robot dog learned to balance and walk on a yoga ball purely in sim, then transferred to the real world untouched. The deeper claim: virtual and physical are 'different realities on a single axis,' not different problems.
- Randomize physics across thousands of parallel sims, not one careful sim.
- Master all N configurations and reality is just configuration N+1.
- This yields zero-shot sim-to-real without further fine-tuning.
- Virtual and physical are points on one reality axis, not separate domains.
Given in the 'Is the virtual world in the service of the physical world?' section, grounded in the DrEureka follow-up to Eureka.