The Foundation Agent

One model that generalizes across three axes — the skills it can do, the bodies it can control, and the realities it can master.

Problem it solves

The fragmentation of embodied AI into per-task, per-robot, per-environment systems that never compound.

Best for

Framing long-horizon embodied-AI roadmaps and why generality is the prize.

Not ideal for

Teams that need a shipping product this quarter on one robot, one task.

Overview

Why this framework exists

Fan's flagship thesis (he says he 'proposed' the term the prior year): the field is converging toward a single embodied model that generalizes over three axes simultaneously — (1) skills, (2) embodiments / form factors, and (3) worlds or realities, virtual and physical. It is the embodied analogue of how one LLM replaced a zoo of task-specific NLP pipelines. The GEAR Lab's stated end-goal.

Core principles

3 total

Generality lives on three axes at once: skills × embodiments × realities.
Virtual and physical agents share one API: perception in, actions out.
A single foundation agent subsumes both gaming AI and robotics.

Origin story

How this framework came to be

Articulated in the 'Exploring virtual worlds' section as the unifying frame behind GEAR's dual mandate — robotics (physical) and gaming agents (virtual) are the same problem under one model.

Source

Traced to primary

Source · PODCAST

Jim Fan on Nvidia's Embodied AI Lab and Jensen Huang's Prediction that All Robots will be Autonomous

Sequoia Capital (Training Data) · 2024

Open source →

Related frameworks

Browse all Innovation →