Code as Action

Let the agent write code as its action space — then save what works to a skill library it authors itself.

Problem it solves

Brittle, non-reusable agent behaviour in open-ended environments with no fixed task.

Best for

Building self-improving agents in open-ended, no-fixed-score environments.

Not ideal for

Narrow tasks with a clean reward and no need for skill reuse.

Overview

Why this framework exists

The insight behind Voyager (built once GPT-4 arrived as 'the best coding model out there'): use code as the action interface. The 3D Minecraft world is converted to a text representation; the agent writes programs against an action API; a self-reflection loop feeds runtime errors back so the agent debugs its own code; correct programs are saved to a skill library — 'a codebase that the LLM interactively authored all by itself.' An automated curriculum (one directive: 'find as many novel items as possible') lets it propose its own next task, neither too hard nor too easy.

Core principles

4 total

Code is a more general action space than hand-coded primitives.
A self-reflection loop turns runtime errors into self-correction.
Verified programs become reusable skills in a self-authored library.
An automated curriculum lets the agent set its own next task at the right difficulty.

Origin story

How this framework came to be

Described in the 'Exploring virtual worlds' section as the second of Fan's Minecraft projects (after MineDojo), and the team's earliest foundation-model agent.

Source

Traced to primary

Source · PODCAST

Jim Fan on Nvidia's Embodied AI Lab and Jensen Huang's Prediction that All Robots will be Autonomous

Sequoia Capital (Training Data) · 2024

Open source →

Related frameworks

Browse all Innovation →