INNOVATIONResearch-grade; demonstrated (Eureka, Oct 2023; DrEureka, RSS 2024).92% confidence

LLM-Authored Reward Functions

Stop hand-engineering reward functions — have an LLM write them in the simulator's API and iterate.

Problem it solves

Reward-function engineering is tedious, manual expert work that bottlenecks robot skill acquisition.

Best for

Automating robot-skill specification; scaling RL beyond hand-tuned rewards.

Not ideal for

Settings without a programmable simulator or a capable coding LLM.

Overview

Why this framework exists

The mechanism behind Eureka. A reward function specifies desirable behaviour (rewarded if on-track, penalized if wrong) and is normally written by a roboticist who knows the API — slow, specialized, manual. Eureka prompts an LLM to write the reward-function code directly against NVIDIA's Isaac Sim API, automating the design. It trained a five-finger robot hand to spin a pen — a skill Fan jokes is 'superhuman with respect to myself.' Generalizes beyond pen-spinning: it can author rewards for arbitrary tasks, or even generate new tasks.

Core principles

4 total
  1. Reward engineering is the manual bottleneck in robot skill learning.
  2. An LLM can write reward-function code against the simulator API.
  3. An outer LLM loop refines rewards; an inner RL loop trains the controller.
  4. The same technique generalizes to authoring new tasks, not just rewards.

Origin story

How this framework came to be

Told in the 'Eureka and Isaac Sim' section as the prior result that gives Fan conviction in the LLM-as-robot-developer approach; DrEureka extends it to reward + domain-randomization design for zero-shot transfer.

Source

Traced to primary
Source · PODCAST
Jim Fan on Nvidia's Embodied AI Lab and Jensen Huang's Prediction that All Robots will be Autonomous
Sequoia Capital (Training Data) · 2024
Open source →

Related frameworks

Browse all Innovation →