LLM-Authored Reward Functions
Stop hand-engineering reward functions — have an LLM write them in the simulator's API and iterate.
The mechanism behind Eureka. A reward function specifies desirable behaviour (rewarded if on-track, penalized if wrong) and is normally written by a roboticist who knows the API — slow, specialized, manual. Eureka prompts an LLM to write the reward-function code directly against NVIDIA's Isaac Sim API, automating the design. It trained a five-finger robot hand to spin a pen — a skill Fan jokes is 'superhuman with respect to myself.' Generalizes beyond pen-spinning: it can author rewards for arbitrary tasks, or even generate new tasks.
- Reward engineering is the manual bottleneck in robot skill learning.
- An LLM can write reward-function code against the simulator API.
- An outer LLM loop refines rewards; an inner RL loop trains the controller.
- The same technique generalizes to authoring new tasks, not just rewards.
Told in the 'Eureka and Isaac Sim' section as the prior result that gives Fan conviction in the LLM-as-robot-developer approach; DrEureka extends it to reward + domain-randomization design for zero-shot transfer.