Lists / People

John Schulman

PPO; RLHF lecture notes

Co-founder of OpenAI who later joined Anthropic in 2024. His PPO algorithm became the dominant policy-gradient method in deep reinforcement learning, and his widely circulated lecture notes on RLHF helped define the training paradigm behind ChatGPT and similar systems. He is credited as a principal architect of the technique that fine-tunes large language models using human preference data.

Wikipedia
#91AI Researchers Who Communicate
42/ 100
Authority scoreNotable
1lists#91peak1primary1category
Appears alongside

Top neighbors of John Schulman

8people
Last updated
10 May 2026
Suggest an edit →