Lists / People

John Schulman

PPO; RLHF lecture notes

Co-founder of OpenAI who later joined Anthropic in 2024. His PPO algorithm became the dominant policy-gradient method in deep reinforcement learning, and his widely circulated lecture notes on RLHF helped define the training paradigm behind ChatGPT and similar systems. He is credited as a principal architect of the technique that fine-tunes large language models using human preference data.

Wikipedia

#91AI Researchers Who Communicate

42/ 100

Authority scoreNotable