Lists / People
John Schulman
PPO; RLHF lecture notes
Co-founder of OpenAI who later joined Anthropic in 2024. His PPO algorithm became the dominant policy-gradient method in deep reinforcement learning, and his widely circulated lecture notes on RLHF helped define the training paradigm behind ChatGPT and similar systems. He is credited as a principal architect of the technique that fine-tunes large language models using human preference data.
42/ 100
Authority scoreNotable
1lists#91peak1primary1category
Appears alongside
Top neighbors of John Schulman
8people
01
Aidan Gomez
Cohere co-founder; transformer-builder commentary
1shared list
02Ajeya Cotra
biological-anchors timelines report
1shared list
03Alec Radford
GPT-1/2/CLIP communications
1shared list
04Aleksander Madry
adversarial-robustness framework
1shared list
05Alex Smola
"Dive into Deep Learning" book
1shared list
06Ali Rahimi
"Machine learning is alchemy" NeurIPS test-of-time talk
1shared list
07Allen Downey
"Think Bayes", "Think Stats"
1shared list
08Anca Dragan
assistive-AI / human-robot interaction lectures
1shared list
Last updated
10 May 2026