Lists / People
Anthropic Interpretability Team
circuits, sparse autoencoder, monosemanticity reports
Anthropic's mechanistic interpretability research group, known for the circuits thread, monosemanticity findings, and sparse autoencoder papers on neural network internals.
55/ 100
Authority scoreEstablished
1lists#37peak1primary1category
Appears alongside
Top neighbors of Anthropic Interpretability Team
8people
01
Aidan Gomez
Cohere co-founder; transformer-builder commentary
1shared list
02Ajeya Cotra
biological-anchors timelines report
1shared list
03Alec Radford
GPT-1/2/CLIP communications
1shared list
04Aleksander Madry
adversarial-robustness framework
1shared list
05Alex Smola
"Dive into Deep Learning" book
1shared list
06Ali Rahimi
"Machine learning is alchemy" NeurIPS test-of-time talk
1shared list
07Allen Downey
"Think Bayes", "Think Stats"
1shared list
08Anca Dragan
assistive-AI / human-robot interaction lectures
1shared list
Last updated
10 May 2026