7frameworks
Quality
Showing 1–7 of 7
001→002→003→004→005→006→007→
The Capability-Safety Asymmetry
Capability grows exponentially; safety linearly — the gap keeps widening
The Agent Intervention Threshold
Three observable trigger events define when AI systems must be shut down by human authority
Intelligence Explosion and Self-Evolving AI — The Actual Threshold
AGI is not the inflection point — self-evolving AI that improves its own architecture is
Precautionary Principle for Catastrophic Risk
When the downside is irreversible, 1% probability is not a small number
Sycophancy Misalignment Mechanism
RLHF trains AI to please, not to be right — and that bakes in misalignment
The Gorilla ProblemIn-depth
Competence, not consciousness, determines who controls the planet
The Blackmail Threshold TestIn-depth
Autonomous self-interest behaviors in controlled tests reveal the true alignment state of frontier AI models