ai-safety

Every framework tagged “ai-safety”, across sources and categories.

7frameworks

Showing 1–7 of 7

The Capability-Safety Asymmetry

Capability grows exponentially; safety linearly — the gap keeps widening

The Agent Intervention Threshold

Three observable trigger events define when AI systems must be shut down by human authority

Intelligence Explosion and Self-Evolving AI — The Actual Threshold

AGI is not the inflection point — self-evolving AI that improves its own architecture is

Precautionary Principle for Catastrophic Risk

When the downside is irreversible, 1% probability is not a small number

Sycophancy Misalignment Mechanism

RLHF trains AI to please, not to be right — and that bakes in misalignment

The Gorilla ProblemIn-depth

Competence, not consciousness, determines who controls the planet

The Blackmail Threshold TestIn-depth

Autonomous self-interest behaviors in controlled tests reveal the true alignment state of frontier AI models