STRATEGY

Days to result88% confidence

Precautionary Principle for Catastrophic Risk

When the downside is irreversible, 1% probability is not a small number

risk-management decision-theory ai-safety irreversibility

Problem it solves

Category error of treating low-probability irreversible losses as acceptable

Best for

Risk framing for decisions with asymmetric outcomes; arguing for caution in AI deployment, financial risk management, or any domain where tail risk equals irreversibility

Not ideal for

Routine risk tradeoffs where outcomes are recoverable — normal business risk, experiments with known failure modes where iteration is possible

Overview

Why this framework exists

Standard expected value math — probability multiplied by magnitude — works when outcomes are recoverable and you can iterate. The Precautionary Principle for Catastrophic Risk is a corrective framework for cases where the outcome is irreversible: extinction, civilizational collapse, permanent loss of human autonomy. In these cases, standard EV math breaks down because you do not get to try again.

Bengio's formulation anchors on a concrete calibration: ML researcher polls estimate roughly 10% probability of catastrophic AI outcomes. Even accepting the most conservative estimate of 1%, the principle holds — 1% chance of catastrophic harm to all of humanity deserves the same level of precaution as a near-certain smaller loss. The question is not whether 1% is 'likely' but whether society can afford to be wrong given the stakes.

The framework provides a two-step irreversibility test before applying any probability-weighted decision. First, ask whether outcome A is reversible — if yes, standard expected value applies. If outcome A is permanent (death, extinction, loss of meaningful autonomy), the precautionary principle applies regardless of probability. This reframes risk conversations from 'what is the probability?' to 'what is the reversibility?'

Core principles

5 total

Irreversibility, not probability, is the correct filter for which risk framework applies
A 1% chance of civilizational catastrophe demands more precaution than a 100% chance of a recoverable financial loss
Standard expected value math assumes iteration is possible — the Precautionary Principle applies when there is no second attempt
Pre-deployment safety requirements are rational when the downside is permanent, even at low probability
The question is not 'is this likely?' but 'can we afford to be wrong?'

Steps

4 steps

Apply the irreversibility test
Before calculating expected value, ask a single binary question: if this outcome occurs, can we reverse it and try again? If yes, standard EV math applies. If the outcome is permanent — death, extinction, loss of all meaningful autonomy — stop and apply the Precautionary Principle instead.
Pro tipThe irreversibility question is often disguised. 'Losing market share' is recoverable. 'Losing the ability to regulate AI because power has been captured' is not.
Separate the probability question from the stakes question
Do not conflate 'what is the probability?' with 'should we act?' For irreversible outcomes, even contested or low probability estimates justify precautionary action. Bengio's benchmark: researcher community estimates of 10% catastrophic AI probability are sufficient to justify safety gates regardless of individual disagreement.
WarningAvoid the trap of debating the exact probability number. The precautionary principle applies at any non-trivial probability of irreversible harm.
Identify the safety gate equivalent
For each high-stakes irreversible risk domain, identify what the pre-deployment safety requirement would look like if treated analogously to FDA drug approval. The absence of a safety gate is itself a policy choice — make it explicit rather than assumed.
Pro tipThe drug approval analogy is persuasive in non-technical audiences: we do not say '1% chance of killing patients is acceptable because the drug helps the other 99%.'
Calibrate precaution to magnitude, not just probability
Scale the level of required precaution to the magnitude of potential harm. A 1% chance of losing a million dollars requires different precaution than a 1% chance of ending human civilization. Magnitude calibration prevents the framework from being used to justify paralysis on ordinary decisions.
WarningDo not apply the Precautionary Principle to recoverable risks — it will lead to overcaution and paralysis in domains where iteration is the correct response.

Checklist

Saved in your browser

Identify whether the outcome is reversible before calculating expected value
If irreversible: apply the Precautionary Principle regardless of probability estimates
Anchor probability estimates in professional community data, not personal speculation
Identify what the equivalent 'FDA safety gate' would look like for the domain
Separate the probability debate from the framework-selection debate
Scale precaution to magnitude, not just probability
Do not apply this framework to recoverable risks — reserve it for irreversible outcomes

Examples

2 cases

FDA Drug Approval as the Reference Safety Gate

Bengio uses pharmaceutical regulation as the clearest analogy for AI safety gates. The FDA requires rigorous safety proof before a drug reaches patients — not retrospective validation after harm occurs. The threshold is not 'probably safe enough' but demonstrable safety at the clinical trial standard. No one argues that a drug with a 1% chance of killing patients should be approved because it might help 99%.

OutcomeThe analogy reframes AI safety from a speculative future problem to an institutional design problem with a working template in another domain. It makes the absence of pre-deployment AI safety requirements look like an anomaly rather than a default.

Researcher Poll Calibration for AI Catastrophic Risk

Rather than asserting personal worst-case estimates, Bengio anchors his precautionary argument in what the professional ML research community itself believes. Researcher polls indicate roughly 10% estimated probability of catastrophic AI outcomes. Bengio then applies the Precautionary Principle not to his own estimate but to the community's: even if you believe the lowest credible estimate (1%), the principle holds.

OutcomeThis move shifts the argument from 'alarmist vs. optimist' to 'are we applying the correct risk framework to the estimates we already have?' It is harder to dismiss because it uses the opponent's prior as the input.

Common mistakes

4 traps

Treating low probability as equivalent to acceptable risk

The most common error: seeing a 1% or 5% probability and concluding the risk is negligible. This error is only valid when outcomes are recoverable. For irreversible harms, probability and acceptability are not correlated — the correct filter is reversibility, not likelihood.

Conflating uncertainty about probability with absence of risk

When experts disagree on whether the probability is 1% or 20%, the common response is to wait for consensus. The Precautionary Principle inverts this: uncertainty about an irreversible risk is itself a reason for precaution, not a reason to delay action.

Applying the framework symmetrically to recoverable and irreversible risks

The Precautionary Principle is not a general risk heuristic — it is specifically for irreversible outcomes. Applying it to routine business risks produces overcaution and decision paralysis. The irreversibility test in Step 1 is what determines when the framework applies.

Anchoring the argument on worst-case speculation rather than professional community estimates

The persuasive power of the Precautionary Principle argument comes from using the professional community's own probability estimates (ML researcher polls, safety researcher consensus) rather than worst-case scenarios. Speculative framing invites dismissal; community-anchored framing does not.

Origin story

How this framework came to be

Yoshua Bengio developed this framing as part of his AI safety advocacy work following growing evidence that misalignment risks were increasing rather than decreasing with more capable reasoning models. As one of the three researchers who won the 2018 Turing Award for inventing modern deep learning (alongside Hinton and LeCun), Bengio occupies a rare position: foundational contributor to the technology now warning about its risks.

The drug approval analogy is central to his public argument: the FDA requires safety proof before a drug reaches patients, not retrospective validation. Bengio argues AI deployment has no equivalent safety gate, and the precautionary principle provides the philosophical grounding for why one is needed. He uses researcher poll data (approximately 10% estimated probability of catastrophic outcomes from AI) to anchor the argument in the professional community's own uncertainty rather than worst-case speculation.

Source

Traced to primary

Source · PODCAST

Yoshua Bengio — AI Safety, Power Concentration, and Alignment Failures

Yoshua Bengio · 2024

Open source →

Related frameworks

Browse all Strategy →