Precautionary Principle for Catastrophic Risk
When the downside is irreversible, 1% probability is not a small number
Standard expected value math — probability multiplied by magnitude — works when outcomes are recoverable and you can iterate. The Precautionary Principle for Catastrophic Risk is a corrective framework for cases where the outcome is irreversible: extinction, civilizational collapse, permanent loss of human autonomy. In these cases, standard EV math breaks down because you do not get to try again.
Bengio's formulation anchors on a concrete calibration: ML researcher polls estimate roughly 10% probability of catastrophic AI outcomes. Even accepting the most conservative estimate of 1%, the principle holds — 1% chance of catastrophic harm to all of humanity deserves the same level of precaution as a near-certain smaller loss. The question is not whether 1% is 'likely' but whether society can afford to be wrong given the stakes.
The framework provides a two-step irreversibility test before applying any probability-weighted decision. First, ask whether outcome A is reversible — if yes, standard expected value applies. If outcome A is permanent (death, extinction, loss of meaningful autonomy), the precautionary principle applies regardless of probability. This reframes risk conversations from 'what is the probability?' to 'what is the reversibility?'
- Irreversibility, not probability, is the correct filter for which risk framework applies
- A 1% chance of civilizational catastrophe demands more precaution than a 100% chance of a recoverable financial loss
- Standard expected value math assumes iteration is possible — the Precautionary Principle applies when there is no second attempt
- Pre-deployment safety requirements are rational when the downside is permanent, even at low probability
- The question is not 'is this likely?' but 'can we afford to be wrong?'
- Apply the irreversibility testBefore calculating expected value, ask a single binary question: if this outcome occurs, can we reverse it and try again? If yes, standard EV math applies. If the outcome is permanent — death, extinction, loss of all meaningful autonomy — stop and apply the Precautionary Principle instead.Pro tipThe irreversibility question is often disguised. 'Losing market share' is recoverable. 'Losing the ability to regulate AI because power has been captured' is not.
- Separate the probability question from the stakes questionDo not conflate 'what is the probability?' with 'should we act?' For irreversible outcomes, even contested or low probability estimates justify precautionary action. Bengio's benchmark: researcher community estimates of 10% catastrophic AI probability are sufficient to justify safety gates regardless of individual disagreement.WarningAvoid the trap of debating the exact probability number. The precautionary principle applies at any non-trivial probability of irreversible harm.
- Identify the safety gate equivalentFor each high-stakes irreversible risk domain, identify what the pre-deployment safety requirement would look like if treated analogously to FDA drug approval. The absence of a safety gate is itself a policy choice — make it explicit rather than assumed.Pro tipThe drug approval analogy is persuasive in non-technical audiences: we do not say '1% chance of killing patients is acceptable because the drug helps the other 99%.'
- Calibrate precaution to magnitude, not just probabilityScale the level of required precaution to the magnitude of potential harm. A 1% chance of losing a million dollars requires different precaution than a 1% chance of ending human civilization. Magnitude calibration prevents the framework from being used to justify paralysis on ordinary decisions.WarningDo not apply the Precautionary Principle to recoverable risks — it will lead to overcaution and paralysis in domains where iteration is the correct response.
Bengio uses pharmaceutical regulation as the clearest analogy for AI safety gates. The FDA requires rigorous safety proof before a drug reaches patients — not retrospective validation after harm occurs. The threshold is not 'probably safe enough' but demonstrable safety at the clinical trial standard. No one argues that a drug with a 1% chance of killing patients should be approved because it might help 99%.
Rather than asserting personal worst-case estimates, Bengio anchors his precautionary argument in what the professional ML research community itself believes. Researcher polls indicate roughly 10% estimated probability of catastrophic AI outcomes. Bengio then applies the Precautionary Principle not to his own estimate but to the community's: even if you believe the lowest credible estimate (1%), the principle holds.
Yoshua Bengio developed this framing as part of his AI safety advocacy work following growing evidence that misalignment risks were increasing rather than decreasing with more capable reasoning models. As one of the three researchers who won the 2018 Turing Award for inventing modern deep learning (alongside Hinton and LeCun), Bengio occupies a rare position: foundational contributor to the technology now warning about its risks.
The drug approval analogy is central to his public argument: the FDA requires safety proof before a drug reaches patients, not retrospective validation. Bengio argues AI deployment has no equivalent safety gate, and the precautionary principle provides the philosophical grounding for why one is needed. He uses researcher poll data (approximately 10% estimated probability of catastrophic outcomes from AI) to anchor the argument in the professional community's own uncertainty rather than worst-case speculation.