MINDSETOngoing practice

Causation vs. Correlation

Two things happening together doesn't mean one causes the other

Problem it solves

limiting beliefs

Best for

Data analysts, researchers, managers evaluating performance, policy makers, and anyone interpreting studies or statistics

Not ideal for

Situations where causal mechanisms are already well-established through rigorous controlled experiments

Overview

Why this framework exists

The confusion between causation and correlation leads to inaccurate assumptions about how the world works. We notice two things happening at the same time (correlation) and mistakenly conclude that one causes the other (causation). We then act on that erroneous conclusion, making decisions that are successful only by luck rather than by capitalizing on real dynamics.

Correlation is measured by a coefficient between -1 and 1, representing the relative weight of shared factors between two measures. Two phenomena with no shared factors (like bottled water consumption and suicide rate) should have a coefficient near zero. Temperature in Celsius and Fahrenheit has a perfect correlation of 1 because they measure the same underlying factor. Most real-world relationships fall somewhere between, indicating that while one variable has some predictive power over another, other factors are clearly at play.

A critical complication is regression to the mean: whenever correlation is imperfect, extremes will soften over time. The best will appear to get worse, and the worst will appear to get better, regardless of any intervention. This means we frequently mistake regression to the mean for the effect of a treatment or policy. Depressed children treated with anything (even hugging a cat or standing on their head) will show improvement, because extreme groups naturally regress toward the average. The only way to distinguish real improvement from regression is through a control group.

Core principles

5 total
  1. Two things happening together (correlation) does not mean one causes the other (causation).
  2. The correlation between two measures reflects the relative weight of their shared factors, not a causal relationship.
  3. Whenever correlation is imperfect, extreme values will regress toward the mean over time regardless of intervention.
  4. Trying to invert a relationship can help determine whether you are dealing with causation or just correlation.
  5. The only reliable way to distinguish treatment effects from regression to the mean is through a control group.

Steps

5 steps
  1. Identify the claimed relationship
    When presented with a relationship between two variables, clearly state what the claim is. Is it that A causes B, that B causes A, or simply that A and B are observed together?
    Pro tipA study showing a relationship between parental alcohol consumption and children's academic success has demonstrated only a correlation, not that one causes the other.
  2. Try inverting the relationship
    Ask whether the reverse could be true. If A appears to cause B, could B actually cause A? Could having kids who do poorly in school cause parents to drink more, rather than the reverse?
    Pro tipInverting the relationship is a quick test for false causation claims.
  3. Check for regression to the mean
    If you are evaluating whether a treatment or intervention worked, ask whether the group being studied is extreme. Extreme groups naturally regress toward the mean over time, regardless of any treatment.
    Pro tipDepressed children will get somewhat better over time even if they hug no cats and drink no Red Bull.
    WarningWithout a control group, it is impossible to determine whether improvement is due to the intervention or simply regression to the mean.
  4. Look for confounding variables
    Consider whether a third variable might explain the observed correlation. Both A and B might be caused by C, creating a correlation between A and B without any direct causal link.
    Pro tipHeight and weight are correlated, but both are partly caused by underlying genetic and nutritional factors.
  5. Demand a control group or equivalent
    For any claim of causation, ask whether a proper control group was used. The aim of rigorous research is to determine whether the treated group improves more than regression alone can explain.
    Pro tipIn real-life performance evaluation, where no control group exists, compare against industry averages, peer cohorts, or historical improvement rates.
    WarningNone of these alternatives is a perfect measure, but they are better than no comparison at all.

Checklist

Saved in your browser

Examples

3 cases
Depressed children and energy drinks

Kahneman created a hypothetical headline: 'Depressed children treated with energy drink improve significantly over three months.' The fact is true, but depressed children treated with standing on their heads or hugging a cat would also show improvement, because extreme groups regress toward the mean.

OutcomeWithout a control group, any apparent treatment effect could simply be regression to the mean. The energy drink gets undeserved credit.
Alcohol consumption and academic success

A study shows a relationship between high alcohol consumption in parents and low academic success in children. It's tempting to conclude that parental drinking causes poor academic outcomes.

OutcomeThe study demonstrates only correlation. Inverting the relationship reveals that having children who do poorly in school could equally cause parents to drink more. A third variable (such as poverty) could also explain both.
Temperature in Celsius and Fahrenheit

Temperature measured in Celsius and Fahrenheit has a perfect correlation coefficient of 1 because they measure the same underlying factor (molecular velocity). Every degree in Celsius has exactly one corresponding value in Fahrenheit.

OutcomeThis is a rare example of perfect correlation that also represents perfect causation: both scales measure the same phenomenon.

Common mistakes

4 traps
Jumping from correlation to causation
The most common error. Seeing two things happen together and immediately concluding one causes the other, without testing alternative explanations.
Ignoring regression to the mean
Attributing improvement in an extreme group to a specific intervention when the improvement would have happened anyway. This is one of the most common statistical errors in media and even scientific reporting.
Failing to consider reverse causation
Assuming A causes B without considering whether B might cause A. Parental drinking and children's academic performance could have the causal arrow pointing in either direction.
Treating any correlation as meaningful
Many observed correlations are coincidental, especially when examining large datasets. Without a plausible causal mechanism, correlation alone is insufficient evidence for action.

Origin story

How this framework came to be

The statistical foundations of correlation and causation were developed across centuries of probability theory and experimental design. The correlation coefficient was formalized by Karl Pearson in the late nineteenth century. The critical importance of distinguishing correlation from causation became prominent in medical research, epidemiology, and social science throughout the twentieth century.

Daniel Kahneman, in Thinking, Fast and Slow, provided memorable illustrations of how regression to the mean fools us into false causal attributions. His hypothetical headline about depressed children improving with an energy drink demonstrates how any treatment appears to work when applied to an extreme group, because extreme groups regress toward the mean regardless of intervention. The book uses this as a supporting idea for the probabilistic thinking chapter, showing how confusing correlation with causation leads to decisions based on luck rather than genuine understanding.

Source

Traced to primary
Source · BOOK
The Great Mental Models, Volume 1 General Thinking Concepts
Shane Parrish & Rhiannon Beaubien · 2019
Open source →

Related frameworks

Browse all Mindset →