Simpson's Paradox - an introduction
Imagine you have a new treatment for a serious disease
that you want to test. You find some male patients and
give some of them the new treatment and some without:
|
Lived |
Died |
Recovery % |
New treatment |
70 |
30 |
70% |
Old treatment |
180 |
120 |
60% |
It looks like it is increasing recovery rate by 10%.
So we test it on a group of women:
|
Lived |
Died |
Recovery % |
New treatment |
90 |
210 |
30% |
Old treatment |
20 |
80 |
20% |
Again, it appears that it is increasing recovery rate by 10%.
But what happens when we look at the total numbers for men and women
combined? You can do the math yourself, here it is:
|
Lived |
Died |
Recovery % |
(Total) |
New treatment |
160 |
240 |
40% |
400 |
Old treatment |
200 |
200 |
50% |
400 |
Suddenly we see that the new treatment is decreasing
recovery rate by 10%, not increasing! Seems impossible, no?
There is no math error above, the problem is due to an effect called the
Simpsons Paradox.
The Simpson's Paradox happens when we have a confounding variable which
causes the groups in our split to be flipped in their size differences,
as you can see above. We tested far more men with the old treatment, yet
we tested far more women with the new treatment. This effect can actually
happen (and has happened aplenty!) in real world examples. More info
on the
Wikipedia page.