Intended for healthcare professionals

Head To Head Head to Head

Are confidence intervals better termed “uncertainty intervals”?

BMJ 2019; 366 doi: https://doi.org/10.1136/bmj.l5381 (Published 10 September 2019) Cite this as: BMJ 2019;366:l5381
  1. Andrew Gelman, professor of statistics and political science1,
  2. Sander Greenland, professor of epidemiology and statistics2
  1. 1Columbia University, New York, USA
  2. 2Department of Epidemiology and Department of Statistics, University of California, Los Angeles, USA
  1. Correspondence to: A Gelman gelman{at}stat.columbia.edu, S Greenland lesdomes{at}ucla.edu

Debate abounds about how to describe weaknesses in statistics. Andrew Gelman has no confidence in the term “confidence interval,” but Sander Greenland doesn’t find “uncertainty interval” any better and argues instead for “compatibility interval”

Yes—Andrew Gelman

Science reformers are targeting P values and statistical significance, and rightly so.123 It’s wrong to take P≤0.05 as indicating that an effect is real, and it’s wrong to take P>0.05 as a reason to act as though an effect is zero.

One proposed reform is to replace statistical significance with confidence intervals: instead of simply reporting whether the 95% interval contains zero or reporting a P value, report the entire interval. But this approach has problems too,4 in that there can be good reasons in some cases to think that the true effect is likely to be outside the interval entirely. Confidence intervals excluding the true value can result from failures in model assumptions (as we’ve found when assessing US election polls5) or from analysts seeking out statistically significant comparisons to report, thus inducing selection bias.6

Confidence intervals can be a useful summary in model based inference. But the term should be “uncertainty interval,” not “confidence interval,” for four key reasons.

Difficulties in interpretation

My first concern with the term “confidence interval” is the well known confusion in interpretation. Officially, all that can be interpreted are the long term average properties of the procedure that’s used to construct the interval, but people tend to interpret each interval implicitly in a bayesian way—that is, by acting as though there’s a 95% probability that any given interval contains the true value. For example, I …

View Full Text

Log in

Log in through your institution

Subscribe

* For online subscription