Article Text
Abstract
Objectives It is a truth universally acknowledged that some areas of science suffer from a surfeit of false positives.
It is still widely beiieved that the p-value is the probability that your results occurred by chance. This is simply wrong. It confuses the p-value with the false positive risk (FPR).
By a false positive, I mean that you claim that an effect exists when in fact the results could easily have occurred by chance. The false positive risk (FPR) is the probability that a ‘significant’ result is nothing more than chance.
The aim of this work is investigate what can be said about the FPR in order to provide a better method of expressing the strength of of the evidence than is provided by p values.
Method The aim is to answer the following question. If you observe a ‘significant’ p-value after doing a single unbiased experiment, what is the probability that your result is a false positive?
It is assumed that we wish to test a null hypothesis that the ture effect size is zero against the alternative that it is not zero. Student’s t test for two independent samples are used as an example. Both simulations and exact calculations are used to assess false positive risks.
In order to calculate the false positive risk, you need Bayes’ theorem. That involves the prior probability that your hypothesis is true, i.e. the probability that there is a real effect there before the experiment is done. You hardly ever have a value for this, so what can be done? There are two possibilities.
Results First, you can say that a prior probability bigger than 0.5 is hardly ever justified, so you can calculate a minimum FPR based on the assumption of a prior of 0.5. If you observe p=0.05 then the FPR is at least 26%. If the hypothesis was implausible, with a prior of 0.1, then then the FPR would be a disastrous 76%.
Second, since you don’t know the prior, you can calculate it as the value that would be needed to reduce the FPR to 0.05 (that is what most people still think that the p value tells you). If you observe p=0.05 in a well-powered experiment then you would need to assume a prior of 0.87 in order to achieve an FPR of 0.05. You would have to assume that you were almost (87%) certain that there was a real effect before you did the experiment.
Conclusions Evidently the evidence provided by the usual standard for ‘statistical significance’ is weak. Even observing p=0.005 would not be strong evidence for an implausible hypothesis: for a prior probability of 0.1 it would give a minimum FPR of 24%.
It’s suggested that FPR, or the prior needed to achieve an FPR of 0.05, are a better measures of evidence than p-values.
In practice, decisions depend on the relative costs (in money and in reputation) that result from wrongly claiming a real effect when there is none, and by failing to detect a real effect when there is one.