P values and hypothesis tests | Confidence intervals |
---|---|
What are they used for? | |
p values are used to assess whether a sample estimate is significantly different from a hypothesised value (such as zero—ie, no treatment effect). Hypothesis tests assess the likelihood of the estimate under the null hypothesis of no difference between 2 population values (or no treatment effect). Conventionally, if p<0.05, the null hypothesis is rejected. | Confidence intervals (CIs) present a range of values around the sample estimate within which there is reasonable confidence that the true, but unknown, population value lies. The 95% CI (the range of values within which there is 95% probability that the true value lies) is most commonly used. It corresponds with the typical 5% significance level used in hypothesis tests. |
What do they tell us? | |
---|---|
The p value is the probability that the observed effect (or more extreme ones) would have occurred by chance if in truth there is no effect. However, it doesn’t tell us anything about the size of the true effect and, moreover, since hypothesis tests are 2 tailed (we are interested in differences in either direction) it doesn’t even tell us the direction of this effect. Thus, in the above example, the p value of 0.006 indicates that an effect of 19.6% or more, in favour of either streptomycin or bed rest, would occur in only 6 in 1000 trials if in truth there is no effect. | The CI provides a range of values whose limits are, with specified probability (typically 95%), the smallest and the largest true population values consistent with the sample data. A CI can thus effectively function as a hypothesis test for an infinite number of values: if the CI includes any 1 of these values then the sample estimate is not statistically significantly different from it. The 95% CI is of particular relevance to evidence-based practice (EBP), providing valuable information such as whether the interval includes or excludes clinically significant values. |
When can they be used? | |
---|---|
There are many different types of hypothesis test, each suitable for a particular type of data. For example, parametric tests (such as t-tests) are only suitable for large samples and for data that are drawn from a population which is approximately normally distributed. | A CI can, and should, be calculated for most measures of effect, such as differences between means (such as scores or weights), and differences in proportions, EER and CER, ARR, NNT, risk ratios (RR), and odds ratios (OR). |
How are they calculated? | |
---|---|
The observed effect together with a measure of its variability (such as the standard error, SE) is used to calculate a “test statistic” (eg, t, z, χ2). For example, a t statistic is calculated by dividing the observed effect by its SE. The value of the test statistic is used (from statistical tables) to determine the p value. Thus, in the example above (where SE(ARR) = 7.1%), the z statistic (assuming that the ARR is approximately normally distributed) for the test of whether the risk of death differs between those allocated to streptomycin and those allocated to bed rest is calculated as z = 19.6/7.1 = 2.76. This has an associated p value of 0.006. | To calculate a CI around a sample estimate, only 3 pieces of information are needed: the sample size (n), the sample standard deviation (SD), and the “z score,” which varies depending on the degree of confidence wanted (95%, 99% etc). For a 95% CI, z = 1.96, and for a 99% CI, z = 2.58. A 95% CI is calculated as: Sample estimate ± 1.96 standard errors (SE) of the measure (note: SE = SD/√n). Thus, in the above example (where SE(ARR) = 7.1%), the 95% CI for the true ARR is calculated as: 19.6% ± 1.96 (7.1%) = 5.7% to 33.6%. |