Background Recently, there has been increasing interest in addressing the problem of over-relying on threshold p values. Using p<0.05 represents a blunt arbiter of conclusions that are fraught with false positives and false negatives. Furthermore, questionable research practices are sometimes used to ‘game’ the p-value threshold in order to support the researchers’ preferred conclusions.
Tools to highlight p-value shortcomings are required to improve interpretation of p-values. The Fragility Index has been proposed as a tool to highlight the ‘fragility’ of evidence derived from a threshold p-value.
Objectives The primary objective of this study was to measure the fragility of conclusions from randomised trials (RCTs) published in the New England Journal of Medicine using the Fragility Index. Secondary objectives were to estimate the added impact of losses to follow-up on fragility, and to measure correlation between Fragility Index and standardised effect size, sample size, total number of events, and publication year.
Method All RCTs of established practices that were published in the New England Journal of Medicine between 2000 to 2016 were included if they met the following criteria: (1) reported a dichotomous primary outcome; (2) had only two comparison groups; and (3) used a 1:1 randomization scheme. Data was extracted from each RCT in duplicate.
The Fragility index was calculated by converting one patient in the group (control or experimental group) from a ‘non-event’ to an ‘event’ outcome and recalculating a two-sided Fisher’s exact test until the p-value meets or exceeds 0.05. This Fragility Index was calculated for trials with a significant primary outcome using a Fragility Index calculator, and the reverse Fragility Index for all trials with non-significant (p>0.05) outcomes using an R package. Loss to follow up was measured. Univariable linear regression was performed to assess the association between prespecified trial characteristics and the Fragility Index.
Results Of 611 RCTs published in the New England Journal of Medicine between 2000 and 2016, a total of 374 met the inclusion criteria. The median Fragility Index was 7.5 (range 0 to 141). One-quarter of the trials had a Fragility Index of 3 or less. The number of patients lost to follow-up exceeded the Fragility Index in 66% (247/375) of the RCTs, indicating that the true Fragility Index would be even lower than reported if corrected for losses to follow-up. The Fragility Index was moderately correlated with the standardised effect size, and weakly correlated with sample size and year of publication. Sensitivity analyses did not reveal material differences when accounting for missing data.
Conclusions Conclusions from RCTs that are based on p-values are very fragile, with a median of fewer than 8 additional events required to change the conclusion from significant to non-significant (or vice-versa). More than one-quarter of all trials would require only 3 additional events to change the conclusion. Furthermore, the majority of trials had a loss to follow-up that exceeded the Fragility Index, indicating that the results would be even more unstable if the Fragility Index was corrected for losses to follow-up. Efforts to increase awareness of the fragility of conclusions based on p-values is urgently required.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.