Recently, there has been increasing interest in addressing the problem of over-relying on threshold p values. Using p<0.05 represents a blunt arbiter of conclusions that are fraught with false positives and false negatives. Furthermore, questionable research practices are sometimes used to ‘game’ the p-value threshold in order to support the researchers’ preferred conclusions.Background
Tools to highlight p-value shortcomings are required to improve interpretation of p-values. The Fragility Index has been proposed as a tool to highlight the ‘fragility’ of evidence derived from a threshold p-value.Objectives
The primary objective of this study was to measure the fragility of conclusions from randomised trials (RCTs) published in the New England Journal of Medicine using the Fragility Index. Secondary objectives were to estimate the added impact of losses to follow-up on fragility, and to measure correlation between Fragility Index and standardised effect size, sample size, total number of events, and publication year.Method
All RCTs of established practices that were published in the New England Journal of Medicine between 2000 to 2016 were included if they met the following criteria: (1) reported a dichotomous primary outcome; (2) had only two comparison groups; and (3) used a 1:1 randomization scheme. Data was extracted from each RCT in duplicate.Method
The Fragility index was calculated by converting one patient in the group (control or experimental group) from a ‘non-event’ to an ‘event’ outcome and recalculating a two-sided Fisher’s exact test until the p-value meets or exceeds 0.05. This Fragility Index was calculated for trials with a significant primary outcome using a Fragility Index calculator, and the reverse Fragility Index for all trials with non-significant (p>0.05) outcomes using an R package. Loss to follow up was measured. Univariable linear regression was performed to assess the association between prespecified trial characteristics and the Fragility Index.Results
Of 611 RCTs published in the New England Journal of Medicine between 2000 and 2016, a total of 374 met the inclusion criteria. The median Fragility Index was 7.5 (range 0 to 141). One-quarter of the trials had a Fragility Index of 3 or less. The number of patients lost to follow-up exceeded the Fragility Index in 66% (247/375) of the RCTs, indicating that the true Fragility Index would be even lower than reported if corrected for losses to follow-up. The Fragility Index was moderately correlated with the standardised effect size, and weakly correlated with sample size and year of publication. Sensitivity analyses did not reveal material differences when accounting for missing data.Conclusions
Conclusions from RCTs that are based on p-values are very fragile, with a median of fewer than 8 additional events required to change the conclusion from significant to non-significant (or vice-versa). More than one-quarter of all trials would require only 3 additional events to change the conclusion. Furthermore, the majority of trials had a loss to follow-up that exceeded the Fragility Index, indicating that the results would be even more unstable if the Fragility Index was corrected for losses to follow-up. Efforts to increase awareness of the fragility of conclusions based on p-values is urgently required.