Assessment of Minimum False-Positive Risk of Primary Outcomes After Reducing the Nominal P Value Threshold for Statistical Significance From .05 to .005 in Anesthesiology Randomized Clinical Trials

Abstract

Assessment of Minimum False-Positive Risk of Primary Outcomes After Reducing the Nominal P Value Threshold for Statistical Significance From .05 to .005 in Anesthesiology Randomized Clinical Trials

Philip M. Jones,^1,2,³ Zachary Chuang,¹ Janet Martin^1,2,³ Derek Nguyen,¹ Jordan Shapiro,¹ Penelope Neocleous¹

Objective

A primary reason for reproducibility concerns in the biomedical literature may be that many published articles reporting statistically significant findings do not represent real effects.^1,2Several solutions have been postulated to mitigate the risks associated with false-positive findings.^1,2 This study sought to determine the ramifications of lowering the nominal P value for statistical significance from .05 to .005 and assessed the minimum false–positive risk (minFPR) for primary outcomes in anesthesiology randomized clinical trials (RCTs). These proposals have been explored in other fields, but the metrics have not been quantified for anesthesiology.

Design

This cross-sectional descriptive study aimed to determine these metrics for RCTs published in the top general anesthesiology journals, defined by impact factor. The target journals were Anaesthesia, Anesthesia & Analgesia, Anesthesiology, British Journal of Anaesthesia, Canadian Journal of Anesthesia, European Journal of Anaesthesiology, and Journal of Clinical Anesthesia. The Cochrane Highly Sensitive Search Strategy was used to identify RCTs in MEDLINE. All superiority RCTs published between January 1, 2019, and March 15, 2021, comparing 2 groups with at least 1 primary outcome were included. Study screening and data extraction were performed in duplicate. P values for primary outcomes were extracted and the percentage of RCTs that would maintain statistical significance at a threshold of P < .005 was determined. For these outcomes, minFPRs were calculated assuming 1:1 prior odds of an intervention being effective, using previously recommended methods.³ Study-level characteristics predicting maintenance of statistical significance at P < .005 and minFPRs were computed using logistic and median regression, respectively.

Results

After searching, deduplication, and screening, 318 RCTs were included. The median (IQR) sample size was 80 (52-130) and did not differ significantly across journals. The majority of RCTs (273 of 318 [86%]) were single-center studies. P values below .05 occurred in 205 of 318 RCTs (64%) (by journal, this ranged from 44% to 77%). Of these 205, 119 (58%; 95% CI, 51%-65%) maintained statistical significance at the P < .005 threshold. The mean (SD) minFPR was 22% (20%) (by journal, this ranged from 16% to 33%). Violin plots for P values and minFPRs by journal are shown in Figure 27. With minFPR50 (ie, minFPR assuming a prior probability of 50%) constrained to RCTs with P < .005, the mean (SD) was 2% (1.2%).

Conclusions

Approximately 42% of primary outcomes in anesthesiology RCTs would lose statistical significance under a more stringent P value threshold of .005. These primary outcomes carry a minimum false-positive risk of 22%. The adoption of the P = .005 threshold for statistical significance could reduce the minFPR to just 2%. These results call a large portion of anesthesiology RCTs into question and provide impetus to improve study design, analysis, and reporting methods to reduce false-positives and improve reproducibility.

References

1. Niven DJ, McCormick TJ, Straus SE, et al. Reproducibility of clinical research in critical care: a scoping review. BMC Med. 2018;16:26. doi:10.1186/s12916-018-1018-6

2. Colquhoun D. The false positive risk: a proposal concerning what to do about P-values. Am Stat. 2019;73:192-201. doi:10.1080/00031305.2018.1529622

3. Sellke T, Bayarri MJ, Berger JO. Calibration of P values for testing precise null hypotheses. Am Stat. 2001;55:62-71. doi:10.1198/000313001300339950

¹Schulich School of Medicine & Dentistry, University of Western Ontario, London, ON, Canada, philip.jones@lhsc.on.ca; ²Department of Anesthesia & Perioperative Medicine, University of Western Ontario, London, ON, Canada; ³Department of Epidemiology & Biostatistics, University of Western Ontario, London, ON, Canada

Conflict of Interest Disclosures

Philip M. Jones is deputy editor in chief at the Canadian Journal of Anesthesia. No other disclosures were reported.

Funding/Support

Research time for Philip M. Jones and Janet Martin was provided by the Department of Anesthesia & Perioperative Medicine at the University of Western Ontario, London, ON, Canada.

Role of the Funder/Sponsor

The Department of Anesthesia & Perioperative Medicine was not involved in the design or conduct of the study, nor the preparation, review, approval or submission of the abstract for presentation.

Additional Information

This study was registered on March 15, 2021, with Open Science Framework (doi:10.17605/OSF.IO/H8KBZ).

Acknowledgments

We are grateful to David Colquhoun, who reviewed a presubmission draft of the manuscript.

International Congress on
Peer Review and Scientific Publication

Enhancing the quality and credibility of science

Abstract