Assessment of Minimum False-Positive Risk of Primary Outcomes After Reducing the Nominal P Value Threshold for Statistical Significance From .05 to .005 in Anesthesiology Randomized Clinical Trials
Abstract
Philip M. Jones,1,2,3 Zachary Chuang,1 Janet Martin1,2,3 Derek Nguyen,1 Jordan Shapiro,1 Penelope Neocleous1
Objective
A primary reason for reproducibility concerns in the biomedical literature may be that many published articles reporting statistically significant findings do not represent real effects.1,2 Several solutions have been postulated to mitigate the risks associated with false-positive findings.1,2 This study sought to determine the ramifications of lowering the nominal P value for statistical significance from .05 to .005 and assessed the minimum false-positive risk (minFPR) for primary outcomes in anesthesiology randomized clinical trials (RCTs). These proposals have been explored in other fields, but the metrics have not been quantified for anesthesiology.Design
This cross-sectional descriptive study aimed to determine these metrics for RCTs published in the top general anesthesiology journals, defined by impact factor. The target journals were Anaesthesia, Anesthesia & Analgesia, Anesthesiology, British Journal of Anaesthesia, Canadian Journal of Anesthesia, European Journal of Anaesthesiology, and Journal of Clinical Anesthesia. The Cochrane Highly Sensitive Search Strategy was used to identify RCTs in MEDLINE. All superiority RCTs published between January 1, 2019, and March 15, 2021, comparing 2 groups with at least 1 primary outcome were included. Study screening and data extraction were performed in duplicate. P values for primary outcomes were extracted and the percentage of RCTs that would maintain statistical significance at a threshold of P < .005 was determined. For these outcomes, minFPRs were calculated assuming 1:1 prior odds of an intervention being effective, using previously recommended methods.3 Study-level characteristics predicting maintenance of statistical significance at P < .005 and minFPRs were computed using logistic and median regression, respectively.Results
After searching, deduplication, and screening, 318 RCTs were included. The median (IQR) sample size was 80 (52-130) and did not differ significantly across journals. The majority of RCTs (273 of 318 [86%]) were single-center studies. P values below .05 occurred in 205 of 318 RCTs (64%) (by journal, this ranged from 44% to 77%). Of these 205, 119 (58%; 95% CI, 51%-65%) maintained statistical significance at the P < .005 threshold. The mean (SD) minFPR was 22% (20%) (by journal, this ranged from 16% to 33%). Violin plots for P values and minFPRs by journal are shown in Figure 27. With minFPR50 (ie, minFPR assuming a prior probability of 50%) constrained to RCTs with P < .005, the mean (SD) was 2% (1.2%).