Pitfalls in the Use of Statistical Methods in Systematic Reviews of Therapeutic Interventions: A Cross-sectional Study

Matthew J. Page,1 Douglas G. Altman,2 Larissa Shamseer,3,4 Joanne E. McKenzie,1 Nadera Ahmadzai,5 Dianna Wolfe,5 Fatemeh Yazdi,5 Ferrán Catalá-López,5,6 Andrea C. Tricco,7,8 David Moher3,4


Researchers have identified several problems in the application of statistical methods in published systematic reviews (SRs). However, these evaluations have been narrow in scope, focusing only on one particular method (such as sensitivity analyses) or restricting inclusion to Cochrane SRs, which make up only 15% of all SRs of biomedical research. We aimed to investigate the application and interpretation of various statistical methods in a cross-section of SRs of therapeutic interventions, without restriction by journal, clinical condition, or specialty.


We selected articles from a database of SRs we assembled previously. These articles consisted of a random sample of 300 SRs addressing various questions (therapeutic, diagnostic, or etiologic) that were indexed in MEDLINE in February 2014. In the current study, we included only those SRs that focused on a therapeutic question, reported at least 1 meta-analysis, and were written in English. We collected data on 61 prespecified items that characterized how well random-effects meta-analysis models, subgroup analyses, sensitivity analyses, and funnel plots were applied and interpreted. Data were extracted from articles and online appendices by a single reviewer, with a 20% random sample extracted in duplicate.


Among 110 SRs, 78 (71%) were non-Cochrane SRs and 55 (50%) investigated a pharmacological intervention. The SRs presented a median of 13 (interquartile range, 5-27) meta-analyses. Among the 110 primary meta-analyses in each SR, 62 (56%) used the random-effects model but only 5 of 62 (8%) interpreted the pooled result correctly (that is, as the average of the intervention effects across all studies). Subgroup analyses were reported in 42 of 110 SRs (38%), but findings were not interpreted with respect to a test for interaction in 29 of 42 cases (69%), and the issue of potential confounding in the subgroup analyses was not raised in any SR. Sensitivity analyses were reported in 51 of 110 SRs (46%), without any rationale in 37 of 51 cases (73%). Authors of 37 of 110 SRs (34%) reported that visual inspection of a funnel plot led them to not suspect publication bias. However, in 28 of 37 cases (76%), fewer than 10 studies of varying size were included in the plot.


There is scope for improvement in the application and interpretation of statistical analyses in SRs of therapeutic interventions. Guidelines such as PRISMA may need to be extended to provide more specific statistical guidance.

1School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia, matthew.page@monash.edu; 2UK EQUATOR Centre, Centre for Statistics in Medicine, NDORMS, University of Oxford, Oxford, UK; 3Centre for Journalology, Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada; 4School of Epidemiology, Public Health, and Preventive Medicine, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada; 5Knowledge Synthesis Group, Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada; 6Department of Medicine, University of Valencia/INCLIVA Health Research Institute and CIBERSAM, Valencia, Spain; 7Knowledge Translation Program, Li Ka Shing Knowledge Institute, St Michael’s Hospital, Toronto, Ontario, Canada; 8Epidemiology Division, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada

Conflict of Interest Disclosures:

Douglas G. Altman and David Moher are Peer Review Congress Advisory Board Members but were not involved in the review or decision for this abstract.