Comparison of Changes in High-Quality vs Low-Quality Evidence in Original and Updated Systematic Reviews

Abstract

Comparison of Changes in High-Quality vs Low-Quality Evidence in Original and Updated Systematic Reviews

Benjamin Djulbegovic,¹ Muhammad Muneeb Ahmed,² Iztok Hozo,³ Despina Koletsi,⁴ Lars Hemkens,^5,6,7 Amy Price,⁸ Rachel Riera,⁹ Paulo Nadanovsky,¹⁰ Ana Paula Pires dos Santos,¹¹ Daniela Melo,¹² Ranjan Pathak,¹³ Rafael Leite Pacheco,^14,15 Luis Eduardo Fontes,^15,16 Enderson Miranda,¹⁷ David Nunan,^17,18

Objective

It is generally believed that evidence from a weaker body of evidence (eg, poorly designed and executed and potentially biased studies, sparse studies and/or heterogenous results) will generate inaccurate estimates about treatment effects more often than evidence from a stronger body of evidence. As a result, estimates of effects of health interventions initially based on high certainty (quality) of evidence (CoE) are expected to change less frequently than the effects estimated by lower CoE, and the estimates of magnitude of effect size are expected to differ between high and low CoE. Empirical assessment of these foundational principles of evidence-based medicine has been lacking.

Design

The Cochrane Database of Systematic Reviews was reviewed from January 2016 through May 2021 for pairs of original and updated reviews for change in CoE assessments based on the Grading of Recommendations Assessment, Development and Evaluation (GRADE) method. The difference in effect sizes between the original and updated reviews were assessed as a function of change in CoE, which was reported as a ratio of odds ratios (RORs). The RORs generated in the studies that changed CoE from very low/low (VL/L) to moderate/high (M/H) vs MH/H to VL/L were compared. Heterogeneity and inconsistency were assessed using the τ and I² statistic. The change in precision of effect size estimates was assessed by calculating the ratio of standard errors (seR), and the absolute deviation in estimates of treatment effects was assessed with adjusted RORs.

Results

Overall, 419 pairs of reviews were included, of which 414 (207 × 2) informed the CoE appraisal and 384 (192 × 2) the assessment of effect size. Certainty of evidence originally appraised as VL/L had 2.1 (95% CI, 1.19-4.12; P = .01) times higher odds to be changed in the future studies than those with M/H CoE. However, the pooled effect size was not different when the CoE changed from VL/L to M/H (ROR, 1.02; 95% CI, 0.74-1.39) compared with M/H CoE changing to VL/L (ROR,1.02; 95% CI, 0.44-2.37). Similarly, the overlap in aROR between the VL/L CoE to M/H vs the M/H to VL/L subgroups was observed (median [IQR], 1.12; 95% CI, 1.07-1.57 vs 1.21; 95% CI, 1.12-2.43). There was a large inconsistency across ROR estimates (I² = 99%). There was larger imprecision in treatment effects when the CoE changed from VL/L to M/H (seR = 1.46) than when it changed from M/H to VL/L (seR = 0.72).

Conclusions

This study found that low-quality evidence changed more often than high CoE. However, the effect size was not systematically different between studies with low vs high CoE, indicating the need for improving contemporary critical appraisal methods.

¹Beckman Research Institute, Department of Computational & Quantitative Medicine, City of Hope, Duarte, CA, USA, bdjulbegovic@coh.org; ²Michael G. DeGroote School of Medicine, McMaster University, Hamilton, ON, Canada; ³Department of Mathematics, Indiana University Northwest, Gary, IN, USA; ⁴Clinic of Orthodontics and Pediatric Dentistry, Center of Dental Medicine, University of Zurich, Zurich, Switzerland; ⁵Department of Clinical Research, University of Basel, Basel Institute for Clinical Epidemiology & Biostatistics, University Hospital Basel, Basel, Switzerland; ⁶Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA; ⁷Meta-Research Innovation Center Berlin (METRIC-B), Berlin Institute of Health, Berlin, Germany; ⁸Anesthesia Informatics and Media Lab, Stanford University, Stanford, CA, USA; ⁹Universidade Federal de São Paulo, Escola Paulista de Medicina, Brazil (Unifesp), São Paulo, Brazil; ¹⁰Department of Epidemiology and Quantitative Methods in Health, National School of Public Health, Fundação Oswaldo Cruz (FIOCRUZ)–Department of Epidemiology, Institute of Social Medicine, Universidade do Estado do Rio de Janeiro (UERJ), Rio de Janeiro, Brazil; ¹¹Department of Community and Preventive Dentistry, Faculty of Dentistry, Universidade do Estado do Rio de Janeiro (UERJ), Rio de Janeiro, Brazil; ¹²Department of Pharmaceutical Sciences, Universidade Federal de São Paulo (Unifesp), São Paulo, Brazil; ¹³Department of Medical Oncology and Therapeutics Research, City of Hope, Duarte, CA, USA; ¹⁴Centro Universitário São Camilo, São Paulo, Brazil; ¹⁵Center of Health Technology Assessment, Hospital Sirio-Libanês–São Paulo, Brazil; ¹⁶Department of Intensive Care and Emergency Medicine, Faculdade de Medicina de Petrópolis, Petrópolis, Rio de Janeiro, Brazil; ¹⁷Kellogg College, University of Oxford, Oxford, UK; ¹⁸Centre for Evidence-Based Medicine, Nuffield Department of Primary Care Health Sciences, Oxford University, Oxford, UK

Conflict of Interest Disclosures

None reported.

International Congress on
Peer Review and Scientific Publication

Enhancing the quality and credibility of science

Abstract