A Machine Learning–Powered Literature Surveillance Approach to Identify High-Quality Studies From PubMed in Disease Areas With Low Volume of Evidence

Patricia L. Kavanagh,1 Tamara Navarro-Ruan,2 Peter LaVita,1 Parrish Rick,1 Alfonso Iorio2,3


The DynaMed Systematic Literature Surveillance process surveys a large set of clinical journals most likely to contain high-quality, high-relevance content on treatment, diagnosis, and prognosis across all medical conditions. For many conditions, limited content is retrieved from those journals. Therefore, a machine learning–powered process was designed, implemented, and tested to efficiently and accurately identify relevant articles published across all journals indexed in PubMed.1-3 This study reports the overall performance of this machine learning–augmented surveillance system.


Content-based search strategies were developed by a medical librarian. PubMed-retrieved references were probability ranked by a LightGBM machine learning algorithm for likelihood of reporting high-quality, clinically relevant evidence.1 Top-ranked references were included for screening, stratified by publication date (<18 months or ≥18 months). Clinical experts trained in critical appraisal of the literature manually screened the references and identified those to be used for updating the topic. The following metrics were used to evaluate the machine learning system: median probability ranking by machine learning of the 15 highest-ranked references, overall and by topic; total and median number of references retrieved by topic; and median position of the first selected reference in the probability-ranked list compared with PubMed reference lists ranked as most recent and best match.


As of May 2022, results were reviewed for 332 topics. Of 91,009 articles identified, the 8406 (9.2%) with the highest probability ranking were manually screened, and 576 references (6.9%) selected to update 241 topics. The median number of references retrieved by topic was 184 (range, 7-3638). The median probability assigned to the 576 references was 0.047 (range, 0.002-0.996), and the median probability by topic was 0.079 (range, 0.047-0.803). The median position of first selected reference for machine learning was 2 vs 9 for the PubMed most recent strategy and 20 for the PubMed best match strategy. Overall, the median difference in position was 22 for machine learning vs the PubMed most recent strategy and 54.5 for machine learning vs the PubMed best match strategy. The 241 topics were distributed among 29 specialties, with pediatrics and infectious diseases accounting for 27%. The most common article type selected was cohort study (29%).


This study provides precise estimates of the performance of a regression-based machine learning algorithm in assisting literature surveillance for topics with a low volume of evidence.


1. Abdelkader W, Navarro T, Parrish R, et al. A deep learning approach to refine the identification of high-quality clinical research articles from the biomedical literature: protocol for algorithm development and validation. JMIR Res Protoc. 2021;10(11):e29398. doi:10.2196/29398

2. Del Fiol G, Michelson M, Iorio A, Cotoi C, Haynes RB. A deep learning method to automatically identify reports of scientifically rigorous clinical research from the biomedical literature: comparative analytic study. J Med Internet Res. 2018;20(6):e10281. doi:10.2196/10281

3. Abdelkader W, Navarro T, Parrish R, et al. Machine learning approaches to retrieve high-quality, clinically relevant evidence from the biomedical literature: systematic review. JMIR Med Inform. 2021;9(9):e30401. doi:10.2196/30401

1DynaMed, EBSCO Health, Ipswich, MA, USA; 2Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada, iorioa@mcmaster.ca; 3Department of Medicine, McMaster University, Hamilton, ON, Canada

Conflict of Interest Disclosures

None reported.