Abstract
Natural Language Processing to Assess the Role of Preprints in COVID-19 Policy Guidance
Nicholas G. Evans,1 Samuel Angelli-Nichols,1 Emma Chang-Rabley,2 Yara Omar,3 Rachel Nas,4 Mikaela Finnegan,4 Rocco Casagrande,4 Emily E. Ricotta5
Objective
To understand the role of preprint copies of scientific papers in public health policy guidance during the COVID-19 pandemic and the potential effects of changes in published versions of record (postprints) on that guidance.1,2
Design
We extracted preprint citations from the Department of Homeland Security Master Question List from February 12, 2020, to February 9, 2021, and the National Institute for Occupational Safety and Health weekly COVID-19 Report from November 6, 2020, to September 17, 2021. Text of the abstract from each preprint was parsed and compared with its postprint using a neural conditional random field model for sentence alignment.3 Identified sentence-level differences were subject to review and adjudication by the research team. Substantive changes were categorized as either nonsignificant numerical changes, contextual updates, or significant changes. Significant changes were compared against the text of the policy guidance to determine whether the postprint result would have resulted in changed guidance and whether guidance was updated after publication.
Results
Our algorithm had a sensitivity of approximately 100% in detecting significant sentence-level changes when compared against a human adjudicator. Of 600 preprints extracted, 486 (81.0%) had associated postprints published during the study period. Our sample had a higher publication rate than global estimates of COVID-19 preprints (21.1%; odds ratio, 3.53; 95% CI, 3.10-4.02). Over time, guidance was slightly more likely to incorporate new postprints than preprints (odds ratio, 1.02; 95% CI, 1.01-1.03). Our model flagged 10,483 sentence-level changes across 464 preprints for review. Of these 464 papers, 329 (70.9%) required adjudication for the significance of their changes and 105 (22.6%) contained changes our team found could potentially change policy guidance. Significant changes included, among others, conclusions about the effectiveness of clinical interventions or nonpharmaceutical interventions to slow the spread of COVID-19, basic epidemiological properties, and the relative efficacy of vaccine candidates against variants of concern. Of 105 significant papers, 44 (41.9%) impacted policy guidance. Guidance that was impacted by significant changes tended to concern therapeutics and public health measures and, less often, changes resulting from the emergence of variants. The remainder involved guidance that lacked granularity of public health guidance relative to the kind of change in the paper’s history. This was followed by a tendency toward conservative public health guidance when presented with competing evidence sources, which meant a single paper’s change did not alter a recommendation. Only 9 papers containing policy-relevant changes were updated.
Conclusions
Preprints can provide critical early information to policymakers but can generate unclear or misleading advice in key areas that may persist if not subject to clear and timely updates. While preprints in our sample tended to be of publishable quality, more work is needed to establish how to ensure this quality in future outbreak response.
References
1. Nelson L, Ye H, Schwenn A, Lee S, Arabi S, Hutchins BI. Robustness of evidence reported in preprints during peer review. Lancet Global Health. 2022;10(11):e1684-e1687. doi:10.1016/S2214-109X(22)00368-0
2. Collins, Alexander R. Reproducibility of COVID-19 pre-prints. Scientometrics. 2022;127(8):4655-4673. doi:10.1007/s11192-022-04418-2
3. Jiang, Maddela M, Lan W, Zhong Y, Xu W. Neural CRF model for sentence alignment in text simplification. arXiv. Preprint posted online May 5, 2020. doi:10.48550/arXiv.2005.02324
1University of Massachusetts Lowell, Lowell, MA, US, nicholas_evans@uml.edu; 2Emory University, Atlanta, GA, US; 3Boston University, Boston, Massachusetts, US; 4Deloitte, Rosslyn, VA, US; 5Uniformed Services University of the Health Sciences, Bethesda, MD, US.
Conflict of Interest Disclosures
None reported.
Funding/Support
The initial funding of this work was provided through the Intramural Division of the National Institute for Allergy and Infectious Diseases. Nicholas G. Evans received funding from the Greenwall Foundation and from the US Air Force Office of Scientific Research, which provided time for the conduct of this research.
Role of Funder/Sponsor
The funder had no role in the design, conduct, or reporting of this work.
Disclaimer
Emily E. Ricotta is an employee of the US Department of Defense. This work does not represent the views of her organization or of the US government.
Additional Information
Emily E. Ricotta is a co–corresponding author (emily.ricotta@usuhs.edu).
