Comparison of Reports of Epidemiology Studies Posted as bioRxiv Preprints and Published in Peer Reviewed Journals

Mario Malički,1 Ana Jerončić,2 Gerben ter Riet,3,4 Lex Bouter,5,6 John P. A. Ioannidis,1,7,8,9,10 IJsbrand Jan Aalbersberg,11 Steven N. Goodman1,7,8


Previous studies showed high levels of similarity between preprints and their subsequent peer-reviewed journal publications.1,2 The goal of this study was to analyze the extent of similarity for preprint-publication pairs in the field of epidemiology.


This cohort study documented differences between bioRxiv epidemiology preprints with only 1 preprint version and their subsequent journal version. Preprints were classified as “epidemiology” by their submitting authors. From inception of the preprint server through December 31, 2019, there were 622 such preprints. Sample size calculation for precision using a 95% confidence level and an 8% margin of error yielded a requirement of 121 preprints, which were then randomly sampled from the 622. Changes between preprint-publication pairs were highlighted using the Microsoft Word function compare two versions of a document. Any changes that occurred were noted and classified.


The 121 bioRxiv epidemiology preprints were later published in 73 different journals (median [IQR] impact factor, 4 [2.9-6.9]) with a median (IQR) time from preprint to publication of 204 (131-243) days. Of the 121 pairs, 31 (26%) had differences in their titles, 8 (7%) in the number or order of authors, 31 (26%) in the number of tables, 28 (23%) in the number of figures, 102 (84%) in the number of references, 54 (44%) in acknowledgment descriptions, 74 (61%) in conflict of interest declarations, and 49 (40%) in data sharing statements. Regarding main content, 109 (90%) had changes in the abstract, 7 of which (6%) reported different P values; 106 (88%) had changes in the introduction section, 37 of which (31%) altered descriptions of their objectives; 120 (99%) had changes in their methods section, 9 of which (7%) had changes in their sample size; 115 (95%) had changes in their results section, with 82 (68%) adding or removing (parts of) results; and 116 (96%) had changes in the their discussion sections, with 65 (54%) adding limitations in their journal versions and 12 (10%) exhibiting substantive changes to main results in the first sentence of their discussion (Table 18).


This study shows that almost all aspects of epidemiological preprints were slightly changed in their journal publication versions, with 10% of preprints changing their main findings. Further research is needed to determine who requested those changes and why, whether changes were associated with the quality of the study or the expertise of those requesting them, and whether changes led to increases in validity, transparency, or readability.


1. Klein M, Broadwell P, Farb SE, Grappone T. Comparing published scientific journal articles to their pre-print versions. Int J Digital Librar. 2019;20:335-350. doi:10.1007/s00799-018-0234-1

2. Shi X, Ross JS, Amancharla N, Niforatos JD, Krumholz HM, Wallach JD. Assessment of concordance and discordance among clinical studies posted as preprints and subsequently published in high-impact journals. JAMA Netw Open. 2021;4(3):e212110. doi:10.1001/jamanetworkopen.2021.2110

1Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA, mario.malicki@mefst.hr; 2Department of Research in Biomedicine and Health, University of Split School of Medicine, Split, Croatia; 3Urban Vitality Centre of Expertise, Amsterdam University of Applied Sciences, Amsterdam, the Netherlands; 4Amsterdam University Medical Centers, Department of Cardiology, Amsterdam, the Netherlands; 5Department of Philosophy, Faculty of Humanities, Vrije Universiteit, Amsterdam, the Netherlands; 6Amsterdam University Medical Centers, Department of Epidemiology and Data Science, Amsterdam, the Netherlands; 7Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA; 8Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA; 9Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA; 10Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, CA, USA; 11Elsevier, Amsterdam, the Netherlands

Conflict of Interest Disclosures

IJsbrand Jan Aalbersberg is senior vice president of research integrity at Elsevier. Mario Malički is a co–editor in chief of Research Integrity and Peer Review. Lex Bouter, John P. A. Ioannidis, and Steven N. Goodman are members of the Peer Review Congress Advisory Board but were not involved in the review or decision for this abstract.


Elsevier funding was awarded to Stanford University for a Meta-Research Innovation Center at Stanford postdoctoral position that supported Mario Malički’s work on the project.

Role of the Funder/Sponsor

IJsbrand Jan Aalbersberg is an employee of Elsevier and had a role in the design and conduct of the study; management and interpretation of the data; review and approval of the abstract; and decision to submit the abstract for presentation.