Abstract
Evidence of Use of Template-Based Peer Review Reports and Concern About Review Mills
Cyril Labbé,1 Gilles Hubert,2 Wendeline Swart,2 Guillaume Cabanac2,3
Objective
Paper mills are well documented, and the existence of review mills has been suggested.1 Our aim was to test tools and methods that would report evidence of such mills. We studied 4 datasets of peer review reports and found evidence for template-based peer review report practices.
Design
Attempting to maximize diversity, we collected 4 datasets gathering 148,159 peer review reports. We used webscraped peer review reports from MDPI journals (11 journals from 2018-2025: 47,593 articles and 122,831 reports), BMJ (97 articles from 2021-2024: 308 reports), PeerJ journals listed in the Multidisciplinary Open Peer Review Dataset (MOPRD)2 (7 journals from 2015-2022: 6292 articles and 12,959 reports), as well as the available dataset NLPeer3 (12,061 reports from 5 conferences in NLP: years 2016, 2017, 2020, and 2022, and F1000Research platform: year 2022). We investigated only plain text reports (excluding attached files) considering “round 1” only. We computed statistics on report length, common sequences of terms (CST), and similarity measures between reports.
Results
Depending on the dataset, the mean report length ranged from 251 to 530 words, and the median from 197 to 433 words. Reports with fewer than 20 words (from 1 to 2825, depending on the dataset) were mostly coming from resubmissions inaccurately reported as a round 1 submission. Regarding CST, excluding stop words (respectively including stop words), 8% to 12% (respectively 11.16% to 24.72%) of reports shared common sequences of at least 10 terms. This represented from 15% to 28% (CST excluding stop words) of articles having at least 1 such report. Regarding similarity, approximately 0.5% to 1.3% of reports were highly similar to another report (ie, cosine similarity excluding stop words greater than 0.75). This represented 0.8% to 3% of articles having at least 1 such report. After having automatically highlighted common CST in reports sharing more than 100 words, we analyzed them qualitatively in order to identify potential templates used by reviewers to write their reports. We were able to identify very generic chunks that are reused from report to report, sometimes by different identifiable reviewers.
Conclusions
Our results showed evidence of template-based peer review reports practices. We observed that the extent of these practices varies from one dataset to another. This might be due to reviewers’ practices, discrepancies in dataset sizes, differences in scientific fields, or to journals’, editors’, or publishers’ habits. More openly available review report datasets would help further characterize and understand this phenomenon.
References
1. Oviedo-García MÁ. The review mills, not just (self-)plagiarism in review reports, but a step further. Scientometrics. 2024;129:5805-5813. doi:10.1007/s11192-024-05125-w
2. Lin J, Song J, Zhou Z, Chen Y, Shi X. MOPRD: a multidisciplinary open peer review dataset. Neural Comput Applications. 2023;35:24191-24206. https://doi.org/10.1007/s00521-023-08891-5
3. Dycke N, Kuznetsov I, Gurevych I. NLPeer: a unified resource for the computational study of peer review. arXiv. Preprint posted online November 12, 2022.doi:10.18653/v1/2023.acl-long.277
1Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG (Laboratoire d’Informatique de Grenoble LIG – équipe SIGMA), Domaine Universitaire de Saint-Martin-d’Hères, Grenoble, France, cyril.labbe@imag.fr; 2IRIT, Université de Toulouse, CNRS, Toulouse INP, UT, Laboratoire IRIT – équipe IRIS, Toulouse, France; 3Institut Universitaire de France (IUF), Paris, France.
Conflict of Interest Disclosures
None reported.
Funding/Support
Cyril Labbé and Guillaume Cabanac acknowledge the NanoBubbles project (https://nanobubbles.hypotheses.org/) that has received Synergy grant funding from the European Research Council (ERC) within the European Union’s Horizon 2020 program, grant agreement number 951393. Guillaume Cabanac received funding from the Institut Universitaire de France (IUF).
Additional Information
Cyril Labbé and Gilles Hubert (gilles.hubert@irit.fr) are co–corresponding authors.