The objective was to test and compare various methods to detect text duplication in peer reviews submitted by 2 or more reviewers.
Peer review fraud is a significant concern.1,2 A data set of peer review comments submitted to SAGE Publishing was analyzed to search for duplicate text, a possible sign of fake peer review.3 Peer review comments for each article peer reviewed by 19 SAGE Publishing journals were downloaded from the ScholarOne peer review management system and loaded into a Pandas DataFrame. Journals were chosen based on the availability of data; therefore, the data set should be considered biased. Similar comments were found using a number of search methods, including MinHash Locality Sensitive Hashing (MinHash LSH) for detecting near-duplicate text strings, and Elasticsearch, a scalable graph database combined with RapidFuzz, a fast string-comparison library, for distinguishing similar from dissimilar comments.
Of 62,974 peer reviewer accounts used to evaluate 66,815 articles, 357 accounts (0.05%) were identified that produced reviews with partial or fully duplicate comments. One large cluster of 47 accounts that shared a number of reports included a number of articles rejected because of suspected paper mill activity. This number suggests that the cluster of 47 accounts represented 47 fake reviewer accounts administered by a paper mill. In total, 972 articles (1.5%) had reviews from reviewer accounts associated with duplicate commenting activity, and 77 articles had reviews from the 47 suspected paper mill accounts (Figure 33). Different search methods identified different suspect accounts and clusters. These searches included (1) a search for exact duplicates, which took 16 seconds to load data into memory and less than 1 second to execute; this search found 29 accounts that had produced similar comments, and (2) a search for similar comments using Elasticsearch, which took 18 minutes and 29 seconds to index and 9 hours, 19 minutes, and 2 seconds to execute; this search found 204 accounts that had produced similar comments.
Efficient methods for identifying possible peer review fraud and paper mill activity were described. The methods should be tested on broader peer review sets and settings. When duplication is found, the findings must be considered in context before a judgment can be made about whether there is misconduct.
1. Misra DP, Ravindran V, Agarwal V. Integrity of authorship and peer review practices: challenges and opportunities for improvement. J Korean Med Sci. 2018;33(46):e287. doi:10.3346/jkms.2018.33.e287
2. Cohen A, Pattanaik S, Kumar P, et al. Organised crime against the academic peer review system. Br J Clin Pharmacol. 2016;81(6):1012-1017. doi:10.1111/bcp.12992
3. Dadkhah M, Kahani M, Borchardt G. A method for improving the integrity of peer review. Sci Eng Ethics. 2018;24(5):1603-1610. doi:10.1007/s11948-017-9960-9
1SAGE Publishing, London, UK, email@example.com
Conflict of Interest Disclosures