Abstract

Degree of Text Similarity and Prevalence of Potential Plagiarism in Biomedical Research Articles According to Linguistic Background and Field of Study

Joon Seo Lim,1 Danielle A. Lee,1 Sung-Han Kim,2 Tae Won Kim3

Objective

Text similarity detection software is widely used by biomedical journals to screen submitted manuscripts for potential plagiarism, with some journals rejecting manuscripts with high overall similarity scores in (eg, >40%) without further review. However, considering that overall scores may be vulnerable to false-positives resulting from common phrases, certain guidelines suggest examining the single-source scores to detect potential plagiarism.1 The degree of text similarity and prevalence of potential plagiarism in biomedical articles was examined according to linguistic background (English-speaking vs non–English-speaking) and field of study (clinical vs nonclinical).

Design

This cross-sectional study was performed in June 2020 and followed the STROBE reporting guideline. We analyzed the iThenticate similarity reports of 480 articles randomly selected from an open access multidisciplinary journal, PLoS One. The articles were categorized into 8 preselected countries as English-speaking (USA, UK, Canada, Australia) vs non–English-speaking (Korea, China, France, Italy) and 6 fields of study as clinical (cardiology, gastroenterology, oncology) vs nonclinical (molecular biology, genetics, microbiology). The degree of text similarity was defined as the overall iThenticate score, and the presence of potential plagiarism was defined as either (1) a single-source score of greater than 10% according to the Springer Nature guideline1 or (2) overall score of greater than 40%, which is a cutoff used at some journals for considering editorial actions.2,3 The similarity scores in each manuscript section were measured by calculating the proportion of highlighted text in each using ImageJ.

Results

The degree of text similarity differed significantly among countries, with articles from non–English-speaking countries having higher scores than those from English-speaking countries (30.9% vs 23.8%, respectively; P < .001) (Table 39). Among the non–English-speaking countries, there was no significant difference in the degree of text similarity between Asian and European countries (31.7% vs 30.1%, respectively; P = .27). Text similarity also differed among fields of study, with clinical articles having higher scores than nonclinical articles (29.5% vs 25.2%, respectively; P < .001). Measurement of text similarity showed that the Methods had the highest degree of text similarity among manuscript sections. The overall prevalence of potential plagiarism was 13.5% (65/480) and 13.8% (66/480) according to the single-source score cutoff of greater than 10% and the overall score cutoff of greater than 40%, respectively. Except for the lower prevalence of potential plagiarism in English-speaking countries according to the overall score cutoff (5.4% vs 22.1%, respectively; P < .001), no statistically significant differences were noted between English-speaking and non–English-speaking countries, Asian and European countries, and clinical and nonclinical articles.

Conclusions

While the degree of text similarity differed significantly according to linguistic background and field of study, the prevalence of potential plagiarism was similar across countries and fields of study. Clinical researchers in non–English-speaking countries in particular may benefit from receiving English-language writing education to avoid unintended text similarity.

References

1. Springer. Plagiarism prevention with CrossCheck. Accessed February 24, 2022. https://www.springer.com/gp/authors-editors/editors/plagiarism-prevention-with-crosscheck/4238

2. IEEE Robotics & Automation Society. Information for IROS editors. Accessed June 14, 2022. https://www.ieee-ras.org/conferences-workshops/financially-co-sponsored/iros/information-for-editors

3. ARRUS Journal of Mathematics and Applied Science. Plagiarism policy. Accessed June 14, 2022. https://jurnal.ahmar.id/index.php/mathscience/plagiarism

1Scientific Publications Team, Clinical Research Center, Asan Institute for Life Sciences, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea; 2Department of Infectious Diseases, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea; 3Department of Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea; twkimmd@amc.seoul.kr

Conflict of Interest Disclosures

None reported.

Funding/Support

This work was supported by grant 2019-781 from the Asan Institute for Life Sciences at Asan Medical Center, Seoul, South Korea.

Role of the Funder/Sponsor

The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the abstract; and decision to submit the abstract for presentation.

Poster