Abstract

Identifying Potential Duplicate Publications in the Scientific Literature Using Crossref

Cyril Labbé,1 Qinyue Liu,1 Amira Barhoumi,1 Olessya Miroshnichenko1

Objective

For various reasons, duplicate publications are often considered problematic.1,2 We tested a method to semiautomatically detect such publications from publicly available metadata registered with Crossref by publisher.3

Design

We first queried Crossref using ISSNs or ISBNs to build a set of (random) digital object identifiers (DOIs) with abstracts for duplicates to be searched. For each DOI, Crossref was queried again to retrieve a new set of 200 DOIs with similar titles containing potential duplicates of publications in the first set. We then identified abstracts in the second set that were nearly identical to those in the first set, differing by only a few words. When DOIs of apparent duplicates were not registered by preprint platforms (eg, bioRxiv, medRxiv), we would then automatically analyze when DOIs were registered by a different publisher, had differences in authorship, or were published in journals or books. The procedure was applied to the following sets of DOIs chosen to represent a wide variety of publishers for which abstracts are registered with Crossref: Science publications (2023-2024), PLOS One (2023-2024), International Journal of Molecular Sciences (2024), BioMed Research International (2024), Scientific Reports (2024), and IGI Global (DOIs included in books published in 2024).

Results

The vast majority of duplicates found were not problematic, as they typically originated from preprint versions (Table 25-0986). For example, of 375 duplicated abstracts for articles published in PLOS One, 4 were not preprint publications. Moreover, close inspection revealed that 3 duplicates were republication of abstracts only. Only the 1 remaining abstract could be seen as problematic, as it was a duplication of the same exact content in 2 different journals, both open access.

Conclusions

One limitation of the process is that despite having the exact same abstract, the body of the publications might be different. For many publications, abstracts are not registered with Crossref or sometimes are registered with errors. It is also difficult to identify when one of the DOIs represents a republication with an abstract alone. Abstracts publicly available at Crossref can be used to detect problematic duplicate publications as a cost-effective alternative to plagiarism detection tools. This experiment emphasizes the importance for publishers to answer positively to the Initiative for Open Abstracts (I4OA), which calls on all scholarly publishers to open the abstracts of their published works and, where possible, to submit them to Crossref.

References

1. Tramèr MR, Reynolds DJ, Moore RA, McQuay HJ. Impact of covert duplicate publication on meta-analysis: a case study. BMJ. 1997;315(7109):635-640. doi:10.1136/bmj.315.7109.635

2. Errami M, Hicks JM, Fisher W, et al. Déjà vu—a study of duplicate citations in Medline. Bioinformatics. 2008;24(2):243-249. doi:10.1093/bioinformatics/btm574

3. van Eck NJ, Waltman L. Crossref as a source of open bibliographic metadata. MetaArXiv. Preprint posted online May 12, 2025. doi:10.31222/osf.io/smxe5

1Université Grenoble Alpes, French National Centre for Scientific Research, Grenoble INP, Laboratoire d’Informatique de Grenoble, Grenoble, France, cyril.labbe@imag.fr.

Conflict of Interest Disclosures

None reported.

Funding/Support

We acknowledge the NanoBubbles project that has received Synergy grant funding from the European Research Council within the European Union’s Horizon 2020 program (grant agreement 951393).

Role of the Funder/Sponsor

The funders had no role in this research.