Citations of Human Gene Research Articles That Describe Wrongly Identified Nucleotide Sequences

Yasunori Park,1 Jennifer Anne Byrne1,2


Preclinical human gene research articles that describe wrongly identified nucleotide sequence reagents provide incorrect information, and these research articles could be manufactured by paper mills that “have been alleged to mass-produce fraudulent manuscripts for publication.”1(p2) Such problematic articles can be highly cited1; however, the features and consequences of these citations are largely unknown. The authors investigated this question by analyzing citations of problematic human gene research articles and of literature reviews that cite these articles.


A human gene research article cited in PubMed (ie, PMID 25721211) with wrongly identified nucleotide sequences was selected as an index case2 to build citation networks in R-studio using Google Scholar citations prior to March 31, 2022. The citing articles were screened for wrongly identified sequences1 and citation context.3 Problematic articles with wrongly identified sequences were examined through up to 6 citation generations. Because 95 literature reviews cited problematic articles, the authors also analyzed 13 literature reviews that focused on human genes. All review references were screened for wrongly identified nucleotide sequences1 to identify problematic references. Each review text was examined to identify statements that cited problematic references. Publications that cited each literature review were examined to identify review citations that reflected information from problematic references.


After analysis of the citations in PMID 25721211 through up to 6 citation generations, 87 cited problematic articles (Figure 23) published in 50 journals were identified. As previously reported,1 most problematic articles (79 of 87 [91%]) were authored by teams from hospitals in China. Ninety-three citations of problematic articles by other problematic articles were identified. A total of 360 statements were identified in 338 citing articles that were supported by problematic articles, typically in the Discussion section (183 of 360 [51%]) or Introduction section (133 of 360 [41%]). The 13 human gene literature reviews cited a total of 1887 references that included 206 problematic articles (11%). Between 1 and 13 claims per review (82 claims in total) were supported by problematic references. The 13 reviews have been cited 1843 times, in which 3 citations reflected claims from 3 problematic references. The 206 problematic references have been cited 31,914 times, including by 5 clinical trial articles. Problematic references were also cited by 78 patent families and 9 Wikipedia entries.


After analysis of the citation network for 1 problematic gene research article, 87 problematic articles and 93 citations of problematic articles by other problematic articles were identified (Figure 23). It was further demonstrated that 13 literature reviews of human genes referenced 206 problematic articles that were in turn cited 31,914 times. Although infrequent, subsequent literature review citations can reflect information from problematic review references.


1. Park Y, West RA, Pathmendra P, et al. Identification of human gene research articles with wrongly identified nucleotide sequences. Life Sci Alliance. 2022;5:e202101203. doi:10.26508/lsa.202101203

2. Kempen JH. Appropriate use and reporting of uncontrolled case series in the medical literature. Am J Ophthalmol. 2011;151:7-10. doi:10.1016/j.ajo.2010.08.047

3. Zhang G, Ding Y, Milojević S. Citation content analysis (CCA): a framework for syntactic and semantic analysis of citation content. J Am Soc Inf Sci Tec. 2013;64:1490-1503. doi:10.1002/asi.22850

1Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia; 2NSW Health Statewide Biobank, NSW Health Pathology, Camperdown, New South Wales, Australia, jennifer.byrne@health.nsw.gov.au

Conflict of Interest Disclosures

None reported.


Jennifer Anne Byrne acknowledges grant funding from the National Health and Medical Research Council of Australia, Ideas grant ID APP1184263.

Role of the Funder/Sponsor

The funder played no role in the study design, data collection, management, analysis, or interpretation and will play no role in the writing of any report or the decision to submit the report for publication.