Detection of Open Science Practices in Major Medical Journals: A Survey and Diagnostic Accuracy of Automatic Tools Using Sensitivity and Specificity

Abstract

Constant Vinatier,¹ Ayu Putu Madri Dewi,² Gwénaël Dumont,¹ Tracey Weissgerber,³ Vladislav Nachev,³ Gowri Gopalakrishna,^2,3,4 Maud Scheidecker,¹ François-Joseph Arnault,¹ Nicholas J. DeVito,⁵ Guillaume Freyermuth,⁶ Mathieu Acher,^6,10 Gauthier Le Bartz Lyan,⁶ Inge Stegeman,^7,8 Mariska M. G. Leeflang,² F. Naudet^9,10

Objective

Despite open science policies in major biomedical journals, adherence remains uncertain. This study evaluated automated tools, from regular expressions to large language models (LLMs), for assessing core open science practices in leading biomedical journals.

Design

We retrospectively assessed research articles from a sample of 10 major generalist medical journals (Annals of-Internal Medicine, BMJ, BMC Medicine, Canadian Medical Association Journal [CMAJ], JAMA, JAMA Network Open, Lancet, Nature Medicine, New England Journal of Medicine, and PLoS Medicine) from 2020 to 2023. Articles were retrieved via PubMed using a Peer Review of Electronic Search Strategies (PRESS) search strategy. The database comprised random samples of 103 randomized controlled trials (RCTs), 98 meta-analyses (MAs), and 111 other research articles (RAs). We evaluated 13 open science practices, including study registration, data sharing, and protocol sharing (open access or upon request). Each article was evaluated by 2 independent raters, with any disagreements resolved by a third rater. Seven different automated tools—rtransparent, oddpub, ctRegistries, ContriBot, DataSeer, SciScore, and an LLM (Llama 3-70B)—were used. Diagnostic accuracies were estimated using sensitivities, specificities, F1 scores, and LR+ and LR-.

Results

Manual extraction in the 312 articles identified registration in 98% (101/103) of RCTs, 69% (68/98) of MAs, and 18% (20/111) of RAs. Open data were present in 6% (6/103) of RCTs, 36% (35/98) of MAs, and 13% (15/111) of RAs and accessible upon request in 78% (80/103), 41% (40/98), and 59% (66/111), respectively. Protocols were openly available in 84% (87/103) of RCTs, 64% (63/98) of MAs, and 20% (22/111) of RAs and accessible upon request in 5% (5/103), 3% (3/98), and 3% (3/111), respectively. The accuracy of automated tools varied depending on the practice evaluated, with F1 scores ranging from 1.00 (Conflict of Interest statement, rtransparent) to 0.16 (SciScore, registration). For study registration, a simple tool using regular expressions, such as rtransparent, demonstrated good sensitivity (77%; 95% CI, 70%-83%) and high specificity (93%; 95% CI, 88%-97%). Data sharing detection remained challenging; for instance, rtransparent detects data sharing with a sensitivity of 74% (95% CI, 68%-80%) and a specificity of 59% (95% CI, 46%-70%). Different diagnostic accuracies were observed depending on the type of research and the journal, likely due to different formatting standards. All results are shown in Table 25-1025. Limitations include the declarative nature of some practices (eg, data sharing).

Conclusions

Our study provides a detailed description of core open science practices across leading biomedical journals. It also highlights current challenges regarding the accuracy of automated tools in detecting these practices. While these tools likely provide valuable insights into overall practices, it is crucial to remain aware of the potential ranking biases introduced by these tools, as well as their limitations in providing detailed feedback for individual studies.

¹Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail), UMR_S 1085, Rennes, France, constant.vinatier1@gmail.com; ²Department of Epidemiology and Data Science, Amsterdam University Medical Centers, Amsterdam, the Netherlands; ³QUEST Center for Responsible Research, Berlin Institute of Health at Charité–Universitätsmedizin Berlin, Berlin, Germany; ⁴Department of Epidemiology, Faculty of Health, Medicine, and Life Sciences, Maastricht University, Maastricht, the Netherlands; ⁵Bennett Institute for Applied Data Science, Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK; ⁶Univ Rennes, IRISA, Inria, CNRS, Rennes, France; ⁷Department of Otorhinolaryngology and Head and Neck Surgery, University Medical Center Utrecht, Utrecht, the Netherlands; ⁸Brain Center, University Medical Center Utrecht, Utrecht, the Netherlands; ⁹Univ Rennes, CHU Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail), UMR_S 1085, Rennes, France; ¹⁰Institut Universitaire de France (IUF), France.

Conflict of Interest Disclosures

None reported.

Funding/Support

As part of the OSIRIS project, this work was supported by the European Union’s Horizon Europe Research and Innovation Program under grant agreement number 101094725. Constant Vinatier, Ayu Putu Madri Dewi, Gowri Gopalakrishna, Nicholas J. DeVito, Inge Stegeman, Mariska M. G. Leeflang, and F. Naudet are members of this project.

Role of the Funder/Sponsor

The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Meeting Information

10th Congress information available here
Sponsors and Exhibitors

2025 Sponsors and Exhibitors are available here.
Past Congresses

See details on previous congresses here.