Abstract
Frequency of Data and Code Sharing in Medical Research: An Individual Participant Data Meta-analysis of Metaresearch Studies
Daniel G. Hamilton,1,2 Kyungwan Hong,3 Hannah Fraser,1 Anisa Rowhani-Farid,3 Steve McDonald,4 Fiona Fidler,1,5 Matthew J. Page4
Objective
Numerous metaresearch studies have investigated rates and predictors of data and code sharing in medicine. However, these studies have often been narrow in scope, focusing on some important aspects and predictors of sharing but not others. A systematic review and individual participant data (IPD) meta-analysis of this corpus of research is being conducted to provide an expansive picture of how availability rates have changed over time in medicine and what factors are associated with sharing.
Design
Ovid Embase, Ovid MEDLINE, MetaArXiv, medRxiv, and bioRxiv were searched up to July 1, 2021, for metaresearch studies that investigated data sharing, code sharing, or both among a sample of scientific articles presenting original research from the medical and health sciences (ie, primary articles). Two authors independently screened records and assessed risk of bias in the included studies. Key outcomes of interest included the prevalence of affirmative sharing declarations (declared availability) and availability as confirmed by the metaresearch authors (actual availability). The association between data and code availability and several factors (eg, year published, journal policy) were also examined. IPD were collected or requested from authors of eligible studies. A 2-stage approach to IPD meta-analysis was performed, with outcomes pooled using the Hartung-Knapp-Sidik-Jonkman method for random-effects meta-analysis.1 The review methods were preregistered on the Open Science Framework2 and are described in a detailed review protocol.3
Results
A total of 4970 potential studies were identified, of which 101 were eligible for the review, 28 of which did not publicly share any IPD. Eligible studies examined a median (IQR) of 203 (125-398) primary articles published between 1987 and 2020 across 32 unique medical disciplines. To date, data from 36 studies (including 7750 primary articles) have been processed. Only 1 study was classified as low risk of bias. Meta-analysis revealed declared and actual data availability rates of 9% (95% CI, 6%-14%; 23 studies) and 3% (95% CI, 1%-6%; 26 studies), respectively, since 2015, with no significant differences between rates when compared with the preceding 5-year period. The same finding was also noted for code sharing (all <1%). Early results also indicate that only 35% (95% CI, 18%-55%; 5 studies) and 16% (95% CI, 10%-22%; 2 studies) of authors complied with mandatory data and code sharing policies, respectively. Comparatively, 13% (95% CI, 0-37%; 6 studies) and 8% (95% CI, 0-50%; 4 studies) of authors submitting to journals with policies encouraging sharing or no policy made data available, respectively.
Conclusions
Preliminary analysis suggests that data and code sharing in medicine remains uncommon and occurs at a rate much lower than expected if journal policies were followed. We recommend future research to explore why sharing rates and compliance with mandatory policies are low as well as strategies for how this might be improved.
References
1. IntHout J, Ioannidis JP, Borm GF. The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Med Res Methodol. 2014;14(1):25. doi:10.1186/1471-2288-14-25
2. Hamilton DG, Fraser H, Fidler F, Rowhani-Farid A, Hong K, Page MJ. Rates and predictors of data and code sharing in the medical and health sciences: a systematic review and individual participant data meta-analysis. Open Science Framework. May 28, 2021. doi:10.17605/OSF.IO/7SX8U
3. Hamilton DG, Fraser H, Fidler F, et al. Rates and predictors of data and code sharing in the medical and health sciences: protocol for a systematic review and individual participant data meta-analysis. F1000Research. 2021;10:491. doi:10.12688/f1000research.53874.2
1MetaMelb Research Group, School of BioSciences, The University of Melbourne, Parkville, Australia, hamilton.d@unimelb.edu.au; 2Melbourne Medical School, Faculty of Medicine, Dentistry & Health Sciences, The University of Melbourne, Parkville, Australia; 3Department of Pharmaceutical Health Services Research, University of Maryland, Baltimore, MD, USA; 4School of Public Health & Preventive Medicine, Monash University, Melbourne, Australia; 5School of Historical and Philosophical Studies, The University of Melbourne, Parkville, Australia
Conflict of Interest Disclosures
Daniel G. Hamilton is a board member of the Association of Interdisciplinary Meta-research and Open Science (AIMOS) and a PhD candidate supported by an Australian Commonwealth Government Research Training Program Scholarship. The Laura and John Arnold Foundation funds the Restoring Invisible and Abandoned Trials (RIAT) Support Center, which supports the salary of Kyungwan Hong and Anisa Rowhani-Farid. Kyungwan Hong was supported in 2020 by the US Food and Drug Administration (FDA) of the US Department of Health and Human Services (HHS) as part of a financial assistance award U01FD005946, funded by FDA/HHS. The project contents are those of Kyungwan Hong and do not necessarily represent the official views of, nor an endorsement by, FDA/HHS or the US government.