Detection and Monitoring of Outcome Reporting Changes Using a Large Language Model: Application to FDA-Regulated Drug Trials

Abstract

Detection and Monitoring of Outcome Reporting Changes Using a Large Language Model: Application to FDA-Regulated Drug Trials

Ian Bulovic,¹ Susmitha Wunnava,^1,2 Wonjin Yoon,^1,3 Adam G. Dunn,^1,4 Florence T. Bourgeois,^1,2,3 Timothy Miller^1,2,3

Objective

Selective inclusion and reporting of clinical trial outcomes may result in biases in the medical evidence on benefits and harms of treatments.¹ ClinicalTrials.gov provides the opportunity to identify changes between prespecified outcomes and those eventually reported in publications, but linkage and comparison of these outcomes requires extensive manual curation, precluding studies of more than a few hundred trials.^2,3 ClinicalTrials.gov archives all versions of registration records as changes are made by investigators, providing the basis to monitor trial outcome changes over time if the information could be efficiently processed. Our objective was to use a large language model (LLM) to estimate changes to primary outcomes at scale.

Design

OpenAI’s Gpt-4o-2024-08-06 model was prompted (without training) to identify trials with meaningful changes in specified primary outcomes between trial registrations at study start and in the final record (prompts and scripts used for processing data available at https://github.com/Machine-Learning-for-Medical-Language/outcome-switching. Meaningful changes were defined as addition, removal, or substantial modification to primary outcomes. Substantial modifications included changes to outcome measures, measurement procedures, or measurement time frame. The model performance was evaluated using a set of 100 manually labeled and adjudicated records, showing F1 scores of 0.88 for outcome addition, 0.81 for outcome removal, and 0.67 for outcome modification (F1 scores range from 0 to 1 and balance sensitivity and positive predictive value). The cohort consisted of interventional trials studying US Food and Drug Administration (FDA)–regulated drugs that were prospectively registered and started January 1, 2008, to December 31, 2022 (allowing ≥2 years’ follow-up). Variables of interest (start year, therapeutic area, industry funding, randomization, and enrollment size) were selected a priori and extracted from registration records using structured data available for download from ClinicalTrials.gov. Univariate analyses and multivariate logistic regression were performed to assess for associations between outcome changes and variables of interest.

Results

Among 27,227 trials studying FDA-regulated drugs, 11,078 (40.7%) had changes to primary outcomes after trial start. This included 3850 trials (14.1%) with addition of an outcome, 3123 (11.5%) with removal of an outcome, and 9105 (33.4%) with substantial outcome modifications. Trials with industry funding were significantly more likely to have an outcome change (47.7% vs 32.9%; P < .001) (Table 25-1101). This difference was observed for each of the outcome change types, including addition (17.4% vs 10.4%; P < .001), removal (12.9% vs 9.9%; P < .001), and modification (39.7% vs 26.4%; P < .001). Multivariate analysis demonstrated reduction in outcome changes over time (P < .001) and a positive association with industry funding (P < .001).

Conclusions

LLMs can be used to monitor for changes in clinical trial outcomes efficiently and at scale using information available in ClinicalTrials.gov. Outcome changes in trials of FDA-regulated drugs are common and associated with industry funding but have decreased over time. Future work should explore use of LLMs to identify outcome changes between trial registrations and published articles.

References

1. Turner EH, Cipriani A, Furukawa TA, Salanti G, de Vries YA. Selective publication of antidepressant trials and its influence on apparent efficacy: updated comparisons and meta-analyses of newer versus older trials. PLoS Med. 2022;19(1):e1003886. doi:10.1371/journal.pmed.10038862

2. Wang A, Menon R, Li T, et al. Has the degree of outcome reporting bias in surgical randomized trials changed? a meta-regression analysis. ANZ J Surg. 2023;93:76-82. doi:10.1111/ans.182733

3. Hinkel J, Heneghan C, Bankhead C. Selective outcome reporting in cancer studies: a scoping review. medRxiv. 2024:07.02.24309826. doi:10.1101/2024.07.02.24309826

Affiliations

¹Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, US, florence_bourgeois@hms.harvard.edu; ²Harvard-MIT Center for Regulatory Science, Harvard Medical School, Boston, MA, US; ³Department of Pediatrics, Harvard Medical School, Boston, MA, US; ⁴Biomedical Informatics and Digital Health, Faculty of Medicine and Health, University of Sydney, Sydney, Australia.

Conflict of Interest Disclosures

None reported.

Funding/Support

National Library of Medicine, National Institutes of Health (R01LM012976, R01LM012973).

Role of the Funder/Sponsor

The funder played no role in the design or reporting of the study.

Additional Information

Florence T. Bourgeois and Timothy Miller are co–senior authors.

International Congress on
Peer Review and Scientific Publication

Enhancing the quality and credibility of science

Abstract