Abstract

Automated Interpretation of Statistical Tables to Assess Reporting Errors and Associations With Open Science Policies in Economics Journals

Stephan B. Bruns,1 Helmut Herwartz,2 John P. A. Ioannidis,3 Chris-Gabriel Islam,2 Fabian H. C. Raters2

Objective

The economics literature predominantly reports statistical values in data tables. This study developed a tool for extraction of statistical values in tables to identify reporting errors and to analyze the association between journal data and code availability policies and article characteristics.

Design

DORIS (Diagnosis of Reporting Errors in Scraped Tables)1,2 automatically extracts statistical values (eg, coefficients and SEs) from tables via web scraping in R to extract tables from HTML articles and text mining in Python to interpret data. This analysis included the top 50 economics journals that provided articles in HTML from 1998 to 2016. Tests were divided into 2 categories: main (first 3 table rows from main tables) and nonmain (all other rows from main tables and other tables). We used a staggered difference-in-differences design to assess the association between data and code availability policies and article outcomes3: reporting errors using a dummy for inconsistency altering significance level; statistical significance proxied by z-value, presence of visual indicators of statistical significance, and type of statistical value reported; logarithmized number of tests per article, tables with tests, and a dummy for appendix position; and citations measured by the logarithmized citation count for the first 7 years.

Results

The study included 578,132 statistical tests (median: 27 per table) from 15,725 tables (median: 4 per article) in 3746 articles (median: 76 per journal) that were published in 31 journals. Based on a sample of 3068 statistical tests, DORIS had a false discovery rate of 1.2%. Analysis revealed that 547 of 3677 articles (14.8%) had at least 1 reporting error in main tests altering the statistical significance of findings. The errors were prevalently oriented toward lowering P values and claiming statistical significance. Assessment of data and code availability policies included 21 journals and 535,838 tests. These policies were associated with slightly less emphasis on statistical significance (smaller z-values: −0.16 [90% CI, −0.33 to 0], P = .10; fewer statistical significance symbols: −4 [90% CI, −6 to −2] percentage points [pp]; P = .004; fewer tests reported with a focus on significance (t, z, or P values instead of SEs and CIs): −8 [90% CI, −13 to −3] pp; P = .004), more rigor in reporting (more tests: 11% [90% CI, −1% to 22%]; P = .12; more tables: 15% [90% CI, 7% to 23%]; P = .002; more appendix tables: 4 [90% CI, 2 to 7] pp; P = .009), and fewer reporting errors in nonmain tests (−0.2 [90% CI, −0.4 to 0] pp; P = .08), but there was no difference in citations (−9% [90% CI, −21% to 4%]; P = .25).

Conclusions

DORIS may be used in the peer review process in the economics literature to improve article quality and to generate large scale data for future meta-research. Data and code availability policies may help improve the reliability of published economics research.

References

1. Better Papers. Home page. Accessed June 12, 2025. https://betterpapers.org/#sec-faqs

2. Bruns SB, Herwartz H, Ioannidis JPA, Islam C-G, Raters FHC. Statistical reporting errors in economics. MetaArXiv. Preprint posted online September 01, 2023. doi:10.31222/osf.io/mbx62

3. Borusyak, K, Jaravel, X, Spiess, J. Revisiting event-study designs: robust and efficient estimation. Rev Econ Studies. 2024;91(6):3253-3285. doi:10.1093/restud/rdae007

1Hasselt University, Hasselt, Belgium, stephan.b.bruns@gmail.com; 2Georg August University Göttingen, Göttingen, Germany; 3Meta-Research Innovation Center at Stanford (METRICS), Stanford, CA, US.

Conflict of Interest Disclosures

John P. A. Ioannidis is a member of the Peer Review Congress Advisory Board but was not involved in the review or decision for this abstract. No other disclosures were reported.

Funding/Support

Funding was received from the German Research Foundation (DFG) under the project “Replications in Empirical Economics: Necessity, Incentives and Impact,” with follow up project “Selective Reporting and the Evolving Research Landscape in Economics” (project number: 405039391) and from Hasselt University within the framework of BOF BILA (BOF21BL08).

Role of Funder/Sponsor

DFG and BOF BILA funding helped in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the abstract; and decision to submit the abstract for presentation.