Comparison of Distributed Peer Review Enhanced by Machine Learning and Natural Language Processing and With Traditional Panel-Based Peer Review of Astronomy Proposals

Abstract

Comparison of Distributed Peer Review Enhanced by Machine Learning and Natural Language Processing and With Traditional Panel-Based Peer Review of Astronomy Proposals

Wolfgang Kerzendorf,¹ Ferdinando Patat,² Dominic Bordelon,³ Glenn van de Ven,⁴ Tyler Pritchard⁵

Objective

An ever-rising number of researchers (which increased by 15% between 2014 and 2018¹) has led to a substantial increase of publications and proposals in astronomy [that overwhelms traditional peer review systems. Distributed peer review (DPR) is a model that uses the pool of proposers as the pool of reviewers and has been suggested² and used in astronomyX.³ This was a quantitative study that explored a new language process–enhanced variation of the DPR scheme (named DeepThoughtDPR). This study tested whether a natural language, machine learning matching algorithm can identify knowledgeable reviewers and whether the DPR system has a similar interreviewer reliability compared with the traditional system.

Design

This was a cross-sectional analysis involving astronomers who submitted telescope time proposals for European Southern Observatory Period 103 on September 27, 2018; in all, 172 volunteers submitted their grants specifically to the machine learning–enhanced DPR. Each proposal was reviewed by 8 peers, resulting in the same number of proposals needing to be reviewed by each proposer; this resulted in 1336 unique reviews (data set available at https://zenodo.org/record/2634598). Next, 112 randomly chosen volunteers from among the proposers reviewed proposals that were matched via an algorithm. The volunteers were given 4 “best match” proposals, 2 “median match” proposals, and 2 “worst match” proposals but were not told about this selection. The 112 volunteers were also asked to report their expertise on each proposal, and the data were used to test the matching algorithm. The other 60 volunteers used an essentially random selection scheme. The DPR scheme was compared with traditional panel-based review on the basis of interreviewer reliability by using 15,000 existing reviews from traditional panel-based review. The comparison tested in which quartile the proposal was ranked by independent groups of reviewers within the same review process.

Results

Figure 21 shows the conditional probability calculated from 820 reviews. The volunteers reported they had “no knowledge” for the “worst match” proposals in 78% of cases, had “general knowledge” for the “median match” proposals in 35% of cases, and considered themselves “expert” for the “best match” proposals in 52% of cases. The match showed a Spearman rank coefficient of 0.64. The comparison of the interreviewer reliability between the traditional panel-based approach and the DPR approach showed a maximum difference of 5% when a quartile agreement matrix scheme was used.

Conclusions

The reviewer-matching algorithm had a high probability of identifying cases of “no knowledge.” The lower probabilities of “expert” and “general knowledge” may be ascribed to self-efficacy but need to be tested in future work. The DPR approach was similar to the traditional panel-based approach. Additional studies with larger numbers of participants are planned.

References

1. Lewis J, Schneegans S, Straza T. UNESCO Science Report: The Race Against Time for Smarter Development. Vol 2021. UNESCO Publishing; 2021

2. Merrifield MR, Saari DG. Telescope time without tears: a distributed approach to peer review. Astronomy and Geophysics. 2009;50(4):4.16-4.20. doi:10.1111/j.1468-4004.2009.50416.x

3. Andersen M, Chiboucas K, Geball T, et al. The Gemini Fast Turnaround program. In: Abstracts of the 233rd AAS Meeting; January 6-10, 2019; Seattle, Washington. Vol 233. American Astronomical Society; 2019:761

¹Michigan State University, East Lansing, MI, USA, wkerzend@msu.edu; ²European Southern Observatory, Garching bei München, Germany; ³University of Pittsburgh, Pittsburgh, PA, USA; ⁴University of Vienna, Vienna, Austria; ⁵New York University, New York, NY, USA

Conflict of Interest Disclosures

Wolfgang Kerzendorf is part of New York University, and the SNYU group is supported by US National Science Foundation CAREER awards AST-1352405 and AST-1413260. He was also supported by a European Southern Observatory (ESO) Fellowship and the Excellence Cluster Universe, Technische Universität München, for part of this work. Glenn van de Ven acknowledges funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program with grant agreement 724857 (consolidator grant ArcheoDyn).

Funding/Support

Wolfgang Kerzendorf was funded by the ESO, New York University, and Michigan State University. The other authors were funded by their respective institutions.

Acknowledgments

We thank the 167 volunteers who participated in the distributed peer review (DPR) experiment for their work and enthusiasm. We also thank M. Kissler-Patig for promoting the DPR experiment following his experience at Gemini; ESO’s director general, X. Barçons, and director for science, R. Ivison, for their support; and H. Schütze for several suggestions on the natural language processing. We thank J. Linnemann for help with some of the statistics tests. We acknowledge the help of Michael Berkwits and Leah Dickstein for the specific questions of norms in the medical community. Wolfgang Kerzendorf thanks the Flatiron Institute.

Disclaimer

This abstract is the result of independent research and is not to be considered as expressing the position of the ESO on proposal review and telescope time allocation procedures and policies.

International Congress on
Peer Review and Scientific Publication

Enhancing the quality and credibility of science

Abstract