Abstract
Anonymizing Reviewers to Each Other in Peer Review Discussions: A Randomized Controlled Trial
Charvi Rastogi,1 Xiangchen Song,2 Zhijing Jin,3,4 Ivan Stelmakh,5 Hal Daumé III,6 Kun Zhang,2 Nihar B. Shah2
Objective
Many peer-review processes in computer science involve reviewers submitting independent reviews followed by a discussion between reviewers of each article on a typed forum (online discussion board). A key policy question is whether reviewers should remain anonymous to each other. This study investigated 7 research questions (RQs): RQ 1. Do reviewers discuss more when anonymous to each other or not? RQ 2. Are decisions closer to senior or junior reviewers’ opinions across conditions? RQ 3. Are reviewers more polite when not anonymous? RQ 4. Do self-reported reviewer experiences differ? RQ 5. Do reviewers prefer one condition? RQ 6. What factors do reviewers consider important in this policy decision? RQ 7. Have reviewers experienced dishonest behavior when their identity is revealed to other reviewers?
Design
A randomized controlled trial was conducted in the Conference on Uncertainty in Artificial Intelligence (UAI) in 2022, where full articles (not abstracts) were reviewed. Reviewers and articles were randomly assigned to either a condition where reviewer identities were hidden from each other or one where they were visible. Reviewers were then matched to articles within each condition using a semiautomated procedure.1 An anonymous survey of reviewers was also administered. The following measurements were made: RQ 1. Average posts per reviewer-article pairs were compared; test statistic: difference across conditions. RQ 2. Test statistic: difference in the fraction of articles where the reviewer closest to the final decision was senior. RQ 3. Politeness was scored 1 to 5 using a locally deployed large language model2 with few-shot prompting; scores were averaged across iterations and paraphrased prompts; test statistic: normalized Mann-Whitney U test. RQ 4. Reviewers rated 5 aspects of their experience on a 5-point Likert scale; differences across conditions were tested using a normalized Mann-Whitney U test. RQ 5. Reviewers rated overall preference on a 5-point scale mapped from −2 to 2; test statistic: Cohen d. RQ 6. Reviewers rated the importance of 6 factors in deciding on anonymity policy, each from 1 (least important) to 6. RQ 7. Reviewers reported experience of any dishonest behavior due to reviewer identities being visible to other reviewers, with checkboxes “Yes, in UAI 2022,” “Yes, in another venue,” “Not sure,” and “No.” The test statistics also served as a measure of the effect sizes. P values were computed via permutation testing.
Results
Overall, 322 papers were reviewed under the anonymous condition (116 accepted) and 310 papers under nonanonymous (114 accepted), with exactly 289 reviewers in both conditions. There were 611 discussion posts made by reviewers in the anonymous condition and 514 in the nonanonymous condition. The results for the 7 research questions are provided in Table 24-0812.
Conclusions
Small but significant differences favoring anonymous discussions were found. Subsequent computer science conferences have drawn on these findings for their policy choices, with a greater inclination toward anonymity in reviewer discussions.
References
1. Shah N. An overview of challenges, experiments, and computational solutions in peer review (extended version). July 7, 2025. Accessed July 16, 2025. https://www.cs.cmu.edu/~nihars/preprints/SurveyPeerReview.pdf
2. Chiang W, Li Z, Lin Z, et al. Vicuna: an open-source chatbot impressing GPT-4 with 90%* ChatGPT quality. LMSYSORG. March 2023. https://lmsys.org/blog/2023-03-30-vicuna/
1Google DeepMind, New York, NY, US; 2Carnegie Mellon University, Pittsburgh, PA, US, nihars@cs.cmu.edu; 3ETH Zurich, Zurich, Switzerland; 4Max Planck Institute, Tübingen, Germany; 5New Economic School, Moscow, Russia; 6University of Maryland, Baltimore, MD, US.
Conflict of Interest Disclosures
As per author affiliations above. In addition, Zhijing Jin is going to join the University of Toronto, and Kun Zhang has a partial appointment at Mohamed bin Zayed University of Artificial Intelligence.
Funding/Support
ONR N000142212181, NSF 1942124, 2200410, 2229881, NIH R01HL159805.
Role of Funder/Sponsor
The funders played no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the abstract; and decision to submit the abstract for presentation.
Acknowledgment
This work was conducted when Charvi Rastogi and Ivan St
