A Scale for the Assessment of Non-systematic Review Articles (SANRA)

Christopher Baethge,1,2

Sandra Goldbeck-Wood,3,4 Stephan Mertens1


To revise the Scale for the Assessment of Non-systematic Review Articles (SANRA), an instrument developed to help editors, reviewers, and researchers assess the quality of non-systematic review articles, and to test it in a larger number of manuscripts.


A team of 3 journal editors modified items in an earlier SANRA version based on face validity, item-total correlations, and reliability scores from previous tests, and deleted an item addressing a manuscript’s writing and accessibility because ratings differed considerably. The revised scale comprises 6 items scaled from 0 (low standard) to 2 (high standard) related to (1) justification of the review’s importance, (2) aims of the review, (3) literature search description, (4) adequacy of referencing, (5) presentation of levels of evidence, and (6) presentation of data central to the article’s argument. For all items we developed recommendations and examples to guide users filling out the instrument. The revised scale was tested by the same editors, blinded to each other’s ratings, in a group of 30 consecutive non-systematic review manuscripts submitted to Deutsches Ärzteblatt, a general medical journal, in 2015.


The mean (SD) sum score across the 30 manuscripts was 6.0 (2.6) [range, 1-12]. Corrected item-total correlations ranged from 0.33 (item 3) to 0.58 (item 6). Cronbach α = .68. The intraclass correlation coefficient (average measure) was 0.77 (95% CI, 0.57-0.88). Raters often disagreed on items 1 and 4. Raters confirmed that completing the scale in approximately 5 minutes is feasible in everyday editorial work and that it is easier to understand than the earlier version.


A revised 6-item version of SANRA, a rating scale for the assessment of non-systematic reviews, demonstrated interrater reliability, homogeneity of items, and internal consistency sufficient for a scale of 6 items. In comparison with earlier versions of the scale, the current version is shorter, is based on appropriate field tests, and is easier to use. Further testing of the scale’s validity (eg, expert ratings of manuscripts, citations, reviewer recommendations) is desirable, as is rater training based on recommendations and examples provided with the scale. The scale is intended to complement rather than replace journal-specific evaluation of manuscripts (eg, pertaining to audience, originality or difficulty) and may contribute to improving the standard of reporting of non-systematic reviews.

1Deutsches Ärzteblatt and Deutsches Ärzteblatt International, Cologne, Germany, baethge@aerzteblatt.de; 2Department of Psychiatry and Psychotherapy, University of Cologne Medical School, Cologne, Germany; 3Department of Obstetrics and Gynecology, Addenbrooke’s Hospital, Cambridge University, Cambridge, UK; 4Journal of Family Planning and Reproductive Health Care, London, UK

Conflict of Interest Disclosures:

None reported.