Original Article
The GRADE approach is reproducible in assessing the quality of evidence of quantitative evidence syntheses

https://doi.org/10.1016/j.jclinepi.2013.02.004Get rights and content

Abstract

Objective

We evaluated the inter-rater reliability (IRR) of assessing the quality of evidence (QoE) using the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) approach.

Study Design and Setting

On completing two training exercises, participants worked independently as individual raters to assess the QoE of 16 outcomes. After recording their initial impression using a global rating, raters graded the QoE following the GRADE approach. Subsequently, randomly paired raters submitted a consensus rating.

Results

The IRR without using the GRADE approach for two individual raters was 0.31 (95% confidence interval [95% CI] = 0.21–0.42) among Health Research Methodology students (n = 10) and 0.27 (95% CI = 0.19–0.37) among the GRADE working group members (n = 15). The corresponding IRR of the GRADE approach in assessing the QoE was significantly higher, that is, 0.66 (95% CI = 0.56–0.75) and 0.72 (95% CI = 0.61–0.79), respectively. The IRR further increased for three (0.80 [95% CI = 0.73–0.86] and 0.74 [95% CI = 0.65–0.81]) or four raters (0.84 [95% CI = 0.78–0.89] and 0.79 [95% CI = 0.71–0.85]). The IRR did not improve when QoE was assessed through a consensus rating.

Conclusion

Our findings suggest that trained individuals using the GRADE approach improves reliability in comparison to intuitive judgments about the QoE and that two individual raters can reliably assess the QoE using the GRADE system.

Section snippets

Background

The Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) working group includes guideline developers, systematic reviewers, clinicians, public health officers, researchers, methodologists, and other health professionals from around the world [1]. The group has spent over a decade developing and refining a systematic, transparent, and explicit process for summarizing, grading, and presenting evidence, and for moving from evidence to health care recommendations. More than

Design

Participants initially worked independently as individual raters assessing the QoE. Once individual raters submitted their judgments about QoE, we randomly paired them with another rater. We asked each pair to discuss their independent ratings and resolve discrepancies by discussion before submitting their final consensus judgment. Both individual raters and pairs of raters worked independently from other raters who evaluated the same evidence. This design allowed us to test the effect of

Results

Twenty-seven members of the GRADE working group and 10 students from the McMaster University HRM graduate program agreed to participate in the study. Fifteen of the 27 GRADE working group members and the 10 students served as raters and completed the assessment of all 16 outcomes.

Table 1 summarizes the baseline characteristics and previous experiences of the raters. Eight of 15 members in the GRADE working group were involved in developing aspects of the GRADE approach. Thirteen raters (nine

Discussion

We found substantial IRR when two individual raters assessed the QoE of 16 outcomes from four systematic reviews. The IRR of the GRADE approach using a GRADE-naive group (students) improved significantly after the calibration exercises from slight agreement (0.11) to substantial agreement (0.66). IRR using the GRADE approach and raters that were familiar with the approach was already high initially (0.62) and improved only slightly (0.72) after the calibration exercises. IRR was similar among

Acknowledgments

Competing interests: No financial competing interest.

Some authors are involved in the development and dissemination of GRADE, and GRADE's success has a positive influence on their academic career.

Authors' contributions: H.J.S. conceived of the study. R.A.M. and H.J.S. designed the study. J.B., E.A.A., S.D.W., R.C., and G.H.G. contributed to the conception and design. R.A.M., G.N., and M.K. performed the statistical analysis. R.A.M. and H.J.S. drafted the manuscript. All the coauthors critically

References (23)

  • G.H. Guyatt et al.

    GRADE: an emerging consensus on rating quality of evidence and strength of recommendations

    BMJ

    (2008)
  • Cited by (266)

    View all citing articles on Scopus
    View full text