Article Text

Download PDFPDF

Rapid reviews methods series: Guidance on assessing the certainty of evidence
  1. Gerald Gartlehner1,2,
  2. Barbara Nussbaumer-Streit1,
  3. Declan Devane3,4,
  4. Leila Kahwati2,
  5. Meera Viswanathan2,
  6. Valerie J King5,
  7. Amir Qaseem6,
  8. Elie Akl7,
  9. Holger J Schuenemann8,9
  10. on behalf of the Cochrane Rapid Reviews Methods Group
  1. 1 Department for Evidence-based Medicine and Evaluation & Cochrane Austria, University of Krems, Krems, Austria
  2. 2 RTI-UNC Evidence-based Practice Center, RTI International, Research Triangle Park, North Carolina, USA
  3. 3 School of Nursing and Midwifery, University of Galway, Galway, Ireland
  4. 4 Evidence Synthesis Ireland & Cochrane Ireland, University of Galway, Galway, Ireland
  5. 5 Center for Evidence-based Policy, Oregon Health and Science University, Portland, Oregon, USA
  6. 6 American College of Physicians, Philadelphia, Pennsylvania, USA
  7. 7 American University of Beirut, Beirut, Lebanon
  8. 8 Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada
  9. 9 Department of Biomedical Sciences, Humanitas University, Humanitas University, Pieve Emanuele, Italy
  1. Correspondence to Dr Gerald Gartlehner, Department for Evidence-based Medicine and Evaluation, University of Krems, 3500 Krems, Niederösterreich, Austria; Gerald.Gartlehner{at}donau-uni.ac.at

Abstract

This paper is part of a series of methodological guidance from the Cochrane Rapid Reviews Methods Group. Rapid reviews (RRs) use modified systematic review methods to accelerate the review process while maintaining systematic, transparent and reproducible methods. This paper addresses considerations for rating the certainty of evidence (COE) in RRs. We recommend the full implementation of GRADE (Grading of Recommendations, Assessment, Development and Evaluation) for Cochrane RRs if time and resources allow.

If time or other resources do not permit the full implementation of GRADE, the following recommendations can be considered: (1) limit rating COE to the main intervention and comparator and limit the number of outcomes to critical benefits and harms; (2) if a literature review or a Delphi approach to rate the importance of outcomes is not feasible, rely on informal judgements of knowledge users, topic experts or team members; (3) replace independent rating of the COE by two reviewers with single-reviewer rating and verification by a second reviewer and (4) if effect estimates of a well-conducted systematic review are incorporated into an RR, use existing COE grades from such a review. We advise against changing the definition of COE or the domains considered part of the GRADE approach for RRs.

  • Methods

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study. No data are available.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • Compared with systematic reviews, rapid reviews (RRs) often do not formally rate the certainty of evidence. As a consequence, certainty of evidence ratings in RR are missing or difficult to interpret.

WHAT THIS STUDY ADDS

  • This paper presents considerations and recommendations for accelerating the use of Grading of Recommendations, Assessment, Development and Evaluation to rate the certainty of evidence.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • RRs vary in scope and timelines. Any decisions regarding shortcuts when rating the certainty of evidence should consider the context of the entire review process.

Introduction

This paper is part of a series from the Cochrane Rapid Reviews Methods Group providing methodological guidance for rapid reviews (RRs).1–3 In recent years, RRs have become a widely used type of knowledge synthesis to support urgent, time-sensitive decisions. An RR is a type of evidence synthesis that brings together and summarises information from different research studies to produce evidence for people such as the public, researchers, policy makers and funders in a systematic, resource-efficient manner.4 RRs apply modified systematic review methods to accelerate the review process and complete a review rapidly while maintaining systematic, transparent and reproducible methods.5

Assessments of the trustworthiness of available evidence and confidence in effect estimates are key components of any evidence synthesis, including RRs. The goal is to provide knowledge users with transparent, well-reasoned judgements about reviewers’ confidence in the evidence underpinning the effects of interventions.6 Researchers have proposed various methods to assess the certainty of a body of evidence.7 For systematic reviews, Cochrane recommends the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach,8 which uses the term certainty of evidence (COE; also known as quality or strength of evidence) to describe the level of confidence that investigators have in the estimates of treatment effects. Internationally, GRADE has become the most widely adopted tool for rating COE because it requires transparent and explicit judgements by reviewers. In addition, the GRADE Working Group provides extensive training resources and GRADEpro (https://www.gradepro.org), an open-access software tool, which helps author teams apply GRADE in a standardised manner.9 For these reasons, the Cochrane Rapid Reviews Methods Group also recommends using GRADE for rating the COE in RR products.

This paper presents considerations and recommendations on how to accelerate COE rating for RRs of interventions. It elaborates on the brief guidance by the Cochrane Rapid Reviews Methods Group5 and its update10 and provides more detailed recommendations on how rating the confidence in treatment effects can be accelerated when conducting RRs. Because RRs vary in scope and timelines, these recommendations should be viewed as guidance, which can be used when time or other resource constraints do not allow the full implementation of GRADE for COE. Our recommendations pertain primarily to the process of how GRADE is applied and not to the GRADE approach itself.

In the following sections, we start with general considerations on improving efficiency when rating COE for RRs and then list the GRADE processes that should remain unchanged to maintain consistency with the general GRADE approach. We then describe ways to accelerate the application of GRADE and discuss those cases that use existing systematic reviews or network meta-analyses (NMA) to inform GRADE assessments.

It is important to emphasise, however, that if time and other resources permit, we encourage investigators conducting RRs to use the full GRADE approach as recommended for Cochrane systematic reviews.8 Table 1 provides an overview of recommendations; the following sections discuss each recommendation in more detail.

Table 1

Recommendations for rating the certainty of evidence for rapid reviews (RRs)

General considerations about increasing efficiency when rating the COE

As outlined in paper 1 of this series,11 reviewers conducting RRs should work closely with knowledge users to refine research questions, develop inclusion and exclusion criteria, and identify comparisons and outcomes of interest. Involving knowledge users is particularly relevant for rating the COE because the choice of interventions, comparators and outcomes determines the framework for rating the COE. Knowledge users can help reviewers choose interventions, comparators and outcomes that are most important for decision making, thereby limiting the number of outcomes that need to be graded. Paper 1 of this series discusses the best ways to engage knowledge users.11

We also recommend using GRADEpro, an open-access software tool for rating the COE in RRs.9 GRADEpro helps investigators apply GRADE in a standardised manner, automatically saves data and provides various output styles for Summary of Findings tables, thereby improving the efficiency of production of the review. GRADEpro might not be relevant if reviewers adopt COE assessments from well conducted, existent systematic reviews that are incorporated into an RR.

Recommendations to maintain consistency with GRADE

To save time when rating the COE, authors of RRs sometimes modify GRADE. For example, authors may rate the COE on a study-level across all outcomes, merge the categories ‘low’ and ‘very low’ into a single category, or withhold information that offers insight into their decisions to uprate or downrate evidence. GRADE, however, has become an internationally established system that many investigators and stakeholders are familiar with and can interpret. Therefore, we recommend that the following four attributes of GRADE should not be changed when using GRADE for RRs:

  1. The definition of COE, including the recommended number of categories (grades) expressing COE.

  2. The domains that determine the COE of an outcome.

  3. The approach that COE is assessed at an outcome level and not at the study level for a given intervention and comparator.

  4. The use of Summary of Findings tables (and Evidence Profiles) with explanatory footnotes to ensure transparency in the domain judgements used to generate the COE.

Definition and grades of COE

For authors of systematic reviews, GRADE defines COE as ‘…the extent to which we are confident that an estimate of the effect is correct’.12 This definition assumes that the true effect lies within a particular range of the estimated effect (usually the confidence or credible interval).12 Although COE represents a continuum, GRADE uses four categories of COE (see table 2).

Table 2

GRADE approach to rating the certainty of evidence12 14

Domains that determine the COE

GRADE distinguishes between randomised and non-randomised studies contributing to a body of evidence. Randomised trials start at high COE and non-randomised studies at low COE. When using ROBINS-I (Risk Of Bias In Non-randomised Studies of Interventions)13 as a tool to assess the risk of bias of non-randomised studies, both randomised and non-randomised studies start at high COE.14 GRADE takes five domains into consideration that can lower COE ratings and three domains that can increase COE ratings (two domains if ROBINS-I is used, see table 2).14 We recommend that these domains remain unchanged when reviewers apply GRADE because the domains are interlinked. Modifications may confuse users of evidence syntheses15 who are familiar with the GRADE approach.

Rating the COE at an outcome level

GRADE assesses the trustworthiness of the available evidence separately for individual outcomes because the COE may differ from one outcome to another within the same body of evidence. We recommend keeping this approach because it provides the necessary granularity for guideline panels and decision-makers who need to consider individual outcomes critical for decision making.

Summary of Findings tables with footnotes

Evidence profiles are the backbone of Summary of Findings tables. They provide a concise presentation of key information needed by users of RRs to inform decisions.16 17 A Summary of Findings table includes the same information as an evidence profile but is intended for a knowledge user audience.18 Authors produce an evidence profile that they then transform into a Summary of Findings table, often by using the GRADEpro software.9 For each outcome within a given intervention/control comparison, Summary of Findings tables provide information about the number of studies and participants, a measure of the control group risk (or a mean value for a continuous outcome), relative and absolute effect estimates with the intervention, and the COE for each outcome. Explanatory footnotes summarise reasons for downrating or uprating the COE. We recommend that investigators conducting RRs present results in Summary of Findings tables for the most important comparisons because the uniform nature of these tables has high familiarity for knowledge users. Additionally, explanatory footnotes enhance the transparency of COE rating judgements.

Recommendations to accelerate GRADE application

Selecting outcomes and rating their importance for decision making

The workload for investigators rating the COE is largely determined by the number of outcomes that are rated for each comparison of interest. Because not all comparisons and outcomes are equally important for decision making and rating the COE can be labour-intensive, prioritising which comparisons and outcomes are most important for decision making is crucial for the efficiency of an RR. Any RR should be based on a research protocol in which comparisons and outcomes of interest are prespecified. Involving knowledge users can help determine the most important comparisons.11 For example, in an RR assessing the efficacy and risk of harms of novel treatments for COVID-19, comparisons with placebo might be more important than comparisons with other active treatments, even if both are included in an RR.

Likewise, not all outcomes of interest are equally important for decision-making. Ideally, expert panels, knowledge users, people living with the condition or consumer representatives would identify the relative importance of outcomes using a Delphi approach or other methods to reach consensus.19 Alternatively, GRADE recommends conducting a literature review of studies that assessed the importance of outcomes for decision making. Because of the time-sensitive nature of RRs, such formal methods are often not possible. As an alternative for RRs, we advise to rely on judgements of knowledge users, topic experts (including people living with the condition), or team members to prioritise outcomes and select those that are most important for decision making (and which subsequently should be graded). The GRADE-suggested approach of using a numerical scale from 1 to 9 (7 to 9—critical; 4 to 6 —important; 1 to 3—of limited importance) can still be used.19 Such ratings can be implemented quickly with online survey tools.

For systematic reviews, GRADE guidance recommends choosing no more than seven outcomes (a pragmatic suggestion based on the collective experience of the GRADE Working Group) for which reviewers rate the COE. These can include outcomes that were regarded as critical or important for decision making. For RRs, reviewers may wish to limit further the number of outcomes with a focus on those most important for decision making. For example, if appropriate, outcomes for rating could be limited to the two most important outcomes reflecting benefits and the two most important outcomes reflecting harms. Regardless of the number of outcomes reviewers choose to rate the COE, the choice should reflect both beneficial and harmful effects of a given intervention or management strategy. Another way that can help restrict the number of outcomes is to include only outcomes critical for decision making, and not outcomes rated as simply important for decision making, if both benefits and harms have been addressed. However, this may not be feasible if many outcomes are deemed critical. Sometimes choosing broader, more general outcomes can be efficient for RRs. For example, rating the proportion of study participants who experienced adverse events might be a more efficient choice than rating the COE of individual, specific adverse events. It is also important to keep in mind that outcomes which reviewers do not rate, can still be included in the review.

Applying GRADE and quality assurance

GRADE guidance recommends that two reviewers independently assess the COE of each relevant outcome. In case of differences in judgements regarding the uprating or downrating of individual domains that determine the COE, investigators need to achieve consensus by discussion or involve a third reviewer to obtain a consensus.20 For RRs subjected to time constraints, we recommend that a single reviewer conducts COE ratings. A second, senior reviewer with experience or formal training in rating COE, should then check the rating decisions and their rationales. For example, after a single investigator has finished COE ratings in GRADEpro,9 the second investigator should review rating decisions and explanations for uprating or downrating by ensuring that these decisions align with current GRADE guidance but without assessing the underlying evidence for each outcome. An important prerequisite for this approach is that both first and second reviewers have experience with GRADE. The first reviewer needs to provide a clear rationale for each decision to up- or downrate the COE so that the second reviewer can assess the rationale for these decisions.18 The best way to provide this type of transparency is to use explanatory footnotes (see the Summary of Findings tables section). Well-formulated explanatory footnotes provide clarity and support the understanding of COE judgements.

Recommendations when using existing systematic reviews or NMA

One approach to RR production is to include and, if necessary, update existing systematic reviews that address parts of a key question of the RR. For example, an existing systematic review might address the benefits and harms of one of several interventions of interest. If such a review is methodologically robust and used GRADE, results and COE can be considered for use in the RR. Choosing the best review can be challenging if several existing systematic reviews are available. Paper 421 of this series addresses how to deal with multiple existing reviews that could potentially be included in an RR. If literature searches of an existing review need to be updated, grades of COE might need to be revised. Paper 2 of this series addresses how literature searches of existing reviews can be updated efficiently.22

A special case of using existing systematic reviews includes NMA. They are an increasingly common analytic tool when direct comparison evidence is sparse or missing. NMAs derive statistical effect estimates from direct (ie, from studies with direct head-to-head comparisons) and indirect (ie, using studies with a common comparator) evidence.23 An NMA often compares the efficacy and safety of multiple interventions, sometimes rendering dozens of comparisons. GRADE recommends a four-step approach to rating the COE of results from NMAs: (1) present the direct and indirect estimates of effect for the comparison, (2) rate the COE of both of these estimates, (3) present the network estimate for the comparison and (4) rate the COE of the network estimate, based on the ratings of the direct and indirect estimates and the assessment of coherence (ie, extent of similarity of direct and indirect estimates).24 25 Overall, this approach can become a labour-intensive task if the number of comparisons is large. In a best-case scenario, a published NMA already rates the COE following GRADE guidance. In such situations, we recommend using the COE grades from the published NMA if authors adhered to GRADE guidance, just as for any methodologically robust systematic review. If authors of the NMA, however, did not rate the COE or did not follow GRADE guidance, we recommend the following approach:

  1. If the NMA presents both direct and indirect estimates of effect, rate the COE of the direct estimate following standard GRADE guidance. Further downrate for indirectness if direct and indirect estimates differ substantially (ie, if there is incoherence). This deviates from the GRADE guidance by not separately rating indirect evidence.

  2. If the NMA presents only an indirect estimate of effect, use standard GRADE guidance but also downgrade for indirectness. This deviates from GRADE guidance in a way that comparisons contributing to an indirect estimate are not graded separately, but rather globally.

Another approach to rate the COE of effect estimates from NMAs is the Confidence In Network Meta-Analysis (CINeMA) web application.26 While CINeMA can indeed be quite useful in rating the COE of NMA, CINeMA’s usefulness for RRs remains limited. This is because the study data for an NMA, needed by CINeMA, are not always readily available in NMA publications.

Conclusions

RRs vary in their scopes and timelines. Therefore, any decisions regarding COE rating shortcuts should consider the context of the entire review process. In this paper, we present recommendations that may accelerate GRADE application to rate the COE. The largest potential to save time probably lies in using COE grades from well-conducted systematic reviews and NMAs, limiting which comparisons require COE grades, limiting the number of outcomes that are graded, and rating COE by a single reviewer whose decisions are reviewed by a second reviewer. If time and resources permit, however, we strongly encourage investigators conducting RRs to use the full GRADE approach. We advise against changing the definition of COE, or the domains considered as part of the GRADE approach.

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study. No data are available.

Ethics statements

Patient consent for publication

Acknowledgments

We gratefully acknowledge Petra Wellemsen for project administration and formatting, Isolde Sommer and Chris Kamel for internal peer review, and Marialena Trivella for ensuring consistency with other papers of this series.

References

Footnotes

  • Twitter @drvalking

  • Contributors GG, BN-S, DD, LK, VJK,MV and HJS contributed to the conceptualisation of this paper. GG and BN-S wrote the first draft of the manuscript. All authors critically reviewed and revised the manuscript. GG is the guarantor.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests GG and EA are members of the GRADE Working Group; HJS is co-chair of the GRADE Working Group; DD works part time for Cochrane Ireland and Evidence Synthesis Ireland, which are funded within the University of Ireland Galway (Ireland) by the Health Research Board (HRB) and the Health and Social Care, Research and Development (HSC R&D) Division of the Public Health Agency in Northern Ireland.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Linked Articles