Article Text

Download PDFPDF

Rating the certainty in evidence in the absence of a single estimate of effect
  1. M Hassan Murad1,
  2. Reem A Mustafa2,3,
  3. Holger J Schünemann3,
  4. Shahnaz Sultan4,
  5. Nancy Santesso3
  1. 1 Evidence-based Practice Center, Mayo Clinic, Rochester, Minnesota, USA
  2. 2 Department of Biomedical and Health Informatics, University of Missouri-Kansas City, Kansas City, Missouri, USA
  3. 3 Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario, Canada
  4. 4 Division of Gastroenterology, Department of Medicine, University of Minnesota, and Minneapolis Veterans Affairs Health Care System, Minneapolis, Minnesota, USA
  1. Correspondence to Dr M Hassan Murad
    , Evidence-based Practice Center, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA; murad.mohammad{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Practitioners of evidence-based medicine need to know the level of certainty in the evidence they are applying to patient care. Whether they are using recommendations from a clinical practice guideline based on a systematic review of the literature or using the results directly from a systematic review, they need to know how trustworthy the evidence is with regard to the benefits and harms of a treatment or a diagnostic test. This construct is called certainty or quality of evidence. The GRADE (Grading of Recommendation, Assessment, Development and Evaluation) approach is a modern framework for rating the certainty in evidence.1 Using GRADE, randomised controlled trials and observational studies are considered to generate high and low certainty evidence, respectively. This initial grade that is based on study design is modified using several key domains such as the methodological limitations of the studies, indirectness of the evidence to the question at hand, imprecision of estimates, inconsistency of the evidence, and the likelihood of publication bias. A body of evidence about a specific outcome is downgraded or upgraded to a final rating of high, moderate, low or very low. High certainty in evidence means that the investigators are very confident that the effect they found across studies is close to the true effect, and very low means that they have very little confidence in the effect.1

Often, a single pooled effect estimate from a meta-analysis is available and is used for assessing the certainty in evidence. However, when studies measure outcomes differently or report outcomes in ways that cannot be standardised and meta-analysed, or in situations of urgency, only a narrative synthesis might be available. Consider a systematic review of self-management programmes in patients with chronic obstructive pulmonary disease.2 There were five randomised trials that informed the effect of the intervention on respiratory symptoms. The individual studies presented their results using different tools and measures which precluded pooling. Two trials3 ,4 used the Borg scale to assess respiratory symptoms. One of these trials3 also presented results for respiratory symptom severity. The third trial5 presented results as the proportion of days rated in patients' diaries as having mild, moderate or severe respiratory symptoms. The fourth trial6 presented results about mean breathlessness and sputum production scores over 2-week periods and the fifth trial7 presented results as breathlessness, sputum volume and sputum colour during exacerbations. These studies could not be pooled, but the evidence could be summarised narratively. There is some guidance about how to synthesise the effects of interventions narratively.8 Guidance about how to grade certainty in this evidence is needed. A judgement on the certainty in evidence is still required because certainty is a key component of decision-making. Providing decision makers (patients, clinicians and policymakers) with evidence of unknown trustworthiness compromises their ability to transform evidence to action.9 Decision makers would want to know how confident we are in the effect of these programmes to improve respiratory symptoms before offering such programmes to patients with chronic obstructive pulmonary disease.

The approach

We provide suggestions on the use of GRADE to rate the certainty of evidence when a meta-analysis has not been performed, and instead a narrative summary of the effect was provided. The approach leverages the meaning of the constructs that represent GRADE domains to produce judgements on how these constructs affect our certainty. In table 1, we explain how the GRADE domains (methodological limitations of the studies or risk of bias, indirectness, imprecision, inconsistency and the likelihood of publication bias) can be applied without a single pooled estimate. Note that this guidance does not address meta-narrative reviews10–13 (which answer questions about conceptual underpinnings and understanding of a phenomenon) or qualitative systematic reviews14 (which summarise themes from focus groups and interviews); rather, we address evidence synthesis of quantitative estimates of effect not amenable to meta-analysis (and thus summarised narratively).

Table 1

Applying the GRADE approach when evidence for an effect is summarised narratively (a meta-analysis is not available)


In table 2, we again refer to the systematic review of self-management programmes in patients with chronic obstructive pulmonary disease2 and illustrate how we applied the GRADE approach. The outcome of interest in this table is respiratory symptoms which were not pooled in meta-analysis. Evidence derived from five randomised trials showed small to no reductions in respiratory symptoms and was judged to warrant low certainty (rated down for methodological limitations of the included studies and inconsistency). Based on this assessment, decision makers can conclude that self-management programmes may slightly reduce respiratory symptoms. This evidence could also be presented to decision makers in a summary of findings table (typically used in guideline development and generated using GRADEpro which allows narrative summaries of the evidence; Table 3 shows one row of a summary of findings table with explanatory notes. The certainty of evidence in table 3 summarises the GRADE judgements about the different domains (all detailed in table 2) that collectively determined the certainty in evidence for one outcome (respiratory symptoms).

Table 2

Illustrative example of rating the certainty in evidence in the absence of a single estimate of effect

Table 3

Illustrative example of how the summary of findings can be presented to guideline developers


Evidence-based practice is founded on making decisions using the best available evidence, whether it is based on a pooled single effect estimate, or on a narrative review of the individual studies informing each outcome. Stakeholders require that such evidence is appraised and the certainty in the effect is determined in order to inform decision-making. One of the greatest strengths of the GRADE approach is that it provides a systematic method to assess the certainty in evidence and a transparent documentation of the judgements used to assess the body of evidence. While typically it is thought to only apply to results that have been statistically aggregated, evaluating the certainty of evidence can also be performed when results have been narratively summarised.16 In this setting, some certainty domains can be applied directly. For other domains, we have provided additional guidance in which the meaning and connotation of those domains can be used. Taken together, an overall assessment of the evidence can be determined. Stakeholders engaged in shared decision-making in a patient–physician dyad, in guidelines development, or in public health and policy, can then use the summarised effect and the certainty in the evidence to make informed decisions.



  • Competing interests None declared.

  • Provenance and peer review Not commissioned; internally peer reviewed.