Article Text

Facilitating GRADE judgements about the inconsistency of effects using a novel visualisation approach
  1. Mohammad Hassan Murad1,
  2. Zhen Wang1,2,
  3. Yngve Falck-Ytter3
  1. 1Evidence-based Practice Center, Mayo Clinic, Rochester, MN, USA
  2. 2Health Care Policy and Research, Mayo Clinic Minnesota, Rochester, Minnesota, USA
  3. 3Case Western Reserve University, Cleveland, Ohio, USA
  1. Correspondence to Dr Mohammad Hassan Murad; murad.mohammad{at}mayo.edu

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Background

Inconsistency is a key domain that determines the certainty of evidence. The Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach specifically defines inconsistency as the variability in results across studies, and not variability in study characteristics, eligibility criteria or design.1 Statistical measures of heterogeneity are often used to assess inconsistency, however, major limitations of such measures have been described. For example, Cochran’s Q test for homogeneity is usually underpowered to detect heterogeneity. The I2 index which is the most commonly used measure, underestimate true statistical heterogeneity when there are fewer than 10 studies in a meta-analysis, which is a common scenario, and is correlated with the sample size of the included studies.2 The I2 index is also often misunderstood as an indicator of the spread of the effect size. Borenstein demonstrates how a meta-analysis with I2 index of 25% can have more spread of the effect size than a meta-analysis with I2 index of 75%.3 Therefore, GRADE guidance on inconsistency recommended less reliance on statistical measures and instead, instructed to make judgements about whether studies in a meta-analysis provide estimates that are clinically importantly different from each other.1 However, there are no existing tools to facilitate this process making it highly subjective. Users are instructed to look at a forest plot and evaluate the similarity of point estimates of the included studies and the overlap of their CIs, and make a judgement based on values that they consider clinically important. Merely counting studies does not work because some studies can be outliers but may have a very small weight within the pooled effect estimate. Having multiple thresholds makes this task even more difficult. Furthermore, in the case of binary outcomes, decision thresholds are based on absolute treatment effects4 5 whereas most meta-analyses and their associated forest plot are performed on relative effect scales.

In this exposition, we operationalise the GRADE definition of decision thresholds (trivial, small, medium or large effect)4 and judge inconsistency in a meta-analysis based on these thresholds. We developed a tool for visualising inconsistency based on stakeholder-provided thresholds. The first aim of this visualisation approach is educational, that is, to teach the concept of inconsistency as it relates to multiple decisional thresholds. The second aim of this visualisation is to provide a practical tool to facilitate making judgements about inconsistency in a meta-analysis or a guideline.

The proposed visualisation approach

This visualisation approach can be used when meta-analysts prepare a summary of the findings table and make a judgement about inconsistency. The approach starts with stakeholders providing three thresholds in the form of absolute risk differences. Consistent with recent GRADE guidance,4 these three thresholds define seven treatment effect ranges consistent with large, medium and small reduction, large, medium and small increase, and a trivial or no effect. As an example, we used in this paper the following thresholds (per 1000 patients): small, medium and large reduction (−10 to –100 and −200), small, medium and large increase (10, 100, 200), and trivial or no effect (between −10 and +10). Random-effects meta-analysis is conducted using the restricted maximum likelihood estimator of between-study heterogeneity on a relative effect scale. The relative treatment effect of each study is converted to an absolute effect using a baseline risk that is either derived from the available studies or can also be provided by users (in this visualisation, it was derived by dividing the number of events in the control groups of a meta-analysis by the total number of participants in the control arms). Each individual study is categorised into one of the seven ranges based on its absolute effect. The random-effect weights of studies that fall in each inference range are summed to provide the total weight for that range. A bar graph depicts all the ranges with the height of the bars representing the percentage of the total weight for the range. This bar graph allows visualisation of the distribution of inferences of the individual studies in relation to stakeholder-provided thresholds. The approach is summarised in box 1. This approach can also be used for continuous outcomes, which can be expressed on their original scale and using stakeholder-provided thresholds. If such thresholds were unknown, the outcome could be expressed as a standardised mean difference and we can use the traditional thresholds of 0.2, 0.5 and 0.8 to define small, moderate and large effect thresholds.4

Box 1

Steps of performing the proposed visualisation approach

  1. Decisional thresholds are provided by stakeholders.

  • If unavailable or unknown, default thresholds can be used.

  1. Meta-analysis is conducted to obtain the effect size and weight for each study.

  2. The effect size of each study is converted as needed to match units of decisional thresholds.

  • Binary outcomes: convert relative effects to absolute effects using appropriate baseline risk.

  • Continuous outcomes: convert to a standardised mean difference if thresholds are unavailable.

  1. Total weight is calculated for each decisional range by summing the weights of individual studies that fall within that range.

  2. A bar graph allows users to visualise the spread of inference across decisional ranges and make a judgement about inconsistency.

The approach is implemented for binary and continuous outcomes in an open-source R code provided in the online supplemental appendix (R Core Team 2024. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria). The code is implemented in an R Shiny application that does not require knowledge of statistical software coding: https://hassan-murad.shinyapps.io/inconsistency_visualization/

Supplemental material

Examples

The first example is a meta-analysis of eight studies6 that evaluated the effect of home non-invasive pressure ventilation on mortality in patients with chronic obstructive pulmonary disease. The I2 index of 27% and the p value for heterogeneity of 0.21 suggest no important heterogeneity. The point estimates on the relative risk scale are similar except for two smaller studies (figure 1, panel A). Applying the proposed visualisation approach (figure 1, panel B), we note that the pooled effect and 50.2% of the weights of individual studies suggest a small reduction in risk. However, the remaining 49.8% of the weights of individual studies suggest three different inferences: moderate reduction, trivial effect and small increase. Thus, inferences from individual studies are quite variable in contrast to the impression derived from the forest plot and its associated statistical measures. In this case, rating down for inconsistency seems justified.

Figure 1

Meta-analysis of trials of home non-invasive positive pressure ventilation in chronic obstructive pulmonary disease. Panel A (top) demonstrates a meta-analysis of the risk ratio scale suggesting small heterogeneity. Panel B (bottom) demonstrates a bar graph showing the distribution of meta-analysis weights per effect size range, suggesting important inconsistency. The green bar represents the inference associated with the target of certainty (the pooled estimate). MA, meta-analysis.

The second example is a meta-analysis of nine studies7 that evaluated the adverse events of fluoxetine in patients who are obese or overweight. The I2 index of 53% and the p value for heterogeneity of 0.03, suggest a substantial and statistically significant heterogeneity (figure 2, panel A). Applying the proposed visualisation approach (figure 2, panel B), suggests that the inference from almost all the studies (96% of the meta-analysis weight) is consistent with a small increase in risk. Therefore, the statistically significant heterogeneity did not lead to any important inconsistency when considering stakeholder-provided thresholds. In this case, rating down for inconsistency is unnecessary.

Figure 2

Meta-analysis of trials of fluoxetine for adults who are overweight or obese. Panel A (top) demonstrates a meta-analysis of the risk ratio scale suggesting substantial heterogeneity. Panel B (bottom) demonstrates a bar graph showing the distribution of meta-analysis weights per effect size range, suggesting minimal inconsistency. The green bar represents the inference associated with the target of certainty (the pooled estimate). MA, meta-analysis.

The third example addresses a continuous outcome (online supplemental figure 1). A meta-analysis of eight trials evaluated the effect of health and wellness coaching on the severity of depression in patients with chronic illness.8 The I2 index of 95% and the p value for heterogeneity of 0.01 suggested a substantial and statistically significant heterogeneity. The proposed visualisation shows that the majority of evidence (75% of meta-analysis weight) was consistent with a single inference, a trivial effect, which can justify not rating down for inconsistency. The forest plot and the bar graph (online supplemental figure 1) demonstrate that statistical heterogeneity is driven by a single small study (11.9% of the weight). Reviewing the inclusion criteria for this study may indicate a systematic difference from the remaining eight studies.

Discussion

It is very challenging to look at a forest plot and judge the consistency of individual studies in terms of their inference relating to multiple inference regions, up to seven regions according to recent GRADE guidance.4 This complexity increases to another level in the case of binary outcomes, which require translation to absolute effects. The proposed visualisation and quantification of total weight across stakeholder-provided thresholds can help in streamlining this judgement and make it more explicit. Using meta-analysis weights instead of ‘counting studies’ addresses small studies that are outliers with extreme results.

The approach can also be used when stakeholders decide to not to use multiple thresholds, and opt to only use the minimally important difference (MID). A positive MID and a negative MID define three ranges of effect, important reduction, trivial to no effect and important increase.5 The same visualisation can show the total meta-analysis weight distributed across these three ranges to make a judgement about inconsistency.

Limitations to this approach include two concerns associated with transforming a relative effect of a binary outcome to an absolute one. The first issue is the assumption of portability of the relative effect across different baseline risks, which is not always true.9 The second issue is that such transformation is usually done without addressing uncertainty in baseline risks. Several methods have been proposed to address uncertainty in the baseline risk when estimating the absolute effect,10 which can be easily incorporated in this proposed visualisation approach. Lastly, there are inherent methodological limitations to the MID and its reliability for gauging clinical relevance. Other approaches for establishing clinical relevance thresholds exists.11

Ethics statements

Patient consent for publication

Ethics approval

Not applicable.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • X @@m_hassan_murad

  • Contributors MHM and YF-Y conceived this study. MHM wrote the first draft. All authors critically revised the manuscript and approved the final version.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.