Article Text

## Abstract

Network meta-analysis (NMA) is an increasingly popular statistical method of synthesising evidence to assess the comparative benefits and harms of multiple treatments in a single analysis. Several automated software packages facilitate conducting NMA using either of two alternative approaches, Bayesian or frequentist frameworks. Researchers must choose a framework for conducting NMA (Bayesian or frequentist) and select appropriate model(s), and those conducting NMA need to understand the assumptions and limitations of different approaches. Bayesian models are more frequently used and can be more flexible but require checking additional assumptions and greater statistical expertise that are often ignored. The present paper describes the important theoretical aspects of Bayesian and frequentist models for NMA and the applications and considerations of contrast-synthesis and arm-synthesis NMAs. In addition, we present evidence from a limited number of simulation and empirical studies that compared different frequentist and Bayesian models and provide an overview of available automated software packages to perform NMA. We will conclude that when analysts choose appropriate models, there are seldom important differences in the results of Bayesian and frequentist approaches and that network meta-analysts should therefore focus on model features rather than the statistical framework.

- Methods

## Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study.

## Statistics from Altmetric.com

#### WHAT IS ALREADY KNOWN ON THIS TOPIC

Network meta-analysis (NMA) has become a popular method to combine results from several studies comparing multiple treatments or interventions.

Statistical models to perform NMA models have been developed both in Bayesian and frequentist frameworks.

#### WHAT THIS STUDY ADDS

We present important theoretical aspects of Bayesian and frequentist models for NMA, review evidence from simulation and empirical studies that compared different frequentist and Bayesian models, and provide an overview of available automated software packages to perform NMA.

#### HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE AND/OR POLICY

Network meta-analysts should focus on model features and their applications and assumptions rather than the statistical framework.

## Introduction

In recent years, evidence synthesis using network meta-analysis (NMA) methods has gained popularity,1 becoming an attractive methodology for researchers, clinicians and decision-makers across the clinical spectrum. A quick search in PubMed shows an exponential increase in the number of NMA publications in the last few years (see figure 1). While conventional pairwise meta-analysis is limited to providing relative effects for two interventions, NMA offers the ability to simultaneously estimate the relative benefits and harms of multiple interventions or diagnostic tests, thus better supporting complex decision-making processes.2 By combing direct evidence with indirect evidence, NMA improves the precision of relative effect estimates. Further, its results can provide guidance on rating treatment options and reduce the uncertainty of parameters for cost-effectiveness models.3 4 The past few years have seen important advances in the statistical methods, software development and methodologies to facilitate interpretation and decision-making.3 5–7

Despite the appealing advantages, NMA presents challenges. The two key assumptions of NMA, transitivity and coherence, rely on the agreement of different sources of evidence (direct and indirect evidence for the same treatment comparison and their similarity), which may be challenging to justify in practice.1 2 8 Despite all advancements that has made it easier to produce NMAs, many published systematic reviews with NMA are of poor quality.9 Limitations in the practice of NMA abound and include inappropriate network configuration and node selection,10 11 using probability ranking to draw conclusions from NMA results,12 13 relying on low power statistical tests to assess coherence assumption,14 using network meta-regression with a limited number of studies to assess effect modification, and failing to assess certainty of evidence for effect estimates.12 13

In recent years, while statisticians and methodologists have made great attempts to improve understanding of NMA methods, there remain methodological topics with limited guidance for those conducting NMA. In the present article, we provide a comprehensive overview of statistical models for NMA, review currently available software packages, and provide guidance for model selection to perform NMA.

### Fundamentals of statistical models for NMA

NMA models have been developed both in Bayesian and frequentist frameworks. From a broader perspective, the statistical inference framework differentiates the two approaches.15 In frequentist statistics, the parameters that represent the characteristics of the population are fixed, but an unknown constant can be inferred using the likelihood of the observed data. In other words, the probability that the research hypothesis is true within the observed data is specified; thus, the frequentist framework can only help decide whether to accept or reject a hypothesis based on the statistical significance level—based on estimation.15 16 The results from an analysis using the frequentist approach are given as a point estimate (eg, OR, relative risk or mean difference) with a 95% CI.

Bayesian statistics have a different perspective on uncertainty that mostly involves conditional probability—the probability of an event A, given event B. Unlike the frequentist approach that only uses the likelihood from the observed data, the Bayesian framework relies on the probability distribution of the model parameters given the observed data and the prior beliefs from external information about the values of the parameters. Combining these two using Markov Chain Monte Carlo (MCMC) simulations, which intends to reproduce the model many times until it stabilises and converges, generates a posterior probability.15–17 The results of the Bayesian framework are presented as a point estimate with a 95% credible interval (CrI), which is interpreted as the interval in which there is a 95% probability that the values of the point estimate will lie. For ratio measures (eg, OR, relative risk or HR), medians are used as point estimate, whereas either the mean or median can be reported for the pooled mean difference or standardised mean difference.

#### Frequentist models for NMA

Bucher *et al* proposed the first frequentist approach for a network of three interventions.18 In this approach, which is also known as *adjusted indirect comparison*, an estimate for the indirect treatment effect of A versus B can be obtained through the direct comparison of A versus C and B versus C (eg, as the difference in log relative effects of the two direct comparisons). The variance of the indirect estimate is the sum of the variances of the two direct effect estimates. In a closed triangular loop of evidence (or evidence cycle) the indirect effect estimate can be combined with the direct effect estimate to obtain the network (or mixed) effect estimate.

The most popular frequentist NMA models are the graph-theoretical approach (proposed by Rücker *et al*
19), the meta-regression approach (proposed by Lumley20 and then further developed by Salanti *et al*
21), and multivariate meta-analysis model (suggested by White *et al*.22). The theoretical aspects of these models and their advantages and limitations are discussed in detail elsewhere.3 23–25

All the above models use estimates obtained at contrast or study level - whether NMA is performed under a fixed-effect model (assumes there is one true effect size underlying the trials for each comparison) or a random-effects model (assumes the true effect size can differ from trial to trial). We will go over the contrast-based and arm-based models later in the text. In addition, while theoretically all these models can be modified to use between-study heterogeneity variance (tau^{2}) from each comparison—informed by more than one study, their associated automated packages (tables 1 and 2) only use the between-study heterogeneity (tau^{2}) that is assumed to be equal between and constant for all treatment comparisons in the network (ie, a homogenous variance assumption). Heterogeneity variance in NMA models is discussed further down in the text.

#### Bayesian models for NMA

All Bayesian models suggested for NMA are hierarchical or multilevel models (ie, written in multiple levels or hierarchical form and that the estimates from submodels are dependent). Smith *et al*
26 proposed a hierarchical model for conventional meta-analysis that lays the foundation for the popular NMA model of Lu and Ades (‘MTC’ model),27 28 which then extended within the generalised linear mixed-modelling framework and was automated to create the ‘GeMTC’ model.29 Within this framework, additional models were proposed to account for arm-level estimands (quantities of interest)—for example, Dias *et al* and Hong *et al* models.30

For meta-analysis and NMA in a Bayesian framework, the effects of interventions to be estimated are given prior distributions. In general, we want the observed data to have the most influence on the posterior effect estimates and, thus, often non‐informative or minimally informative prior distributions (in which the posterior distribution is determined as completely or as minimally as possible by the observed data) are used.23 31 32 The prior distribution should be within the range of plausible values for the pooled intervention effect and, if the prior is to be non‐informative (or vague or uniform), it should be very large, which means that the distribution is essentially flat over the plausible range of values for the treatment effect.31–33 Despite the importance of prior distributions, a recent review of 44 Bayesian NMAs published in leading general medical journals found that approximately half did not specify their choice of heterogeneity prior distributions and 84% failed to provide a rationale for their selected priors.33

In a random-effects model, a second set of priors is defined for the between‐study heterogeneity variance (tau^{2}). The range of plausible values for informative and non-informative priors of tau^{2} and their influence on NMA effect estimates are discussed elsewhere.32–34 In brief, a typical non‐informative prior for tau^{2} for a binary outcome using logit link models ranges from 0 to 2; this represents a huge range of trial‐specific treatment effects. For example, for a median treatment effect of OR=1.25 (Ln OR=0.22), we would expect 95% of trials to have a true ORs between 0.17 and 9.24 (calculated as exp[0.22±2]). It is important to ensure the posterior distribution of tau^{2} is sufficiently different from the prior distribution, otherwise the prior distribution will dominate the data and no posterior updating will occur. An alternative, but outdated, approach would be to use a vague gamma prior distribution for the effect estimate to ensure lower prior weights for large values of tau^{2}. The limitation of this approach is that it becomes unreasonably informative when low values of tau^{2} are possible or if the evidence is described by a sparse network. A network of treatments that is informed by small number of trials (most comparisons informed by 1-2 trials) are typically called sparse network. A star-shape network does not have any closed-loop of evidence and typically consists of treatments that are mostly compared to the reference treatment (e.g., placebo or usual care)" is a footnote sentence, a description for explaining "sparse network" term. If it can not be added as a footnote, please remove pranthesis from the beginning and the end. Similar to the frequentist model, the between‐study heterogeneity variance is assumed to be zero in a fixed-effect model.

### Contrast-synthesis and arm-synthesis NMAs

NMA can be performed using contrast-based or arm-based models. Currently, arm-based models are only available using a Bayesian framework.25 30 34 While contrast-based NMAs focus on modelling relative treatment effects (eg, log of the OR), arm-based models use absolute treatment effects from trial arms (eg, the log odds of success).25 30 Arm-based models have been criticised for their departure from standard meta-analysis practice by compromising randomisation in evidence, and that absolute effects tend to be highly variable compared to relative effects.25 30 35 The advantages of arm-based models are intriguing: they allow entry of arm-level covariates/effect modifiers into the model, data from single-arm studies can be included—this is helpful in cost-effectiveness analysis and decision-making models to have better performance in handling missingness, which is helpful in estimating comparative absolute effects.25 35–37 Arm-based models can also accommodate different heterogeneity variances as opposed to using a common-heterogeneity variance.30

#### Heterogeneity variance in NMA models

All automated packages, with the exception of *pcnetmeta* which is used to perform arm-based NMA, assume a common heterogeneity variance for all treatments effects under the random-effects assumption. This means for all available direct comparisons in the network a single between-study heterogeneity (tau^{2}) is assumed.23 27 32 While theoretically both frequentist and Bayesian models are capable of estimating tau for each comparison informed by at least two studies, in practice it is challenging and mathematically complex.27 Using a constant and common heterogeneity variance reduces the number of model parameters, simplifies the estimation and increases the precision of heterogeneity variance and as a result effect estimates from the NMA. Indeed, in order to simplify the NMA model, it is sometimes the preferred approach for performing arm-based synthesis.25 38

In sparse networks and star-shaped networks informed by small number of studies, heterogeneity estimation across the network can have implausible results and using a common heterogeneity variance can lead to unexpected imprecision of network estimates (ie, the CI of the network estimate becomes wider than that of the direct or the indirect estimates).39 40 When the choice of priors for the heterogeneity variance is too wide, this can be more pronounced in Bayesian models than frequentist models.40 In such instances, using a fixed-effect model (tau=zero) can resolve the problem.39 Arm-based models also can be used, but the estimates using the arm-based models may not converge well if some comparisons are only informed by a few (say, less than three) studies.30

### Currently available software packages for NMA

*GeMTC*,41 42 *BUGSnet*
43 and *bnma*
44 are available R packages for performing Bayesian contrast-based NMA using a common heterogeneity variance. In addition, pcnetmeta package allows performing Bayesian arm-based NMA in R. Neupane *et al* provided a comprehensive review of three R packages— *GeMTC*, *pcnetmeta* and *netmeta—*for NMA.45 *Multinma* is an additional automated package available in R for performing Bayesian NMA of individual patient and aggregate data using multilevel network meta-regression.46 47 R package *netmeta*
19 and the Stata *network* suite48 49 are the available automated packages to perform frequentist contrast-based NMA. Further details about these packages are summarised in tables 1 and 2.

### Empirical and simulation studies comparing Bayesian and frequentist models

Although a considerable amount of methodological research in NMA has been published recently, only a limited number of simulation and empirical studies have compared the existing methodological advances. Unlike other areas of biomedical research, methods are commonly evaluated on an empirical example, resulting in the notion of differences between candidate models rather than an evaluation against known theoretical values.

We conducted a scoping search of PubMed and Scopus to identify empirical and simulation studies comparing frequentist and Bayesian NMA models and found one study protocol,38 two conference presentations50 51 and four peer-reviewed publications.24 25 52 53 The protocol describes an empirical study of three contrast-synthesis methods and two arm-synthesis methods.38 Their group has published a simulation study comparing contrast-based model of Lu and Ades,28 the arm-based model of Hong *et al*
54 and two intermediate contrast-based models with different study intercepts and applied these models to a real dataset.25 They concluded that all these models can provide valid results and that the results of these models are comparable. The important difference between their selected models were not that they were arm-based or contrast-based but in other model features (ie, fixed vs random study intercepts, estimands, whether treatment effects relate to study intercepts and missing data assumptions).

Seide *et al* performed a simulation study for sparse networks of trials using two frequentist models (graph-theoretical approach and multivariate meta-analysis model) and the GeMTC hierarchical Bayesian model.24 All models were contrast-based and used a common-heterogeneity variance. They concluded that all models performed well with respect to coverage, precision, bias and error, with minimal difference in estimated ranking probabilities. The important factor in the observed differences was the heterogeneity; for example, the credible intervals were wider compared with CIs when flatter priors were used for between-trial heterogeneity or bias, even though minimally, increased with heterogeneity. Kiefer *et al*
53 also performed a simulation study of networks with up to five interventions comparing the frequentist graph-theoretical approach (*netmeta*) with two Bayesian MTC hierarchical models, assuming a common heterogeneity variance (all random-effects contrast-based models). They concluded that Bayesian and frequentist consistency models perform similarly and that with moderate or no incoherence and very low heterogeneity the latter had slightly better properties.

By re-analysing data from 14 published networks, Sadeghirad *et al* compared fixed and random-effects models from the GeMTC hierarchical Bayesian model with non-informative priors to a multivariate meta-analysis (*network suite*) frequentist model.51 They found no disagreement between network estimates from the two framework models. In a conference poster presentation, Harvey *et al*
50 compared Bayesian and frequentist approaches using a single real dataset but did not provide any details of NMA models or any of their findings in the abstract. Hong *et al*
52 used Proc GLIMMIX in SAS to perform frequentist NMA and WinBUGS for Bayesian NMA (both random-effects contrast-based models) using a single real data set and concluded that Bayesian methods are more flexible and their results more clinically interpretable.

## Conclusion

Some may argue that the Bayesian framework is more flexible than frequentist in accommodating alternative NMA models and construction of complicated models (eg, arm-based models, analysis of disconnected networks or longitudinal trial data, or using non-common heterogeneity variance) with less assumptions. Most of these advantages are unavailable in automated software packages and, because of their complexity, require greater statistical expertise and proficiency in Bayesian software (eg, WinBUGS, OpenBUGS, JAGS). Using automated software packages within Bayesian framework also has challenges that frequentist approaches avoid, including selection of appropriate prior distributions for treatment effects and between-study heterogeneity variance, and convergence of models derived from MCMC simulations.

Despite these potential challenges, Bayesian models have been used more frequently, perhaps because of the availability of free software packages and codes and previous shortcomings and complications of fitting frequentist NMA models. A limited number of simulation and empirical studies suggest no important difference in the performance, bias or errors between the frequentist and the Bayesian models, and highlight the importance of appropriate model selection. Theoretical aspects of different NMA models also support the idea that when appropriate NMA model is selected, it is unlikely to assume the results of Bayesian and frequentist NMA models would be different. Thus, network meta-analysts should focus on model features and their applications and assumptions rather than the statistical framework.

## Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study.

## Ethics statements

### Patient consent for publication

### Ethics approval

Not applicable.

## References

## Footnotes

Contributors BS, FF, MJZ and LT did the planning and initiated the framework. All authors further developed the framework and participated in writing the article. BS is the guarantor of the article. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding Dr JWB is supported, in part, by a CIHR Canada Research Chair in the prevention and management of chronic pain.

Competing interests None declared.

Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting or dissemination plans of this research.

Provenance and peer review Not commissioned; externally peer reviewed.