Article Text

Download PDFPDF

What is the vibration of effects?
  1. Constant Vinatier1,
  2. Sabine Hoffmann2,3,
  3. Chirag Patel4,
  4. Nicholas J DeVito5,
  5. Ioana Alina Cristea6,
  6. Braden Tierney7,
  7. John P A Ioannidis8,
  8. Florian Naudet1,9
  1. 1Univ Rennes, CHU Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, Centre d’investigation clinique de Rennes (CIC1414), Rennes, France
  2. 2Department of Statistics, Ludwig-Maximilians-Universität München, München, Germany
  3. 3LMU Open Science Center, Ludwig-Maximilians-Universität München, München, Germany
  4. 4Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
  5. 5Nuffield Primary Care Health Sciences, University of Oxford, Oxford, UK
  6. 6Department of General Psychology, University of Padova, Pavia, Italy
  7. 7Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA
  8. 8Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California, USA
  9. 9Institut Universitaire de France (IUF), Paris, France
  1. Correspondence to Constant Vinatier, University of Rennes, Rennes, France; constant.vinatier1{at}gmail.com

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Navigating between contradictory results is not rare in the practice of evidence-based medicine. Recently, two papers published in the same year and in the same journal investigated the same research question with the same dataset and reached divergent results regarding the benefits of retrieval bag use during laparoscopic appendectomy.1 The two studies reached contrasting conclusions, one found that these bags actually reduce the risk of infection2 while the other study found no support for a difference.3 Likewise, a multitude of network meta-analyses about the treatment of psoriasis reached divergent conclusions on the best drug to use,4 the best drug always being the one of the drug manufacturer in case of industry-funded meta-analyses. Implementing the findings of medical research for decision-making in clinical practice is quite challenging when scientific results stand on such unstable ground. One reason, among others, is analytical flexibility that represents the variability in results arising from ‘researcher degrees of freedom’ (ie, uncertain decisions researchers have to make in study design, data collection and data analysis5). Analytical flexibility arises, for instance, when researchers have to choose among multiple justifiable methods, models or measurements. Given this analytical variability and under the pressure to publish, researchers may try different analysis strategies and selectively report the most impressive, desirable, publishable result.6 Not surprisingly, reported results may be, on average, inflated.7

The range of results arising from analytical flexibility can be explored using a generalisation of sensitivity analyses in which all uncertain analytical and methodological choices are systematically varied to estimate how much different results can be, that is, the vibration of effects (VoE). This framework reports the range of effect sizes that can be obtained within the same study due to the various analytical and methodological choices that can be made,7 providing researchers with the possibility to report all possible results rather than selectively reporting the most impressive, favourable or publishable results. In this article, we will illustrate how the VoE framework can be used to explore the (in)stability of results in biomedical research and discuss its use in evidence-based medicine by evaluating all possible methodological choices which come from analytical flexibility.

Assessing and reporting analytical variability

A realistic approach to explore analytical variability may rely on seeing how different investigators can reasonably approach and vary their choices for a given research question and dataset. In a ‘Multi-analyst study’8 several independent teams analyse the same dataset to assess the analytical choices and variability in results.9 Each team may choose how they best want to analyse the data. For instance, 29 research teams independently investigated the same dataset to examine whether skin tone was associated with red cards in soccer. The teams employed a variety of statistical models leading to considerable differences in effect size and statistical significance.9 This approach is, however, challenging to implement as it relies on recruiting and managing a large network of independent teams which still may leave many plausible analytic strategies unexplored. Moreover, very often it is difficult to justify which specific analytical choices are more meaningful than others.

Another less complex approach in term of feasibility may help. Accordingly, the VoE is a more general framework that can be used for any research project to explore analytical variability in a more comprehensive way. It involves computing the results of a very large number of possible analysis strategies by varying one or more analytical choices in all possible analytical scenarios and comparing their impact on the observed results.

Figure 1 illustrates the VoE approach using observational data from National Health and Nutrition Examination Survey (NHANES) prescription data by fitting 9595 different models (of which 6242 models converge) exploring the association of systolic blood pressure (in mm Hg) and use of lisinopril.10 Lisinopril is a drug to treat high blood pressure and it is the most commonly prescribed drug in the NHANES 2011–2018 prescription data. The estimated beta coefficients of interest ranged from −0.553 to 0.575, with a median of 0.003 mm Hg.

Figure 1

Vibration of effects of beta coefficient in the exploration of the association between lisinopril usage and systolic blood pressure. An estimate <0 suggests lower systolic blood pressure with lisinopril. This figure was produced using data from Tierney et al10 by fitting 9595 random select models, among all possible models, exploring the association and using 253 covariates, with a maximum number of variables in the model set to 20. Data and code to reproduce the figure are available on the Open Science Framework at https://osf.io/xfy75/. (A) Dots represent the 6242 convergent regression models among the 9595 randomly selected models. Colours represent densities (red=high, blue=low), with marginal density plot of distributions. (B) Point estimates and 95% CIs for all models. Colours represent densities (red=high, blue=low).

Various indicators have been proposed to assess and quantify VoE. Some are focused on p values, including the range of p values (RP): the difference between the 99th and the 1st percentile of the negative log transformation of the p value.11 12 Others are based on effect sizes, including, for instance, relative OR and relative HR, which are the ratio of the 1st and the 99th percentile of the OR and the HR, respectively.11 13 A Janus effect, named after the two-faced Greek-Roman god Janus, is defined in a VoE study by the presence of opposite results (eg, ORs on both sides of the no-effect line) among all possible analysis strategies,11 and indicates substantial analytical variability: for example, a treatment seems to be better than control in some analyses while the control is better than the treatment in some other analyses of the very same data; or a biomarker seems to be a risk factor for a disease in some analyses and a protective factor for that disease in some other analyses using the same dataset.

Regarding the example presented in figure 1, there was a Janus effect with the 1st percentile being negative (−0.247) and the 99th percentile being positive (0.126). A total of 1.6% (154/9595) of the associations were statistically significant at p<0.05 (141 and 13 of which were negative and positive, respectively). The p values of the different models ranged from 0.1510×10–6 to 0.9997 with a median of 0.6908 (figure 1). Given this wide variability in results, it would have been easy to selectively report a favourable or unfavourable association between lisinopril intake and systolic blood pressure based on analytical flexibility .

VoE in primary research and evidence syntheses

From data processing choices14 (eg, eligibility criteria, handling of outliers,15 dichotomisation of outcome16 and of covariates) to model selection, many sources of analytical flexibility exist in primary research and can be explored using the VoE framework. There is a continuum between study designs. Randomised controlled trials (RCTs) control flexibility through generally more stringent design choices than observational studies, for example, randomisation limits confounding.17 The presence of analytical variability also depends on (1) the richness of datasets (eg, big, wide, deep data), especially when data are not collected for research purpose10 and need to be heavily preprocessed,18 and (2) the complexity19 of the models being considered (eg, linear models vs more complex ones),20 as there are many more junctures where choices can be made with complex models. The VoE framework allows for exploring the stability of results within a primary study.

However, the phenomenon of analytical variability is not restricted to primary research and is also observed in evidence synthesis methods. While meta-analyses are supposed to be exhaustive, reproducible quantitative syntheses of the available evidence on a given research question, they are also prone to analytical flexibility. Differences in inclusion/exclusion criteria regarding Population-Intervention-Comparison-Outcomes-Study design and other analytical choices can lead to substantial analytical variability, especially for controversial topics with high clinical and statistical heterogeneity,21 or when the evidence synthesis methods are complex and rely on assumptions that are difficult to verify (eg, exchangeability in comparative effectiveness research). Analytical variability has been observed in head-to-head meta-analyses assessing the efficacy of acupuncture for smoking cessation21 and operative compared with non-operative treatments for proximal humerus fractures,12 in an indirect comparison of nalmefene to naltrexone22 and in a network meta-analysis of 21 antidepressants23 with 172/231 (74%) comparisons exhibiting a Janus effect.24 To a smaller extent, analytical variability has been observed in 16 332 individual-level data meta-analyses exploring the efficacy of canagliflozin versus placebo in type 2 diabetes, depending on the combinations of RCTs to be included.25

Implications for evidence-based medicine

The VoE framework may be helpful to assess the robustness of results to alternative plausible analytical choices in a systematic way. It also offers a valuable meta-research tool to explore specific replicability issues, such as controversies across discrepant meta-analyses. For instance, there is some controversy regarding the additional benefit of escitalopram, a single-enantiomer drug of citalopram, whose launch coincided with the expiration of exclusivity for citalopram.26 Despite its large commercial success, the superiority of escitalopram over citalopram remains uncertain with contradictory claims based on conflicting meta-analyses.23 26–29 Using the most comprehensive Network Meta Analysis to date,23 it could be interesting to explore whether difference in treatment selection can lead to different effect estimates in terms of magnitude and statistical significance.24 Among the 4 116 254 possible network meta-analyses based on the 21 included treatments, 1 174 541 included both drugs. The estimated ORs ranged from 0.735 to 0.982, with a median of 0.881 (1st percentile 0.747, 99th percentile 0.965). There was no Janus effect since all OR estimates were in favour of escitalopram, possibly owing to the identification of an effect in the direct comparisons (OR=0.753 (0.630, 0.900), 13 studies). However, the RP was 0.0003–0.8258 with median of 0.1196 and an RP of 2.348. Only 33% (389 726) of the associations reached statistical significance at p<0.05 (figure 2). The VoE framework allowed to explore the robustness of the identified difference depending on treatment selection. If a genuine difference exists, its identification and magnitude depends on the pathways used for indirect comparisons as defined by the different network geometry. As a direct consequence, the VoE framework has the potential to help explore controversies in evidence-based medicine such as conflicting meta-analyses on the same topic. And indeed, it has been argued that there are inconsistencies between direct and indirect evidence in the escitalopram-citalopram comparison, with some doubts concerning even the reliability of the direct evidence.26

Figure 2

Vibration of effects for the comparison of escitalopram versus citalopram in the treatment of major depressive disorder. Data and code to reproduce the figure are respectively available on the Open Science Framework at https://osf.io/xfy75/. (A) An OR<1 is in favour of escitalopram. In the graphs on the right, dots represent meta-analyses and colours represent densities (red=high; blue=low), with marginal density plot of distributions. Full methods are detailed at https://doi.org/10.17605/OSF.IO/MB5DY. (B) Example of a network of 12 treatments (in blue) that failed to identify a difference, OR=0.98 (0.84; 1.15) (p=0.823). Treatments in grey are treatment of the full meta-analysis not included in the network. Size of the points represents the number of patients included. (C) Example of a network of 12 treatments (in blue) that identified a difference, OR=0.74 (0.61; 0.89) (p=0.001). Treatments in grey are treatment of the full meta-analysis not included in the network. Size of the points represents the number of patients included.

The VoE framework can be very informative, but it should be handled with care. Even a strong association that is seemingly robust in a VoE analysis could be a false positive. Additionally, care should be given to choosing what parameters should vary and what reasonable and plausible limits to set on the variation examined. Since there is usually no consensus regarding all potential methodological choices, defining the set of model specifications to consider, VoE analysis is itself a subjective choice. Moreover, some model specifications examined may not be sensible or valid, for example, if they include collider variables that can impact the effect of interest. In the same vein, the VoE framework is an agnostic approach that explores all possible choices, although researchers may usually consider existing framework and rationales when making their choices and some combinations may make less sense than others. Finally, conducting all possible subset analyses within a study can lead to an overwhelming number of analyses, posing computational challenges.

The existence of analytical variability in primary research and in evidence syntheses is an important argument for registration of statistical analysis plans.17 It allows, for primary research and also for evidence syntheses,30 to check if they deviated from their initial plans. In addition, registration for evidence syntheses may help limit the conduct of redundant meta-analysis that may end in divergent results and even more divergent interpretations,31 adding confusion. Detailed statistical analysis plans can be employed to prespecify the proposed approach to handle multiplicity by making any choice transparent and to limit outcome-dependent analytical choices. Registration is currently mandatory for clinical trials but still optional for observational research and meta-analyses. Even for clinical trials, mandatory registration does not extend to mandating also the public availability of detailed statistical analysis plans in advance.

Conclusion

If used with care, the VoE framework can be a useful tool to explore and visualise uncertainties related to a universe of possible analytical choices in primary studies, datasets and meta-analyses. It can increase transparency in the reporting of results arising from different data processing, data/study eligibility and model specifications and help explore controversies in evidence-based medicine such as conflicting meta-analyses on the same topic.

Ethics statements

Patient consent for publication

Ethics approval

Not applicable.

References

Footnotes

  • X @NaudetFlorian

  • Contributors During the writing of another paper about registration of observational studies, editor Juan Franco invited this educational paper on vibration of effects. FN invited the team of coauthors. CV and FN wrote the first draft. All other authors contributed to revising it critically and agreed on the final content.

  • Funding The author(s) received no specific funding for this work. Publications fees will be paid by Rennes University Hospital.

  • Competing interests CV is a PhD student in the OSIRIS (Open Science to Increase Reproducibility in Science) project. The OSIRIS project has received funding from the European Union’s Horizon Europe research and innovation programme under the grant agreement number 101094725. SH has received funding from the European Union’s Horizon Europe programme, the German Federal Ministry for the Environment, Nature Conservation, Nuclear Safety and Consumer Protection (BMUV) and the LMUExcellent. CP received funding from NIH (NIEHS R01ES0324702 and NIA RF1AG074372). NJD has received funding from the European Union’s Horizon Europe programme, also via the OSIRIS project, the Naji Foundation, the German Federal Ministry of Education and Research (BMBF) and the Fetzer Franklin Memorial Fund, and has been employed on grants from the Mohn-Westlake Foundation, Laura and John Arnold Foundation, Elsevier and the Good Thinking Society in the last 5 years. BT is compensated for consulting with Seed Health and Enzymetrics Biosciences on microbiome study design. FN received funding from the French National Research Agency (ANR-17-CE36-0010), the French Ministry of Health and the French Ministry of Research. He is a work package leader in the OSIRIS project. He is a work package leader for the doctoral network MSCA-DN SHARE-CTD (HORIZON-MSCA-2022-DN-01 101120360), funded by the EU. The work of JPAI has been supported by an unrestricted gift from Sue and Bob O’Donnell to Stanford University.

  • Provenance and peer review Not commissioned; externally peer reviewed.