Article Text

## Abstract

When analysing and presenting results of randomised clinical trials, trialists rarely report if or how underlying statistical assumptions were validated. To avoid data-driven biased trial results, it should be common practice to prospectively describe the assessments of underlying assumptions. In existing literature, there is no consensus on how trialists should assess and report underlying assumptions for the analyses of randomised clinical trials. With this study, we developed suggestions on how to test and validate underlying assumptions behind logistic regression, linear regression, and Cox regression when analysing results of randomised clinical trials.

Two investigators compiled an initial draftbased on a review of the literature. Experienced statisticians and trialists from eight different research centres and trial units then participated in a anonymised consensus process, where we reached agreement on the suggestions presented in this paper.

This paper provides detailed suggestions on 1) which underlying statistical assumptions behind logistic regression, multiple linear regression and Cox regression each should be assessed; 2) how these underlying assumptions may be assessed; and 3) what to do if these assumptions are violated.

We believe that the validity of randomised clinical trial results will increase if our recommendations for assessing and dealing with violations of the underlying statistical assumptions are followed.

- epidemiology
- statistics & research methods

## Statistics from Altmetric.com

## Introduction

The results of randomised clinical trials and systematic reviews hereof are, and should be, at the top of the hierarchy of evidence.1–5 However, several domains may bias trial results including the choice and conduct of statistical analyses.2 6–8

In order to ensure validity of trial results and under some circumstances to optimise statistical power, most statistical methods do require checking and validation of underlying theoretical assumptions.9 For example, the intervention effect estimated by analysis of covariance (ANCOVA) could under certain circumstances be erroneous if residuals are not normally distributed; and when using ANCOVA, power will often increase if data are log transformed.10 Likewise, if normality is violated and the sample size is small, then the Wilcoxon rank sum test can be three to four times more powerful than the independent samples t-test.9

Previously, we have undertaken a review of randomised clinical trials published in major medical journals in order to clarify whether the underlying assumptions behind the applied statistical tests were protocolised, assessed and reported.11 The conclusion was that trialists rarely report if or how the underlying assumptions were validated.11 Furthermore, the literature on the topic propose no clear recommendation on how to assess and report these underlying assumptions in relation to conducting a randomised clinical trial.11

For decades, there have been a positive movement and ongoing strive to enhance transparency in medical research, for example, through the CONSORT statement and the EQUATOR network.6 12 However, it has not yet become common practice to prospectively report the methods used to assess the underlying assumptions of the statistical models used in the analysis of a randomised clinical trial.11 It also seems essential to address what steps are to be taken if the underlying assumptions are not fulfilled and what criteria are used to decide whether an assumption is violated or not. By checking assumptions, we can strive to ensure that the appropriate statistical methodology is being used.

To facilitate clarity, completeness and transparency of reporting in randomised clinical trials, every step of the trial process, including assessments of assumptions underlying the chosen statistical methods, need to be thoroughly protocolised, described and reported.11 The aim of this paper is: (1) to consider which underlying assumptions should be assessed when using logistic regression, linear regression and Cox regression during analysis of results from randomised clinical trials; (2) to consider how to assess and validate these underlying assumptions and (3) how to deal with violations of these underlying assumptions.

We primarily focus on randomised clinical trials with two intervention arms, and logistic regression, linear regression and Cox regression as trialists commonly use these analysis methods in randomised clinical trials.11 Furthermore, these regression analysis methods allow essential adjustments for stratification variables used in the randomisation as well as adjustments for baseline covariates.13 14

## Methods

Prior to beginning the development of the suggestions in this study, we performed a systematic survey of the literature to determine the current practice on the reporting of the assessment of the model assumptions for logistic, linear and Cox regression analyses in randomised trials. The results showed that the reporting of the results of model assumptions in trials is suboptimal.11 The development of the recommendations in this study was done in two steps: (1) a systematic survey of the methodological literature to identify candidate assumptions for each method and (2) a consensus study among thirteen experts selected from academic clinical trials centres.

### Systematic survey

The methodological literature was searched to identify candidate assumptions. Relevant databases were searched (PubMed, Cochrane Library, Google Scholar) using the search terms (assumption, statistical, analysis, randomi*) in February 2019. Based on the results of the systematic survey two investigators (TL and JCJ) developed a candidate list of assumptions and compiled an initial draft of the paper including:

General considerations.

Which assumptions to assess.

How to assess if underlying assumptions are violated.

Potential measures in case the assumptions are violated.

### Consensus study of experts

The initial draft was distributed to invited selected investigators at different departments and institutions known in our network (see list of coauthors). We applied a Delphi-inspired process focusing on anonymised commenting for the investigators to be unbiased by opinions from other specific investigators. Each investigator at each institution was free to accept, reject, comment or suggest alternative methods preferably backed by arguments, results of empirical studies, results of simulation studies and other references to justify their comments and suggestions. All correspondence went exclusively and one-to-one through an independent facilitator (AKN). AKN collected all comments, assembled them into a report and wrote a compiled and anonymised summary of the comments. JCJ and TL then commented on the report from the facilitator and here on composed a revised draft of how to test for assumptions. The comments, the revised draft and the report from the facilitator were then sent to the external investigators for a next round of anonymised comments. This process was repeated six times until all coauthors could accept the final document.

## Results

### General considerations on which assumptions to assess, what methods to use and potential measures if the assumptions are not fulfilled

We strongly recommend that the protocol or the detailed statistical analysis plan for a trial should specify the key assumptions underpinning the analyses as well as how these assumptions should be assessed and what should be done if the assumptions are violated.11

For all regression analyses, we recommend testing for major interactions between each covariate and the intervention variable. We recommend that the statistician, in turn, includes each possible first order interaction between included covariates and the intervention variable. For each combination, it should be assessed if the interaction term is significant and what the effect size is. Predefined threshold for significance should depend on the number of tests and although very restrictive Bonferroni adjusted thresholds may be considered (0.05 divided by number of possible interactions). Furthermore, it should be considered whether the interaction is assumed to have a clinically significant effect. If the interaction is concluded to be significant, it should be considered presenting separated analyses for each of the relevant variables (eg, for each site if there is significant interaction between the trial intervention and ‘site’) and an overall analysis including the interaction term in the model.

Visual inspection of different types of plots may be used to assess if underlying assumptions are violated (eg, visual inspection of histograms or plots of residuals). There are also several formal statistical tests, which may be used to test if certain underlying assumptions appear to be violated (eg, the Shapiro-Wilk test, Pearson’s χ² test and the Anderson-Darling test).15 16 Visual inspection of plots has the limitation that it requires a subjective assessment of the plot in question and the subjective assessment may not be reliable and replicable.11 The use of formal statistical tests also has limitations.11 Formal tests for normality will, for example, often conclude that data are *not* normally distributed if the data set is large, even when the departure from normality is inconsequential.11 On the contrary, if the data set is small, serious departures from normality may not be detected by a formal test due to lack of power and the inherent asymmetry in formal hypothesis testing as discussed below.11

We generally recommend using both graphical plots and formal statistical tests, and if there are discrepancies between these two assessments, then potential explanations for these discrepancies, and any action taken consequently, should be considered thoroughly and reported.11

How to visually interpret statistical plots is a complex task and statistical knowledge and experience is needed.17 In the present paper, we will not consider how to interpret plots, but this is of course a prerequisite for a valid statistical assessment if plots are used.

To limit the potential bias, we suggest that any graphical and formal assessment of assumptions is performed blinded to the randomised interventions to ensure that the ensuing choice of method was not influenced by the effect of the intervention. We also encourage that all the plots used are published in a supplementary material as part of the main trial publication. That allows the reader to assess whether the methods are considered adequate. Furthermore, it should be documented how the choice of methods was arrived at. It is necessary that all assessments and statistical analyses are performed while the statistician remains blind to the randomised treatment allocation, that is, intervention groups should be coded as, for example, ‘1’ and ‘2’ and unnecessary variables that might compromise the blinding of the statistician should be excluded from the dataset used in the primary analyses.

### Statistical reporting

Blinded data on all outcomes should preferably be analysed by two independent statisticians. Two independent statistical reports should be sent to the trial steering committee, and in case of discrepancies between the two statistical reports, possible reasons should be identified and consensus on the most correct result sought obtained. A final statistical report should then be prepared, and all three statistical reports should be published as supplementary material.

Our detailed suggestions are presented in three tables, one for each analysis method, each presenting: ‘Assumptions behind the analysis’, ‘How to assess if the assumptions are violated’ and ‘Potential measures if the assumptions are not fulfilled’ (see tables 1–3).

## Discussion

It can always be questioned whether it is possible and valid to present a ’cookbook’ for statistical analyses. Nevertheless, it is difficult to assess the quality of a methodology or to improve it, if is not described in detail. The present paper presents detailed suggestions on assessing and dealing with violation of underlying assumptions for three frequently used statistical methods when analysing results of randomised clinical trials. Our recommendations are not exhaustive and following another methodology or systematic plan may also lead to valid results. This is particularly important to emphasise regarding the measures suggested for when the assumptions are *not* met, as several valid options often exist when violations of underlying assumptions occur. Nevertheless, the existing lack of transparency when trialists report how underlying statistical assumptions are assessed is striking—both in published protocols and in trial publications.11 Other methodological aspects than tests for underlying assumptions have received thorough attention for decades.7 It is our intention that this paper should equally heighten attention on this important part of the methodology and be viewed as a supplement to existing recommendation.8 18

When analysing results of randomised clinical trials, the primary objective will often be to clarify if a given intervention is beneficial or not. Other trial objectives might necessitate assessments of assumptions we have not included in our recommendations. As an example, we do plan to assess the proportional hazard assumption between the compared intervention groups. However, we do not recommend assessing the proportional hazards assumption for each covariate (eg, by visually inspect plots of Schoenfeld residuals both of continuous and categorical covariates19) because violations of this proportional hazard assumption will rarely influence the overall trial conclusions whether the intervention works or not. If the HRs for each covariate are of interest, then the proportional hazard assumption for this covariate should of course be assessed.

A potential limitation of the present paper is that we did not follow a strict Delphi methodology.20 A Delphi process often includes several live meetings, SKYPE meetings and telephone conferences where recommendations are discussed, which has several advantages (in depth discussions, dedicated time for the project and so on). However, some researchers might, for example, be more persuasive than others or some researchers might have more charisma or be more famous than others, which might hinder recommending the optimal methodology if the famous researchers are not presenting the optimal viewpoints or arguments. Therefore, we chose to focus on anonymous commenting when considering our recommendations and our methodology is therefore described as a ‘Delphi-inspired’ methodology. Nevertheless, it may be a potential limitation that we did go through a full Delphi process.

## Conclusion

This paper provides suggestions for assessing the validity of the assumptions underlying commonly used statistical analyses for randomised clinical trials, as well as suggestions for assessing and addressing violations. We believe that the validity of trial results will increase if our recommendations are followed.

### Key messages

The validity of results from randomised clinical trials is dependent on fulfilment of the underlying assumptions of the statistical analysis.

There is no good consensus on how trialists should assess and report underlying assumptions for the analyses of randomised clinical trials.

This paper presents suggestions on which underlying assumptions to test when using logistic regression, linear regression and Cox regression to analyse results of randomised clinical trials.

The paper provides detailed suggestions on how these underlying assumptions may be assessed and what to do if the assumptions are violated.

## References

## Footnotes

Correction notice This article has been corrected since it was published Online First. Minor formatting issues have been corrected, namely in the abstract.

Contributors JCJ and TL did the planning and initiated the project. All authors developed the framework and participated in writing the article. AKN is the guarantor of the article. All remaining authors are selected statistical professors, experts, methodologists and trialists from different parts of the world. The recommendations presented in this paper are derived from the authors’ extensive experience and expertise in this field as well as from relevant and up to date literature. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Patient and public involvement statement We did not involve patients or the public in our work.

Provenance and peer review Not commissioned; externally peer reviewed.