Article Text

Download PDFPDF

Systematic reviews to evaluate causation: an overview of methods and application
  1. Khalid S Khan1,2,
  2. Elizabeth Ball1,2,
  3. Caroline E Fox3,4,
  4. Catherine Meads2
  1. 1Department of Obstetrics and Gynaecology, Royal London Hospital, London, UK
  2. 2Centre for Primary Care and Public Health, Queen Mary University of London, London, UK
  3. 3Birmingham Women's Hospital NHS Foundation Trust, Birmingham, UK
  4. 4Reproduction, Genes and Development, School of Clinical and Experimental Medicine, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
  1. Correspondence to Khalid S Khan, Centre for Primary Care and Public Health, Queen Mary University of London, Yvonne Carter Building, 58 Turner St, Whitechapel, London, E1 2AB, UK;k.s.khan{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Good-quality systematic reviews inform evidence-based decision making, but they usually focus on diagnosis or effectiveness of treatment rather than the aetiology of a condition. The cause of many medical conditions is multifactorial and can be hard to establish1 ,2 but it is the understanding of disease causation that underpins medical education, practice and research. Thus, systematic reviews to evaluate aetiological evidence are much needed. This article outlines methods employed to review causation, collating evidence on the relationship between exposures and putative clinical outcomes.

Assessing causation using systematic reviews

Systematic reviews are robust pieces of research based upon a clearly formulated question, from which it is possible to identify relevant studies, appraise their quality and summarise the evidence using scientific and replicable methods. It is the use of an explicit and systematic approach that differentiates such reviews from anecdotal evidence, expert commentaries and narrative reviews. A causation review requires specific steps to transparently investigate causal criteria (table 1).

Table 1

Common criteria for causation and their assessment through systematic reviews

Originally causal criteria such as strength, consistence and temporality of association (this is not a comprehensive list) were derived from studies in the fields of microbiology3 and epidemiology2 but have now gained wider acceptance within clinical medical research.4,,6 Causation reviews are difficult to identify because there are no specific medical subject headings (MESH terms) used for indexing within the searchable databases. However, there are MESH terms for aetiology (and causality) which can be combined with search filters for systematic reviews.7

In addition, evidence for causation of health disorders may be concealed within the results sections or tables of a published review and not necessarily labelled as such. For example, when applying the experimental evidence criterion (please refer table 1) the presumed causal agent may be removed within a randomised controlled trial. If the condition under investigation is absent after treatment compared with the control group where it is present, a causal relationship may be inferred. For example, the removal of visible areas of endometriosis and restoration of anatomy by division of adhesions in order to treat pelvic pain associated with endometriosis, which has been investigated in a recent Cochrane review.8

Setting up a causation systematic review

At the outset, any hypothesis concerning strength, consistency and temporality should be specified. The mechanisms behind the development of the condition under investigation should be explored and the disease process must be outlined including aetiological factors, pathophysiology and clinical manifestations.9 Studying aetio-pathogenesis in this way involves evaluation of the proposed biological pathway in relevant animal, laboratory or human studies.10 Multiple sources should be searched and the search terms should identify literature on the events in the biological pathway, including unpublished studies to avoid the considerable risk of publication bias in basic science literature.9,,11 Standard systematic review techniques need to be used to ensure rigour in the final conclusions regarding causality.12 ,13

Assessing study design and quality

Epidemiological principles relevant to studying causation without bias include prospective design; measurement of causal agent, correct temporality of the association and control for confounding, although it is acknowledged that this list is not exhaustive. In observational studies, adjustment using multivariable modelling should be suggested as a way of controlling for known confounding variables. There are no validated quality assessment checklists or scores for causality systematic reviews yet so any quality assessment will need to acknowledge debate on what constitutes good quality in this area. This is similar to another new area for systematic reviews such as diagnostic yield where quality assessment checklists are evolving.14

Detailed quality assessment is helpful in exploring heterogeneity and in generating inferences.10 In this regard study design can be very informative. For example, when randomisation is not feasible, research may use cohort or case-control design. In studies on fetal exposure to potentially harmful maternal drugs,15 these designs were used to evaluate causation. Dolovich et al15 reported an increase in major fetal malformations after benzodiazepine exposure in utero in a meta-analysis of case controlled, but not of cohort studies (figure 1). The difference in results between cohort and case control is likely to be explained by recall bias or poor selection of control group, and weakens our inference about causation.13 ,16

Figure 1

Quality assessment of studies: association of major malformations with prenatal exposure to benzodiazepines (adapted from13 ,15; values of point estimates of OR>1.0 indicate an association of malformations with exposure to benzodiazepines compared to no exposure).

Data synthesis

Formal examination of hypotheses concerning consistency, strength, temporality, dose-response relationship and the biological plausibility of suggested mechanisms facilitates the assessment of causality. Strength of association can be subdivided into magnitude (ie, how large was the effect) and precision (eg, p values and CI). For example, a meta-analysis of observational studies showed that bicycle helmets reduce the risk of head injuries in cyclists involved in a crash (OR 0.31, 95% CI 0.26 to 0.37).17 The point estimate of OR and the upper limit of CI suggest a large, precise effect and, thus, a high degree of confidence in a causal association. In another example, the OR for exposure to benzodiazepines in pregnancy and association with major malformations in the newborn baby was 1.43 (0.89 to 2.31).15 This result suggests a great deal of uncertainty in a causal association as the CI includes the possibility of no association at all. With regard to magnitude, ORs of above 2 and below 0.5 are often considered worthy of further causal exploration if derived from good-quality studies. In a randomised controlled trial (RCT) of folic acid supplementation starting before pregnancy, the association was large (RR 0.28, 95% CI 0.12 to 0.71).18 When generated through randomised trials, such large effects are sufficiently compelling to lead to public policy changes. The precision achieved through meta-analysis does not by itself prove causation. One has to beware of the effect of bias and confounding. Even large effects from observational studies need careful consideration when inferring causation.19

Statistical analysis for heterogeneity and graphical representations may be used to explore consistency (figures 1 and 2). Statistical heterogeneity in the associations observed may arise when selected studies show qualitatively opposite results (eg, positive association as well as negative association seen through point estimates of ORs among cohort studies in figure 1). In this case the likelihood of there being a causal association will be low. Statistical heterogeneity in observed effects may also arise when selected studies show qualitatively same results (eg, positive association of different effect sizes). In this case, establishing a causal association may benefit from examination of a ‘dose-response’ relationship.

When feasible, combining study results (meta-analysis) may evaluate strength of association. Subgroup meta-analyses may be useful to explore the effect of study quality, temporality and dose-response relationships. In one example regarding the role of homocysteine in pre-eclampsia,20 (figure 2) subgroup meta-analysis considering temporality of association reassured us, despite the strength of association among temporal studies being less than that overall, that the association met this causal criterion.

Figure 2

Temporality of association: subgroup meta-analysis on the role of homocysteine in preeclampsia (adapted from20; values of point estimates of weighted mean difference >0 µmol/l indicate an association of hyper homocysteinaemia with preeclampsia).

A biological gradient can be explored through meta-regression which investigates the effect of one or more study characteristics on the size of treatment effect, taking precision into account. Hooper et al4 use meta-regression to show the association between modified dietary fat intake and cardiovascular disease. A genuine relation may be inferred when a slope is significantly different from zero. In another example for dose-response, (table 2) Ronksley et al6 observed greater cardiac protection with increasing alcohol intake. In addition, when biological plausibility is demonstrated, for example, a favourable change in biomarkers of coronary heart disease (higher levels of high-density lipoprotein cholesterol and adiponectin and lower levels of fibrinogen) in those who drank moderate amounts of alcohol,21 our confidence in a causal association is increased through existence of a disease mechanism.

Table 2

Dose-response relationship: pooled RRs (95% CI) for cardiovascular outcomes (number of pooled studies in parentheses after each effect estimate; heterogeneity statistic unavailable)


The strength of the any causal inference that can be drawn from a review depends upon the rigour of the review methods employed, the validity of the review. This will be based on responses to several questions such as: was the initial search adequate; was the quality of the included studies adequate; were the findings both substantive and statistically significant; etc. Through systematic review both meta-analytic (quantitative) and criteria-based (qualitative) methods can be used together in making causal inferences.

When evaluating the effect of bias and confounding on causal criteria such as strength, consistency and dose-response of association, there is debate about how often there is correlation between results from RCTs and observational studies.22 ,23 Conclusions stemming from the estimates of effect generated by the studies of different design being similar are valid only if the two groups of studies are similar in all respects other than the design itself. This is not often the case and medical advice based on observational studies has frequently been overturned when RCT evidence has emerged so that we now know we should rely on RCTs wherever feasible. For example, our belief about the protective effect of hormone replacement therapy on cardiovascular disease based on observational studies24 ,25 has been overturned by RCT evidence.26 ,27

Traditionally, qualitative narrative review techniques are used to assess causal criteria. A systematic review, with judicious use of meta-analysis, provides an elegant solution to this difficult interpretative problem. When synthesising data to test hypotheses concerning causal criteria, there are no accepted scoring systems to decide whether there is sufficient evidence for causality or not. Ultimately, these assessments are judgments. But, a systematic approach to evidence synthesis and interpretation is likely to generate more transparent, robust inferences than ad hoc considerations. When evidence is found to be lacking, reviews examining causation may be useful in identifying gaps in research.



  • Competing interests None.