Evidence-based medicine (EBM’s) traditional methods, especially randomised controlled trials (RCTs) and meta-analyses, along with risk-of-bias tools and checklists, have contributed significantly to the science of COVID-19. But these methods and tools were designed primarily to answer simple, focused questions in a stable context where yesterday’s research can be mapped more or less unproblematically onto today’s clinical and policy questions. They have significant limitations when extended to complex questions about a novel pathogen causing chaos across multiple sectors in a fast-changing global context. Non-pharmaceutical interventions which combine material artefacts, human behaviour, organisational directives, occupational health and safety, and the built environment are a case in point: EBM’s experimental, intervention-focused, checklist-driven, effect-size-oriented and deductive approach has sometimes confused rather than informed debate. While RCTs are important, exclusion of other study designs and evidence sources has been particularly problematic in a context where rapid decision making is needed in order to save lives and protect health. It is time to bring in a wider range of evidence and a more pluralist approach to defining what counts as ‘high-quality’ evidence. We introduce some conceptual tools and quality frameworks from various fields involving what is known as mechanistic research, including complexity science, engineering and the social sciences. We propose that the tools and frameworks of mechanistic evidence, sometimes known as ‘EBM+’ when combined with traditional EBM, might be used to develop and evaluate the interdisciplinary evidence base needed to take us out of this protracted pandemic. Further articles in this series will apply pluralistic methods to specific research questions.
- evidence-based practice
- behavioral medicine
- biomedical engineering
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Two and a half years into the COVID-19 pandemic, it is time to take stock. What began as—we assumed—an acute respiratory illness rapidly sweeping the world has become a prolonged global crisis with medical, social, economic and political dimensions. The SARS-CoV-2 virus has proved tenacious and shifting. It causes a complex multisystem disease which disproportionately impacts minority ethnic groups and the sick, poor, old and disadvantaged. It produces significant mortality and—in some—prolonged sequelae. Effective and safe vaccines were produced rapidly, but uptake has been patchy and highly transmissible variants continue to spread and mutate. Coordinated disinformation campaigns have weakened the public health response.
Despite a quarter of a million scientific papers on COVID-19, some basic issues remain contested. How exactly does the virus spread? How effective are non-pharmaceutical interventions—masks, distancing, closure of buildings, remote working and learning, lockdowns—in reducing transmission, and what are their trade-offs? How can we make schools, hospitals and other public buildings safe? How can we protect workers and the public without closing down the economy? How cn we reduce the shocking inequalities that have characterised this pandemic?
This paper explains why we need to go beyond lip-service to evidential pluralism to fully answer those questions. While acknowledging the primacy of randomised controlled trials (RCTs) and meta-analyses of RCTs for estimating the efficacy of drugs and vaccines, we highlight and extend the arguments made by some scholars within the evidence-based medicine (EBM) movement for ‘EBM+’,1 2 defined as an approach which systematically considers mechanistic evidence (studies which aim to explain which factors and interactions are responsible for a phenomenon3) on a par with probabilistic clinical and epidemiological studies. Our central argument is that for some aspects of the pandemic, especially those characterised by a combination of complexity (multiple variables interacting dynamically with a high degree of uncertainty), urgency (decisions needed in days not years) and threat (the consequences of not acting could be catastrophic), mechanistic evidence has been mission-critical and RCTs difficult or impossible. Thousands of lives were likely lost as a result of what was incorrectly claimed to be an “evidence-based” approach—dismissing or downgrading mechanistic evidence, overvaluing findings from poorly designed or irrelevant RCTs, and advocating for inaction where RCT evidence was lacking. The pandemic is an epistemic opportunity for the EBM movement to come to better understand, debate and embrace EBM+.
Evidence hierarchies: not fit for all purposes
Every student of EBM is taught that the hierarchy of evidence—with meta-analyses of RCTs at the top and so-called ‘less trustworthy’ evidence (case studies, mechanistic evidence, expert opinion) at the bottom (figure 1A)4 —is a useful heuristic but does not apply in all circumstances. But guidance to apply the hierarchy flexibly to take account of—for example—the nature of the research question, the degree of complexity involved, whether RCTs of the necessary size and scope are actually feasible, and whether policy decisions can or should be put on hold pending definitive results—has been honoured more in the breach than the observance. From Guyatt et al’s5 paper announcing EBM as a ‘new paradigm’ in medicine to their 2008 Grading of Recommendations, Assessment, Development and Evaluation statement (which offers eight domains of strengths and limitations of the overall evidence base, including observational studies as well as RCTs)6 and the Oxford Centre for EBM 2011 Levels of Evidence Document,7 there remains within EBM orthodoxy an explicitly hierarchical relationship between probabilistic clinical and epidemiological evidence (invariably placed at the top of the hierarchy) and the kinds of evidence which help establish causal mechanisms (invariably placed at or near the bottom).
The cornerstone of EBM as taught to students—ask focused questions in population-intervention (or exposure)-comparison-outcome (PICO) format; search for RCTs and, if these are unavailable, work down the hierarchy of evidence; appraise studies for risk of bias; and combine them using meta-analysis—was developed for a stable context in which predictions based on yesterday’s research map unproblematically onto today’s policy questions (eg, what actions should we take to prevent and manage coronavirus?). Despite their theoretical strengths for some kinds of question, and despite heroic achievements in the pandemic with fast-track platform trials such as RECOVERY,8 PRINCIPLE9 and PANORAMIC,10 RCTs are not a panacea (table 1).11 Modifications of the hierarchy of evidence (eg, by separating the layers with dotted or wavy lines, as in figure 1B, and using systematic review as a lens through which primary evidence is interpreted, as in figure 1C4 fall short of the epistemic shift that is now needed.
Strengthening causal claims through mechanistic evidence
Mechanistic evidence includes a wide range of designs, including in vitro experiments, biomedical imaging, autopsy, established theory, animal experiments, aerosol science, engineering research and simulations.3 Like all evidence, mechanistic studies can be of variable quality and definitiveness. Table 2 shows some preliminary criteria for grading mechanistic evidence, based partly on previous studies,12–14 which would benefit from further elaboration.
No one kind of evidence stands alone. Rather, as encapsulated in the Bradford Hill indicators15–17 (table 3), evidence of mechanisms must be combined with probabilistic evidence from clinical trials (and, where appropriate, evidence from non-randomised comparative and observational studies) to make a strong case for causality.13 This is because real-world circumstances often differ from the ones in which a probabilistic estimate (such as an effect size in an RCT) was demonstrated; evidence supporting a plausible mechanism affirms (or not) that the causal relationship is likely to be stable across settings.13 The presence of high-quality and consistent mechanistic evidence greatly increases the external validity of an RCT finding; absence of such evidence should make us question RCT findings.3 Public health interventions involve two kinds of mechanisms: the upstream causes of disease (for COVID-19 ‘mechanisms involving family structures and interaction patterns, occupational behaviour, urban density, housing occupation and overcrowding, workplace and retail environment structures and organisation, as well as local social, economic and cultural variation’—Aronson JK, page 6882 and the ‘causes’ through which an interventions might act (eg, attitudes, beliefs, capability, personal resources).2
Mechanistic evidence is inherently explanatory. It helps address the question ‘how might this effect be produced?’. In the early months of the pandemic, the grammar of our research was prematurely fixed and PICO-constrained (eg, ‘what is the effect size of masks in preventing respiratory infections?’). This narrowed our focus onto probabilistic studies of interventions (‘masks-on’), comparisons (‘masks-off’) and predefined outcomes (infections in the wearer) expressed in statistical terms (see masking example in box 118–28). We suppressed our scientific imagination. We failed sufficiently to wonder at the novelty of the disease and the significance of its unique patterns of spread such as super-spreading events, overdispersion, transmission in the absence of symptoms, and indoor predominance—all of which should have raised mechanistic hypotheses about a predominantly airborne mechanism of transmission.29
Public masking—‘naked statistics’ or evidential pluralism?
Mortality rates differed dramatically between Asian countries which introduced public masking very early in the pandemic and Western countries which did not.18 But was this relationship causal?
The ‘naked statistics view of evidence-based medicine (EBM)’12 would answer this question exclusively from randomised controlled trial (RCT) evidence, rejecting all other study designs as less trustworthy. The DANMASK researchers, for example, randomised 6024 people to being advised to wear masks outside the home or not.19 The primary outcome (infection with SARS-CoV-2) occurred in 42 people (1.8%) of the intervention arm and 53 (2.1%) of the control arm—a difference that was not statistically significant. This finding was described by some in the EBM community thus: ‘now that we have properly rigorous scientific research we can rely on, the evidence shows that wearing masks in the community does not significantly reduce the rates of infection’.20
DANMASK had numerous design flaws21: it was underpowered; it focused only on whether masks protect the wearer rather than whether they reduce overall community transmission; it occurred at a time of extremely low incidence of COVID-19; it had an intervention period of only 1 month; and—as the authors themselves observed19—the 95% CIs were compatible with a 46% reduction to a 23% increase in infection with masking (ie, it was inconclusive rather than negative). An editorial in the Cochrane Database of Systematic Reviews encouraged policy-makers to adopt the precautionary principle and ‘act on incomplete evidence’ for several reasons: the urgency of the situation (and the risks of doing nothing), the multiple complexities and complex chains of causation for this particular behavioural intervention, and the potential benefits, over time, of even a non-statistically significant reduction in transmission rate.22
An EBM+ approach would consider the DANMASK trial and other RCTs (such as a large community masking study in Bangladesh villages23 on their merits, but also take account of studies which throw light on the mechanism by which masking might work—including.
Laboratory, real-world case studies and mathematical modelling studies supporting an airborne route of transmission.29
Engineering studies which established the filtration properties of different kinds of masks.24 25
Studies on the psychological, sociocultural and wider structural causes of whether people actually wear masks.26
When the pandemic began, we immediately sought ‘robust’ evidence from new RCTs.30 But overapplication of this heuristic led to narrowly focused reviews,31 excluded experts from other disciplines, and closed our minds to the mechanistic explanations we could have built from well-conducted laboratory studies,32 animal studies,33 modelling studies,34 engineering studies35 or careful analysis of real-world events36—not to mention prepandemic studies on the transmission of comparable respiratory viruses.37
Dynamic mechanisms: complex interventions in complex systems
A complex intervention (eg, facial protection with masks or respirators) consists of multiple components acting independently and inter-dependently at multiple levels (eg, material artefact, person, group, organisation, system), such that it is difficult to identify the ‘active ingredient’.38 Facial protection can be home-made or produced to formal technical standards; humans may agree—or not—to mask; masking may be socially expected, organisationally required or legally mandated; the threshold case rate for introducing or reintroducing masking may vary; global supply may be unreliable; masking is often paired with other interventions, introducing confounders.
A complex system—such as the one for delivering public health (which includes government, health service providers and the public)—is characterised by a number of features:
Emergence (the system evolves over time, in ways that cannot be fully anticipated).
Adaptation (the system adjusts to accommodate change).
Feedback (either positive, where change in one factor reinforces another factor, or negative, where change in one factor diminishes another).
Self-organisation (eg, a public health department adapts a national programme to suit local constraints and priorities).39
Particular outcomes emerge as a result of multiple interactions in the system.
Whereas studies of drug and vaccine efficacy can be interpreted within the traditional biomedical paradigm (table 4, column 2),40 findings from such trials may not map to complex, real-world settings for multiple social and behavioural reasons. Studies how complex interventions generate outcomes in complex systems require a new paradigm (table 4, column 3) with designs that can capture dynamic change, accommodate non-linearity and embrace uncertainty.41
One approach to researching these generative mechanisms is the in-depth case study, a mixed-method design in which data collection, interpretation and synthesis are oriented to producing a rich description of a phenomenon in context.42 Hypotheses are explored and tested by constructing a plausible story (narrative synthesis) in which all data—qualitative and quantitative—are accounted for. Local public health interventions, even when part of national policy, are typically path-dependent (shaped and constrained by what has happened here previously) and emergent (they grow organically over time); they tend to be implemented and improved through testing and iteration in the real world.43 As Ogilvie et al43 have pointed out, this pragmatic approach is not ‘weaker’ than an RCT but complementary to it (figure 2). Observations from real-world case studies inform hypothesis-driven questions about the impact of particular interventions under controlled conditions, which can be tested in RCTs. Conversely, RCTs may produce important findings which inform development and real-world testing of an intervention. These authors contrast the ‘brick wall’ of a meta-analysis (in which every primary study contributes a similar ‘brick’) with the ‘dry stone wall’ of a mixed-method narrative review44 (in which different kinds of primary evidence are combined to illuminate a problem from multiple angles).45
Generative mechanisms in complex systems can also be studied quantitatively using modelling. Models (simplified versions of reality) can be used for a variety of purposes—to describe epidemic growth, estimate the likely impacts of interventions, simulate airflow inside buildings, and synthesise data from multiple sources in a manner analogous to systematic review and meta-analysis.46 Model projections can be useful for presenting plausible counterfactuals, and for quantifying the impact of uncertainty. Policy-makers’ use of modelling studies permit considering multiple ‘what if’ scenarios and sensitivity assessment as policy options develop. For communicable disease outbreaks, models can capture the non-independence of events that are the defining attribute of these diseases, and allow quantification of indirect effects on individuals who do not receive a given intervention46—for example creation of herd effects via immunisation.
As with all research designs, real-world case studies42 47 and modelling46 can be done well or badly, and the findings of such studies can be more or less relevant to a policy decision that is being contemplated in a different context or setting.
Mechanisms in the physical world: standards as evidence
Engineers use knowledge of the natural world to craft designs which benefit and improve society. That design process takes account of physical and technical properties and capabilities along with social needs and assessment of impact. Central to the science of engineering is a high reliance on standards, which are continuously developed to improve on the state of the art.48 Indeed, quality in engineering evidence is defined in terms of relevant accepted standards and codes (eg, on indoor air quality controls for healthcare facilities).49
New engineering problems are addressed largely by creatively applying and actively improving existing standards and models. Designing medical equipment such as ventilators or diagnostic imaging systems does not begin de novo but with a standard selection of component products built and tested to meet established standards. Such devices are not iteratively constructed and tested to confirm success. Most aspects are determined analytically using numerical models and design tools. Designs are constrained to meet established national standards that dictate performance and safety requirements. RCTs have no place in such design work.
Engineering knowledge rarely generates a single ‘truth’. Rather, it tends to produce a range of options which vary in how well they meet social needs and satisfy design and resource constraints. There is usually more than one way of designing a medical device to meet safety and performance standards. The success of the engineering method is illustrated by the rarity of harmful failures of engineered systems. When rare failures occur, there follows a detailed investigation of causes, and application of that knowledge to future designs.50
Building engineering contributes crucial evidence on how to protect people from airborne pathogens, since the use and impact of personal protective equipment is influenced by (among other things) the quality of indoor air.35 Furthermore, the mechanisms by which filtering facepiece respirators work are long-established and well-understood. Robust certification systems, standards and workplace usage protocols for respirators currently minimise exposure to occupational hazards for millions of workers worldwide. Nobody would propose an RCT comparing these products with less effective protection to ‘prove’ their value in protecting against chemical contamination in a lead smelter—yet such studies are currently being proposed for determining whether respirators protect against biological hazards in healthcare environments.
Human mechanisms: social and behavioural evidence
The human (or social) sciences seek to explain why people act in the way they do, taking account of wider influences such as cultural norms or economic constraints. The Medical Research Council’s framework on developing and evaluating complex interventions emphasises the need for developing a programme theory that takes account of the many influences on human (and organisational and system) behaviour.39
One such framework is the capability-opportunity-motivation-behaviour model,51 whose lead developer, Professor Susan Michie, was an adviser to UK’s Scientific Advisory Group on Emergencies. In relation to wearing masks or respirators, people may not be physically able to do so, or they may lack key knowledge (eg, they may wrongly consider themselves at low risk, wrongly believe that cloth masks, surgical masks and respirators are equally effective, or underestimate the importance of a good fit); because they lack motivation (including conscious beliefs and subconscious emotional reactions), or because they lack opportunity (eg, their employer bans masks or social pressures convey negative norms).
Other frameworks, such as those based on social ecology theory, place greater emphasis on the multiple layers of influence which shape and constrain human behaviour (including individual, interpersonal, organisational, community and public policy).52
Without a careful exploration of these wider influences, experimental studies of complex interventions may produce misleading findings.
The pandemic brought complex and fast-changing problems. We have argued that EBM’s traditional methods and quality standards, which favour probabilistic evidence from clinical trials, must be extended to place greater emphasis on other sources of evidence. We are not, however, arguing for an anything-goes approach to evidence or what one reviewer of an earlier draft of this paper called ‘a throwback to the 1970s’—a time before much of the rigorous scholarship of EBM had been built.
An urgent debate is needed within the EBM/evidence-based healthcare community on how, in what circumstances and to what extent EBM should evolve into EBM+. We hope to contribute to that debate with further articles on how an EBM +approach can enhance and extend EBM’s important contribution to pandemic science.
Patient consent for publication
Thanks to Kevin Hedges, Leyla Asadi, and Simon Smith and a colleague who preferred to remain anonymous for helpful comments on previous versions of this paper.
Twitter @trishgreenhalgh, @sameo416
Contributors TG wrote the original draft. All authors contributed to planning, discussing, drafting and refining the manuscript.
Funding TG’s work is partly supported by the NIHR Oxford Biomedical Research Centre (BRC-1215-20008). DC received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. MO received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. RM is funded by the National Health and Medical Research Council Principal Research Fellowship (grant number 1137582). DF is supported by Canadian Institutes for Health Research (2019 COVID-19 rapid researching funding OV4-170360).
Competing interests TG is a member of Independent SAGE and an unpaid adviser to Balvi, a philanthropic fund. RM in the last 5 years has received funding from Sanofi and Seqirus for investigator driven research on influenza vaccines. She has been on advisory boards or consulted for COVID-19 vaccines for Seqirus, Janssen and Astra-Zeneca. She has received funding for an industry-linkage grant scheme from mask manufacturer Detmold and consulted for mask manufacturer Ascend Performance Technologies. DF has served on advisory boards related to influenza and SARS-CoV-2 vaccines for Seqirus, Pfizer, Astrazeneca and Sanofi-Pasteur Vaccines, and has served as a legal expert on issues related to COVID-19 epidemiology for the Elementary Teachers Federation of Ontario and the Registered Nurses Association of Ontario. DJC worked as a Clinical Specialist for the Canadian PPE Manufacturers Association between November 2021 and February 2022. She now serves as an executive board member with the Coalition of Healthcare Associated Infection Reduction (CHAIR), a volunteer, not-for-profit group, and is a co-founder of the volunteer grassroots group Community Access to Ventilation Information (CAVI) supporting public library CO2 monitor loaning programmes.
Provenance and peer review Not commissioned; externally peer reviewed.