Statistics from Altmetric.com
Introduction: the challenge of knowledge inclusion in guidelines
Evidence-based guidelines whether national, regional or developed by specialty groups, must search for, and explicitly consider, evidence from sources other than conventional clinical trials and their quantitative data. This need for appraising and including knowledge from a wide variety of sources in guideline development is well recognised.1–3
Although evidence on statistical association—usually from randomised controlled trials (RCTs)—is commonly thought to be the dominant type of knowledge appraised and included, guideline developers frequently use a range of other types of knowledge including the views and experiences of those using and providing health services, understanding of how interventions work (eg, from logic models or realist evaluations), and other information, such as aetiology and the context of care (online supplementary text box 1).
Supplementary file 1
These different types of knowledge are used and needed in many situations, for example, when evidence from RCTs is not available, impossible to obtain, contradictory or inappropriate. They can also be used in conjunction with knowledge from RCTs to provide context, to assess relevance and to understand bias. Furthermore, explicit (written or spoken) knowledge and the more intricate forms of knowledge like experiential and contextual knowledge can help guideline makers to take an approach consistent with the intentions of early evidence-based medicine (EBM) proponents: namely, that best evidence is not restricted to evidence from RCTs and meta-analyses alone.4
However, how to properly appraise (judge) and include (integrate) different kinds of knowledge remains unclear. Agreed methods are not yet available or are in the early stages of development and the need for and use of different kinds of knowledge is not always explicitly acknowledged, which affects the use of guidelines in practice.5 6 International and cultural differences in guideline production practices may further impede developments in appraising and including a broader range of types of knowledge (online supplementary text box 2).
In this paper, we discuss four specific aspects of guideline development to highlight the main challenges identified by the AID Knowledge Working Group through discussions and workshops with guideline developers and users (online supplementary text box 3):
the purpose of guideline development;
the problem of induction;
the dominance of frequency based reasoning;
the challenge of integrating different sources of knowledge.
In order to do this, we refer to some philosophical concepts around knowledge creation.
The purpose of guideline development
The efforts of the pioneers of the EBM movement were primarily in response to the discovery of the variation problem in population studies. Reducing variation of the care provided at a population level was considered to be an important way to achieve improved quality for individual patients.7 Hence, epidemiology, the science of studying populations, gained prominence in guidelines, the aims of which are to support decisions for individual patients. Classic epidemiology became clinical epidemiology when introduced to the bedside and the dominance of RCTs as the gold standard for intervention studies to assess causal relation between interventions and effect followed in this construct of epidemiology as used in EBM. The underlying—yet little explored—assumption is that guidelines based on population studies provide the best advice to inform clinical decisions for individual patients or situations.
However, reducing variation is not the only reason for developing guidelines; they are developed for several reasons, of which the most important one is to improve the quality of care. In order to meet the range of needs, guidelines may need different approaches, such as summarising large quantities of knowledge for practising healthcare professionals, serving as an intermediate product for other tools or applications (such as clinical decision support software) or providing implementation guidance. Although not primarily developed for this purpose, guidelines can also serve as tools to legally shield both patients and professionals, to help governments and health insurers allocate scarce resources and to act as governance frameworks for practitioners and governments.
There is also the role of guideline development as a discipline in itself; along with its associated practices and institutions, it provides employment and intellectual interest for many.
There has been surprisingly little research into the purposes of guideline development. One mixed-method study found the purposes of guidelines were: defining norms, summarising evidence, formalising current consensus and/or describing current practices in a handbook-type format.5
Making the purposes of guidelines more explicit may help determine how different types of knowledge could and should be used. For instance, if the aim is to describe current good practice (eg, how services are organised to deliver care), this may be better achieved by drawing on qualitative or mixed-method evaluative research rather than RCTs. If the aim is to assess the effectiveness of a specific treatment or approach, evidence from RCTs or high-quality prospective cohort studies would usually be the primary source of knowledge, with qualitative or mixed-method studies serving to help understand the local context of implementation.
It is important to note, however, that for most guideline developers, the primary purpose remains that of supporting decision making in the clinical encounter. This leads us to the next fundamental aspect of guideline development.
The problem of induction
How do different types of knowledge in guidelines development help to make clinical decisions? Some basic concepts from the philosophy of science may help to understand the problem.
Inference, the problem of induction and evasions
In logic, to infer means to conclude from evidence using reasoning.8 In everyday healthcare practice, care professionals and patients reason to reach conclusions about what has happened, to make predictions about what will happen and to decide what to do next. Because of uncertainty in medicine, we usually deal with a specific type of inference, called induction, where the conclusions of our reasoning are not always right even when based on true premises. In philosophy, there is a concern whether this is actually possible, called the problem of induction,8 as introduced by Hume in 1739.9 At its simplest, this means we cannot predict the future with certainty. Although this seems reasonable, we are in fact able to predict the future quite accurately on many occasions in clinical practice. How is this possible? Philosopher of science Ian Hacking8 argues that we never solve the problem of induction, but only evade it by applying different kinds of reasoning to reduce uncertainty and increase our chance of reaching the best possible outcome.
The dominance of frequency-based reasoning
The evasion most dominantly used in guideline development is frequency-type reasoning in the form of systematic reviews, RCTs and observational studies.5 This evades the problem of induction by recognising that ‘although we can’t predict the future for the individual case, we can be “usually” right (eg, 95% of the time)’8 as long as events or cases are frequent enough.
Frequency-based reasoning relies on basic assumptions that have some drawbacks. First, this line of reasoning assumes that reality is dice like and that we—eg, scientists guideline developers and healthcare professionals—are rolling the same dice (online supplementary text box 4). Frequency-type reasoning presupposes adequate framing and defining of what is similar and what is not, which is always based on judgement and choice.
Second, frequency-based reasoning aims to find simple causal correlations, independent of context. The question is whether these simple correlations hold true in real life. Different understandings of causality exist that could help us address this drawback.10 For instance, a network of complex causal relationships may be more realistic. This drawback is described as the efficacy paradox, where the different interference from non-specific effects (different from those controlled for between groups in a trial), measurement artefacts (that mimicked therapeutic effects in the trial) and regression patterns (such as the self-limiting nature of a disease) in real life can outweigh the specific effect found in a trial. This paradox may become especially apparent when inferring in the context of multimorbidity.11
Finally, and most importantly, although frequency-based reasoning works well for frequent events (large groups, many data points and long periods of time), such reasoning faces fundamental limitations when inferring in the single-case scenario: a single patient, a rare disease, a system intervention and an one-off event. This can be particularly challenging when recommendations based on frequency-type evidence alone are deployed to help decision making for individual patients or unique situations, such as a public health response to a disease outbreak.2
Given these drawbacks, it is worth noting that other types of reasoning to evade the epistemological problem of induction exist.
In table 1, several alternative ways of reasoning are listed. They are mainly used in areas where frequency-based reasoning is particularly problematic, for instance in guidelines focusing on complex interventions, public and occupational health, rehabilitation, and social care and welfare.11–13
These different types of reasoning try to help make valid inferences for the single-case scenario, when there is no frequency of events. Many of these are already recognised and stated by Bradford-Hill14 in his criteria for causation, but some are newer, such as Annemarie Mol’s logic of care,15 where a practitioner will try something, wait and see and let unfolding events guide the next step. Using this type of reasoning, the problem of induction is solved through ‘tinkering’, making incremental changes to improve a situation.
Guidelines can and do support these kinds of evasions by including different types of knowledge. For instance, providing laboratory information about aetiology helps to make an inference based on mechanistic reasoning.16 A description of cases of harm can offer an inference based on the precautionary principle.17 Rethinking how inferences are made in practice may shift the dominance of frequency-based reasoning and its reliance on a restrictive type of knowledge to a broader spectrum of knowledge being used to support different reasoning approaches. The need for using different type of knowledge is shown by a large Dutch analysis showing that knowledge from RCTs far outweighed other knowledge types used, irrespective of the question at hand, thus ignoring important and relevant knowledge from other sources.5 6
The challenge of integration
Making a recommendation for a specific healthcare problem in a specific healthcare system requires the assessment of knowledge not just on its own merits, but importantly its integration with other knowledge. Indeed, EBM is defined as integrating the best evidence with clinical expertise and patient preference.4
However, in the context of medicine, and even more so in that of guideline production, integration of different types of knowledge remains underexplored and undertheorised.
Some areas of evidence synthesis have addressed integration. For example, statistical techniques such as meta-analysis can be used to combine data from different studies, and another range of techniques can be used to synthesise qualitative data.
In guideline development, most of the activities and tools to support high-quality evidence synthesis such as risk of bias assessment and quality assessment (such as GRADE) tend to focus primarily on frequency-based reasoning and knowledge. For the assessment of quality of qualitative evidence, there are limited but relevant initiatives for guideline development in progress, for example, the recently published Grading of Recommendations Assessment, Development, and Evaluation – Confidence in the Evidence from Reviews of Qualitative research (GRADE-CERQual)18 guidance. However, many of these efforts try to achieve integration by synthesizing studies that share the same questions and design (eg a set of qualitative or, more narrowly, ethnographic studies),19 at times appraising18 all such knowledge again in frequentist terms, like with some qualitative evidence synthesis methods20 that ‘emphasize frequencies of the qualitative data they present…undermin[ing] the uniqueness of the qualitative knowledge they proclaim by focusing on frequency and the general patterns’.21
The main issue is that these tools, activities and initiatives aim to integrate similar knowledge, such as data from the same study designs, the same populations or the same outcomes. How different kinds of knowledge are valued, appraised and weighed in relation to each other, for example, regarding effectiveness, efficiency or ethical concerns, is not clearly articulated.
Nonetheless, guideline developers do recognise that other types of knowledge are often used and somehow integrated in practice, particularly when discussing the evidence and formulating recommendations, often called ‘judgement’ or ‘considered judgement’.22 23 This is the traditionally less clearly described or analysed black box part of the process that new initiatives try to shed a light on, such as NICE’s structured tables linking evidence to recommendations and the GRADE Evidence to Decision frameworks.22
They appear promising yet challenges remain. First, findings from ethnography question whether structured frameworks really influence or reflect guideline development processes.24 25 In an ethnographic study of guideline development meetings, Moreira showed that guideline developers formulate guidance by combining different ‘repertoires of evaluation, organised around four different epistemic criteria: robustness, usability, acceptability and adequacy’.24 Importantly, such criteria are deployed at each stage of evidence appraisal: usability, acceptability and adequacy are integral to evidence assessment, rather than being easily categorised as either ‘judgements’ or ‘additional considerations’ as current evidence to decision frameworks suggest.22 Acknowledging the importance of these epistemic skills in evidence appraisal become much more important when it is understood that recommendations nearly always draw on different types of knowledge.
Second, bringing knowledge together is not just a process of integrating, triangulation and finding a single answer. Knowledge from many sources is often conflicting, and indeed the exploration of opposing ideas is often very important. In social sciences, methods for evidence synthesis of other kinds of data have been developed and assessed in research26 and in practice guideline development,13 but these have not yet been adopted routinely in healthcare guideline development. A process of integration is not just a technical, simple mechanistic process. Guideline development is a human, social process involving relevant stakeholders in discussion, debate and judgement. Therefore, the guideline development processes also relies on a balanced and representative guideline committee that functions well.27
Finally, integrating many types of knowledge is not a process in which anything goes. Some integration processes are likely to be better than others. Guideline development needs to be transparent and consistent so that reality, be it physical or social, can limit the inferences and recommendations made. We need a range of integration approaches depending on our understanding of what is true and real; for example, integration of different knowledge could be based on combinations of coherence (what fits best in a network of other theories), on consensus (what people agree on) and/or on correspondence (what links best to what is believed to be real).
Given the current state of evidence to decision frameworks, there is still little guidance on how to robustly and consistently combine knowledge of different types without using the frequentist understanding of knowledge. A broader discussion within the guideline community is needed about the frameworks used to integrate and include different kinds of knowledge. Considering theories from epistemology and findings from ethnography (online supplementary text box 5) could be instrumental to deepen our understanding of how other types of knowledge can be synthesised and integrated in guideline production.
The development of guideline recommendations is an interactive human process that requires a range of knowledge and experience including, but not exclusively, knowledge from frequency-based research, such as clinical trials. As in the clinical encounter, appraising and including different types of knowledge in guideline development should be used to make better inferences to guide decisions, but in practice, arguments are used to exclude some kinds of knowledge for a range of reasons, including concerns about introducing bias in frequentist reasoning.
In this paper, we present important epistemological reasons to appraise and include a (wide) variety of different types of knowledge to highlight important aspects of guideline development that await further exploration and practical suggestions.
We acknowledge that appraising and including knowledge from a different variety of sources is likely to be complex and ongoing. Discussions about purpose, reasoning and integration in guideline development will continue. A simple set of tools or methodological quick fixes are unlikely to suffice, and developing criteria for appraising and integrating different knowledge will remain a challenge. However, we believe that much can be done to help guideline developers improve this—now often implicit5practice that is central to their work. Capacity-building workshops that confront implicit forms of reasoning are one example. AID Knowledge runs such workshops annually at G-I-N conferences. They help to strengthen ties between guideline developers who are concerned about the increasingly rigid methodological constraints on guideline methods at the expense of fostering epistemic sensibilities. It is important for guideline developers to feel they are part of a community of practice that encourages epistemic skill development, rather than a hierarchical community where superior guideline methods are defined by a small group of experts. This will help to keep guideline development innovative and diverse.
Acknowledging that dominant frequentist methods are excellent for some questions but do not fit all knowledge needs is the first step to implementing different kind of reasoning in guideline development. How to address the diversity in methods for different kinds of questions should be among the top guideline research priorities.
Contributors All authors substantially contributed to the conception or design of the work, or the acquisition, analysis or interpretation of data; drafted the work or revising it critically for important intellectual content; gave final approval of the version published; agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Funding SW received funding from the European Union Seventh Framework Programme (FP7‐PEOPLE‐2013‐COFUND), Grant/Award Number: 609020. TZJ received funding from a fellowship in the Future Research Leaders program of Linköping University. The OA fee was paid for by the Guidelines International Network.
Competing interests SW was a guidelines update standing committee member for NICE from 2014 until 2018. FF was a Board member of Guidelines International Network from 2009 - 2013. He led the Norwegian Guideline Secretariat from 2001 - 2006 and has been a member of several Guideline Working Groups nationally and internationally. CH is guideline co-ordinator of the Netherlands Society of Occupational Medicine. SL worked as a guideline methodologist and developer in Australia. FM worked with NICE on clinical guidelines since 2000 and was Director of the Centre for Clinical Practice (responsible for guidelines) at NICE from 2008-2011. BS worked for NICE from 2008 until 2017, advising on the methods of guideline development. She is also the chair of the GIN AID Knowledge Working Group. TZJ was founding co-chair of the AID Knowledge Working Group. He has studied guideline development for about a decade, through multiple externally funded studies.
Provenance and peer review Not commissioned; externally peer reviewed.
Collaborators Stephanie Chang, Pwee Keng Ho, Sonja Kersten, Miranda Langedam, Peter O’Neill, Sarah Richards, Rodrigo Pardo Turriago, Sue Phillips.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.