Article Text
Statistics from Altmetric.com
Introduction
In the first article, we described the potential flaws often undermining guideline trustworthiness, despite their evidence-based ‘quality mark’, and mentioned the considerable efforts made to counter this over the past two decades. We have highlighted that limited panel composition, poor methodology and conflict of interest are the main ‘endogenous’ reasons for their weakness to date. Moreover, the ‘corruption’ of primary evidence and waste of biomedical research on which they are grounded further contributes to that phenomenon.1
Once published, it is hard to discredit even the most conflicted and untrustworthy guideline,2 mainly for the asymmetry of biomedical literature, not giving adequate resonance to such errors, inertia of medical practice, fear of medicolegal claims and the absence of independent, trustable and authoritative agencies rating, not simply collecting, the guidelines. Therefore, it is crucial not only to inform clinicians that guidelines can be wrong much more commonly than expected, but also to explain to them how to recognise the flaws and errors confidently.
In this paper, we suggest how to achieve this. A broader perspective by the research and policymaking standpoint about the issue will also be offered.
Assessing the trustworthiness of a recommendation
Most physicians need to decide about the trustworthiness of a single recommendation, not of an entire guideline. In doing so, while the above-cited extrinsic factors limiting the reliability of evidence-based guidelines need to be always considered, the conceptualisation of trustworthiness as related to its content and/or methodological value is very helpful, together with the need of deciding a ‘trustworthiness threshold’ to be adopted, depending on the clinical circumstances encountered (figure 1).
Indeed, assessing the quality of the methods followed to produce the guideline containing that recommendation is a reasonable, preliminary and necessary step. Up to 22 analytical tools exploring the quality of guidelines have been set up through 20053 and other frameworks appeared later or were significantly amended. Some of these tools gained large popularity and most literature assessing quality of guidelines has adopted one of these instruments.4–6 They share similar quality domains and the items to be assessed overlap to a large extent (table 1).
However, ‘sensitivity’ and ‘specificity’ of these tools have never been assessed prospectively, they have been decided mostly on a consensus basis, and they do not offer a score with a clear cut-off value below which a guideline should be disregarded. The most recent of them, from Institute of Medicine, sets the standard to such a high level that it is unfulfilled by all guidelines so far measured against it,7 so its practical value is unclear.
Alternatively, ‘red flags’ with an empirical relationship to flawed recommendations have been proposed,8 but they lack external validation and it is unlikely that such a knowledge gap will be reduced soon. Guideline repositories such as the National Guideline Clearinghouse have recently set forth a minimum standard of inclusion criteria for guidelines to be admitted and accepted onto that database. Alternatively, Guidelines International Network advocates and encourages public involvement in evaluation and discussion of guidelines, whereas NICE offers a ‘methodological seal’ for those organisations producing guidelines on its behalf through the NICE accreditation programme. Effectiveness of these systems in orienting readers towards the most trustworthy guidelines is however unknown.
Nevertheless, the lower the methodological quality of a given guideline, the higher the need to assess its content trustworthiness before adopting that recommendation with confidence, especially in the case of high risks of harm to the patients and/or high costs deriving from its adoption. Therefore, a complementary and even closer view must be obtained considering the ‘content’ trustworthiness, that is how and to what extent, the evidence was correctly rated and translated onto the recommendation of interest. When more than one guideline on the same topic exists, searching for any unexplained difference between recommendations considering the same evidence is a popular, indirect (‘proxy’) index of trustworthiness and reasoned comparisons of similar guidelines are now offered by the National Guideline Clearinghouse website. Unfortunately, the concordance of recommendations between guidelines may be falsely reassuring and ultimately wrong,9 while discordance may be (legitimately) due to differing interpretations of the ‘evidence’, the relative importance attributed to outcomes considered or to variable context-sensitive issues.10 Eventually, discrepant and discordant recommendations lead to further confusion. Instead, assessing indepth the content trustworthiness of recommendation implies taking into account, among the others, the consistency of the recommendation with the primary evidence on which it was grounded, how evidence gaps were filled-in by the discussion within the panel group, what values and considerations were adopted, and from which perspective the recommendation was issued (individual clinical practice or population wide/policymaking) and whether costs were considered or not.
The complexity of factors involved in producing recommendations informed by—rather than simply based on—evidence has been made explicit by the GRADE working group,11 which produced a method increasingly adopted (although not universally) by many organisations and societies sponsoring guidelines. Also, the GRADE method can be conveniently used by guideline users needing to scrutinise the trustworthiness of a given recommendation indepth (see table 1).
‘Cross matching’ a recommendation with the related primary evidence using the GRADE method with a ‘backward’ reanalysis going from the recommendation to the evidence which produced it is the essence of this ‘direct’ approach. However, this method is extremely time-consuming, requires considerable expertise of evidence-based methodology and its reproducibility is unknown. Furthermore, this approach has been evaluated only in a few studies.10 ,12 On the contrary, content (and methodological) trustworthiness can be verified also through a more affordable assessment of literature discussing guidelines and primary studies to which they refer to, as well as following the debate on websites dedicated to guidelines, letters, editorial and comments in peer-reviewed journals, blogs, social media and so on, if sufficient time (and interest) is dedicated to it. The highest level of content trustworthiness can be achieved only when prospective, convincing observational and/or experimental data about the outcomes derived from the adoption of a given recommendation is available (‘ex post trustworthiness’). However, this knowledge is rarely available.2 ,13–15 Furthermore, there is no structured effort made by anyone (either at an organisational or individual level) to assess systematically the outcomes derived from adopting guidelines. Eventually, ‘modelling’ of the projected health outcomes seems necessary,16 and this is increasingly considered a good option for screening programmes and population-wide interventions.
In conclusion, assessing the reliability of guidelines is largely a matter of individual, careful and complex judgement. Tools assessing their methodology are helpful but not exhaustive, and no predefined approach can be adopted to assess their content trustworthiness, especially when robust data are lacking.
Building a guideline user safety bundle
Since there is no single easily distinguishing feature of a trustworthy guideline and no satisfactory comprehensive assessment tool available, we suggest for the guideline user a seven-point ‘safety bundle’, to recognise wrong guidelines with confidence.
Question context: First of all, framing the question correctly is of paramount importance. To this regard, it must be clear what reason that recommendation is searched for. That is whether we need to be answering individual patient problems or we are in search for answers to effectiveness, safety or cost issues for population-wide interventions, setting a standard for quality assessment purposes or deciding how to allocate resources properly. Regardless, a clear translation of the generic question into an answerable clinical question is required, according to the most basic and time-honoured evidence-based rule (remembering the PICO acronym for Patients, Intervention, Comparison and Outcomes).
Trustworthiness threshold: Then, we should decide the ‘trustworthiness’ threshold for the recommendation we will eventually find, depending on the relevance of the outcomes we are considering, the expected benefits, the costs derived from adopting that recommendation, the potential risks and all of the patients' (or other stakeholders) perspectives appropriate for that question.
Remit and scope: Considering the remit and scope of the guideline we are considering is worthy of attention too. Some guidelines, for example, may include cost considerations and offer a population-wide perspective, whereas others offer a more patient-centred approach. It would therefore be of use to find a guideline tailored to the perspective which is more appropriate to our needs.
Quality assessment: Using a quality assessment tool such as AGREE, IOM or GIN criteria will increase the users’ awareness of the potential pitfalls of a guideline. There is empirical evidence that non-adherence to some key features of the guideline making process increases the risks of wrong recommendations. Among them, conflict of interest, panel composition, poor and opaque methods of evidence rating and absence of external review process will all erode substantially the reliability of guidelines and can be used as screening tests of methodological trustworthiness. Since medical specialty guidelines often fall short of these basic standards, they must be considered with special attention.
Content assessment: An assessment of recommendations against their content trustworthiness is also necessary, especially when there is any reasonable doubt about their methodological quality, when the risks and/or costs deriving from their adoption are of special concern, when there is weak or conflicting evidence available or a high ‘trustworthiness threshold’ is considered appropriate altogether. We discourage the practice of cross matching similar guidelines as a surrogate for their content trustworthiness. Instead, we recommend to control accurately whether external lines of evidence support that recommendation, outcomes following the adoption of that recommendation are available, as well as to use the GRADE method (or at least using broadly the GRADE principles of evidence rating) to scrutinise their content trustworthiness. It should be remembered again that the ‘corruption of the evidence’ makes any effort to produce reliable and useable recommendations difficult to overcome.
Adoption: Once a reasonable level of certainty about the reliability of that recommendation has been achieved, the next step is to decide whether it is worth adopting it as is, modifying it or disregarding it on a case-by-case basis. The evidence to decision frameworks of the DECIDE project (http://www.decide-collaboration.eu/) may be helpful for this task.
Shared decision: The careful critical assessment of any recommendation leading to this analysis should not be used for solitary decisions but has to be communicated effectively and explicitly with the patient for a shared decision-making process.
Conclusions
Evidence-based guidelines are a valuable tool in many cases for most physicians willing to adopt the best biomedical knowledge available to improve patients' care. In the last years, attention has been paid to translate accurately the general purpose of guidelines to the individual patients' needs and specific frameworks have been developed for making this process explicit, reasonable and consistent with the available evidence.17
Despite their evidence-based quality mark, however, even contemporary guidelines may be flawed, not only for internal biases but also for the wider ‘corruption’ and waste of the biomedical research under the pressure of commercial interests, excess medicalisation, overtreatment and defensive medicine.
Dealing with flawed guidelines is not rare and we need to assess their trustworthiness more firmly. Unfortunately, the process which leads to the right decision is not straightforward, because of the high number of flawed guidelines. We suggest the adoption of a guideline user ‘safety bundle’ which considers all of the intrinsic and extrinsic reasons of guideline untrustworthiness, to make an informed and cautious use of guidelines before adopting (and/or adapting) them with confidence. We also strongly encourage the use of effective and explicit communication with the patient, to foster a comprehensive shared decision-making process.18 ,19
Beside the individual patient level, untrustworthy guidelines raise serious concerns at the level of teamwork practice, design and implementation of care pathways as well as in defining correctly the research priority agenda, not to mention their medicolegal implications. Eventually, discovering a ‘suspicious’ recommendation may be a valuable opportunity for professional and scientific growth, and a discussion about a valuable opportunity for reflecting on care pathway production.
Moreover, in an ideal and healthy scientific community, finding an untrustworthy recommendation should elicit an indepth, unconflicted reassessment of that guideline on an urgent basis, considering the huge potential negative impact on patient's safety, the harms and wastes from a wider societal perspective and the devastating consequences derived from the assumption of false equipoise based on flawed guidelines. Furthermore, there is the potential to create a serious obstacle to otherwise necessary research.20
The scientific organisation producing that suspicious guideline should be notified and medicolegal value of that guideline downgraded. Peer-reviewed journals should be more involved in this critical appraisal process. If concerns are justified, adequate explanations of the ‘critical incident’ should be offered by guideline sponsors and expert panelists, methods of guideline making process eventually reassessed and that recommendation amended expeditiously.
As the editor of Lancet Richard Horton said, guidelines (should) ‘force us to scrutinise primary research literature in ways that we don't normally do’. We agree. That is exactly the opposite of using them as heuristic tricks to respond quickly to complex clinical questions needing a slow, rather than fast, decision-making process.21
Despite the optimistic expectations announced more than two decades ago, primary research and its ‘evidence based’ synthesis offered by guidelines can be unreliable, flawed and untrustworthy. However, thanks to evidence-based methods, such problems have been highlighted and a ‘methodological immunisation’22 from wrong guidelines and ‘bad research’23 offered as a solution.
Meanwhile, a critical appraisal of the primary evidence the guidelines refer to and considering to what extent they are trustworthy and relevant for the specific clinical context remain the sole and ultimate way of protecting our patients from the risks and harms caused by wrong guidelines.
References
Footnotes
Twitter Follow Primiano Iannone @primianoiannone
Contributors PI contributed to conception and design of the paper, stewardship of the discussion within the panel of the authors for the production of the safety bundle, critical assessment of the literature about guideline methodology and appraisal, selection of case studies cited; draft of the manuscript. NM, MM and AC made substantial contributions to the acquisition, analysis and interpretation of current methodology of guidelines appraisal; revision of the manuscript for important intellectual content; final approval of the version published. JD, GC and GMP made substantial contributions to the analysis, interpretation of relevant literature considered and of the case studies to be tested with the safety bundle; revision of the manuscript for important intellectual content; final approval of the version published. All of the authors agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Competing interests None declared.
Provenance and peer review Not commissioned; internally peer reviewed.