Article Text

PDF

Impact of a GRADE-based medical question answering system on physician behaviour: a randomised controlled trial
  1. Ariel Izcovich,
  2. Juan Martín Criniti,
  3. Juan Ignacio Ruiz,
  4. Hugo Norberto Catalano
  1. Internal Medicine Department, Hospital Alemán de Buenos Aires, Buenos Aires, Argentina
  1. Correspondence to : Dr Ariel Izcovich
    , Internal Medicine Department, Hospital Alemán de Buenos Aires, Pueyrredon 1640, Buenos Aires 1118, Argentina; ariel.izcovich{at}gmail.com

Extract

Physicians are frequently faced with questions related to their patients’ care that they cannot answer. A vast number of randomised trials have tested a wide variety of behaviour-changing strategies designed to improve practitioners’ evidence utilisation, but systematic reviews have concluded that the effects are generally small and inconsistent. We conducted a randomised controlled trial to determine whether a question identification and solving system, using structured evidence summaries with recommendations, would change physician's behaviour related to the care of their hospitalised patients. The trial was conducted at the secondary level, internal medicine ward. Relevant clinical questions were the units of randomisation; 14 clinicians participated in the study. The question identification and answering system was carried out using evidence summaries with recommendations based on the Grades of Recommendation, Assessment, Development and Evaluation (GRADE) approach stressing influence on clinician behaviour (decision/recommendation concordance). During 131 morning reports, 553 questions were identified (4.2 questions per meeting). 398 were excluded because they were not about diagnostic or therapeutic interventions or because their answers could not have impact on clinician behaviour, and 31 were excluded because of lack of time to answer them, leaving 124 included questions. The proportion of clinical decisions concordant with the proposed recommendations was 79% in the intervention arm and 44% in the control arm: relative risk 1.8 (95% CI 1.3 to 2.4), number of evidence summaries needed to change a care decision for one question raised was 3 (95% CI 2 to 6). A question identification and answering system was feasible, effectively performed and significantly influenced clinician behaviour related to the care of hospitalised patients, which suggests that interventions facilitating accessibility and interpretability of the best available evidence at the point of care have the potential to significantly impact on the quality of healthcare.

  • EDUCATION & TRAINING (see Medical Education & Training)
  • MEDICAL EDUCATION & TRAINING

Statistics from Altmetric.com

Introduction

Physicians are frequently confronted with questions related to their patients’ care1 that they cannot answer.2 ,3 Several reasons for this behaviour have been identified and include limited time to perform the research, lack of training in critical appraisal of the information found and low expectations for finding a relevant and direct answer to questions.4 ,5 Improving this situation would mean a step towards achieving clinical practice that is consistent with the best evidence. A vast number of randomised trials have tested a wide variety of behaviour-change strategies designed to improve practitioners’ evidence utilisation, but systematic reviews have concluded that the effects are generally small and inconsistent.6 ,7 Initiatives aimed to allow physicians to have access to up-to-date, evidence-based treatment and management recommendations at the point of care are being developed.8 Another intuitively appealing way to achieve such evidence-based practice (EBP) is to provide a service of question identification and answering. Such services have shown to be able to provide satisfactory answers,9 change physician's information-seeking behaviour10 and increase general practitioner adherence to prescribing discharge medications.11 We conducted a randomised controlled trial (RCT) to determine if a question identification and solving system using structured evidence summaries with recommendations would influence physician behaviour related to the care of their hospitalised patients.

Methods

Medical problem-solving system

The study was conducted on the Internal Medicine Service of a German hospital in Buenos Aires, Argentina, between June 2012 and July 2013, and was approved by the institutional ethics committee (April 2012, ref nr 115). The context in which this study was carried out was described in a previous publication.12 During the course of the study, a physician specialised in internal medicine, who was trained and skilled in evidence-based decision-making, and whose work was funded by the Internal Medicine Service, identified the medical questions that arose during morning reports in which all the clinicians (staff and residents) in the Internal Medicine Service meet up to discuss cases. Such questions were either explicitly formulated by staff or resident physicians, or inferred. They were collected using the PICOT structure (Population/Problem, Intervention, Comparison, Outcome, Type of design that would answer the question)13 to gather key words for a literature search. In order to focus on those questions whose answers could potentially change the clinician's course of action, we excluded those that fulfilled one of the following criteria: (1) answered immediately by someone who was present in the session, frequently, using electronic resources such as UpToDate, (2) not related to therapeutic or diagnostic intervention, (3) about past decisions (eg, in the case of a patient with stroke for whom an MRI was performed: would it not have been better to perform a CT scan?). The same physician who collected the questions also subjectively categorised them based on the probability of significant clinical impact on patient important outcomes, and attempted to answer all of them, one by one, prioritising those he considered more relevant. The literature search was carried out following an ‘umbrella’ strategy in which the first step was the identification of clinical practice guidelines and systematic reviews,14 and the second step was the identification of individual RCTs or observations when considered relevant (ie, RCT published after the publication of the identified systematic reviews; figure 1). The search started once the morning report was over and lasted a maximum of 2 h. Using the identified publications, the same clinician analysed all the critical and important outcomes (relative and absolute effects and their confidence levels), judged the benefits/risks balance and its confidence level and translated this information into a recommendation following the Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) approach15 ,16 (see online supplementary appendix 1). Finally, the clinician constructed short, structured evidence summaries that contained a brief description of the case that motivated the question, a summary of the evidence gathered, an interpretation of the results (description of best estimates of effect and confidence)17 and the recommendation.

Figure 1

Information searching strategy.

Study participants

All the Internal Medicine Service attending physicians (six staff and eight residents) participated and were blinded to the purpose and methods of the study until it was over. Although the authors were members of the service during the realisation of the study, they did not attend or made decisions related to patients’ care, and were hence not considered as study participants. The process of the medical question identification and answering described is the usual practice on the service since 2009, which made this possible. All the participating physicians had some degree of training in evidence-based problem solving.

Study design

After the responsible physician answered the questions as described, they were randomised to two arms, intervention and control. Randomisation was performed using a computer pregenerated random number list and was stratified by strength of recommendation. A member of the service that usually performs administrative tasks, and who was blinded to the study’s objectives, was in charge of group assignment. She received all the evidence summaries and delivered (hardcopy and email), to the clinicians in charge, those constructed in response to the questions assigned to the intervention arm. She archived the summaries written in response to questions assigned to the control arm (figure 2). Two clinicians, blinded to arm assignment, checked patient electronic medical records and, when necessary, performed telephonic interviews with patients in order to assess the following outcomes: (1) decision/recommendation concordance (primary outcome) defined as the proportion of medical decisions that were coherent with the proposed recommendations, (2) in hospital mortality, (3) transfer to an intensive care unit (ICU), (4) 6-month mortality, (5) 6-month rehospitalisation. We performed a prespecified subgroup analysis of strong versus weak recommendations (see online supplementary appendix 1).

Sample size, power calculation and statistical analysis

Assuming that without the intervention, half of the clinicians’ decisions would be in line with the proposed recommendations, and considering as clinically significant an absolute increase in the compliance of 25% and an α error of 0.05, a sample size of 116 questions in total would guarantee 80% power. Calculations were performed using G*power 3 software (http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/).

Dichotomous variables were assessed using relative and absolute risks, by χ2 tests. Length of hospital stay was assessed using non-parametric tests. Tests of significance were two-tailed, and a p value of less than 0.05 or a 95% CI excluding 1 were considered significant. Calculations were performed using Epi Info V.7.1.4.0. We calculated the number of evidence summaries needed to change a care decision for one question raised as the inverse of the risk difference and its CI. In cases where multiple questions related to the care of the same patient were identified, we decided to include all of them in the primary outcome analysis, as we consider that clinician’s decisions related to every identified question are independent from one another even when related to the same patient. We also performed a sensitivity analysis in which we considered one question (the first that arose) per patient. In cases where multiple questions related to the care of the same patient were assigned to different arms, we decided to analyse that patient's clinical outcomes (secondary outcomes) as if assigned to the intervention arm.

Results

During 131 morning reports, 553 questions (4.2 questions per meeting) were identified. 398 were excluded because they fulfilled exclusion criteria and 31 were excluded because of lack of time to perform the literature search. In total, 124 questions (62 randomised to each arm), related to the care of 120 inpatients (60 in each group), were included (figure 3) and answered. All the evidence summaries constructed in response to the questions assigned to the intervention arm were effectively delivered to the corresponding treating physicians between 2 and 4 h after their identification. The quality of the evidence found in response to the included questions was judged as very low 32 (26%), low 56 (46%), moderate 32 (25%) and high 4 (3%), and the recommendations were strong for 44 (35%) questions and weak for 80 (65%). Twenty-five (57%) of the strong recommendations were based on moderate or high quality of evidence and 19 (43%) on very low or low quality of evidence (13 based on GRADE paradigmatic situation 1, 2 based on GRADE paradigmatic situation 2 and 4 based on GRADE paradigmatic situation 3)16 (see online supplementary appendix 2). Baseline question and patients’ characteristics were balanced (table 1). Primary outcome (decision/recommendation concordance) occurred in 76 (61.2%) of the included questions, 49 in the intervention arm and 27 in the control arm, relative risk (RR) 1.8 (95% CI 1.3 to 2.4), risk difference 35.4% (95% CI 19.5% to 51.4%), number of evidence summaries needed to change a care decision for one question raised 3 (95% CI 2 to 5; table 2). Sensitivity analysis showed that results were robust if the primary outcome was calculated considering only the first question related to every patient (excluding four extra questions). Subgroup analysis showed that provision of evidence summaries for strong and weak recommendations both have significant impact on recommendation compliance, RR 1.9 (95% CI 1.1 to 2.6) and 1.7 (95% CI 1.1 to 2.6), respectively (table 3). No significant differences were observed in any of the evaluated secondary outcomes (table 2).

Table 1

Baseline characteristics

Table 2

Results

Table 3

Subgroup analysis

Figure 3

Group assignment and questions identified/answered.

Discussion

Ubbink et al18 describe that “evidence-based practice (EBP) provides a structure for the bedside use of research and consideration of patient values, and preferences to optimise clinical decision-making and to improve patient care”. Although the potential benefits of EBP are widely accepted, research consistently shows that there is an important gap between evidence and practice.19 Some of the identified barriers responsible for this gap include lack of time, resources and/or training.18 Multiple interventions designed to overcome this gap have been tested with different success rates.20 These interventions can be classified in three groups of complementary approaches: interventions focused in training clinicians to independently find, appraise and apply the best evidence (ie, training programmes, courses); interventions focused on providing the best available evidence at the point of care in a friendly and usable format (ie, producing, disseminating and updating trustworthy guidelines); and interventions that directly address behaviour change (ie, reminders, computerised decision support).21 In this trial, we evaluated an approach that combines the latter two by providing tailored evidence summaries in response to questions that arose during the care of inpatients. The results show that such an approach was able to significantly and frequently influence physicians’ decisions related to their patient care.

Based on the observed results compared with other behavioural changing strategies, the proposed intervention could be categorised as highly effective.7 ,22–24 Although we did not observe a significant impact on clinically important outcomes, this was expected in the context of an intervention intended to improve quality of care through affecting physician behaviour, hence improving evidence-based decision-making.25 Impact on clinically important outcomes could be very difficult to prove (huge sample sizes needed, low signal-to-noise ratio) in this context.26 ,27 Thus, process measures such as physician behaviour are considered valid surrogate measures when there is solid evidence of their relation to those clinical important outcomes.28 Although the intervention in our study included a wide range of recommendations, most of which were based on low-quality evidence, meaning that their impact on clinical important outcomes is dubious, one could argue that improving evidence-based decision-making is a measure of quality by itself.29 Hence if the proposed intervention were able to provide trustworthy recommendations based on the best available evidence, the observed results could be interpreted as a valid surrogate to quality of care improvement. Unfortunately, this cannot be asserted because the evaluated system has not been validated yet, which we assume as the main weakness of our study.

Subgroup analysis showed that strong and weak recommendations both had significant impact on clinicians’ decisions, suggesting that they used the provided information as much in the scenario in which the observed results showed that the intervention produced clear benefits or harms, as when they suggested that benefits and risks were balanced, or when there was low/very low quality of evidence or uncertainty about values and preferences. These findings are consistent with other RCTs in which physicians’ hypothetical course of action (evaluated with surveys) was influenced by strong as well as weak recommendations.30 ,31

Other studies have evaluated interventions based on evidence facilitating services to improve quality of care; most of them have observational designs and used surveys to measure their impact on outcomes.32–35 Their results are inconsistent but suggest possible benefits. Two RCTs informed benefits in terms of physician reported attitude towards searching for information and satisfaction.9 ,10 An RCT suggested positive impact through evaluating physicians reporting changes in their behaviour.36 Another RCT, which evaluated the impact of inserting an evidence statement into hospital discharge letters, showed a significant increase in general practitioner adherence to discharge medications.11 Our study proposes some novel aspects that could be seen as strengths in the light of the existing evidence. The intervention proposed is based on an evidence analysis system (GRADE approach) that is widely accepted, and has proven to be reproducible and more reliable than intuitive judgement.37 We included recommendations in the evidence summaries, which have shown to be valuable for clinicians, especially in the context of low or very low quality of evidence, and possibly influence their course of action.31 The RCT design provides the optimal strategy to evaluate quality-improving interventions with the lowest risk of bias.38 We objectively measured physician behaviour by addressing the proportion of decisions that were coherent with the proposed recommendations in each arm instead of using surveys, which are prone to recall bias.

The main aspects that threaten the confidence and applicability of the observed result are: (1) the reported findings could significantly differ if the proposed interventions were implemented in different contexts (eg, physicians not trained in evidence-based decision-making could be more reticent to use the provided evidence compared with those who participated in the present trial); (2) indirectness of outcome,39 as mentioned: physician behaviour is a surrogate for important clinical outcomes, the validity of which could be questioned.

Conclusion

A question identification and answering system was feasible, effectively performed and significantly influenced clinicians’ behaviour related to the care of hospitalised patients. The provided information was useful when it resulted in strong as well as in weak recommendations. These findings suggest that interventions that facilitate accessibility and interpretability of the best available evidence at the point of care have the potential to significantly impact on the quality of healthcare.

References

View Abstract
  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

Footnotes

  • Twitter Follow Ariel Izcovich at @IzcovichA

  • Competing interests None.

  • Ethics approval The German Hospital ethics committee.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.