Article Text

General medicine
Which actionable statements qualify as good practice statements In Covid-19 guidelines? A systematic appraisal
  1. Omar Dewidar1,2,
  2. Tamara Lotfi3,4,5,
  3. Miranda Langendam6,
  4. Elena Parmelli7,
  5. Zuleika Saz Parkinson8,
  6. Karla Solo3,4,5,
  7. Derek K Chu3,9,
  8. Joseph L Mathew10,
  9. Elie A Akl3,11,
  10. Romina Brignardello-Petersen3,4,
  11. Reem A Mustafa3,12,
  12. Lorenzo Moja13,
  13. Alfonso Iorio3,4,9,
  14. Yuan Chi14,15,
  15. Carlos Canelo-Aybar16,
  16. Tamara Kredo17,18,
  17. Justine Karpusheff19,
  18. Alexis F Turgeon20,21,
  19. Pablo Alonso-Coello16,
  20. Wojtek Wiercioch3,4,5,
  21. Annette Gerritsen17,
  22. Miloslav Klugar22,
  23. María Ximena Rojas23,
  24. Peter Tugwell24,25,
  25. Vivian Andrea Welch1,2,
  26. Kevin Pottie26,
  27. Zachary Munn27,
  28. Robby Nieuwlaat3,
  29. Nathan Ford28,
  30. Adrienne Stevens3,4,
  31. Joanne Khabsa29,
  32. Zil Nasir3,4,
  33. Grigorios I Leontiadis4,9,
  34. Joerg J Meerpohl30,31,
  35. Thomas Piggott3,4,5,
  36. Amir Qaseem32,
  37. Micayla Matthews3,4,5,
  38. Holger J Schünemann3,4,5,9,33,34
  39. the eCOVID-19 recommendations map collaborators
  1. 1 Methods Centre, Bruyère Research Institute, Ottawa, Ontario, Canada
  2. 2 School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario, Canada
  3. 3 Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario, Canada
  4. 4 Michael G DeGroote Cochrane Canada and McMaster GRADE Centres, McMaster University, Hamilton, Ontario, Canada
  5. 5 WHO Collaborating Center for Infectious Diseases, Research Methods and Recommendations, McMaster University, Hamilton, Ontario, Canada
  6. 6 Amsterdam University Medical Centers, University of Amsterdam, Amsterdam, Netherlands
  7. 7 Joint Research Centre, European Commission, Ispra, Italy
  8. 8 Instituto de Salud Carlos III, Agencia de Evaluación de Tecnologías Sanitarias, Madrid, Spain
  9. 9 Department of Medicine, McMaster University, Hamilton, Ontario, Canada
  10. 10 Department of Pediatrics, Post Graduate Institute of Medical Education and Research, Chandigarh, India
  11. 11 Department of Internal Medicine, American University of Beirut, Beirut, Lebanon
  12. 12 Internal Medicine, Division of Nephrology and Hypertension, University of Kansas School of Medicine, Kansas City, Kansas, USA
  13. 13 Department of Health Product Policy and Standards, World Health Organization, Geneve, Switzerland
  14. 14 Yealth Network, Beijing Yealth Technology Co., Ltd, Beijing, China
  15. 15 Cochrane Campbell Global Ageing Partnership, London, UK
  16. 16 Iberoamerican Cochrane Center, Biomedical Research Institute Sant Pau-CIBERESP, Barcelona, Spain
  17. 17 Cochrane South Africa, South African Medical Research Council, Cape Town, Western Cape, South Africa
  18. 18 Clinical Pharmacology, Department of Medicine, Stellenbosch University, Stellenbosch, Western Cape, South Africa
  19. 19 National Institute for Health and Care Excellence, London, UK
  20. 20 Centre de Recherche du Centre Hospitalier Affilié Universitaire de Québec (CHA), CHA-Hôpital de l'Enfant-Jésus, Université Laval, Quebec, Quebec, Canada
  21. 21 Department of Anesthesiology and Critical Care Medicine, Université Laval, Québec City, Québec, Canada
  22. 22 Czech National Centre for Evidence-Based Healthcare and Knowledge Translation, Institute of Biostatistics and Analyses, Masaryk University, Brno, Czech Republic
  23. 23 Department of Clinical Epidemiology and Public Health, Institut d’Investigació Biomèdica Sant Pau IIB Sant Pau, Barcelona, Spain
  24. 24 Department of Medicine, University of Ottawa Faculty of Medicine, Ottawa, Ontario, Canada
  25. 25 Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada
  26. 26 Deparatment of Family Medicine, University of Ottawa, Ottawa, Ontario, Canada
  27. 27 Joanna Briggs Institute, University of Adelaide, Adelaide, South Australia, Australia
  28. 28 Department of HIV, Hepatitis and Sexually Transmitted Infections, World Health Organization, Geneva, Switzerland
  29. 29 Clinical Research Institute, American University of Beirut, Beirut, Lebanon
  30. 30 Cochrane Germany, Cochrane Germany Foundation, Freiburg, Germany
  31. 31 Medical Center - University of Freiburg, Institute for Evidence in Medicine, Freiburg, Germany
  32. 32 American College of Physicians, Philadelphia, Pennsylvania, USA
  33. 33 Department of Biomedical Sciences, Humanitas University, Milan, Italy
  34. 34 Cochrane Canada, Hamilton, Ontario, Canada
  1. Correspondence to Prof Holger J Schünemann, McMaster University Department of Health Research Methods Evidence and Impact, Hamilton, Canada; schuneh{at}mcmaster.ca

Abstract

Objectives To evaluate the development and quality of actionable statements that qualify as good practice statements (GPS) reported in COVID-19 guidelines.

Design and setting Systematic review . We searched MEDLINE, MedSci, China National Knowledge Infrastructure (CNKI), databases of Grading of Recommendations Assessment, Development and Evaluation (GRADE) Guidelines, NICE, WHO and Guidelines International Network (GIN) from March 2020 to September 2021. We included original or adapted recommendations addressing any COVID-19 topic.

Main outcome measures We used GRADE Working Group criteria for assessing the appropriateness of issuing a GPS: (1) clear and actionable; (2) rationale necessitating the message for healthcare practice; (3) practicality of systematically searching for evidence; (4) likely net positive consequences from implementing the GPS and (5) clear link to the indirect evidence. We assessed guideline quality using the Appraisal of Guidelines for Research and Evaluation II tool.

Results 253 guidelines from 44 professional societies issued 3726 actionable statements. We classified 2375 (64%) as GPS; of which 27 (1%) were labelled as GPS by guideline developers. 5 (19%) were labelled as GPS by their authors but did not meet GPS criteria. Of the 2375 GPS, 85% were clear and actionable; 59% provided a rationale necessitating the message for healthcare practice, 24% reported the net positive consequences from implementing the GPS. Systematic collection of evidence was deemed impractical for 13% of the GPS, and 39% explained the chain of indirect evidence supporting GPS development. 173/2375 (7.3%) statements explicitly satisfied all five criteria. The guidelines’ overall quality was poor regardless of the appropriateness of GPS development and labelling.

Conclusions Statements that qualify as GPS are common in COVID-19 guidelines but are characterised by unclear designation and development processes, and methodological weaknesses.

  • COVID-19
  • Evidence-Based Practice
  • Health Services Research

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information. All relevant data included.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Summary box

What is already known about this subject?

  • Good practice statements (GPS) (ie, actionable statements about interventions that would do substantially more good than harm or vice versa) do not qualify for rating the certainty of evidence, but are important statements in guidelines. The GRADE Working Group developed five criteria to assess the appropriateness of issuing a GPS.

Summary box

What are the new findings?

  • Statements that qualify as GPS constitute more than half of the actionable statements in COVID-19 guidelines; there was rarely any appropriate labelling and a lack of transparency in the rationale for their development.

How might it impact clinical practice in the foreseeable future?

  • We provide a structured framework for GPS evaluation. Utilisation of this framework by researchers will help monitor the progress around GPS development and evaluate potential barriers slowing the uptake of available guidance by guideline developers.

Introduction

Several formal approaches have emerged to structure the process of developing health recommendations in guidelines.1 Within guidelines, there are a variety of actionable statements for application by clinicians, consumers and other stakeholders.2 These actionable statement can be further broken down into the categories of formal recommendations, informal recommendations and good practice statements (GPSs). Formal recommendations use the best available evidence and should be developed based on transparent and trustworthy methods.3–6 Such recommendations are the central aim of guideline development. Informal recommendations resemble formal recommendations but they lack reporting or use of rigorous guideline development methods. GPSs, sometimes referred to as best practice statements, form a separate category of actionable statements that are considered important to issue for healthcare practice.2 GPSs differ from formal and informal recommendations as they are not typically based on systematic reviews of the evidence and do not include a rating of the certainty of evidence using approaches such as Grading of Recommendations Assessment, Development and Evaluation (GRADE).7 8 The GRADE approach is the most widely used tool for guideline developers to assess the certainty in effect estimates and subsequently translating the evidence into recommendations using a standardised and transparent evidence to decision framework.7 9 10

Due to the lack of international consensus guidance for GPS development and reporting, they are commonly confused with other GRADEd recommendations. For example, GPSs are frequently reported as strong recommendations with low or very low-quality evidence.11–13 To clarify this confusion, GRADE proposed the following five criteria to assess the appropriateness of issuing a recommendation as a GPS and differentiate them from GRADEd recommendations8 : (1) statement is clear and actionable, (2) message is necessary regarding healthcare practice, (3) implementation of the statement likely to result in large net positive consequences, (4) summarisation of evidence would be poor use of guideline panel’s time and (5) the rationale connecting the indirect evidence used to support the statement is clear and explicit.

The prevalence and quality of GPS in guideline documents has not been empirically evaluated, particularly during the current COVID-19 pandemic where healthcare professionals, scientific societies and government agencies invested a substantial amount of time and resources in developing clinical practice guidelines to reduce information gaps and improve patient outcomes. Furthermore, the application of the GRADE criteria for GPS have neither been operationalised as guidance for those evaluating guidelines nor for developers of GPS. During the development of the global living map of COVID-19 recommendations and portal for contextualisation (eCOVID-19RecMap)14 15 (https://COVID-19.recmap.org), we identified and evaluated GPS for their appropriateness for development to inform clinical practice.

Methods

Search

We systematically searched MEDLINE (PubMed) from 1 March 2020 to 24 September 2021 using a search string: ((practice guideline[PT]) OR (practice guidelines as topic*[MH])) NOT (comment[pt] or editorial[pt] or letter[pt] or interview[pt] or case reports[pt] or news[pt]), with no restrictions on the language of publication, as part of work to build the eCOVID-19RecMap.15 We searched ECRI Clinical Guidelines, International Database of GRADE Guidelines (BIGG database), National Institute for Health and Care Excellence (NICE), the World Health Organization (WHO), Centers for Disease Control and Prevention (US CDC) and Guidelines International Network (GIN)’s libraries using an automated web scraping approach via Application Process Interfaces (API). We also manually searched MedSci and China National Knowledge Infrastructure (CNKI) databases to identify Chinese guidelines.

Additionally, we manually searched websites of the following guideline organisations: Public Health Agency of Canada (PHAC), Scottish Intercollegiate Guidelines Network (SIGN), Canadian Task Force on Preventive Health Care (CTFPHC), European Centres for Disease Control and Prevention (ECDC). We also contacted guidelines developers of all the above organisations to keep us apprised of any new or updated guidelines.15

Identifying COVID-19 guidelines

We included guidelines eligible for the eCOVID-19RecMap with the most recent guideline uploaded on 24 September 2021. These guidelines reported original or adapted recommendations and were consistent with the WHO definition of practice guidelines while addressing any topic regarding patients at risk for or infected with COVID-19.16 Online supplemental table S1 describes the definition in detail. We selected guidelines for the eCOVID-19RecMap based on a prioritisation process developed within the eCOVID-19RecMap executive research team (https://COVID-19.recmap.org/about). A topic is a priority if it satisfies one of the following in COVID-19 context1: arises commonly in practice,2 uncertainty in practice,3 new evidence to consider,4 existence of variations in practice,5 important consequences for high resource use/cost,6 not adequately addressed in existing guidelines.17 The priority list was refined weekly according to the climate of the pandemic at the current point in time.

Supplemental material

We did not restrict guideline eligibility by population group, organisation, country, guideline quality or language. However, we only extracted and evaluated non-English guidelines that could be translated to English by members of our multinational team. For guidelines with more than one version, we evaluated the most recent update. Guideline eligibility was determined by two researchers independently, with consensus or arbitration for a final decision if needed.

Identifying actionable statements that qualify as GPS

We identified actionable statements from the included guidelines using the framework proposed by Lotfi et al.2 In brief, statements that are actionable in isolation with an expected large net benefit, not GRADEd for strength or the certainty of evidence or accompanied by a citation for supporting evidence and the alternative of the stated statement were judged as illogical or did not conform with ethical norms were qualified as GPS.2 Additionally, researchers extracted statements in the guidelines labelled as best practice or GPSs. We used this approach to identify GPS because there is no universally accepted approach for presenting GPS in guidelines and they are often inconsistently labelled.13 18 Two researchers extracted the statements and experts in guideline development reviewed them as a quality control step. In addition, we extracted the source, topic (eg, infection prevention and control, vaccination) and intended user and applicable context of each guideline.

Evaluating GPS

We compared the appropriateness of issuing the GPS labelled by guideline developers with statements that qualified as GPS using the five GRADE criteria in table 1.8 We piloted a form using answer options of ‘yes’, ‘probably yes’, ‘probably no’ and ‘no’ and developed instructions for how to use the form (online supplemental figure S1). Trained methodologists held weekly meetings to optimise these judgements by discussing examples from guidelines. We used the following approach for the judgements: researchers selected ‘yes’ and ‘no’ answers when information supporting or opposing the qualification of the statement as GPS, respectively, was explicit in the guideline (any primary document or supplements). We selected ‘Probably yes’ and ‘Probably no’ when the information supporting or opposing the qualification of the statement as GPS was implicit, respectively. For the statement to fulfil the GPS criteria, all the criteria ii–v must be answered ‘probably yes’ or ‘yes’. We did not include criterion i as part of the assessment for appropriates of issuing the statement as GPS since it is a requirement for any recommendation.8 Online supplemental table S3 presents examples of GPS. We then iteratively developed the explanations and signalling questions in table 2 and reordered the original GRADE criteria for the purpose of critical appraisal of GPS. We conducted all the evaluations in duplicate, and an expert in guideline development validated them. We resolved disagreements by consensus in weekly group discussions.

Supplemental material

Figure 1

PRISMA chart for guidelines eligible for the eCOVID-19RecMap. BIGG, International Database of Grade Guidelines; CCITC, Changes of Care in Times of COVID-19; CDC: Centers for Disease Control and Prevention; ECDC, European Centres for Disease Control and Prevention; GIN, Guidelines International Network, NICE, National Institute for Health and Care Excellence; PHAC, Public Health Agency of Canada; SIGN, Scottish Intercollegiate Guidelines Network; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Table 1

GRADE criteria for evaluating GPS modified from reference8*

Table 2

Characteristics of included guidelines and good practice statements

Guidelines quality appraisal

To evaluate if the guidelines were developed with rigorous methods, we critically appraised their development process using the Appraisal of Guidelines for Research and Evaluation (AGREE) II tool for three out of six domains that were deemed important for guideline credibility: scope and purpose, rigour of development and editorial independence.19 The other AGREE domains (stakeholder involvement, clarity of presentation domain and applicability) were not included in the evaluation as they are not as critical for determining the overall quality of the guideline. Two researchers independently conducted the evaluations of the guidelines and a guideline development expert subsequently reviewed them. The scores of each domain item were assessed on a seven-point scale; 0% if each reviewer scored a 1 (minimum value) and 100% for a score of 7 (maximum value) by both reviewers. We identified discrepancies when a difference of 3 points or more per item between the reviewers was found. We resolved these discrepancies by consensus or a third reviewer. The final score per item was calculated as the average of scores between reviewers after resolution of discrepancies if any. We extracted the information from the guidelines into the GRADEpro (www.gradepro.org) app through a new module that allows the creation of GPS. We then included the GPS in the RecMap (https://covid19.recmap.org/recommendations?recommendationFormality=gps).

Patient and public involvement statement

We partnered with public representatives from the Cochrane Consumer network in the development and conduct of the eCOVID-19RecMap project. The representatives participated in weekly calls of the project executive team where this project was reviewed for relevence of content and provided contextual feedback. The representatives were not involved in the extraction and evaluation of the GPS. The larger eCOVID-19RecMap investigator team also reviewed the design and conduct of the project and provided feedback accordingly.

Statistical analysis

Characteristics of the included guidelines and judgements for each of the GPS evaluation criteria were summarised as percentages. Univariate ORs were used to examine the association between guideline and statement characteristics with issuing of GPS. AGREE II scores were calculated according to the AGREE II manual and reported using the median and IQR. All analyses and figures were conducted with R V.4.1.1 software. GPS evaluation and AGREE II scores were stratified by labelling of GPS by guideline developers.

Results

Characteristics of eligible guidelines

We identified 4533 records through PUBMED, MedSci, handsearching and 11 guideline databases and websites. We excluded 1401 (31%) guidelines after deduplication and title screening, and a further 700 (25%) after screening at full text. Of the identified COVID-19 guidelines, 412 were related to care in the context of COVID-19 and 1746 pertained directly to COVID-19. The guidelines pertaining directly to COVID-19 were eligible for publishing on the eCOVID-19RecMap. Of those guidelines, 253 were extracted and evaluated since the formal launch in November 2020 to September 2021 (figure 1). We identified 2375 of 3726 (64%) statements that qualified as GPS in 200 of 253 (79%) guidelines included on the eCOVID-19RecMap (online supplemental table S2). Those 200 guidelines were included in our analysis. On average, 82% of the statements per guideline (range from 2% to 100%) qualified as GPS.

Characteristics of GPs

Table 3 shows that 64% of the guidelines were published by WHO and 13% by the CDC. One hundred and sixty (80%) guidelines were in the field of public health and 50% were produced for global use. Forty per cent of the GPS provided guidance on infection control while the remaining were on a variety of topics including vaccination, planning and monitoring health services, screening, diagnosis and treatment. The GPS targeted a range of users: 38% were nominally intended for healthcare providers and professionals and 36% targeted public health officials. The remaining GPSs were intended to be used by individuals outside the healthcare setting, patients, caregivers and the public. One guideline was translated from French to English while the remaining guidelines were published in English.

Table 3

Improving the good practice statement evaluation framework

Issuing GPS according to guideline characteristics and statement topic

Figure 2 presents the associations between issuing GPS based on the guideline organisation, field, region, and recommendation topic. Guidelines published in the field of clinical practice were less likely to publish statements that qualify as GPS as compared with formal/informal recommendations, while guidelines in health systems and public health were more likely. Guidelines published by WHO, CDC, PHAC, ECDC and SIGN were more likely to issue statements as GPS with varying strengths of association. GPS were more frequently issued in guidelines published for European-Central Asian use (OR 2.01, 95% C.I 1.54 to 2.62). In contrast, guidelines published for global and North American use were less likely to issue statements as GPS. Issuing GPS was more common in statements regarding infection control (OR 1.63, 95% C.I 1.37 to 1.93), planning and monitoring (OR 1.32, 95% C.I 1.03 to 1.71) and health services and systems (OR 3.05, 95% C.I 2.30 to 4.05). Statements considering diagnosis (OR 0.40, 95% C.I 0.27 to 0.61), treatment and rehabilitation (OR 0.16, 95% C.I 0.12 to 0.20) and screening (OR 0.32, 95% C.I 0.22 to 0.47) were less likely to be issued as GPS. Statements concerning vaccination were also associated with being issued as GPS (OR 1.24, 95% C.I 1.00 to 1.53).

Figure 2

Association of guideline and statement characteristics with issuing statements that qualify as good practice statements. Reference was issuing actionable statements other than good practice statements. Dashed line corresponds to univariate OR of 1.00. We were not able to evaluate associations for guideline regions: South Asia and East Asian Pacific and NICE guideline organisation with issuing good practice statements due to absence of other types of statements. CDC, Centers for Disease Control and Prevention; ECDC, European Centres for Disease Control and Prevention; GPS, good practice statement; NICE, National Institute for Health and Care Excellence; PHAC, Public Health Agency of Canada; SIGN, Scottish Intercollegiate Guideline Network.

Evaluation of development process of the GPS

Only 27/2375 (1%) of the identified statements that qualified as GPS were actually labelled as GPS by the guideline developers. Of those, 23/27 (85%) statements satisfied all the GPS criteria (ii–v) with implicit and explicit rationales for development. ‘Clear and actionable’ was judged as ‘yes’ in 89%, 2% were judged as ‘probably yes’ and 3.7% were judged as ‘probably no’ (figure 3). For the criterion ‘necessity of the message for healthcare’, 63% of the GPS were judged as ‘yes’ and 37% were judged as ‘probably yes’. Eleven per cent of those GPS were judged as ‘yes’ for the criterion relating to net positive consequences from implementing the statement, while 82% were judged as ‘probably yes’. For the criterion relating to usefulness of collection and summarisation of evidence, 4% of the GPS were judged as ‘yes’, 82% as ‘probably yes’ and 15% as ‘probably no’. Fifty-six per cent provided an explicit statement explaining the chain of indirect evidence supporting the development of the GPS and were judged as ‘yes’ for this criterion. Judgements ‘probably yes’ was assigned to 56% of the GPS for this criterion.

Figure 3

Distribution of judgements for good practice statement (GPS) criteria. Annotations correspond to percentage of statements with their respective judgement. GDG, guideline development group.

The reporting of implicit or explicit rationales supporting the development of statements that qualified as GPS (n=2348) was generally similar to those statements labelled as GPS by guideline developers. Of those, 2205/2348 (94%) statements satisfied all the GPS criteria (ii–v) with implicit and explicit rationales. Notable differences in proportion of statements supported with an explicit rationale were found for criteria ‘statement leads to large net positive consequence’ and ‘summarising evidence is a poor use of a guideline development group’s time’, with more frequent reporting for statements reported as GPS. In contrast, explicit rationales explaining the chain of indirect evidence supporting the development of the GPS was more common for statements not reported as GPS, compared with statements reported as GPS (56% vs 39%, respectively).

Quality of guidelines reporting GPS

The AGREE II evaluation of the six guidelines reporting statements labelled as GPS based on the three domains of interest showed that the overall quality of these guidelines was limited; none of the guidelines scored over 60% for all three domains. Figure 4 shows that the six guidelines with labelled GPS scored a median of 81% (IQR 64–85) in the domain ‘Scope and purpose’, but only 9.4%, (IQR, 8.3–27 for the domain ‘methodological rigour’ and 0% (IQR) 0–0) for the domain ‘editorial independence’. The 194 guidelines reporting statements that qualified as GPS scored similarly. Two of those guidelines scored over 60% for all three domains.

Figure 4

AGREE II assessment (three domains) of guidelines stratified by labelling of good practice statements by guideline developers. Guidelines containing statements labelled by guideline developers as GPS (n=6) and guidelines containing statements that qualify as GPS (n=194). The thickness of the plot represents the kernal density estimation to show the distribution shape of the data. The three lines represent the median and lower (25%) and upper (75%) quartiles based on density estimates. Wider sections of the plot represent a higher probability that guidelines will take on the given value; the slimmer sections represent a lower probability. AGREE, Appraisal of Guidelines for Research and Evaluation; GDG, guideline development group; GPS, good practice statement.

Discussion

Our evaluation of COVID-19 recommendations using a novel classification that anatomises guidelines into actionable statements2 shows that guideline developers include advice that frequently qualifies as GPS, (64% of our eligible statements of which 94% satisfied all the GPS criteria ii–v with implicit and explicit rationales) although developers rarely label them as GPS. Accordingly, the evaluation of GPS development processes proved challenging. Statements were more likely to be issued as GPS in European-Central Asia guidelines in the field of public health, specifically statements concerning infection control, planning and monitoring and health systems. We found only a few GPS that were supported by rationales for their development regardless of how the guideline developers labelled them. Overall, the quality of most guidelines including formal and informal recommendations was poor and, similar to GPS, the recommendations were often not supported by rationales for their development. Particularly, the reported editorial independence of the guidelines was very low, which could question their trustworthiness. Guidelines to overcome the COVID-19 pandemic would serve healthcare professionals and services better if included GPS were clearly identified and developed through an explicit process. If GPSs are not transparently reported by developers, it is likely that they can be misinterpreted. Thus, in the accompanying article20, we provide operationalised and structured implementation of GRADE guidance for the development of GPS. Our findings suggest that significant changes are needed in the way guideline developers conduct GPS development. The high prevalence of GPS may be explained by the uncertainty and rapid spread of COVID-19, leading to a lack of direct evidence and immediate need for guidance, reducing the rigour of the guideline development process.

Our evaluation shows that the most poorly described criteria were the net consequences of implementing the statement and the usefulness of summarising and collecting the evidence. For the former, many rationales are presumed to be ‘straightforward’ and based on general knowledge, hence guideline developers may have been reluctant to document this rationale for each statement. For example, in statements regarding infection control (approximately 50% of the statements), the interventions aim to prevent transmission. Although net consequences are not often stated, it is implicitly clear that new cases (and deaths) might be prevented. However, for the latter criterion, the judgement rests on the belief of a guideline panel that they have high confidence in the indirect evidence. A formal documentation is needed to ensure that these statements should truly be issued.

Strengths and limitations

The strengths of this study include the first systematic evaluation of a large sample of COVID-19 GPS irrespective of language, topic, publication source or date of development. We used criteria previously proposed by the GRADE Working Group for GPS but created explanations and signalling questions in addition to response options, which allowed us to differentiate between statements explicitly or implicitly supported by a proper rationale (table 2). All judgements were conducted in duplicate and reviewed by an expert in guideline development after developing guidance for this approach.

Our work has several limitations. First, we did not assess if statements GRADEd as low or very low certainty were GPS rather than formal recommendations. It has been shown that GPS are often incorrectly GRADEd,12 18 therefore, despite their abundance in COVID-19 guidelines, the actual proportion of GPS may be even higher. Second, despite the use of the most recent version of each guideline, this evaluation is limited by its cross-sectional nature. Temporal changes in the quality of GPS can be assessed in the future as more updated versions of guidelines and recommendations become available. Third, this is the first time this approach to identifying GPS is used and, despite face validity using established criteria8 and the rigorous methods applied (eg, duplicate judgements by extensively trained raters and validated by experts in guideline development), further validation is required. Fourth, our assessment depended on the completeness of reporting in the guidelines and not necessarily the guideline conduct or methods. Fifth, we acknowledge that the nature of the judgement is contingent on a judgement informed by the expertise and knowledge of the evaluator, which may have been variable. To increase confidence, all judgements were completed by two trained reviewers and verified by an expert in guideline development to validate the decisions methodologically. Our multidisciplinary team also includes content experts of various clinical knowledge who were engaged when needed.

Comparison with other work

Previous work reported that GPS are commonly issued in non-COVID-19 guidelines.12 18 A retrospective evaluation of discordant recommendations (low or very low confidence in the estimate of effect) in WHO guidelines identified 29 (18%) as GPS. Similarly, a study produced by the Endocrine Society found 43 (35.6%) of discordant statements were GPS, further indicating that GPS are prone to misjudgement.12 18 Our findings show that GPS are prevalent in guidelines and may be even more commonly used during public health emergencies. The COVID-19 crisis may have impacted developers’ ability and capacity to produce more rigorous guidance, forcing them to balance methodological rigour with speed.

Implications for guideline users and developers

First, our study shows that guideline developers should explicitly report the use of GPS in the guideline development process. When not explicitly labelled, two approaches using signalling questions on whether a GPS is justified for development were proposed in prior work.8 The first involves identifying that the alternative of the statement is absurd or does not conform with ethical norms. The phrasing of the statement may present a source of confusion when identifying the alternative. Hence, may be unreliable when identifying GPS. The second method involves acknowledging that the collection of high-certainty indirect evidence to review and support the statement would be a time-consuming process (criterion iv: summarisation of evidence would be poor use of guideline panel’s time). The latter method requires more expertise and familiarity with the field of the statement. In turn, users can assess if GPS were appropriately developed using our methodology.

Second, most of the guidelines were produced for global use but guidelines developed in regions other than high income countries (North America and Europe) were scarce. Thus, implementing the GPS in other settings, especially in low-income settings, may not be feasible. For example, GPS recommending increasing surveillance for farm workers and their close contacts or maintaining humidity level indoors between 30% and 50% is heavily dependent on resources and influenced by organisational aspects.

Third, adherence to our updated guidance for the operationalisation and implementation of GPS development20 may improve the transparency in the process of developing and reporting of GPS and help direct guideline developers’ resources and efforts to what is needed and avoid the inappropritate issuing of GPS. For example, the European Commission Initiative of Breast Cancer Guidelines on Breast Cancer Screening and Diagnosis21 reported their GPS in a supplementary document and provided detailed descriptions of the rationales supporting them.

Implications for research

We evaluated the GPS primarily through information provided in the guideline and judgement of the evaluators. Our evaluation of COVID-19 GPS using the previously published five criteria for GPS provided us with insight that improvements to the GPS framework are required to ensure reproducible and valid future evaluations of GPS. Our suggested framework for evaluating GPS builds on our incorporation of judgements with response options that we applied in our evaluation. We also provide a specific order, explanations and signalling questions for using the criteria for GPS evaluation (table 2). For example, the assessment if the statement is actionable and clear was placed at the end of the evaluation as it is not specific to GPS and does not impact on the appropriatness of the rationale for its development. Furthermore, it is not specific to GPS, but is relevant for all actionable statements. We found that using the criterion summarising evidence would be poor use of guideline panel’s time as the first criterion for the evaluation, helps with differentiating the GPS from other types of actionable statements although is sometimes a difficult judgement to make. Further testing of this framework by other research teams is required, along with specific GRADE guidance for the development and evaluation of GPS.

Conclusions

The large number of GPS in COVID-19 guidelines emphasises their importance in guidelines especially during public health emergencies, when there is a need for urgent guidance and there is a lack of direct evidence to inform decision making. Our evaluation shows that improvements are needed in the presentation, transparent reporting and the rationale for GPS development beyond the existing GRADE guidance. Furthermore, we need studies to monitor the progress around GPS development and evaluate potential barriers slowing the uptake of available guidance by guideline developers.

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information. All relevant data included.

Ethics statements

Patient consent for publication

Acknowledgments

We would like to acknowledge the research collaborators that were involved in the screening of guidelines and the extraction and evaluation of the GPS included in the eCOVID-19 recommendations map.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @okdewidar

  • Correction notice This article has been corrected since it first published. ORCID has been added for Miloslav Klugar.

  • Contributors OD, TL, ML, ZSP, EP and HJS contributed to the study conception and design. KS designed and ran the literature searches. OD, JK, ZN and MM screened literature and conducted the data extraction and evaluation. Authors provided feedback on the conceptual approach used in this study. All authors provided critical review, interpretation and approval of the final manuscript. Other members of the eCOVID-19 recommendations map contributed to the screening, data extraction and data validation but did not meet authorship criteria. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. HJS acts as guarantor and accepts full responsibility for the work and/or the conduct of the study, had access to the data, and controlled the decision to publish.

  • Funding CIHR (FRN VR4-172741 & GA3-177732) for COVID-19 recommendation mapping. AFT is the Chairholder of the Canada Research Chair in Critical Care Neurology and Trauma.

  • Competing interests HJS, AS, VAW report grants from Canadian Instituites of Health during the conduct of the study—FRN VR4-172741 & GA3-177732. RAM reports grants from WHO, grants from ASH, grants from ACR, other from Boehringer ingelheim international, grants from NIDDK, outside the submitted work. EA, PA-C and HJS report contribution to the development of the original five criteria for assessing the appropriateness of issuing good practice statements. The remaining authors have nothing else to declare.

  • Patient and public involvement Patients and/or the public were involved in the design, or conduct, or reporting, or dissemination plans of this research. Refer to the Methods section for further details.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Linked Articles

  • Research methods and reporting
    Omar Dewidar Tamara Lotfi Miranda W Langendam Elena Parmelli Zuleika Saz Parkinson Karla Solo Derek K Chu Joseph L Mathew Elie A Akl Romina Brignardello-Petersen Reem A Mustafa Lorenzo Moja Alfonso Iorio Yuan Chi Carlos Canelo-Aybar Tamara Kredo Justine Karpusheff Alexis F Turgeon Pablo Alonso-Coello Wojtek Wiercioch Annette Gerritsen Miloslav Klugar María Ximena Rojas Peter Tugwell Vivian Andrea Welch Kevin Pottie Zachary Munn Robby Nieuwlaat Nathan Ford Adrienne Stevens Joanne Khabsa Zil Nasir Grigorios Leontiadis Joerg Meerpohl Thomas Piggott Amir Qaseem Micayla Matthews Holger J Schünemann