Living systematic reviews (LSRs) are systematic reviews that are continually updated, incorporating relevant new evidence as it becomes available. LSRs are critical for decision-making in topics where the evidence continues to evolve. It is not feasible to continue to update LSRs indefinitely; however, guidance on when to retire LSRs from the living mode is not clear. We propose triggers for making such a decision. The first trigger is to retire LSRs when the evidence becomes conclusive for the outcomes that are required for decision-making. Conclusiveness of evidence is best determined based on the GRADE certainty of evidence construct, which is more comprehensive than solely relying on statistical considerations. The second trigger to retire LSRs is when the question becomes less pertinent for decision-making as determined by relevant stakeholders, including people affected by the problem, healthcare professionals, policymakers and researchers. LSRs can also be retired from a living mode when new studies are not anticipated to be published on the topic and when resources become unavailable to continue updating. We describe examples of retired LSRs and apply the proposed approach using one LSR about adjuvant tyrosine kinase inhibitors in high-risk renal cell carcinoma that we retired from a living mode and published its last update.
- evidence-based practice
- systematic reviews as topic
Data availability statement
Data sharing not applicable as no datasets generated and/or analysed for this study. No data are available.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
What is already known on this topic?
Living systematic reviews are critical for decision-making, but criteria on when their living mode can be retired are not clear.
What this study adds
The first trigger of termination is when the evidence becomes conclusive, which can be based on moderate or high GRADE certainty of evidence.
The second trigger depends on whether the evidence derived from the living systematic review continues to be of utmost relevance to the decision-making of key stakeholders, including people affected by the problem, healthcare professionals, policymakers and researchers.
Additional triggers are when new studies are not anticipated to be published, and when resources for continuing the living mode become unavailable.
How this study might affect research, practice or policy
Systematic reviewers can use these triggers to determine when to retire a living systematic review.
Terminating a living systematic review can shift resources and focus to other topics with less certain evidence or higher priority.
Methodological research is needed to study the impact and validity of these triggers.
Evidence-based healthcare decisions must be based on current updated information that is systematically collected and synthesised. However, systematic reviews become outdated quickly. One study estimated that in 7% of cases, a signal suggesting the need to update a review could be observed as early as the time of publication.1 Thus, the rationale for living systematic reviews (LSRs) is intuitive and straightforward. LSRs are systematic reviews that are continually updated to incorporate relevant new evidence as it becomes available.2 The decision to initiate an LSR depends on three criteria: the topic is important for decision-making, new studies are being published and the certainty in evidence is low.2 During the COVID-19 pandemic, the need for LSRs has become more apparent, and a few LSRs have been published in high-impact journals.3–10 However, large effort, commitment and cost are required to maintain LSRs. LSRs cannot continue forever and criteria or triggers to retire LSR from their living status are needed.
Such triggers for retirement from the living mode can be derived from relevant literature about when to start an LSR and from literature about updating non-LSRs.2 4 11–14 The Cochrane Collaboration suggests that an LSR may transition out of a living mode when the criteria to initiate LSR are no longer met.4 A survey of 545 closed or stable non-living Cochrane reviews demonstrated that common rationales provided by authors to deem a topic stable related to not having new studies published on the topic, conclusiveness of the evidence or the fact that the intervention is no longer in general use.11 The justification was deemed insufficient in 49% of the reviews,11 and reanalysis of a sample of 40 reviews demonstrated that there was no clear algorithm or consistent approach to make such decision.12 In this proposal, we aim to identify potential situations in which LSRs can be retired from the living mode and illustrate these situations through a published example of an LSR that has published its last update and was retired from the living mode.9
Triggers for retiring LSRs from the living mode
We propose four triggers from which one or more can provide a sufficient rationale for retiring LSRs.
When the evidence becomes conclusive
Determination based on certainty of the evidence
LSRs can be retired from the living mode when the cumulative evidence is judged to be of high certainty for the outcomes that are required for making a decision about the intervention, both benefits and harms. Indeed, high certainty evidence is defined as ‘further research is unlikely to change our confidence in the estimate of effect’.15 The certainty in evidence is a construct that encompasses the domain of imprecision, which is often considered on its own as a sign of conclusiveness of evidence based on statistical terms, which is discussed next. Thus, the construct of certainty of evidence is more comprehensive because it considers additional domains such as the risk of bias, indirectness, inconsistency and the plausibility of publication bias.15 For example, the statistical evidence may be robust, but if it was derived from studies at high risk of bias, the certainty in the evidence would be low and the living mode should remain required. GRADE requires making semiquantitative judgements about the five certainty domains to reach a final certainty rating for each outcome.15 To retire an LSR from the living mode, we need to be certain about most or all outcomes of relevance to patients. It is possible that the living mode may be discontinued for some, but not all outcomes. Finally, it is plausible that moderate certainty evidence may be sufficient in some cases to make the decision that the efforts of keeping a review in the living mode are not warranted, rather, such review can be updated when certain triggers or conditions are met,14 but it would not be a true LSR.
Determination based on statistical methods
One potential trigger for discontinuing updating an LSR is when the statistical evidence supporting the observed treatment effect is considered to be robust or conclusive. The advantage of this trigger is that it can be automated in electronic platforms5 7 8 that support LSRs by facilitating literature search, data synthesis and presentation of LSR findings. The disadvantage of this trigger is that it does not address other domains of certainty, such as risk of bias or indirectness, which can be critical for decision-making.
Making a judgement about statistical conclusiveness based on conventional statistical significance (eg, p value) is inadequate because statistical significance can be fragile, that is, it may change based on the outcome of a handful of events.16 17 Type 1 and 2 errors (false-positive and false-negative conclusions, respectively) increase every time a trial is added to a meta-analysis, and the conventional CI that does not account for error inflation may be too narrow. Hence, the need to adjust for type 1 and 2 errors in meta-analysis has been long recognised.18 Simmonds et al present four possible methods that can be applied in a meta-analysis.19 The most commonly cited method is trial sequential analysis (TSA).20 One of the reasons for TSA popularity over the other methods is that it was implemented in an open source software.21 This approach accounts for type 1 error by ensuring that the cumulative type 1 error rate (commonly 5%) remains at the predetermined level every time the analysis is updated. The approach also accounts for type 2 error by performing a sample size calculation of the meta-analysis as if the meta-analysis was a clinical trial (using a plausible effect size, a specific type 1 error and the desired level of statistical power). This sample size calculation is sensitive to the chosen effect size and requires stakeholder engagement and consensus. In the TSA method, the Z score (standardised pooled treatment effect) is calculated every time a trial is added to the meta-analysis. If it crosses the alpha spending boundary, then the treatment effect is considered conclusive. In addition, If the Z score ends up in the futility region, it would also suggest that adding new studies will unlikely lead to a statistically significant effect.19 20
The available statistical methods to determine conclusiveness of evidence are based on the null effect. In modern decision-making, we rate certainty in whether an effect estimate is on one side of a threshold or in a particular range.22 23 Thus, future novel statistical methods that incorporate specific thresholds are needed.
When the topic becomes less relevant to stakeholders
It is plausible that a management option or a diagnostic test becomes obsolete, such as a device or a technology that becomes outdated or a drug that is no longer on the market. For example, systematic reviews evaluating non-real-time continuous glucose monitoring systems are less relevant to current clinical practice that emphasises real-time monitoring.24 In the current pandemic, LSRs evaluating hydroxychloroquine can be terminated since the use of this drug, which was a hot topic in 2020, is not of interest to physicians or the public anymore. Lack of interest in hydroxychloroquine may relate to studies that suggested lack of efficacy, increased harms, other treatments showing a clear net benefit, or may relate to other political and social factors25 26 and may also reflect our certainty in its effect (the previous trigger based on certainty).
Policy relevance also affects decisions about updating or retiring LSRs from the living mode. Authors of 2 Cochrane LSRs have reported that decisions made by the US Food and Drug Administration and the WHO have impacted their decisions to update LSRs about convalescent plasma and COVID-19 mitigation in the aviation sector, respectively.27 LSRs about the pandemic that synthesised observational studies or evaluated surrogate outcomes can be terminated (or modified) as randomised trials using patient-important outcomes have become available.27 It is important to recognise that a topic may become less relevant to stakeholders due to perceived clear benefit or perceived lack of benefit, although the evidence may not be conclusive. Hence, terminating an LSR based on this criterion should also consider the first criterion, conclusiveness of evidence.
When new studies are not expected to be published
LSRs may not be needed when research that might impact the conclusions of the review is no longer emerging.4 Such criterion is difficult to ascertain or conclude with confidence but can be inferred based on expert opinion and when recent LSR updates do not identify new studies or ongoing studies, and also based on a careful review of relevant trial registers. The living mode may also not be relevant when ongoing studies exist but are expected to be completed and published in a relatively long time frame (eg, in 5 years). Thus, the living mode may be paused temporarily, as opposed to being completely terminated.
When required resources become unavailable
Lack of resources represent a pragmatic rationale for retiring LSRs. LSRs require substantive amounts of human and financial resources. This includes additional effort from librarians, reviewers and statistical analysts. In addition, LSRs, and similar to living guidelines,28 require ongoing conflict of interest management, quality control and stakeholder engagement. The funding needed to support all these efforts may expire or shift towards other topics considered more urgent. Researchers are another important stakeholders who may become less interested or motivated to maintain such effort or pursue a particular topic. Our LSR on ventilation modes for people with COVID-19 for WHO was retired because there were no ongoing resources (both external and internal funding) to support the large team that was required to continue in the living mode.6 29 30 Our group decided to divert efforts to other important research questions due to both triggers, lack of resources and not expecting new studies to be published on the topic in the near future.
An illustrative example of a terminated LSR
One LSR investigated the effect of adjuvant tyrosine kinase inhibitors (TKIs) on the risk of cancer recurrence and progression to metastases in high-risk renal cell carcinoma.9 The review started in 2018 and was retired from a living mode in 2021. Meta-analysis showed that adjuvant TKIs as compared with observation offered no benefit in overall survival or disease-free survival and significantly increased adverse events. The termination was based on two triggers, conclusiveness of evidence based on certainty of the evidence and on statistical considerations, and relevance to stakeholders (box 1).
Example of making a decision to retire a living systematic review about adjuvant tyrosine kinase inhibitors from the living mode
1. Evidence was judged to be conclusive.
a. Certainty of evidence.
The certainty in evidence using the GRADE approach was judged to be high for the most important outcomes of overall survival, disease-free survival and all-cause grade 3 or above adverse effects. Data were derived from five randomised controlled trials that were judged to be at low risk of bias. There were no concerns about inconsistency, indirectness or evidence of publication bias.
b. Statistical considerations:
Trial sequential analysis (TSA) suggested that the evidence was statistically conclusive. For example, figure 1 for the outcome of overall survival shows TSA with the Z score ending in the futility region, suggesting that further studies will unlikely show a meaningfully different treatment effect. This analysis was based on data from 6531 patients, which exceeded the optimal information size of 5492, which is a sample size calculation based on an assumed type 1 error rate of 5%, power of 80% (ie, type 2 error rate of 20%) and a conservative 15% plausible relative risk reduction (higher relative risk reductions of 20%–30% are more commonly used17 but would lead to a smaller optimal information size).
2. Relevance-based considerations
Most practitioners do not currently consider adjuvant TKIs in clinical practice due to perceived lack of survival benefit and high incidence of toxicity. The contemporary trials are now focused on the use of immune checkpoint inhibitors in this setting; for which the evidence is increasing and qualifies for an LSR.
LSRs cannot be indefinitely updated. Systematic reviewers need to prioritise and select a few topics that will continue to be living. In this proposal, we suggest triggers to help with this decision: is the evidence conclusive? Is the decision still relevant to key stakeholders, including people affected by the problem, healthcare professionals, policymakers and researchers? Other pragmatic criteria relate to whether new research is still expected to be published about the topic and if resources are still available to continue the living mode. We deemphasised statistical approaches for determining conclusiveness of the evidence because they are not comprehensive, that is, they do not address other GRADE certainty domains, they are sensitive to assumptions about error rates and plausible effects and are based on determinations relevant to the null effect.15 17
While guidance for reporting LSRs is still under development,31 32 it is important for the LSR to report on triggers for retirement from the living mode. Also, when an LSR is actually being retired from the living mode, the latest version should clarify retirement and provide the justification (ie, the triggers). For example, one LSR studied the impact of angiotensin-converting enzyme inhibitors and angiotensin-receptor blockers on SARS-CoV-2 infections. This LSR published updates that informed readers when a certain key question was retired and provided the rationale.10
It is not clear how many LSRs are in fact living and continue to be updated on a regular basis. Bibliographic databases do not utilise indexing terms (eg, Medical Subject Headings (MeSH) or Emtree terms) to identify LSRs. Hence, identifying LSRs requires searching using text words and word adjacency approaches and manual searching. Although more LSRs have been published during the COVID-19 pandemic, the number of LSRs remains relatively small, and the number of retired LSRs is even smaller. Thus, the main limitation of the proposed criteria is the overall limited experience of scientific journals and evidence users with LSRs. Further methodological development of retirement criteria of LSRs will occur when more LSRs become available. Comparative evaluation of different types of evidence syntheses, for example, rapid review, living review, in terms of accuracy and efficiency, will also inform development of triggers to retire LSRs.
Proposed triggers for retiring LSR from a living mode are: when the evidence becomes conclusive for the outcomes that are required for decision-making, when the question becomes less pertinent for decision-making as determined by relevant stakeholders, when new studies are not anticipated to be published on the topic, and when resources become unavailable to continue updating. We strongly encourage LSR authors to provide an indication of LSR retirement in the last iteration of the LSR, whether electronically or in print, along with a rationale and triggers that supported their decision.
Data availability statement
Data sharing not applicable as no datasets generated and/or analysed for this study. No data are available.
Patient consent for publication
Twitter @ibrahimelmikati, @joanne_kh, @IrbazRiaz
Contributors MHM is an epidemiologist and a Professor of Medicine at the Mayo Clinic with expertise in evidence synthesis, guideline development and decision-making. EA, IKEM, JK, RN, HJS and IBR provided expertise in methodology, evidence synthesis, meta-analysis and decision-making. ZW, HC and LL provided statistical expertise. MHM and IBR conceived the idea. MHM is the guarantor of this work.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.