Article Text

Rapid review methods series: Guidance on the use of supportive software
  1. Lisa Affengruber1,2,
  2. Barbara Nussbaumer-Streit1,
  3. Candyce Hamel3,4,
  4. Miriam Van der Maten5,
  5. James Thomas6,
  6. Chris Mavergames7,
  7. Rene Spijker8,
  8. Gerald Gartlehner1,9
  9. On behalf of the Cochrane Rapid Reviews Methods Group
  1. 1Department for Evidence-based Medicine and Evaluation, Cochrane Austria, University for Continuing Education Krems, Krems, Austria
  2. 2Department of Family Medicine, Maastricht University, Maastricht, The Netherlands
  3. 3Canadian Association of Radiologists, Ottawa, Ontario, Canada
  4. 4School of Epidemiology and Public Health, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
  5. 5Knowledge Institute, Dutch Association of Medical Specialists, Utrecht, The Netherlands
  6. 6University College London, UCL Social Research Institute, London, UK
  7. 7Cochrane Central Executive Team, London, UK
  8. 8Cochrane Netherlands, Julius Center for Health Sciences and Primary Care, UMC Utrecht, Utrecht University, Utrecht, The Netherlands
  9. 9Center for Public Health Methods, RTI International, Research Triangle Park, North Carolina, USA
  1. Correspondence to Lisa Affengruber, Department for Evidence-based Medicine and Evaluation, Cochrane Austria, University for Continuing Education Krems, Krems 3500, Austria; lisa.affengruber{at}donau-uni.ac.at

Abstract

This paper is part of a series of methodological guidance from the Cochrane Rapid Reviews Methods Group. Rapid reviews (RRs) use modified systematic review methods to accelerate the review process while maintaining systematic, transparent and reproducible methods. This paper guides how to use supportive software for RRs.

We strongly encourage the use of supportive software throughout RR production. Specifically, we recommend (1) using collaborative online platforms that enable working in parallel, allow for real-time project management and centralise review details; (2) using automation software to support, but not entirely replace a human reviewer and human judgement and (3) being transparent in reporting the methodology and potential risk for bias due to the use of supportive software.

  • Methods

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • Various supporting software is currently available to expedite the review process, but guidance on the selection and use of supportive software is limited.

WHAT THIS STUDY ADDS

  • This paper presents an overview of considerations and recommendations for supportive software use for rapid reviews (RRs), covering all stages of an RR and workflow management. Additionally, we provide recommendations for emerging software, recognising that available supportive software may quickly become outdated. When assessing supportive software, we considered its validity, usability and accessibility. Currently, the task of study selection benefits the most from supportive software, as most of the valid software focuses on this task. Moreover, project management is also significantly enhanced through the use of supportive software, streamlining coordination, tracking progress and ensuring effective collaboration among team members.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • Review teams should incorporate one or more software tools by considering the scope and complexity of the review topic, the available financial resources, the institutional access to tools, the timeline, the technical know-how within the review team and the limitations of the software.

Introduction

This paper is part of a series from the Cochrane Rapid Review Methods Group, providing methodological guidance for the conduct of rapid reviews (RRs). It addresses considerations around the use of supportive software to manage RRs and its use for specific steps of the review production.

Recently, many applications have been developed to support the different steps of systematic review (SR) production.1 The ‘Systematic Review Toolbox’, a web-based catalogue of tools launched in 2014, provides a list of tools that may be used during various stages of the review process. As of November 2023, the list includes 248 software tools, with some available for free and others requiring payment.2 A recent study showed that using supportive software during the production of evidence synthesis is state-of-the-art for most review teams, but not for all.3 A study by Scott et al, which focused on automation software, found that review teams that use supportive software do so for individual review steps: most frequently at the study selection stage, followed by the data extraction and data synthesis/meta-analysis stages.3 In practice, lack of knowledge about existing tools is the most frequent barrier to implementing supportive software.4 Other factors preventing review teams from using supportive software are the costs, the software’s complexity, a mistrust of its validity, the time constraints of adoption and implementation, a mismatch to workflow and a lack of user-friendliness.3–5 However, supportive software is crucial for evidence synthesis conduct, especially for RRs, as it facilitates an accelerated review process. Table 1 gives an overview of commonly used terms in this guidance and their definitions.

Table 1

Commonly used terms in this guidance paper

Supportive software can provide different levels of automation and decision-making by applying different methods.1 5–7 O’Connor et al5 propose a framework to describe the different levels of automation. Level 1 involves no automation and focuses solely on file management. Level 2 employs semiautomated tools for workflow prioritisation, such as those that prioritise abstracts by relevancy during the study selection process. Level 3 uses semiautomated tools that carry out tasks automatically, but still require human-supervised decisions, for example, the software selects abstracts and a human checks its decisions. Level 4 represents fully automated decision-making.

The recommendations in this guidance paper are evidence informed and based on the results of a forthcoming scoping review that systematically identified evaluation studies of ready-to-use tools for evidence synthesis conduct (https://osf.io/5m2qc). We only included tools that require no programming expertise, employing codes, syntaxes and algorithms and are designed to be user-friendly—thereby obviating the necessity for specialised knowledge for operation.

In this guidance paper, we focus on supportive software that helps facilitate and expedite the RR process by demonstrating reliable results in evaluation studies and based on our collective experience within the RR community and does not claim to be exhaustive. When assessing supportive software, we considered its validity, usability and accessibility. We describe these three criteria hierarchically because, without adequate validity, usability and accessibility are irrelevant. Validity is important in the RR process so that the review team can rely on the software’s outputs with confidence and reduce the need for additional validation steps. Usability refers to how easily the software can be used by the review team, even by those without extensive technical expertise. Accessibility is especially relevant for review teams that have limited resources.

In the following sections, we first address general considerations of software that can be used in the production of evidence syntheses, including RRs. Then we elaborate on how supportive software can help accelerate specific steps of RR production. In addition, we describe what factors to consider when assessing emerging supportive software. Online supplemental appendix 1 provides an overview of the validity, usability and accessibility of the supportive software that has been empirically evaluated.

Supplemental material

Table 2 provides an overview of the recommendations for supportive software use, which are discussed in detail in the following sections.

Table 2

Recommendations for use of supportive software in rapid reviews

General recommendations

To produce RRs, we strongly encourage using web-based collaborative platforms, for instance, Google Drive8 or Microsoft OneDrive9 for data storage and collaborative writing, Microsoft Teams10 and Slack11 for communication, and platforms specifically tailored for conducting systematic evidence syntheses, for example, Covidence,12 DistillerSR,13 Rayyan14 or EPPI-Reviewer.15 This provides a centralised location for project files, allows for real-time project management and amendments, facilitates working in parallel across geographic boundaries. The benefit of using SR-tailored software is that it enables review teams to perform different steps of the review process (eg, study selection, data extraction and critical appraisal) with progress being tracked in real time within the software. This helps the review team by providing a comprehensive, transparent and up-to-date report of the review process.

Specific recommendations for supportive software use during RR production

Literature search

In the following section, we describe how software can support the literature search. For general advice on developing and conducting a systematic literature search for an RR, please consult the corresponding paper of this methods series.16

Search strategy

Text mining software can be used to support search strategy development.17 Several supportive tools are available and offer different functions. For example, the Yale Medical Subject Heading (MeSH) Analyzer18 curates the article metadata of relevant pasted MEDLINE articles (eg, indexing, MeSH terms among others), the PubMed PubReMiner19 uses simple word frequency analysis of MEDLINE records to identify terms or subject headings, the Systematic Review Accelerator (SRA) offers a selection of tools, such as word frequency analysis, the Polyglot Search Translator (an automated translation of PubMed/Ovid MEDLINE) translates search strings across databases,20 and ChatGPT, a machine learning language model, aids in listing synonyms and alternatives for terminology or concepts.21

Validity

All tools appear valid for supporting literature searches, as the precision was equal to manual search and the errors low.7 22 However, only one evaluation study testing specific datasets is currently available for each of the tools7 22 (see online supplemental table 1 for further details on validity).

Usability

Using the PubMed IDs of relevant articles, the Yale MeSH Analyzer18 automatically retrieves article metadata by generating a grid, which facilitates the identification of appropriate MeSH terms. Using PubMed IDs or a PubMed search string word, the PubMed PubReMiner19 generates frequency tables of bibliographic record fields, which facilitates the identification of appropriate MeSH and search terms, among others. As in the English language, phrases are frequently used to convey concepts, it should be noted that its current functionality does not extend to the analysis of phrases. The SRA Polyglot Search Translator20 allows search queries to be uploaded as .txt files or simply copied as text. All tools18–20 can save time, but the time required to learn how those tools operate should not be underestimated. Gains in efficiency might only happen after a learning curve. If review teams use one of these tools to support their RR search process, we recommend an experienced information specialist check the tool output.

Accessibility

All tools are free of charge and easily accessible via their homepage.18–20 None completely automates any part of the search process, and manual adjustments will still be necessary.

Deduplication

Traditional citation management software such as EndNote,23 Citavi24 and Zotero25 offer supported deduplication, but studies evaluating this function are lacking. These tools typically conduct a stepwise deduplication process.26 Some duplicates are detected automatically by the software, whereas others must be checked manually (eg, a record where most of the elements are identical, but the formatting of the author lists varies so that the records appear to be different). This stepwise deduplication process can be time intensive, but it provides control over the deduplication process.

For automated deduplication of search results, the SRA Deduplicator,27 or Deduklick,28 and applications in most web-based SR platforms (eg, Covidence,12 DistillerSR,13 EPPI-Reviewer,15 Rayyan14) are available. To save time, we recommend using automated deduplication software, with some additional verification by a human.

Validity

Automated tools offer automated deduplication and may be a valid alternative to manual deduplication, as evaluation studies report low error rates and high precision.7 29 However, we recommend always verifying that a tool’s accuracy, reliability and overall performance have been tested and validated.

Usability

Libraries can be easily uploaded in all tools in different file formats (eg, txt, xml, nbib, ris, ddpe) and do not require an advanced level of technical proficiency.

Accessibility

The SRA Deduplicator27 and Zotero (up to 300 megabytes)25 are free of charge and accessible via their homepage, whereas Deduklick,28 EndNote23 and Citavi24 require payment for use.

Study selection

Manage study selection at the abstract and full-text level

Web-based SR-tailored software platforms (eg, Covidence,12 DistillerSR,13 EPPI-Reviewer,15 Rayyan,14 SyRF30) enable multiple participants to work in parallel, regardless of geographical location. These platforms can help manage and distribute screening tasks among review team members and keep track of records throughout the review process, which can save time and improve efficiency. Such software platforms allow for flexibility if inclusion and exclusion criteria are adapted throughout the screening process.

Semiautomated study selection

Advanced text mining and machine learning tools can help reduce the resources spent on title/abstract screening. The tools offer several functions to support the screening process: they sort/filter abstracts by theme/topic/content or, more advanced, they rank records by their inclusion probability and present records with the highest likelihood of inclusion first or present the inclusion probability for records at the title/abstract level. For title/abstract screening, many tools have been evaluated, for example, ASReview,31 EPPI-Reviewer,32 DistillerSR,33–35 Rayyan,36 37 Research Screener,38 SWIFT-Active Screener39 and SWIFT-Review40 (see online supplemental table 1).

It is difficult to recommend one specific tool, as a performance comparison is impossible because most evaluation studies assessing text mining or machine learning tools examined study selection in reviews of specific interventions or clinical conditions and used different algorithms and training sets for predictions and different validity outcomes for reporting results (see online supplemental table 1). Only a few studies comparing tools in head-to-head testing on the same dataset are available.32 34 41 If the algorithms of the semiautomated or fully automated tools and the datasets used in the studies are not openly available, the evaluation cannot be replicated, and it is impossible to assess the tool’s applicability. In our experience, the performance of semiautomated tools is additionally influenced by the humans training the tool and by the breadth, scope, and topic of the review, among many other factors.

However, we recommend the following steps when using semiautomated supportive software for study selection in RRs: (1) To train the tool in making accurate predictions about the inclusion probability, the software needs a clean dataset (ie, all duplicates removed). The size of the training set depends on the software. Some tools start ranking as soon as they have one included record, others after they have a specific number of records screened, or after a certain amount of time has passed; (2) Include known relevant studies at the beginning to train the tool, with the aim of representing the breadth of the eligible studies and not a biased subset. When formulating the research question, compiling the protocol, or conducting the preliminary searches, authors usually find one or more relevant studies and (3) Pilot the software tool with all team members participating in the study selection task. This can improve the accuracy of inclusion/exclusion decisions and can increase consistency of screening among the review team. Although automation tools are not currently suitable to entirely replace a human reviewer, these tools can be used to support a human reviewer in the screening process, for example, through ranking records by inclusion probability and filtering options (eg, keywords for exclusion and inclusion). To save time, a modified or stop screening approach can be applied once the software has identified 95% of the relevant studies.39 42 Records missed with this approach could be recovered by checking reference lists of included studies, as shown in a study by Gates et al.43

Crowd support for study selection

Another solution to support the conduct of RRs is crowdsourcing alone or in combination with machine learning.44–46 By engaging willing contributors, online crowdsourcing platforms can potentially reduce the time that review teams spend on screening tasks. However, such platforms also demand time for necessary training and crowd engagement.

Validity

Semiautomated31–33 35–40 and crowd-supported44–46 study selection are valid tools to support study selection, as the sensitivity, recall and accuracy was mostly high in evaluation studies.

The semiautomated supportive software DistillerSR33 35 SWIFT-Active Screener,39 SWIFT-Review40 and crowd-supported44–46 and Rayyan36 37 showed a good performance, identifying 95% of relevant studies after investigators had screened 31%–67% of all references. ASReview,31 EPPI-Reviewer32 Research Screener38 demonstrated robust performance by identifying 95% of relevant studies, after screening 38%–99% of abstracts across reviews.

Three studies by Noel-Storr et al44–46 found that using the Screen4Me service,47 which employs crowdsourcing in combination with machine learning and a pool of previous datasets, was an accurate method (sensitivity: ranging from 94% to 100%) for abstract screening. Web-based SR-specific platforms, such as Covidence12 or SyRF,30 facilitate the review process workflow and offer functions that improve efficiency (eg, prioritisation), but no validity studies are available.

Usability

The time investment necessary to learn to use these tools must be considered.31–33 35–40 44–46

Accessibility

For the majority of the above-mentioned tools, payment is required, with the exception of ASReview.48 Screen4Me47 is only available for Cochrane authors, while Rayyan14 is limited to early-career researchers. ASReview48 lacks real-time collaboration functionality, as it is not web-based. It is crucial to account for the costs associated with the deployment of supportive software during the planning phase of an RR.

Full-text retrieval

We recommend using supportive software for obtaining full texts. Manual searching and limited access to subscription-based journals can make obtaining full texts time-consuming and cost prohibitive. To reduce the amount of manual searching in retrieving full texts, supportive software is available, such as the ‘find full text’ feature in EndNote,23 Citavi24 and Zotero.25 Another tool is the browser plug-in Unpaywall49 (only for use in Chrome and Firefox), which displays a green unlock icon in the browser if a free version of the full text is available and allows the user to access the full text by clicking on the icon.

Validity

A retrospective study evaluated tools that automatically retrieve full texts compared with manually searching for full texts. However, due to institutional differences in access, the validity outcomes could not be calculated. Among other tools EndNote’s ‘find full-text’ was assessed in this study.7

Usability

EndNote23 saves the obtained full texts as PDFs in a folder, which facilitates easy upload to a screening tool. However, in our experience, the EndNote feature works only when the DOI is available in the software, and the feature apparently does not support preprint archives. Authors should be aware that the supportive software’s efficiency depends on institutional journal subscriptions and open access. Unpaywall49 is easy to use and facilitates a full-text download, as the PDF launches or a webpage with access to the PDF is opened when the green unlock icon is clicked.

Accessibility

EndNote23 and Citavi24 are available for a fee. Zotero25 and Unpaywall49 are freely available, and Zotero offers extra storage space for a fee.

Data extraction

We recommend using SR-tailored collaborative online platforms, such as Covidence,12 DistillerSR13 or EPPI-Reviewer,15 which facilitate and structure the data extraction process for users by offering flexible and customisable data extraction forms.

Software that aims to semiautomate or fully automate data extraction, such as RobotReviewer,50 is available. RobotReviewer is a machine learning system that automatically identifies and extracts relevant text from randomised controlled trials (RCTs).50

Any automated data extraction tool should be tested within the review and used as an assistant to human reviewers, who can validate the machine learning suggestions and correct as needed.

Validity

Automatically identified text by RobotReviewer was of equal quality to the manually identified text in the Cochrane Reviews.50 However, only one evaluation study is available to assess the validity examining specific interventions or clinical conditions and used specific training sets for predictions and specific validity outcomes for reporting results. Therefore, we cannot recommend a certain tool. However, as mentioned above, some considerations should be made before use of emerging software.

Usability

Data extractors should pay particular attention when using automated software for data items more prone to errors (such as outcomes and numerical results), as extraction from tables is not supported. Descriptive data items, such as population and intervention, benefit the most from automated software support. Moreover, these software tools tend to perform better when extracting data from RCTs compared with observational studies. However, the initial training time required may outweigh the reduction in time for data extraction.

Accessibility

Covidence,12 DistillerSR13 and EPPI-Reviewer15 require payment for use, Robot Reviewer50 is free of charge.

Critical appraisal

For the critical appraisal of studies, software based on crowd support such as CrowdCARE51 or software based on semiautomation such as RobotReviewer52 are available.

Semiautomated critical appraisal

RobotReviewer52 is a machine learning system that automatically assesses some of the risk of bias (RoB) domains in the Cochrane RoB 1.0 tool for RCTs.53 Also, the SR-tailored collaborative online platforms support manual RoB assessment.

Crowd support for critical appraisal

CrowdCARE is an online crowd tool that teaches RoB assessment and facilitates the sharing of appraisals as a repository among a global community of clinicians. After finishing training, users can critically appraise studies.51

Validity

The CrowdCARE tool showed substantial concordance between the expert consensus ratings and the crowd’s mean ratings.54 However, the authors indicate that further exploration of the use of crowdsourcing for RoB assessment is needed, especially across the spectrum of research question types, study designs and healthcare disciplines.54 RobotReviewer52 assesses the RoB of RCTs using four out of seven domains of the Cochrane RoB 1.0 tool.53 However, the Cochrane RoB 1.0 tool was updated and replaced by the RoB 2.0 tool.55 RobotReviewer’s accuracy has been tested by the developers50 and three independent research teams.7 56 57 The authors reported moderate agreement between the tool and human reviewers (range: 46%–100%).7 50 56 57 RobotReviewer7 50 56 57 can be helpful in supporting a reviewer’s RoB assessment by providing an automated assessment of four out of seven domains, which can save time and costs.

We recommend that reviewers always check and validate the RoB assessments offered by the software.

Usability

CrowdCARE51 supports different study designs, such as SRs, RCTs and case series, while RobotReviewer52 only supports RCTs. As CrowdCARE51 is a repository rated by a crowd, not every article is available with a rating. RobotReviewer52 is a web-based software and easy to use with a drag-and-drop function for PDFs.

Accessibility

The two tools are free of charge.51 52

Assessment of the certainty of evidence

For rating the certainty of evidence (CoE) based on the Grading of Recommendations Assessment, Development and Evaluation (GRADE) framework, web-based software tools called GRADEpro58 and MAGICapp59 are available. As with full SRs, we strongly encourage using these web-based software for RRs, as it helps reviewers apply GRADE in a standardised manner, automatically saves data and provides various output styles for summary of findings tables, thereby improving the review production efficiency. Another paper60 of this series provides more detailed guidance on rating the CoE for RRs.

Recommendations for emerging software

In the former sections, we focused on supportive software that has already proven useful in evaluation studies. With the rise of large language models (eg, ChatGPT and Claude 2), new ways of using AI during the review process are constantly emerging.

When deciding whether to use new emerging software, we encourage reviewers to look for validation studies of the tool that used real-world data. In addition, reviewers should check that the validations are applicable in their own domain and should justify why they consider the tool to be valid and reliable for use in their review. Before using an emerging software, check whether it is regularly updated and whether user feedback is taken into consideration for improvements. Understand the cost and licensing terms associated with the software. Some software may be free, while others require a subscription or one-time purchase.

Conclusion

Supportive software is essential to facilitate the conduct of all types of evidence synthesis. In particular, RRs should make use of these tools, as they can increase efficiency and expedite steps of the review process. While there exists a wide range of supportive software for all steps of the review process, currently, the task of study selection benefits the most from supportive software, as most of the valid software focuses on this task. Moreover, project management is also significantly enhanced through the use of supportive software, streamlining coordination, tracking progress and ensuring effective collaboration among team members.

We recommend using supportive software throughout the review process and list available tools and evidence by their validity and usability, if available. Additionally, we provide recommendations for emerging software, recognising that available tools may quickly become outdated. Review teams may choose one or more software tools by considering the scope and complexity of the review topic, the available financial resources, the institutional access to tools, the timeline, the technical know-how within the review team and the limitations of the software, as many evaluation studies are conducted on limited datasets and in specific contexts. While at the beginning there might be an initial increase in time to learn the tools, they offer gains in efficiency once learnt. Review teams can use automated software to support the reviewers, but human judgement should not be replaced entirely.

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study.

Ethics statements

Patient consent for publication

Acknowledgments

We would like to thank Sandra Hummel for formatting the manuscript.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors LA, BN-S and GG contributed to the conceptualisation of this paper. LA and BN-S wrote the first draft of the manuscript. All authors critically reviewed, revised and approved the manuscript. LA is responsible for the overall content.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests LA is an associate convenor of the RRMG. BNS, CH and GG are co-convenors of the RRMG.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.