Article Text

Download PDFPDF
Pilot study on large language models for risk-of-bias assessments in systematic reviews: A(I) new type of bias?
  1. Joseph Barsby1,2,
  2. Samuel Hume1,3,
  3. Hamish AL Lemmey1,4,
  4. Joseph Cutteridge5,6,
  5. Regent Lee7,
  6. Katarzyna D Bera3,7
  1. 1Oxford Medical School, Oxford, UK
  2. 2Newcastle Upon Tyne Hospitals NHS Foundation Trust, Newcastle Upon Tyne, UK
  3. 3University of Oxford St Anne's College, Oxford, UK
  4. 4University of Oxford Magdalen College, Oxford, UK
  5. 5York and Scarborough Teaching Hospitals NHS Foundation Trust, York, UK
  6. 6Hull University Teaching Hospitals NHS Trust, Hull, UK
  7. 7Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK
  1. Correspondence to Dr Katarzyna D Bera, University of Oxford Nuffield Department of Surgical Sciences, Oxford, UK; katarzyna.bera{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Risk-of-bias (RoB) assessment is used to assess randomised control trials for systematic errors. Developed by Cochrane, it is considered the gold standard of assessing RoB for studies included within systematic reviews, representing a key part of Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines.1 The RoB tool comprises six domains that may signify bias: random sequence generation, allocation concealment, blinding of participants and personnel, attrition bias, reporting bias and other potential biases.2 This assessment is an integral component of evaluating the quality of evidence; however, it is a time-consuming and labour-intensive process.

Large language models (LLMs) are a form of generative artificial intelligence (AI) trained on large volumes of data. ChatGPT is an LLM developed by OpenAI, capable of generating a wide variety of responses in response to user prompts. Concerns exist around the application of such AI tools in research, including ethical, copyright, plagiarism and cybersecurity risks.3 However, LLMs are increasingly popular with investigators seeking to streamline analyses. Studies have begun investigating the potential role of LLMs in the RoB assessment process.4 5 Given the flexibility and rapidly evolving nature of LLMs, our goal was to explore whether ChatGPT could be used to automate the RoB assessment process without sacrificing accuracy. This study offers an assessment of the applicability of LLMs in SRs as of December 2023.

This study sits within an SR (PROSPERO CRD420212479050). Two reviewers (SH and HALL) implemented RoB across n=15 full-length papers in portable document format (PDF) format (table 1). Six domains were assessed independently alongside an added …

View Full Text


  • X @scullingmonkey

  • Contributors Conception of study: JB and KDB. Data collection: JB, JC, KDB, SH, HALL. Data analysis: JB, JC and KDB. Data interpretation: JB, KDB and LR. Drafting of paper: JB, JC and KDB. Approval of final version: all authors. ChatGPT is the subject of this study and was only used as described in methods and results. ChatGPT was not used in analysing, drafting or rewriting of the paper.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.