Article Text
Statistics from Altmetric.com
Barsby et al present a thought-provoking pilot study which discusses the application of large language models (LLMs) for automating risk-of-bias (RoB) assessments in system evaluation.1 Although LLMs show potential in streamlining evidence synthesis, the major ethical concerns caused by their integration into the medical decision-making process require careful consideration.
Patient safety is paramount. RoB assessments directly impact the quality of evidence used to guide clinical decisions. As highlighted by Barsby et al, current LLM performance in RoB assessment remains suboptimal, with both ChatGPT 3.5 and ChatGPT 4 demonstrating only moderate agreement with human assessors.1 Prematurely relying on these models could lead to misinformed judgements, …
Footnotes
Contributors YW conceptualised the study and drafted the manuscript. CL contributed to reviewing and editing the manuscript. Both authors reviewed and approved the final manuscript.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; internally peer reviewed.