Article Text
Statistics from Altmetric.com
Introduction
Large language models (LLMs), a new family of tools to generate natural language, are likely to radically change how some aspects of healthcare are delivered.1 It is difficult to comprehend the pace at which change is already occurring. For example, ChatGPT, released by OpenAI in November 2022, represented a leap in the capability of artificial intelligence (AI).2 This was followed by the release of the Gemini family of models from Google and Claude-3 from Anthropic, as well as numerous other models with impressive capabilities. The aim of this analysis article is to illustrate how LLMs can already be used to improve healthcare based on their power to find and convey information. At the same time, we want to alert users to the pitfalls of relying on this form of AI: vigilance will be required.
The reflex to always turn to search engines is already built into how we work. We have all become aware of AI applications in speech recognition and automation to some extent.3 But LLMs take AI to a new level, enabling conversational analysis and synthesis of available knowledge at a scale and sophistication that, even by the admission of its engineers, is not fully understood.4
Of note, there is significant scope for LLMs to facilitate increased levels of shared decision-making (SDM). SDM is defined as ‘an approach where clinicians and patients share the best available evidence when faced with the task of making decisions and where patients are supported to consider options to achieve informed preferences’.5 In this article, we address the likely consequences of LLMs for SDM.
How LLMs work
The basic building block of LLMs, such as ChatGPT or Gemini, is the ability to predict the probability of the ‘next word’ in a string of words.6 But this somewhat modest-sounding ability unfolds …
Footnotes
X @glynelwyn
Contributors PR and GE initiated this article and were supported by WBW and DB. We reviewed the most recent literature that has examined the use of LLMs in healthcare and evaluated the use of ChatGPT 3.5 in a hypothetical situation. PR has experience as a pharmacist and health economist and works as a machine learning engineer. DB is a retired clinician in California who has personal relevant experience of the condition used as an example in this article. WBW has experience as a physician and health economist; he runs Microsoft’s philanthropic AI for Health programme. GE has researched shared decision-making and developed patient decision-support tools. He is the guarantor of this article. The paper includes case uses for ChatGPT (in boxes).
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests We have read and understood BMJ policy on declaration of interests and have the following interests to declare: GE developed the concept of comparison tables (Option Grid) as conversation aids to support shared decision-making. These tools were licensed to EBSCO and are now produced as part of the Dynamed Decisions product. PR has no conflicts of interest. WBW is a Microsoft employee and leads Microsoft’s philanthropic AI for Health Research programme, part of Microsoft’s philanthropic AI for Good Lab. In that role, WBW has no sales or revenue obligations or expectations.
Provenance and peer review Not commissioned; externally peer reviewed.