After stratospheric levels of hype, early evidence may be bringing generative artificial intelligence down to Earth.
A series of recent research papers by academic hospitals has revealed significant limitations of large language models (LLMs) in medical settings, undercutting common industry talking points that they will save time and money, and soon liberate clinicians from the drudgery of documentation.
Just in the past week, a study at the University of California, San Diego found that use of an LLM to reply to patient messages did not save clinicians time; another study at Mount Sinai found that popular LLMs are lousy at mapping patients’ illnesses to diagnostic codes; and still another study at Mass General Brigham found that an LLM made safety errors in responding to simulated questions from cancer patients. One reply was potentially lethal.
This article is exclusive to STAT+ subscribers
Unlock this article — and get additional analysis of the technologies disrupting health care — by subscribing to STAT+.
Already have an account? Log in
Already have an account? Log in
To submit a correction request, please visit our Contact Us page.
STAT encourages you to share your voice. We welcome your commentary, criticism, and expertise on our subscriber-only platform, STAT+ Connect