Skip to Main Content

After stratospheric levels of hype, early evidence may be bringing generative artificial intelligence down to Earth.

A series of recent research papers by academic hospitals has revealed significant limitations of large language models (LLMs) in medical settings, undercutting common industry talking points that they will save time and money, and soon liberate clinicians from the drudgery of documentation.

advertisement

Just in the past week, a study at the University of California, San Diego found that use of an LLM to reply to patient messages did not save clinicians time; another study at Mount Sinai found that popular LLMs are lousy at mapping patients’ illnesses to diagnostic codes; and still another study at Mass General Brigham found that an LLM made safety errors in responding to simulated questions from cancer patients. One reply was potentially lethal.

STAT+ Exclusive Story

STAT+

This article is exclusive to STAT+ subscribers

Unlock this article — and get additional analysis of the technologies disrupting health care — by subscribing to STAT+.

Already have an account? Log in

Already have an account? Log in

Monthly

$39

Totals $468 per year

$39/month Get Started

Totals $468 per year

Starter

$30

for 3 months, then $39/month

$30 for 3 months Get Started

Then $39/month

Annual

$399

Save 15%

$399/year Get Started

Save 15%

11+ Users

Custom

Savings start at 25%!

Request A Quote Request A Quote

Savings start at 25%!

2-10 Users

$300

Annually per user

$300/year Get Started

$300 Annually per user

View All Plans

Get unlimited access to award-winning journalism and exclusive events.

Subscribe

STAT encourages you to share your voice. We welcome your commentary, criticism, and expertise on our subscriber-only platform, STAT+ Connect

To submit a correction request, please visit our Contact Us page.