News | Radiology Imaging | September 12, 2025

AJR study demonstrates high precision, recall for GPT-4 in radiology reporting.

Report Shows GPT-4 Can  Detect, Classify Critical Findings in Radiology Reports

Sept. 10, 2025  — According to ARRS’ American Journal of Roentgenology (AJR), general-purpose large language models (LLMs) such as GPT-4 can detect and classify critical findings in radiology reports with high precision and recall when guided by carefully designed prompt strategies, highlighting their potential to support timely communication in clinical workflows.

“In our evaluation of more than 400 radiology reports, GPT-4 achieved precision of 90% and recall of 87% for true critical findings using a few-shot static prompting approach,” said first author Ish A. Talati, MD, from the department of radiology at Stanford University. “These results suggest that out-of-the-box LLMs may adapt to specialized radiology tasks with minimal data annotation, although further refinement is needed before clinical implementation.”

Talati et al.’s AJR manuscript included 252 radiology reports from the MIMIC-III database and an external test set of 180 chest radiograph reports from CheXpert Plus. Reports were manually reviewed to identify critical findings and categorized as true, known/expected, or equivocal. Various prompting strategies—including zero-shot, few-shot static, and few-shot dynamic—were tested with GPT-4 and Mistral-7B to optimize detection performance.

For true critical findings in the holdout test set, GPT-4 achieved 90.1% precision and 86.9% recall, compared to 75.6% and 77.4% for Mistral-7B. On the external test set, GPT-4 reached 82.6% precision and 98.3% recall, while Mistral-7B achieved 75.0% and 93.1%, respectively. Static few-shot prompting with five examples emerged as the most effective approach for optimizing performance.

“Effective identification of critical findings is essential for patient safety,” Talati and colleagues concluded. “While further technical development is required, these findings underscore the promise of LLMs in improving radiology workflows by augmenting communication of urgent findings.”

A supplement to this AJR accepted manuscript is available here.


Related Content

News | Radiology Education

Jan. 22, 2026—The American Roentgen Ray Society (ARRS) will host a live virtual symposium, "Medical Imaging for ...

Time January 28, 2026
arrow
News | Computed Tomography (CT)

Jan. 21, 2026 — Aidoc recently announced that the U.S. Food and Drug Administration (FDA) cleared the industry's first ...

Time January 23, 2026
arrow
News | Point-of-Care Ultrasound (POCUS)

Jan. 22, 2026 — Qure.ai has received a grant from the Gates Foundation to develop a large open-source multi-modal ...

Time January 23, 2026
arrow
News | PACS

Jan. 21, 2026 — Fujifilm Healthcare Americas Corp. and Voicebrook, Inc. have announced a strategic partnership to ...

Time January 22, 2026
arrow
News | Radiology Education

Jan. 20, 2026 — The American Society of Radiologic Technicians (ASRT) Foundation has named ASRT member Danielle McDonagh ...

Time January 20, 2026
arrow
News | Radiology Business

Jan. 7, 2026 — RadNet, Inc., a provider of high-quality, cost-effective outpatient diagnostic imaging services and ...

Time January 13, 2026
arrow
News | X-Ray

Dec. 31, 2025 – Carestream Health, Inc. has completed the separation of the company into two geographically focused ...

Time January 08, 2026
arrow
News | Radiology Business

Jan. 6, 2026 — DirectMed Imaging, a portfolio company of Frazier Healthcare Partners, has acquired Tri-Imaging Solutions ...

Time January 06, 2026
arrow
News | Artificial Intelligence

Dec. 1, 2025 — Researchers at the University of California, Berkeley and University of California, San Francisco have ...

Time December 10, 2025
arrow
News | Computed Tomography (CT)

A new study shows large increases in the use of computed tomography (CT) scans of the head in emergency departments ...

Time December 05, 2025
arrow
Subscribe Now