News | Radiology Imaging | September 12, 2025

AJR study demonstrates high precision, recall for GPT-4 in radiology reporting.

Report Shows GPT-4 Can  Detect, Classify Critical Findings in Radiology Reports

Sept. 10, 2025  — According to ARRS’ American Journal of Roentgenology (AJR), general-purpose large language models (LLMs) such as GPT-4 can detect and classify critical findings in radiology reports with high precision and recall when guided by carefully designed prompt strategies, highlighting their potential to support timely communication in clinical workflows.

“In our evaluation of more than 400 radiology reports, GPT-4 achieved precision of 90% and recall of 87% for true critical findings using a few-shot static prompting approach,” said first author Ish A. Talati, MD, from the department of radiology at Stanford University. “These results suggest that out-of-the-box LLMs may adapt to specialized radiology tasks with minimal data annotation, although further refinement is needed before clinical implementation.”

Talati et al.’s AJR manuscript included 252 radiology reports from the MIMIC-III database and an external test set of 180 chest radiograph reports from CheXpert Plus. Reports were manually reviewed to identify critical findings and categorized as true, known/expected, or equivocal. Various prompting strategies—including zero-shot, few-shot static, and few-shot dynamic—were tested with GPT-4 and Mistral-7B to optimize detection performance.

For true critical findings in the holdout test set, GPT-4 achieved 90.1% precision and 86.9% recall, compared to 75.6% and 77.4% for Mistral-7B. On the external test set, GPT-4 reached 82.6% precision and 98.3% recall, while Mistral-7B achieved 75.0% and 93.1%, respectively. Static few-shot prompting with five examples emerged as the most effective approach for optimizing performance.

“Effective identification of critical findings is essential for patient safety,” Talati and colleagues concluded. “While further technical development is required, these findings underscore the promise of LLMs in improving radiology workflows by augmenting communication of urgent findings.”

A supplement to this AJR accepted manuscript is available here.


Related Content

News | Radiology Business

March 12, 2026 — DelveInsight's has released its latest Diagnostic Imaging Equipment Market Insights report. The in ...

Time March 13, 2026
arrow
News | Enterprise Imaging

Mar. 9, 2026 — GE HealthCare recently announced that View, the viewer within the Genesis Radiology Workspace, has ...

Time March 12, 2026
arrow
News | FDA

Mar. 9, 2026 — GE HealthCare's View, the powerful viewer within the Genesis Radiology Workspace, has received 510(k) ...

Time March 09, 2026
arrow
News | HIMSS

March 5, 2026 — At the Health Information and Management Systems Society (HIMSS) Conference & Exhibition 2026 in Las ...

Time March 06, 2026
arrow
News | Radiology Business

March 5, 2026 — Cassling is now accepting applications for the 2026 Imaging for Impact Award, a national recognition ...

Time March 05, 2026
arrow
News | Radiology Education

The American Society of Radiologic Technologists (ASRT) has named 109 individuals from across the country to participate ...

Time February 24, 2026
arrow
Feature | Information Technology | Dhruv Chopra

Radiology is a cornerstone of modern medical diagnostics, but today it stands at an inflection point. Pressures ...

Time February 24, 2026
arrow
News | Radiology Business

The American Society of Radiologic Technologists (ASRT) will host a free Virtual Career Fair on March 17, from 4-7 p.m ...

Time February 20, 2026
arrow
News | Magnetic Resonance Imaging (MRI)

Feb. 19, 2026 — GE HealthCare recently announced 510(k) clearance of three new magnetic resonance (MR) innovations with ...

Time February 20, 2026
arrow
Feature | Artificial Intelligence | Jordan Bazinsky

For the past decade, artificial intelligence's (AI) potential in healthcare has been synonymous with speed. In medical ...

Time February 16, 2026
arrow
Subscribe Now