Images in a 68-year-old woman with a screen-detected ductal carcinoma in situ with an artificial intelligence (AI) score of 10 on the screening mammograms. (A) Mammogram of right breast from craniocaudal view. (B) Mammogram of right breast from mediolateral oblique view. (C) Craniocaudal digital breast tomosynthesis image of right breast. (D) US image of right breast. AI score is defined as the overall examination-level score from the AI system, and a score of 1 is indicative of low probability of breast cancer and 10 of high probability. The arrows in A and C indicate the malignancy, and the dotted line in D indicates the tumor diameter. Image courtesy of Radiological Society of North America
Mammograms acquired through population-based breast cancer screening programs produce a significant workload for radiologists. AI has been proposed as an automated second reader for mammograms that could help reduce this workload. The technology has shown encouraging results for cancer detection, but evidence related to its use in real screening settings is limited.
In the new study—the largest of its kind to date, Norwegian researchers led by Solveig Hofvind, Ph.D., from the Section for Breast Cancer Screening, Cancer Registry of Norway in Oslo, compared the performance of a commercially available AI system with routine independent double reading as performed in a population-based screening program. The study drew from almost 123,000 examinations performed on more than 47,000 women at four facilities in BreastScreen Norway, the nation’s population-based screening program.
The dataset included 752 cancers detected at screening and 205 interval cancers, or cancers detected between screening rounds. The AI system predicted the risk of cancer on a scale from 1 to 10, with 1 representing the lowest risk and 10 the highest risk. A total of 87.6% (653 of 752) of screen-detected and 44.9% (92 of 205) of interval cancers had the highest AI score of 10.
The researchers created three thresholds to assess the performance of the AI system as a decision-making tool. Using a threshold that mirrors the average individual radiologist rate of positive interpretation, the proportion of screen-detected cancers not selected by the AI system was less than 20%. While the AI system performed well, the study’s reliance on retrospective data means that more research is needed.
“In our study, we assumed that all cancer cases selected by the AI system were detected,” Dr. Hofvind said. “This might not be true in a real screening setting. However, given that assumption, AI will probably be of great value in interpretation of screening mammograms in the future.”
The results showed favorable histopathologic characteristics associated with a better prognosis for screening-detected cancers with low versus high AI scores. Opposite results were observed for interval cancers. This may indicate that interval cancers with low AI scores are true interval cancers not visible on the screening mammograms.
The high percentage of true negative examinations classified with a low AI score has the potential of substantially reducing the interpretive volume, while allowing only a small proportion of cancers to go undetected. By using AI as one of the two readers in a double reading setting, the radiologist could still identify these cancers, the researchers said.
“Based on our results, we expect AI to be of great value in the interpretation of screening mammograms in the future,” Dr. Hofvind said. “We expect the greatest potential to be in reducing the reading volume by selecting negative examinations.”
Although more study is needed before clinical implementation of AI in breast cancer screening, the results of the study help establish a basis for future research, including prospective studies, Dr. Hofvind said.
“We are looking forward to testing out different scenarios for AI using retrospective data and then running a prospective trial,” she said.
For more information: www.rsna.org