Application of Med-PaLM 2 in the refinement of MIMIC-CXR labels
收藏DataCite Commons2025-02-04 更新2025-04-16 收录
下载链接:
https://physionet.org/content/med-palm2-mimic-cxr/
下载链接
链接失效反馈官方服务:
资源简介:
MIMIC-CXR is a large, open source dataset that is widely-used in medical AI
research. One of the limitations of this dataset is the lack of ground truth
labels for the chest X-ray studies. Prior work has extracted structured labels
from the MIMIC-CXR radiology report text using CheXpert, a natural language
processing (NLP) model. As comprehensive expert validation of these labels is
cost-prohibitive, there exists a need for scalable methods of identifying NLP-
derived labels that would benefit from manual review. We have developed
prompts for extraction of clinically-relevant labels using a clinically-
trained large language model, Med-PaLM 2, which we selectively applied to
MIMIC-CXR radiology reports. A subset of cases where the Med-PaLM 2 results
differed from the previously published CheXpert labels were reviewed by three
US board certified radiologists to establish a ground truth. Of these
differing labels, Med-PaLM 2 achieved an accuracy of 66%, compared to 19% of
CheXpert. Our results demonstrate the potential use of medically-oriented
large language models such as Med-PaLM 2 in both label extraction and
identifying cases for manual review. This dataset offers 1,378 radiologist-
verified ground truth labels to the MIMIC-CXR project.
提供机构:
PhysioNet
创建时间:
2025-01-30



