five

Application of Med-PaLM 2 in the refinement of MIMIC-CXR labels

收藏
DataCite Commons2025-02-04 更新2025-04-16 收录
下载链接:
https://physionet.org/content/med-palm2-mimic-cxr/
下载链接
链接失效反馈
官方服务:
资源简介:
MIMIC-CXR is a large, open source dataset that is widely-used in medical AI research. One of the limitations of this dataset is the lack of ground truth labels for the chest X-ray studies. Prior work has extracted structured labels from the MIMIC-CXR radiology report text using CheXpert, a natural language processing (NLP) model. As comprehensive expert validation of these labels is cost-prohibitive, there exists a need for scalable methods of identifying NLP- derived labels that would benefit from manual review. We have developed prompts for extraction of clinically-relevant labels using a clinically- trained large language model, Med-PaLM 2, which we selectively applied to MIMIC-CXR radiology reports. A subset of cases where the Med-PaLM 2 results differed from the previously published CheXpert labels were reviewed by three US board certified radiologists to establish a ground truth. Of these differing labels, Med-PaLM 2 achieved an accuracy of 66%, compared to 19% of CheXpert. Our results demonstrate the potential use of medically-oriented large language models such as Med-PaLM 2 in both label extraction and identifying cases for manual review. This dataset offers 1,378 radiologist- verified ground truth labels to the MIMIC-CXR project.
提供机构:
PhysioNet
创建时间:
2025-01-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作