CheXpert5000
收藏arXiv2025-09-30 收录
下载链接:
https://gitlab.uni-hannover.de/sontje.ihler/chexpert5000
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个公开可用的胸部X光图像集,包含了来自65,240名患者的224,316张放射性照片。这些图像的标签是通过使用自然语言处理技术对患者的报告进行生成的。此外,该数据集还提供了患者的年龄、性别和拍摄视角信息。为了确保数据的有效性和公正性,研究人员根据患者ID将该数据集重新划分为训练集、验证集和测试集,并特别注意保持标签分布的一致性。这个大型数据集特别关注于包含5000个标注样本的子集,其任务重点在于医学图像分类。
This is a publicly available chest X-ray image dataset containing 224,316 radiographic images from 65,240 unique patients. Labels for these images were generated via natural language processing (NLP) techniques applied to patient medical reports. Additionally, the dataset provides patient demographic and imaging metadata including age, gender, and radiographic projection/view angle. To ensure data validity and fairness, researchers re-split the dataset into training, validation, and test sets based on patient IDs, with special attention paid to maintaining consistent label distributions across all splits. This large-scale dataset specifically highlights a subset of 5,000 annotated samples, with its core task focusing on medical image classification.
提供机构:
Stanford University



