five

MultiCaRe: An open-source clinical case dataset for medical image classification and multimodal AI applications

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10079369
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset contains multi-modal data from over 70,000 open access and de-identified case reports, including metadata, clinical cases, image captions and more than 130,000 images. Images and clinical cases belong to different medical specialties, such as oncology, cardiology, surgery and pathology. The structure of the dataset allows to easily map images with their corresponding article metadata, clinical case, captions and image labels. Details of the data structure can be found in the file data_dictionary.csv. More than 90,000 patients and 280,000 medical doctors and researchers were involved in the creation of the articles included in this dataset. The citation data of each article can be found in the metadata.parquet file. Refer to the examples showcased in this GitHub repository to understand how to optimize the use of this dataset.The license of the dataset as a whole is CC BY-NC-SA. However, its individual contents may have less restrictive license types (CC BY, CC BY-NC, CC0). For instance, regarding image filess, 66K of them are CC BY, 32K are CC BY-NC-SA, 32K are CC BY-NC, and 20 of them are CC0.

本数据集包含来自7万余篇开放获取且已去标识化的病例报告的多模态数据,涵盖元数据、临床病例、图像说明文字及超过13万张图像。图像与临床病例分属肿瘤学、心脏病学、外科学、病理学等不同医学专科。该数据集的结构可实现图像与其对应文章元数据、临床病例、说明文字及图像标签的快速关联匹配。有关数据结构的详细说明可参阅data_dictionary.csv文件。 本数据集收录的文章共涉及超过9万名患者及28万名医务工作者与研究人员。每篇文章的引用数据可在metadata.parquet文件中获取。 可参阅本GitHub仓库中展示的示例,以掌握该数据集的优化使用方法。本数据集整体采用知识共享署名-非商业性使用-相同方式共享(CC BY-NC-SA)许可协议,但其单个内容可能采用限制更为宽松的许可类型(如CC BY、CC BY-NC、CC0)。例如在图像文件中,6.6万张采用CC BY许可,3.2万张采用CC BY-NC-SA许可,3.2万张采用CC BY-NC许可,另有20张采用CC0许可。
创建时间:
2025-03-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作