five

MedCD: A Medical Clinical Dataset

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/medcd-medical-clinical-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
We curated and release a real-world medical clinical dataset, namely MedCD, in the context of building generative artificial intelligence (AI) applications in the clinical setting. The MedCD dataset is one of the accomplishments from our longitudinal applied AI research and deployment in a tertiary care hospital in China. First, the dataset is real and comprehensive, in that it was sourced from real-world electronic health records (EHRs), clinical notes, lab examination reports and more. Second, the dataset is large, that contains 1·7 million EHR examples involving more than 250K patients, collected from 30 clinical departments over the first quarter of year 2024. The scale is comparable to that of MIMIC-IV. The data was de-identified and organized into a format similar to MIMIC-IV free-text clinical notes. Moreover, the objective of this dataset is to accelerate generative AI research and development in healthcare. MedCD not only contains millions of patients' data, but also features supervised data for a variety of real fundamental clinical tasks with months' worth of annotation endeavors by clinicians. Following the general paradigm of generative AI application development, the MedCD dataset consists of: (1) unsupervised pretraining data where each patient data is organized as a medical document, (2) supervised fine-tuning data for a wide spectrum of clinical applications including NER, retrieval and summarization, and (3) benchmark data for evaluating fundamental clinical tasks such as patient triage and notes generation. Further, we describe a spectrum of deployed clinical applications making use of this data, as reference implementation and baseline. We believe that MedCD is to-date the most comprehensive and largest scale clinical dataset in Chinese, and the first designed for generative AI research and development in healthcare.
提供机构:
Chen, Ye
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作