CARMEN-I: A resource of anonymized electronic health records in Spanish and Catalan for training and testing NLP tools
收藏DataCite Commons2024-04-24 更新2024-07-13 收录
下载链接:
https://physionet.org/content/carmen-i/
下载链接
链接失效反馈官方服务:
资源简介:
The CARMEN-I corpus comprises 2,000 clinical records, encompassing discharge
letters, referrals, and radiology reports from Hospital Clinic of Barcelona
between March 2020 and March 2022. These reports, primarily in Spanish with
some Catalan sections, cover COVID-19 patients with diverse comorbidities like
kidney failure, cardiovascular diseases, malignancies, and immunosuppression.
The corpus underwent thorough anonymization, validation, and expert
annotation, replacing sensitive data with synthetic equivalents. A subset of
the corpus features annotations of medical concepts by specialists,
encompassing symptoms, diseases, procedures, medications, species, and humans
(including family members). CARMEN-I serves as a valuable resource for
training and assessing clinical NLP techniques and language models, aiding
tasks like de-identification, concept detection, linguistic modifier
extraction, document classification, and more. It also facilitates training
researchers in clinical NLP and is a collaborative effort involving Barcelona
Supercomputing Center's NLP4BIA team, Hospital Clinic, and Universitat de
Barcelona's CLiC group.
提供机构:
PhysioNet
创建时间:
2023-10-06



