ClinSpEn-CC Sample Set: Parallel English-Spanish COVID-19 Clinical Cases
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/6497350
下载链接
链接失效反馈官方服务:
资源简介:
The ClinSpEn-CC (clinical case) dataset is a collection of EN-ES parallel COVID-19 clinical case reports to be used in the 2022 Workshop of Machine Translation (WMT)'s Biomedical Translation Task.
The dataset’s case reports were carefully selected to cover a wide range of aspects related to the disease: different types of patients (children, adults, elderly and pregnant people, babies), different comorbidities (cancer, mental health issues, immunosuppressed patients) and symptomatology (mild and severe presentations, dermatologic, immunologic and psychiatric manifestations, thrombosis, …). The reports were translated from English to Spanish by a professional medical translator on a first step and revised by a clinical expert on a second step.
ClinSpEn-CC includes a total of 202 case reports, which amount to almost 4 000 sentences. Each file is duplicated, with the Spanish version having a “.es” extension and the English files having a “.en” extension. Each report has been parallelized so that every sentence’s line number corresponds to the same sentence’s line number in both languages.
This repository contains a sample set of 50 cases.
Related Links:
- Data website with more information: https://temu.bsc.es/clinspen/
- WMT website (includes schedule, registration, ...): https://www.statmt.org/wmt22/
ClinSpEn SAMPLE SETS:
- ClinSpEn-CC Sample Set (Clinical Cases): https://doi.org/10.5281/zenodo.6497350
- ClinSpEn-CT Sample Set (Clinical Terms): https://doi.org/10.5281/zenodo.6497372
- ClinSpEn-OC Sample Set (Ontology Concepts): https://doi.org/10.5281/zenodo.6497388
ClinSpEn TEST SETS:
- ClinSpEn-CC Test Set (Clinical Cases): https://doi.org/10.5281/zenodo.6948634
- ClinSpEn-CT Test Set (Clinical Terms): https://doi.org/10.5281/zenodo.6948669
- ClinSpEn-OC Test Set (Ontology Concepts): https://doi.org/10.5281/zenodo.6948679
创建时间:
2023-03-09



