five

MultiClinSum Dataset: Summarization of Clinical Case Reports in English, Spanish, French and Portuguese

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15188951
下载链接
链接失效反馈
官方服务:
资源简介:
MultiClinSum Shared Task Dataset MultiClinSum is a shared task about the automatic summarization of clinical case reports in English, Spanish, French and Portuguese held as part of the BioASQ workshop at CLEF 2025. The task relies on a corpus of manually selected full clinical case reports and their corresponding clinical case report summaries derived from case report publications written in the previously mentioned languages. In addition, participants are allowed to use any other data source available online as long as they report it. This version of the data contains the sample set: a small subset of 20 full-text documents and their summaries in English meant to be used as a sample of the data that will be used in the task. Both the full-texts and their summaries are .txt documents in UTF-8. They are separated in different folders and each pair have an almost identical filename, with the summaries having the suffix "_sum". Resources: - MultiClinSum website - BioASQ website License This work is licensed under a Creative Commons Attribution 4.0 International License. Contact If you have any questions or suggestions, please contact us at: - Salvador Lima-López ()- Martin Krallinger () Additional resources and corpora If you are interested in MultiClinSum, you might want to check out these corpora and resources: DisTEMIST (Corpus of disease mentions and normalization to SNOMED CT) MedProcNER (Corpus of clinical procedure mentions and normalization to SNOMED CT) SympTEMIST (Corpus of clinical findings and normalization to SNOMED CT) DrugTEMIST (Corpus of medication mentions) CardioCCC (Corpus of diseases and medication mentions in cardiology texts) PharmaCoNER (Corpus of medications, drugs, chemical substances, genes, proteins and vaccine mentions and normalization) MEDDOPROF (Corpus of mentions of professions, occupations and working status and normalization) MEDDOPLACE (Corpus of mentions of place-related entity mentions, including departments, nationalities or patient movements etc.. and normalization) MEDDOCAN (Corpus of mentions of Personal Health Identifiers (PHI)) CANTEMIST (Corpus of cancer tumor morphology mentions and normalization) CodiESP (Corpus of clinical case reportes with assigned clinical codes from ICD10, Spanish version) LivingNER (Corpus of mentions of species, including human/family members, pathogens, food, etc.. and normalization to NCBI Taxonomy) SPACCC-POS (Corpus of clinical case reports in Spanish annotated with POS-tags) SPACCC-TOKEN (Corpus of clinical case reports in Spanish annotated with token-tags (word mention boundaries)) SPACCC-SPLIT (Corpus of clinical case reports in Spanish annotated with sentence boundary-tags) MESINESP-2 (Corpus of manually indexed records with DeCS /MeSH terms comprising scientific literature abstracts, clinical trials, and patent abstracts)
创建时间:
2025-04-10
二维码
社区交流群
二维码
科研交流群
商业服务