five

Diagnostic Interview Corpus - Translation

收藏
DataCite Commons2026-05-05 更新2026-05-06 收录
下载链接:
https://yareta.unige.ch/archives/03497afb-3903-4d13-bf76-2d361dd3d117
下载链接
链接失效反馈
官方服务:
资源简介:
The Diagnostic Interview Corpus is a multilingual dataset of 12,754 French medical consultation sentences (questions and instructions) with translations into 12 languages and associated UMLS-based semantic glosses. It supports research on low-resource medical machine translation, semantic representation, and pictograph generation. Languages - Source: French - Targets (in translations.csv): Albanian, Modern Standard Arabic, Tunisian Arabic, Moroccan Arabic, Algerian Arabic, Dari (Afghan Persian), Farsi (Iranian Persian), Russian, English, Spanish, Tigrinya, Ukrainian - Semantic gloss (in translations.csv): French sentences aligned with UMLS glosses (concept sequences + functional tokens). - Paraphrases (in paraphrases.csv): French paraphrases aligned with the corresponding French source sentences, generated through a grammar-based approach to ensure controlled syntactic variation Domains and registers - Medical consultations - Questions and instructions (e.g., symptom checks, treatment directives) - Categories by body region (e.g., head, chest, abdomen) Features - Parallel multilingual translations created and adapted with clinical experts - Semantic gloss layer (UMLS CUIs + functional tokens) for pictograph generation - Patient-centered simplifications and cultural adaptations to improve comprehension Example French: Avez-vous des nausées ou des vomissements ? English: Do you have nausea or vomiting? UMLS gloss: You | Nausea | or – article | Vomiting | Question Intended Use - Low-resource multilingual MT research - Semantic representation learning (UMLS-based) - Pictograph translation systems for patients with limited health literacy - Evaluation of medical-domain MT beyond surface-level accuracy Acknowledgements This corpus was developed in the context of the BabelDr and PictoDr projects at the University of Geneva in collaboration with Geneva University Hospitals.This work is part of the PROPICTO project, funded by the Swiss National Science Foundation (N°197864) and the French National Research Agency (ANR-20-CE93-0005). This project also received funding by the ”Fondation Privée des Hôpitaux Universitaires de Genève”.
提供机构:
Université de Genève, Yareta
创建时间:
2025-10-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作