five

li-lab/MultiMed-X

收藏
Hugging Face2026-02-04 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/li-lab/MultiMed-X
下载链接
链接失效反馈
官方服务:
资源简介:
MultiMed-X是一个多语言基准数据集,用于评估医学推理能力,包括自然语言推理(NLI)和开放式问答(QA)。该数据集旨在评估大型语言模型在非英语医学环境中的推理质量、事实准确性和本地化能力,特别关注低资源语言。数据集通过翻译和专家验证两个已建立的英语医学基准(BioNLI和LiveQA)构建,涵盖了7种非英语语言:中文(ZH)、日语(JA)、韩语(KO)、斯瓦希里语(SW)、泰语(TH)、约鲁巴语(YO)和祖鲁语(ZU)。每个实例都经过双语医学专家的独立审查和修订,以确保临床正确性和语言自然性。数据集格式为统一的表格,包含多个字段,如id、lang、task、source、text和label。数据集包含350个实例每种语言,总计约2,450个实例,由约12名医生或高级医学生注释和验证。数据集的用途包括多语言医学推理评估、跨语言鲁棒性分析、低资源语言基准测试等,但不适用于临床部署或直接医疗决策。

MultiMed-X is a multilingual benchmark for medical reasoning evaluation across natural language inference (NLI) and open-ended question answering (QA). The dataset is designed to assess reasoning quality, factual accuracy, and localization of large language models in non-English medical settings, with particular emphasis on low-resource languages. The dataset is constructed by translating and expert-validating two established English medical benchmarks (BioNLI and LiveQA), covering 7 non-English languages: Chinese (ZH), Japanese (JA), Korean (KO), Swahili (SW), Thai (TH), Yoruba (YO), and Zulu (ZU). Each instance is independently reviewed and revised by bilingual medical experts to ensure clinical correctness and linguistic naturalness. The dataset format is a unified table with fields such as id, lang, task, source, text, and label. The dataset contains 350 instances per language, totaling approximately 2,450 instances, annotated and validated by about 12 physicians or senior medical students. The intended uses include multilingual medical reasoning evaluation, cross-lingual robustness analysis, low-resource language benchmarking, etc., but it is not intended for clinical deployment or direct medical decision-making.
提供机构:
li-lab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作