five

ekacare/Eka-IndicMTEB

收藏
Hugging Face2025-11-19 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/ekacare/Eka-IndicMTEB
下载链接
链接失效反馈
官方服务:
资源简介:
Eka-IndicMTEB是一个包含印度多语言医学术语的评估数据集,旨在评估嵌入模型在多种印度语言和脚本中的医学术语上的表现。该数据集包含2,532个经过医生验证的查询,捕捉了印度医疗生态系统的语言和领域特定多样性。数据集包括跨越症状、诊断、程序、药物和相关概念的医学实体,并丰富了现实世界的语言变异,如拼写错误、特殊字符、缩写和口语表达。数据集涵盖了英语、印地语、孟加拉语、泰米尔语、泰卢固语、卡纳达语、马拉地语和马拉雅拉姆语等多种语言。

Eka-IndicMTEB is an evaluation dataset comprising Indian Multilingual Medical Terms designed to evaluate embedding models on medical terminology across multiple Indic languages and scripts. It contains 2,532 doctor-verified queries, capturing the linguistic and domain-specific diversity of the Indian healthcare ecosystem. The dataset includes medical entities spanning symptoms, diagnoses, procedures, medications, and related concepts, enriched with real-world linguistic variations such as spelling errors, special characters, abbreviations, and colloquial expressions. The dataset covers multiple languages including English, Hindi, Bengali, Tamil, Telugu, Kannada, Marathi, and Malayalam.
提供机构:
ekacare
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作