surrey-nlp/health-QE
收藏Hugging Face2026-01-30 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/surrey-nlp/health-QE
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个专注于翻译任务的多语言数据集,涉及英语(en)和几种印度语言,包括泰米尔语(ta)、古吉拉特语(gu)、马拉地语(mr)和印地语(hi)。数据集标记为“医疗”和“生物学”,表明其领域特定性。它包括针对不同语言对(en-gujarati、en-hindi、en-marathi、en-tamil)的多种配置,每种配置都有训练、验证和测试分割。特征包括源文本和目标文本、分数、均值、z分数、领域、ID和语言对,表明这是一个用于翻译质量评估的全面数据集。大小类别表明数据集为中等规模,每种配置的示例数量在10K到100K之间。
This is a multilingual dataset dedicated to translation tasks, covering English (en) and multiple Indian languages including Tamil (ta), Gujarati (gu), Marathi (mr), and Hindi (hi). It is labeled with "medical" and "biological" domains, indicating its domain-specific nature. The dataset offers multiple configurations for different language pairs: en-gujarati, en-hindi, en-marathi, and en-tamil, each with training, validation, and test splits. Its features include source text, target text, score, mean, z-score, domain, ID, and language pair, making it a comprehensive dataset for translation quality assessment. Classified as medium-scale, the dataset contains between 10,000 and 100,000 examples per configuration.
提供机构:
surrey-nlp



