gaHealth
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/seamusl/gaHealth
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为gaHealth语料库,它专注于健康领域的英文-爱尔兰语(EN-GA)和爱尔兰语-英文(GA-EN)翻译训练模型,特别强调与新冠病毒相关的数据。该数据集被用于在adaptNMT框架内训练的模型评估,并允许与其他团队模型的性能进行比较。其规模为16,000行,并结合了LoResMT2021语料库中的8,000行数据,所涉及的任务是神经机器翻译。
The gaHealth Corpus is a dataset dedicated to training English-Irish (EN-GA) and Irish-English (GA-EN) neural machine translation models in the healthcare domain, with a particular emphasis on COVID-19-related data. It is utilized for evaluating models trained within the adaptNMT framework, and enables performance comparison between models developed on this corpus and those from other research teams. The corpus comprises 16,000 parallel sentence pairs, 8,000 of which are sourced from the LoResMT2021 corpus, and its core task is neural machine translation (NMT).
提供机构:
AdaptNMT



