chungimungi/iitb-english-hindi-split
收藏Hugging Face2024-12-13 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/chungimungi/iitb-english-hindi-split
下载链接
链接失效反馈官方服务:
资源简介:
IIT Bombay英语-印地语语料库包含英语-印地语的平行语料库以及从印度语言技术中心收集的单语印地语语料库。该语料库自2016年起用于亚洲语言翻译共享任务,包括印地语到英语和英语到印地语的翻译任务,以及作为印地语到日语和日语到印地语的枢轴语言对。语料库在2020年和2021年进行了两次重要更新,分别增加了约47,000和49,400对句子。此外,提供了如何使用该语料库的指南,包括如何从HuggingFace数据集库中导入语料库,以及如何使用BPE分词来训练英语-印地语机器翻译系统。
The IIT Bombay English-Hindi corpus contains parallel corpus for English-Hindi as well as monolingual Hindi corpus collected from a variety of existing sources and corpora developed at the Center for Indian Language Technology, IIT Bombay over the years. This corpus has been used at the Workshop on Asian Language Translation Shared Task since 2016 for the Hindi-to-English and English-to-Hindi language pairs and as a pivot language pair for the Hindi-to-Japanese and Japanese-to-Hindi language pairs.
提供机构:
chungimungi



