IIT Bombay English-Hindi Parallel Corpus
收藏arXiv2018-05-20 更新2024-06-21 收录
下载链接:
http://www.cfilt.iitb.ac.in/iitb_parallel
下载链接
链接失效反馈官方服务:
资源简介:
IIT Bombay English-Hindi Parallel Corpus是由印度理工学院孟买分校的印度语言技术中心创建的大型平行语料库,专门用于英语与印地语之间的机器翻译研究。该数据集包含149万条平行语料,涵盖多种领域和应用,如法律、行政和教育等。数据集的创建过程涉及从公开资源中收集和整合语料,以及通过内部项目和课程项目新收集的语料。该数据集已在多个机器翻译共享任务中使用,旨在提高英语与印地语之间的翻译质量和效率。
The IIT Bombay English-Hindi Parallel Corpus is a large-scale parallel corpus created by the Center for Indian Language Technology at the Indian Institute of Technology Bombay, specifically dedicated to machine translation research between English and Hindi. This corpus contains 1.49 million parallel sentence pairs, covering diverse domains and applications including law, administration, education and other fields. The construction of this dataset involves collecting and integrating corpus resources from public sources, as well as newly gathered corpus materials from internal projects and course projects. This corpus has been utilized in multiple machine translation shared tasks, aiming to enhance the quality and efficiency of English-Hindi machine translation.
提供机构:
印度理工学院孟买分校计算机科学与技术系印度语言技术中心
创建时间:
2017-10-09



