Bengali to Assamese parallel corpus
收藏arXiv2015-04-06 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/1504.01182v1
下载链接
链接失效反馈官方服务:
资源简介:
本研究使用了一个名为‘Bengali to Assamese parallel corpus’的数据集,由皇家工程与技术学院和高哈蒂大学共同创建。该数据集包含约17100条孟加拉语和阿萨姆语的平行句子,主要来源于小说、故事、印度旅游和观光等领域。数据集的创建过程涉及句子对齐和语料准备,旨在支持孟加拉语到阿萨姆语的统计机器翻译研究,以促进两种语言间的交流和理解。
This study employed a dataset named 'Bengali to Assamese Parallel Corpus', which was jointly developed by the Royal Academy of Engineering and Technology and Gauhati University. The dataset comprises approximately 17,100 parallel sentence pairs in Bengali and Assamese, primarily sourced from domains including novels, short stories, Indian tourism and sightseeing. The construction of this dataset involved sentence alignment and corpus preprocessing, with the goal of supporting statistical machine translation research from Bengali to Assamese, thereby facilitating communication and mutual understanding between the two languages.
提供机构:
皇家工程与技术学院计算机科学与工程系,高哈蒂大学信息技术系
创建时间:
2015-04-06



