情感标记的英-孟加拉语平行语料库
收藏arXiv2020-07-28 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2007.14074v1
下载链接
链接失效反馈官方服务:
资源简介:
情感标记的英-孟加拉语平行语料库是由贾达普大学创建的一个包含107,984对平行句子的数据集,旨在通过引入情感分析来增强机器翻译的输出质量。该数据集通过收集和翻译英语句子到孟加拉语,并使用多种词典进行情感标注,分为简单句和其他复杂句型。创建过程涉及文本简化、情感分析和神经机器翻译模型的训练。该数据集主要用于研究情感特征对机器翻译性能的影响,特别是在提高翻译质量和流畅性方面。
The sentiment-labeled English-Bengali parallel corpus is a dataset comprising 107,984 parallel sentence pairs, developed by Jadavpur University to enhance the output quality of machine translation through the integration of sentiment analysis. Constructed by collecting and translating English sentences into Bengali, with sentiment annotation carried out using multiple lexicons, this dataset is categorized into simple sentences and complex sentence structures. Its creation workflow encompasses text simplification, sentiment analysis, and the training of neural machine translation models. This dataset is primarily utilized to investigate the impact of sentiment features on machine translation performance, specifically in improving translation quality and fluency.
提供机构:
贾达普大学
创建时间:
2020-07-28



