five

Data Augmentation for Low resource Neural Machine Translation Based on Se mantic Related Word Replacement and Grammatical Error Correction

收藏
科学数据银行2021-12-10 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/en/detail?dataSetId=d83ceeecfbe549b9a0fe8f4202a3b7fd
下载链接
链接失效反馈
官方服务:
资源简介:
This paper proposes a low-resource language neural machine translation data enhancement method based on semantically related word replacement and grammatical error correction. Firstly, the low-resource language is data-enhanced through the method of semantically related word replacement; secondly, the data-enhanced bilingual parallel corpus is grammatically corrected to make it conform to linguistic syntax and common sense reasoning. The results show that the method proposed in this paper not only guarantees the quantity of training corpus, but also improves the quality of training corpus, realizes effective data enhancement for low-resource languages, and further improves the effect of neural machine translation for low-resource languages.
提供机构:
Xiaobing Zhao; School of Information Engineering, Minzu University of China、National Language Resource Monitoring & Research Center of Minority Languages, Minzu University of China
创建时间:
2021-12-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作