five

Monolingual Data Augmentation based Chinese to Tibetan Machines Translation

收藏
科学数据银行2021-12-10 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/en/detail?dataSetId=fde027e80ae14c30a87ff9fff7be21e3
下载链接
链接失效反馈
官方服务:
资源简介:
Chinese-Tibetan machine translation has great value to the promotion of general use of national language, cross-ethnic communication in Tibetan area and the implementation and promotion of policies. Recently, neural machine translation has made great progress in the machine translation of Chinese-English, English-French and other languages with the help of large-scale training data and powerful modeling capabilities of deep learning models. Neural machine translation usually relies on large-scale parallel sentence pairs for training. However, the machine translation of Chinese and minority languages such as Chinese-Tibetan, Chinese-Mongolian and other Chinese and minority languages lack large-scale bilingual parallel data, resulting in the lack of performance of Chinese and ethnic language machine translation . Aiming at resolving the problem of lack of training data for Chinese and ethnic language machine translation, this paper utilizes the current situation of large-scale annotation data and mature natural language processing tools in Chinese, and proposes a way to use monolingual data augmentation to expand the scale of Chinese-Tibetan parallel sentence pairs. To improve the quality of Chinese-Tibetan machine translation. This paper has carried out a variety of data augmentation methods. The results on three main neural machine translation models show that the monolingual data augmentation method can effectively improve the quality of machine translation, which is a machine translation between Chinese and national languages. The translation provides a reference
提供机构:
Institute of Ethnology and Anthropology, Chinese Academy of Social Sciences; Congjun Long
创建时间:
2021-12-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作