Monolingual Data Augmentation based Chinese to Tibetan Machines Translation

Name: Monolingual Data Augmentation based Chinese to Tibetan Machines Translation
Creator: Institute of Ethnology and Anthropology, Chinese Academy of Social Sciences; Congjun Long
Published: 2021-12-10 00:00:00
License: 暂无描述

科学数据银行2021-12-10 更新2026-04-23 收录

下载链接：

https://www.scidb.cn/en/detail?dataSetId=fde027e80ae14c30a87ff9fff7be21e3

下载链接

链接失效反馈

官方服务：

资源简介：

Chinese-Tibetan machine translation has great value to the promotion of general use of national language, cross-ethnic communication in Tibetan area and the implementation and promotion of policies. Recently, neural machine translation has made great progress in the machine translation of Chinese-English, English-French and other languages with the help of large-scale training data and powerful modeling capabilities of deep learning models. Neural machine translation usually relies on large-scale parallel sentence pairs for training. However, the machine translation of Chinese and minority languages such as Chinese-Tibetan, Chinese-Mongolian and other Chinese and minority languages lack large-scale bilingual parallel data, resulting in the lack of performance of Chinese and ethnic language machine translation . Aiming at resolving the problem of lack of training data for Chinese and ethnic language machine translation, this paper utilizes the current situation of large-scale annotation data and mature natural language processing tools in Chinese, and proposes a way to use monolingual data augmentation to expand the scale of Chinese-Tibetan parallel sentence pairs. To improve the quality of Chinese-Tibetan machine translation. This paper has carried out a variety of data augmentation methods. The results on three main neural machine translation models show that the monolingual data augmentation method can effectively improve the quality of machine translation, which is a machine translation between Chinese and national languages. The translation provides a reference

提供机构：

Institute of Ethnology and Anthropology, Chinese Academy of Social Sciences; Congjun Long

创建时间：

2021-12-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集