A dataset of Mongolian-Chinese speech translation

Name: A dataset of Mongolian-Chinese speech translation
Creator: 中国少数民族语言文学学院, 中央民族大学; Xiaobing Zhao; 中国政法大学；中国少数民族语言文学学院, 中央民族大学; Borjigin B.Teniger
Published: 2022-05-07 00:00:00
License: 暂无描述

科学数据银行2022-05-07 更新2026-04-23 收录

下载链接：

https://www.scidb.cn/en/detail?dataSetId=ccb197e059de4d9ea06c351aa3c60bb6

下载链接

链接失效反馈

官方服务：

资源简介：

Due to the lack of public datasets, few researches focus on speech translation in minority languages. To this end, this paper constructs a dataset of Mongolian-Chinese speech translation, named as NMLR-Mon2Chs ST. The dataset consists of Mongolian speech, Mongolian and Chinese text. First, Mongolian speech were obtained from 36 Mongols aged between 20 and 25 by recording on their mobile phones. Then, the corresponding Chinese texts were annotated by professionals. In order to make sure the quality of the dataset, the preprocessing was done, such as removing the quiet speech, resampling, and normalization. As a result, a total of 25 hours of high-quality data are obtained, and the average duration of audio in the dataset is 4.2 seconds. The establishment of this dataset allows researchers access to speech translation for minority languages.

提供机构：

中国少数民族语言文学学院, 中央民族大学; Xiaobing Zhao; 中国政法大学；中国少数民族语言文学学院, 中央民族大学; Borjigin B.Teniger

创建时间：

2021-12-21