five

A dataset of Mongolian-Chinese speech translation

收藏
科学数据银行2022-05-07 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/en/detail?dataSetId=ccb197e059de4d9ea06c351aa3c60bb6
下载链接
链接失效反馈
官方服务:
资源简介:
Due to the lack of public datasets, few researches focus on speech translation in minority languages. To this end, this paper constructs a dataset of Mongolian-Chinese speech translation, named as NMLR-Mon2Chs ST. The dataset consists of Mongolian speech, Mongolian and Chinese text. First, Mongolian speech were obtained from 36 Mongols aged between 20 and 25 by recording on their mobile phones. Then, the corresponding Chinese texts were annotated by professionals. In order to make sure the quality of the dataset, the preprocessing was done, such as removing the quiet speech, resampling, and normalization. As a result, a total of 25 hours of high-quality data are obtained, and the average duration of audio in the dataset is 4.2 seconds. The establishment of this dataset allows researchers access to speech translation for minority languages.
提供机构:
中国少数民族语言文学学院, 中央民族大学; Xiaobing Zhao; 中国政法大学;中国少数民族语言文学学院, 中央民族大学; Borjigin B.Teniger
创建时间:
2021-12-21
二维码
社区交流群
二维码
科研交流群
商业服务