TrainMan, TrainCS, DevCS
收藏arXiv2020-07-12 更新2024-07-25 收录
下载链接:
https://www.datatang.com/competition
下载链接
链接失效反馈官方服务:
资源简介:
本数据集由数据糖(北京)科技有限公司发布,包含三个子数据集:TrainMan(纯普通话数据集,500小时)、TrainCS(普通话-英语混合数据集,200小时)和DevCS(普通话-英语混合数据集,40小时)。数据收集自中国30个省份的智能手机用户,覆盖娱乐、旅游等多个领域。数据集主要用于提升普通话-英语混合语音识别系统的性能,解决语言切换识别难题。
This dataset, released by Data Sugar (Beijing) Technology Co., Ltd., comprises three subsets: TrainMan (a pure Mandarin speech dataset with 500 hours), TrainCS (a Mandarin-English mixed speech dataset with 200 hours) and DevCS (a Mandarin-English mixed speech dataset with 40 hours). The data was collected from smartphone users across 30 provinces in China, covering various domains including entertainment and tourism. It is primarily designed to improve the performance of Mandarin-English mixed speech recognition systems and tackle the challenge of language switching recognition.
提供机构:
数据糖(北京)科技有限公司
创建时间:
2020-07-12



