数据堂—1,025小时重口音普通话手机采集语音数据
收藏魔搭社区2025-12-02 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/DatatangBeijing/1025Hours-MandarinStrongAccentSpeechDataByMobilePhone
下载链接
链接失效反馈官方服务:
资源简介:
1,025小时重口音普通话手机采集语音数据是由2000余名中国本土发音人参与录制,南方为主,并覆盖部分北方重口音省份,男女均衡。语音均采用较重口音的普通话录制,富有地域特点。本套数据里录音内容丰富,涵盖手机语音助手交互、智能家居命令、车载命令词、数字等多种类别,精准匹配智能家居、智能车载等实际应用场景
The 1,025-hour heavily accented Mandarin speech dataset was collected via mobile phones. Over 2,000 native Chinese speakers participated in the recording process, with most speakers originating from southern China and some from northern provinces with heavy accents, and the gender distribution is balanced. All recordings were produced in heavily accented Mandarin, showcasing distinct regional characteristics. The dataset contains diverse recording content covering multiple categories including mobile voice assistant interactions, smart home commands, in-vehicle command terms, and numerical content, which precisely aligns with practical application scenarios such as smart home and intelligent in-vehicle systems.
提供机构:
maas
创建时间:
2024-05-07
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个用于测试中文语音识别模型的语音数据资源,包含1,025小时重口音普通话语音数据,通过手机采集自超过2,000名中国说话者,主要来自南方地区,性别分布平衡。数据内容涵盖手机助手交互、智能家居命令、车载命令等多样场景,贴近实际应用,格式为16kHz、16bit的未压缩wav单声道音频。
以上内容由遇见数据集搜集并总结生成



