数据堂—1,505小时普通话手机采集语音数据
收藏魔搭社区2025-12-25 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/DatatangBeijing/1505Hours-MandarinSpeechbyMobilePhone
下载链接
链接失效反馈官方服务:
资源简介:
1,505小时普通话手机采集语音数据由6278位分布于广东、福建、山东、江苏、北京、湖南等全国33省中国发音人参与录制。其中,男性2980人,女性3298人,录音内容为常用口语句子,录音环境包含安静环境和噪音环境。1,505小时普通话手机采集语音数据标注文本均由专业标注人员转写校对,准确率不低于98%
1,505 hours of Mandarin speech data collected via mobile phones, with 6,278 Chinese speakers from 33 provinces across China including Guangdong, Fujian, Shandong, Jiangsu, Beijing, Hunan and other regions participating in the recording. Among them, 2,980 are male speakers and 3,298 are female speakers. The recording content consists of daily colloquial sentences, and the recording environments cover both quiet and noisy settings. The annotated texts of this 1,505-hour mobile-collected Mandarin speech dataset were all transcribed and proofread by professional annotators, with an accuracy rate of no less than 98%.
提供机构:
maas
创建时间:
2024-05-06
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含1,505小时普通话手机采集语音,用于测试普通话语音识别模型。数据由6,278名来自中国33个省份的说话者录制,涵盖安静和嘈杂环境下的常见口语句子,音频格式为16kHz未压缩WAV,转录准确率不低于98%。
以上内容由遇见数据集搜集并总结生成



