普通话手机采集语音数据【数据堂】
收藏OpenDataLab2023-12-12 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/shujutang/shujutang35
下载链接
链接失效反馈官方服务:
资源简介:
1,505小时普通话手机采集语音数据由6278位分布于广东、福建、山东、江苏、北京、湖南等全国33省中国发音人参与录制。其中,男性2980人,女性3298人,录音内容为常用口语句子,录音环境包含安静环境和噪音环境。1,505小时普通话手机采集语音数据标注文本均由专业标注人员转写校对,准确率不低于98%。
This 1,505-hour Mandarin speech dataset collected via mobile phones was recorded by 6,278 Chinese speakers from 33 provinces across China, including Guangdong, Fujian, Shandong, Jiangsu, Beijing, Hunan and other regions. Among them, 2,980 are male speakers and 3,298 are female speakers. The recorded content consists of commonly used colloquial sentences, and the recording environments include both quiet and noisy scenarios. All annotated transcripts of this dataset were transcribed and proofread by professional annotators, with an accuracy rate of no less than 98%.
提供机构:
shujutang
创建时间:
2023-12-12
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含1,505小时的普通话手机采集语音数据,由6,278位来自全国33个省份的发音人录制,男女比例均衡,录音内容为常用口语句子,涵盖安静和噪音环境,标注准确率不低于98%。该数据集为商业数据,仅限企业合作购买。
以上内容由遇见数据集搜集并总结生成



