粤语手机采集语音数据【数据堂】
收藏OpenDataLab2023-12-20 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/shujutang/shujutang54
下载链接
链接失效反馈官方服务:
资源简介:
1,652小时粤语手机采集语音数据包括4888名来自广东省发音人,在安静的室内环境下的录音数据。录音内容广泛,覆盖50 万句常用口语语句,包括微博高频词、日常用语等。1,652小时粤语手机采集语音数据的句子平均重复次数1.5次,平均句长12.5字。匹配主流安卓、苹果系统手机。
This dataset contains 1,652 hours of Cantonese speech data collected via mobile devices, including recordings from 4,888 speakers across Guangdong Province, all captured in quiet indoor environments. The recorded content covers a wide range of scenarios, including 500,000 common colloquial sentences such as high-frequency Weibo terms and daily expressions. The average repetition count of each sentence in the dataset is 1.5 times, with an average sentence length of 12.5 Chinese characters. This dataset is compatible with mainstream Android and Apple iOS smartphones.
提供机构:
shujutang
创建时间:
2023-12-20
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含1,652小时的粤语手机采集语音,由4888名广东发音人在安静室内录制,涵盖50万句常用口语,如微博高频词和日常用语。数据平均句长12.5字,重复1.5次,兼容主流安卓和苹果手机,为商业数据,仅面向企业合作购买。
以上内容由遇见数据集搜集并总结生成



