数据堂—1,652小时粤语手机采集语音数据

Name: 数据堂—1,652小时粤语手机采集语音数据
Creator: maas
Published: 2026-05-20 15:15:38
License: 暂无描述

魔搭社区2026-05-20 更新2024-05-15 收录

下载链接：

https://modelscope.cn/datasets/DatatangBeijing/1652Hours_CantoneseDialectSpeechDataByMobilePhone

下载链接

链接失效反馈

官方服务：

资源简介：

1,652小时粤语手机采集语音数据包括4888名来自广东省发音人，在安静的室内环境下的录音数据。录音内容广泛，覆盖50 万句常用口语语句，包括微博高频词、日常用语等。1,652小时粤语手机采集语音数据的句子平均重复次数1.5次，平均句长12.5字。匹配主流安卓、苹果系统手机

This dataset contains 1,652 hours of Cantonese speech data collected via mobile phones, with recordings made by 4,888 speakers from Guangdong Province in quiet indoor environments. The recorded content is extensive, covering 500,000 commonly used spoken sentences including high-frequency microblog terms and daily conversational expressions. For this dataset, the average repetition count per sentence is 1.5 times, and the average sentence length is 12.5 Chinese characters. It is compatible with mainstream Android and Apple smartphones.

提供机构：

maas

创建时间：

2024-05-06

搜集汇总

数据集介绍