中英文混合儿童手机语音数据库
收藏北京国际大数据交易所2024-03-01 收录
下载链接:
https://webs.bjidex.com/sys-bsc-home/#/bscConsole/tradingMarket/detail?id=245
下载链接
链接失效反馈官方服务:
资源简介:
每个发音人录制 1-4个会话。总录音时长约为991.5小时,包括开头和结尾的静音段(各约500毫秒)。该数据库的总大小为106 GB。 该数据库包括886名男性发音人(47%)和998名女性发音人(53%)。 原始语料是语音丰富的句子。考虑到发音人说出这些句子的潜在认知负荷,我们小心地选择长度在5到15个单词之间的自然句子。原始句子选自日常领域并删除了一些包含攻击性或否定词的短语以及句子。我们的句子列表中有大约319,200个不同的句子,每句话重复次数不超过五次。 在该项目中,我们收集了以下年龄段的语音数据:年龄段 #发音人 #发音人(%) 4 – 6 years 201 10.7% 7 – 9 years 774 41.1% 10 – 12 years 909 48.2% 总计 1,884 100% 根据中文普通话的特点,中国分为2个方言区:北方方言与南方方言。地区 #典型城市 # 发音人 # 发音人 (%) 北方 山东/天津/北京/黑龙江 1020 54.1% 南方 上海/湖南/江西/福建 864 45.9% 总计 1,884 100%
Each speaker recorded 1 to 4 sessions. The total recording duration is approximately 991.5 hours, including the silent segments at the start and end (each lasting about 500 milliseconds). The total size of this database is 106 GB.
This database comprises 886 male speakers (47%) and 998 female speakers (53%).
The original corpus consists of phonetically rich sentences. Considering the potential cognitive load on speakers when producing these sentences, we carefully selected natural sentences with lengths ranging from 5 to 15 words. The original sentences were sourced from daily scenarios, and any phrases or sentences containing offensive or negative vocabulary were excluded. Our sentence inventory includes approximately 319,200 unique sentences, with each sentence being repeated no more than five times.
In this project, we collected speech data across the following age groups:
- 4–6 years: 201 speakers, accounting for 10.7% of the total cohort
- 7–9 years: 774 speakers, accounting for 41.1% of the total cohort
- 10–12 years: 909 speakers, accounting for 48.2% of the total cohort
- Total: 1,884 speakers, representing 100% of all participants
Based on the characteristics of Standard Mandarin Chinese, China is divided into two dialect regions: Northern Dialect Region and Southern Dialect Region. The regional distribution is as follows:
- Northern Dialect Region: typical cities include Shandong, Tianjin, Beijing and Heilongjiang, with 1020 speakers, accounting for 54.1% of the total
- Southern Dialect Region: typical cities include Shanghai, Hunan, Jiangxi and Fujian, with 864 speakers, accounting for 45.9% of the total
- Total: 1,884 speakers, representing 100% of all participants
提供机构:
北京海天瑞声科技股份有限公司
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据库包含1884名4-12岁儿童(男女比例均衡)的991.5小时中英文混合语音,覆盖中国南北方言区,采用31.9万条5-15词的自然短句作为语料,总数据量达106GB。
以上内容由遇见数据集搜集并总结生成



