ChineseChildrenSpeechData
收藏魔搭社区2026-05-15 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/zuosi041116/ChineseChildrenSpeechData
下载链接
链接失效反馈官方服务:
资源简介:
# 数据堂178小时中国儿童麦克风语音采集数据数据
## 数据集描述
用于178小时中国儿童麦克风语音采集数据数据 中文语音识别模型”模型的测试任务
### 数据集简介
178小时中国儿童麦克风语音采集数据数据是由739名来自中国的儿童通过高保真麦克风采集录制的语音数据,其中男女比例均衡。178小时中国儿童麦克风语音采集数据的录音内容主要来自儿童教科书、儿童故事书、数字,符合儿童语言使用习惯。录音环境为相对安静的室内,文本均经过人工转写,准确率高。
### 数据集支持的任务
中文语音识别模型”模型的测试任务
## 数据集的格式和结构
### 数据格式
44.1kHz, 16bit,未压缩wav,单声道
### 设备
Motu 声卡+Avantone 麦克风;Blueyeti 麦克风
### 人员
739名中国儿童;女性387人,占比52%
### 转写内容
文本转写;时间戳;噪音符号;标识符
## 数据集生成的相关信息
### 原始数据
无
### 数据集标注
句准确率97%(噪音符号和其他标识符的准确率不计入在内)
#### 标注过程
无
#### 标注者
无
## 数据集版权信息
版权归数据堂所有,商用数据。
## 其他相关信息
详见https://www.datatang.com/dataset/26?source=modelscope
# Datatang 178-hour Chinese Children's Microphone Speech Collection Dataset
## Dataset Description
This 178-hour Chinese children's microphone speech collection dataset is designed for testing Chinese speech recognition models.
### Dataset Overview
This 178-hour speech dataset was collected from 739 Chinese children with a balanced gender ratio using high-fidelity microphones. The recorded content mainly originates from children's textbooks, storybooks and numerical materials, which conform to the language usage habits of children. All recordings were conducted in relatively quiet indoor environments, and all texts were manually transcribed with high accuracy.
### Supported Tasks
Testing tasks for Chinese speech recognition models.
## Dataset Format and Structure
### Data Format
Uncompressed single-channel WAV files with a sampling rate of 44.1kHz and 16-bit bit depth.
### Recording Equipment
Motu sound card + Avantone microphone; Blue Yeti microphone
### Participant Information
739 Chinese children, including 387 females, accounting for 52% of the total cohort.
### Transcription Content
Text transcripts, timestamps, noise symbols and identifiers.
## Dataset Generation and Annotation
### Original Data
None
### Dataset Annotation
The sentence-level accuracy reaches 97% (the accuracy of noise symbols and other identifiers is not included in the calculation).
#### Annotation Process
None
#### Annotators
None
## Copyright Information
Copyright owned by Datatang, for commercial use.
## Additional Information
For more details, please refer to: https://www.datatang.com/dataset/26?source=modelscope
提供机构:
maas
创建时间:
2024-08-13



