aidatatang
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/aidatatang
下载链接
链接失效反馈官方服务:
资源简介:
aidatatang语料库的内容和相应的描述包括:
语料库包含200小时的声学数据,主要是移动记录的数据。
邀请了来自中国不同口音地区的600位演讲者参与录音。
每个句子的转录准确率大于 98%。
录音在安静的室内环境中进行。
数据库按7:1:2的比例分为训练集、验证集和测试集。
语音数据编码和说话人信息等详细信息保留在元数据文件中。
还提供了分段的成绩单。
该语料库旨在支持语音识别、机器翻译、声纹识别和其他语音相关领域的研究人员。因此,语料库完全免费供学术使用。每个句子的转录准确率大于 98%。
The content and corresponding descriptions of the Aidatatang Corpus are as follows: The corpus contains 200 hours of acoustic data, primarily recorded via mobile devices. Six hundred speakers from diverse accent regions across China were invited to participate in the recording sessions. The transcription accuracy of each sentence exceeds 98%. All recordings were conducted in quiet indoor environments. The corpus is split into training, validation, and test sets at a ratio of 7:1:2. Detailed information such as speech data encoding and speaker metadata is stored in the metadata files. Segmented transcriptions are also provided. This corpus is intended to support researchers in fields including speech recognition, machine translation, speaker verification, and other speech-related domains. Therefore, the corpus is completely free for academic use. The transcription accuracy of each sentence exceeds 98%.
提供机构:
OpenDataLab
创建时间:
2023-06-25
搜集汇总
数据集介绍

背景与挑战
背景概述
aidatatang是一个公开的中文语音语料库,包含200小时的移动设备录制数据,涉及600位来自不同口音地区的说话者,转录准确率超过98%。该数据集按7:1:2的比例划分为训练集、验证集和测试集,提供详细的元数据,旨在免费支持语音识别、机器翻译和声纹识别等学术研究。
以上内容由遇见数据集搜集并总结生成



