aidatatang

Name: aidatatang
Creator: OpenDataLab
Published: 2026-05-17 09:30:44
License: 暂无描述

OpenDataLab2026-05-17 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/aidatatang

下载链接

链接失效反馈

官方服务：

资源简介：

aidatatang语料库的内容和相应的描述包括：语料库包含200小时的声学数据，主要是移动记录的数据。邀请了来自中国不同口音地区的600位演讲者参与录音。每个句子的转录准确率大于 98%。录音在安静的室内环境中进行。数据库按7：1：2的比例分为训练集、验证集和测试集。语音数据编码和说话人信息等详细信息保留在元数据文件中。还提供了分段的成绩单。该语料库旨在支持语音识别、机器翻译、声纹识别和其他语音相关领域的研究人员。因此，语料库完全免费供学术使用。每个句子的转录准确率大于 98%。

The content and corresponding descriptions of the Aidatatang Corpus are as follows: The corpus contains 200 hours of acoustic data, primarily recorded via mobile devices. Six hundred speakers from diverse accent regions across China were invited to participate in the recording sessions. The transcription accuracy of each sentence exceeds 98%. All recordings were conducted in quiet indoor environments. The corpus is split into training, validation, and test sets at a ratio of 7:1:2. Detailed information such as speech data encoding and speaker metadata is stored in the metadata files. Segmented transcriptions are also provided. This corpus is intended to support researchers in fields including speech recognition, machine translation, speaker verification, and other speech-related domains. Therefore, the corpus is completely free for academic use. The transcription accuracy of each sentence exceeds 98%.

提供机构：

OpenDataLab

创建时间：

2023-06-25

搜集汇总

数据集介绍