中文语音识别Aishell-1学术数据集训练集
收藏魔搭社区2026-06-07 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/speech_asr/speech_asr_aishell1_trainsets
下载链接
链接失效反馈官方服务:
资源简介:
希尔贝壳中文普通话开源语音数据库AISHELL-ASR0009-OS1录音时长178小时,是希尔贝壳中文普通话语音数据库AISHELL-ASR0009的一部分。AISHELL-ASR0009录音文本涉及智能家居、无人驾驶、工业生产等11个领域。录制过程在安静室内环境中, 同时使用3种不同设备: 高保真麦克风(44.1kHz,16-bit);Android系统手机(16kHz,16-bit);iOS系统手机(16kHz,16-bit)。高保真麦克风录制的音频降采样为16kHz,用于制作AISHELL-ASR0009-OS1。400名来自中国不同口音区域的发言人参与录制。经过专业语音校对人员转写标注,并通过严格质量检验,此数据库文本正确率在95%以上。分为训练集、开发集、测试集。(支持学术研究,未经允许禁止商用。)
The AISHELL-ASR0009-OS1 open-source Mandarin speech database, with a total recorded duration of 178 hours, is a subset of the Xier Beike AISHELL-ASR0009 Mandarin speech corpus. The transcriptions of the AISHELL-ASR0009 corpus cover 11 domains including smart home, autonomous driving, and industrial production. Recordings were collected in quiet indoor environments using three distinct devices: a high-fidelity microphone (44.1kHz, 16-bit), Android smartphones (16kHz, 16-bit), and iOS smartphones (16kHz, 16-bit). The audio recorded by the high-fidelity microphone was downsampled to 16kHz for the development of AISHELL-ASR0009-OS1. A total of 400 speakers from various accent regions across China participated in the recording sessions. Transcribed and annotated by professional speech proofreaders and subjected to rigorous quality inspections, the word accuracy rate of this database exceeds 95%. It is divided into training, development, and test sets. (For academic research purposes only, commercial use is prohibited without prior authorization.)
提供机构:
maas
创建时间:
2023-01-10
搜集汇总
数据集介绍

背景与挑战
背景概述
Aishell-1是一个高质量的中文语音识别学术数据集,包含178小时的多领域录音,由400名不同口音说话者录制,文本准确率超过95%。该数据集专为学术研究设计,分为训练、验证和测试三部分,禁止未经许可的商业使用。
以上内容由遇见数据集搜集并总结生成



