LibriSpeech corpus

Name: LibriSpeech corpus
Creator: LibriSpeech
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/run.sh

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集来源于有声读物，包含了以阅读方式录制的母语为英语的说话人语音。所有音频的采样率为16千赫兹，采用16位编码。在声学建模中，我们使用了清理后的“train_960_cleaned”子集。该数据集的规模为960小时，任务重点在于声学建模。

This dataset is derived from audiobooks, containing speech recordings of native English speakers produced in a reading manner. All audio files have a sampling rate of 16 kHz and use 16-bit encoding. For acoustic modeling, we utilized the cleaned "train_960_cleaned" subset. This dataset has a total duration of 960 hours, with the core task focusing on acoustic modeling.

提供机构：

LibriSpeech

5,000+

优质数据集

54 个

任务类型

进入经典数据集