speechbrain/LoquaciousSet

Name: speechbrain/LoquaciousSet
Creator: speechbrain
Published: 2026-02-11 14:31:57
License: 暂无描述

Hugging Face2026-02-11 更新2025-05-31 收录

下载链接：

https://hf-mirror.com/datasets/speechbrain/LoquaciousSet

下载链接

链接失效反馈

官方服务：

资源简介：

LargeScaleASR是一个包含25000小时英文语音识别数据的数据集，用于研究和商业用途。数据集由6个子集组成，分别是large、medium、small、clean、dev和test。large子集包含25000小时的有声读物/自发和干净/嘈杂的转录语音；medium子集包含2500小时的有声读物/自发和干净/嘈杂的转录语音；small子集包含250小时的有声读物/自发和干净/嘈杂的转录语音；clean子集包含13000小时的有声读物/自发转录语音，不包含YODA和Peoples Speech数据；dev和test子集分别包含15小时和21小时的数据。

LargeScaleASR is a dataset containing 25,000 hours of English speech recognition data for research and commercial use. The dataset consists of 6 subsets: large, medium, small, clean, dev, and test. The large subset contains 25,000 hours of read/spontaneous and clean/noisy transcribed speech; the medium subset contains 2,500 hours of read/spontaneous and clean/noisy transcribed speech; the small subset contains 250 hours of read/spontaneous and clean/noisy transcribed speech; the clean subset contains 13,000 hours of read/spontaneous transcribed speech, excluding YODA and Peoples Speech data; the dev and test subsets contain 15 hours and 21 hours of data, respectively.

提供机构：

speechbrain

5,000+

优质数据集

54 个

任务类型

进入经典数据集