tedlium

Name: tedlium
Creator: huggingface.co
License: 暂无描述

huggingface.co2025-03-24 收录

下载链接：

https://huggingface.co/datasets/LIUM/tedlium

下载链接

链接失效反馈

官方服务：

资源简介：

Dataset Card for tedlium Dataset Summary The TED-LIUM corpus is English-language TED talks, with transcriptions, sampled at 16kHz. The three releases of the corpus range from 118 to 452 hours of transcribed speech data. Example from datasets import load_dataset tedlium = load_dataset("LIUM/tedlium", "release1") # for Release 1 # see structure print(tedlium) # load audio sample on the fly audio_input = tedlium["train"][0]["audio"] # first decoded… See the full description on the dataset page: https://huggingface.co/datasets/LIUM/tedlium.

TED-LIUM语料库为英语语言的TED演讲集，包含转录文本，采样频率为16kHz。该语料库共分为三个版本，转录语音数据时长介于118至452小时之间。示例从数据集中加载数据集： python from datasets import load_dataset # 加载Release 1版本的TED-LIUM语料库 tedlium = load_dataset("LIUM/tedlium", "release1") # 查看数据集结构 print(tedlium) # 动态加载音频样本 audio_input = tedlium["train"][0]["audio"] # 查看数据集的完整描述，请访问：https://huggingface.co/datasets/LIUM/tedlium.

提供机构：

huggingface.co