tedlium
收藏huggingface.co2025-03-24 收录
下载链接:
https://huggingface.co/datasets/LIUM/tedlium
下载链接
链接失效反馈官方服务:
资源简介:
Dataset Card for tedlium
Dataset Summary
The TED-LIUM corpus is English-language TED talks, with transcriptions, sampled at 16kHz. The three releases of the corpus range from 118 to 452 hours of transcribed speech data.
Example
from datasets import load_dataset
tedlium = load_dataset("LIUM/tedlium", "release1") # for Release 1
# see structure
print(tedlium)
# load audio sample on the fly
audio_input = tedlium["train"][0]["audio"] # first decoded… See the full description on the dataset page: https://huggingface.co/datasets/LIUM/tedlium.
TED-LIUM语料库为英语语言的TED演讲集,包含转录文本,采样频率为16kHz。该语料库共分为三个版本,转录语音数据时长介于118至452小时之间。
示例
从数据集中加载数据集:
python
from datasets import load_dataset
# 加载Release 1版本的TED-LIUM语料库
tedlium = load_dataset("LIUM/tedlium", "release1")
# 查看数据集结构
print(tedlium)
# 动态加载音频样本
audio_input = tedlium["train"][0]["audio"]
# 查看数据集的完整描述,请访问:https://huggingface.co/datasets/LIUM/tedlium.
提供机构:
huggingface.co



