BSC-LT/distilled-yodas-spanish
收藏Hugging Face2025-12-15 更新2025-10-18 收录
下载链接:
https://hf-mirror.com/datasets/BSC-LT/distilled-yodas-spanish
下载链接
链接失效反馈官方服务:
资源简介:
Distilled YODAS Spanish 是一个高质量的数据集,包含约 8,000 小时的西班牙语音数据。该数据集是从包含超过 37,000 小时西班牙语音的 YODAS 数据集中筛选出来的,筛选过程保留了时长在 2-30 秒且至少包含三个单词的语音片段,并通过两个专门的西班牙验证模型进行验证。数据集分为训练、验证和测试三个部分,每个部分都包含不同的一致性级别的转录文本。该数据集适用于自动语音识别 (ASR) 和相关任务的训练和评估。
The Distilled YODAS Spanish dataset is a high-quality subset of the Spanish portion of the YODAS dataset, containing approximately 8,000 hours of Spanish speech. It is designed for Automatic Speech Recognition (ASR) tasks and includes metadata such as audio path, transcription, and other relevant information. The dataset is divided into training, validation, and test splits, each with a specific consensus level for transcription quality. This dataset is valuable for training and evaluating ASR models in Spanish.
提供机构:
BSC-LT



