nvidia/Granary
收藏Hugging Face2026-03-12 更新2025-08-30 收录
下载链接:
https://hf-mirror.com/datasets/nvidia/Granary
下载链接
链接失效反馈官方服务:
资源简介:
Granary是一个大规模、开源的多语言语音数据集,涵盖25种欧洲语言,用于自动语音识别(ASR)和自动语音翻译(AST)任务。该数据集提供了约1百万小时的伪标记高质量ASR语音数据,并支持ASR(转录)和AST(X→英语翻译)两种主要任务。Granary采用了一个复杂的多阶段处理流程,确保来自不同来源的数据具有高质量和一致性。它还包括如何访问和使用数据集的信息,以及如何组织音频文件以便正确运行的说明。
Granary is a large-scale, open-source multilingual speech dataset covering 25 European languages for Automatic Speech Recognition (ASR) and Automatic Speech Translation (AST) tasks. It provides approximately 1 million hours of high-quality pseudo-labeled ASR speech data and supports two main tasks: ASR (transcription) and AST (X→English translation). Granary employs a sophisticated multi-stage processing pipeline to ensure high-quality and consistent data from various sources. It also includes information on how to access and use the dataset, as well as instructions on organizing audio files for proper functionality.
提供机构:
nvidia



