eurospeech-enhanced-dacvae
收藏Hugging Face2026-03-22 更新2026-03-23 收录
下载链接:
https://huggingface.co/datasets/laion/eurospeech-enhanced-dacvae
下载链接
链接失效反馈官方服务:
资源简介:
EuroSpeech是一个包含欧洲议会演讲的语音数据集,已转换为DAC VAE潜在表示。数据集采用WebDataset格式,包含约569个分片(每个约2GB),总计包含来自多种语言(包括丹麦语、德语、英语、克罗地亚语、意大利语、立陶宛语、马耳他语、挪威语和葡萄牙语)的超过190万条语音样本。每个样本包含三个文件:原始FLAC格式音频、DAC VAE潜在表示([T_latent, 128]形状的numpy float32数组)和包含丰富元数据的JSON文件。DAC VAE潜在表示使用Facebook DACVAE模型生成,输入采样率为48,000Hz,潜在帧率为25帧/秒。数据集适用于自动语音识别(ASR)和文本转语音(TTS)任务,并提供详细的元数据信息,包括文本转录、音频时长、字符每秒等指标。
EuroSpeech is a speech dataset comprising speeches from the European Parliament, which has been converted into DAC VAE latent representations. The dataset is stored in WebDataset format, containing approximately 569 shards (each around 2GB in size), with a total of over 1.9 million speech samples across multiple languages including Danish, German, English, Croatian, Italian, Lithuanian, Maltese, Norwegian and Portuguese. Each sample includes three files: the original FLAC-format audio, the DAC VAE latent representation (a numpy float32 array with shape [T_latent, 128]), and a JSON file with rich metadata. The DAC VAE latent representations are generated using Facebook's DACVAE model, with an input sampling rate of 48,000 Hz and a latent frame rate of 25 frames per second. This dataset is suitable for automatic speech recognition (ASR) and text-to-speech (TTS) tasks, and provides detailed metadata including text transcriptions, audio duration, characters per second and other metrics.
提供机构:
LAION eV
创建时间:
2026-03-21



