ESpeech/ESpeech-upvote
收藏Hugging Face2025-08-25 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/ESpeech/ESpeech-upvote
下载链接
链接失效反馈官方服务:
资源简介:
Upvote YouTube音频数据集包含了从Upvote YouTube频道提取的296小时的音频片段及其对应的元数据。每个音频文件代表频道视频内容的一个片段,以44.1kHz的采样率进行处理。该数据集适用于文本转语音(TTS)、自动语音识别(ASR)和语音质量评估任务。数据集的语言为俄语,音频格式为MP3,采样率为44.1kHz。数据集的结构包括音频数据、文件名、片段索引、原始视频名称、音频片段的转录文本、起始和结束时间、单词级别的 时间戳和置信度分数、说话者信息、质量指标、片段结构以及语音活动检测(VAD)相关信息。所有可用的YouTube视频片段都被用作训练集。
The Upvote YouTube Audio Dataset contains 296 hours of processed audio segments extracted from the Upvote YouTube channel along with corresponding metadata. Each audio file is a segment from the channels videos and content, processed at a 44.1kHz sample rate. The dataset is intended for tasks such as text-to-speech (TTS), automatic speech recognition (ASR), and quality assessment. The language of the dataset is Russian, and the audio format is MP3 at a 44.1kHz sample rate. The structure of the dataset includes audio data, file names, segment indexes, original video names, transcribed text of the audio segments, start and end times, word-level timestamps and confidence scores, speaker information, quality metrics, segment structure, and voice activity detection (VAD) related information. All available YouTube video segments are used as the training set.
提供机构:
ESpeech



