five

ESpeech/ESpeech-upvote

收藏
Hugging Face2025-08-25 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/ESpeech/ESpeech-upvote
下载链接
链接失效反馈
官方服务:
资源简介:
Upvote YouTube音频数据集包含了从Upvote YouTube频道提取的296小时的音频片段及其对应的元数据。每个音频文件代表频道视频内容的一个片段,以44.1kHz的采样率进行处理。该数据集适用于文本转语音(TTS)、自动语音识别(ASR)和语音质量评估任务。数据集的语言为俄语,音频格式为MP3,采样率为44.1kHz。数据集的结构包括音频数据、文件名、片段索引、原始视频名称、音频片段的转录文本、起始和结束时间、单词级别的 时间戳和置信度分数、说话者信息、质量指标、片段结构以及语音活动检测(VAD)相关信息。所有可用的YouTube视频片段都被用作训练集。

The Upvote YouTube Audio Dataset contains 296 hours of processed audio segments extracted from the Upvote YouTube channel along with corresponding metadata. Each audio file is a segment from the channels videos and content, processed at a 44.1kHz sample rate. The dataset is intended for tasks such as text-to-speech (TTS), automatic speech recognition (ASR), and quality assessment. The language of the dataset is Russian, and the audio format is MP3 at a 44.1kHz sample rate. The structure of the dataset includes audio data, file names, segment indexes, original video names, transcribed text of the audio segments, start and end times, word-level timestamps and confidence scores, speaker information, quality metrics, segment structure, and voice activity detection (VAD) related information. All available YouTube video segments are used as the training set.
提供机构:
ESpeech
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作