ESpeech-upvote
收藏魔搭社区2025-12-05 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/ESpeech/ESpeech-upvote
下载链接
链接失效反馈官方服务:
资源简介:
# Upvote YouTube Audio Dataset
## Dataset Description
This dataset contains 296 hours of processed audio segments extracted from the "Upvote" YouTube channel with corresponding metadata. Each audio file represents a segment from the channel's videos and content, processed at 44.1kHz sample rate.
### Dataset Summary
- **Language**: Russian
- **Task**: TTS, ASR, Quality Assessment
- **Audio format**: MP3, 44.1kHz sample rate
- **Structure**: Segmented audio files with JSON metadata
- **Source**: Upvote YouTube channel content
## Dataset Structure
### Data Fields
#### Basic Information
- `audio`: Audio data (44.1kHz sample rate, MP3 format)
- `file_name`: Name of the audio segment file (format: `<original_name>_<idx>.mp3`)
- `segment_index`: Index of the audio segment within the original video
- `original_name`: Original name of the YouTube video recording
#### Transcription and Timing
- `text`: Transcribed text of the audio segment
- `start`: Start time of the segment in seconds
- `end`: End time of the segment in seconds
- `words`: Word-level timestamps and confidence scores
#### Speaker Information
- `speaker`: Speaker identifier (e.g., "SPEAKER_00")
#### Quality Metrics
- `emos_overall`: EMOS overall quality score
- `noise_confidence`: Noise detection confidence

#### Segment Structure
- `num_sentences`: Number of sentences (for merged segments)
- `original_segments`: Original subsegments data (for merged segments)
#### VAD (Voice Activity Detection)
- `vad_trimmed`: Whether VAD trimming was applied
- `vad_start`: VAD start time
- `trim_ratio`: Ratio of trimmed audio
### Data Splits
- **Train**: All available YouTube video segments
## Dataset Creation
### Source Data
The dataset consists of audio content extracted from the "Upvote" YouTube channel. The channel produces various types of content primarily in Russian. Each YouTube video has been processed and segmented into multiple audio clips, with each segment saved as a separate MP3 file along with its transcription and metadata.
## Usage
### Loading the Dataset
Load and extract the tar archive files using:
```bash
cat upvote_stripped_archive.tar.aa upvote_stripped_archive.tar.ab upvote_stripped_archive.tar.ac > upvote_stripped_archive.tar && tar -xf upvote_stripped_archive.tar
```
## Additional Information
### Licensing Information
Apache 2.0 License - see LICENSE file for details.
### Citation Information
```bibtex
@dataset{upvote_youtube_audio_dataset,
title={Upvote YouTube Audio Dataset},
author={Denis Petrov},
year={2025},
url={https://huggingface.co/datasets/ESpeech/ESpeech-upvote/}
}
```
# Upvote YouTube音频数据集
## 数据集说明
本数据集包含从“Upvote”YouTube频道提取的296小时预处理音频片段及对应元数据。每条音频文件均源自该频道的视频内容,采样率为44.1kHz。
### 数据集概览
- **语言**:俄语
- **任务**:文本到语音(Text-to-Speech, TTS)、自动语音识别(Automatic Speech Recognition, ASR)、质量评估
- **音频格式**:MP3,44.1kHz采样率
- **结构**:附带JSON元数据的分段音频文件
- **来源**:Upvote YouTube频道内容
## 数据集结构
### 数据字段
#### 基础信息
- `audio`:音频数据(44.1kHz采样率,MP3格式)
- `file_name`:音频片段文件名(格式:`<original_name>_<idx>.mp3`)
- `segment_index`:原始视频内的音频片段索引
- `original_name`:原YouTube视频的名称
#### 转录与时序信息
- `text`:音频片段的转录文本
- `start`:片段起始时间(单位:秒)
- `end`:片段结束时间(单位:秒)
- `words`:词级时间戳与置信度分数
#### 说话人信息
- `speaker`:说话人标识符(例如:“SPEAKER_00”)
#### 质量指标
- `emos_overall`:EMOS整体质量评分
- `noise_confidence`:噪声检测置信度

#### 片段结构
- `num_sentences`:(合并片段的)句子数量
- `original_segments`:(合并片段的)原子段数据
#### 语音活动检测(Voice Activity Detection, VAD)
- `vad_trimmed`:是否应用了VAD裁剪
- `vad_start`:VAD起始时间
- `trim_ratio`:音频裁剪比例
### 数据划分
- **训练集**:所有可用的YouTube视频片段
## 数据集构建
### 源数据
本数据集的音频内容均提取自“Upvote”YouTube频道。该频道主要产出多种类型的俄语内容。每条YouTube视频均经过处理并分割为多个音频片段,每个片段以独立MP3文件形式存储,并附带其转录文本与元数据。
## 使用说明
### 数据集加载
可通过以下命令合并并解压tar归档文件:
bash
cat upvote_stripped_archive.tar.aa upvote_stripped_archive.tar.ab upvote_stripped_archive.tar.ac > upvote_stripped_archive.tar && tar -xf upvote_stripped_archive.tar
## 补充信息
### 许可信息
采用Apache 2.0许可协议——详细信息请参阅LICENSE文件。
### 引用信息
bibtex
@dataset{upvote_youtube_audio_dataset,
title={Upvote YouTube音频数据集},
author={Denis Petrov},
year={2025},
url={https://huggingface.co/datasets/ESpeech/ESpeech-upvote/}
}
提供机构:
maas
创建时间:
2025-08-28



