ESpeech-webinars2
收藏魔搭社区2025-12-05 更新2025-08-30 收录
下载链接:
https://modelscope.cn/datasets/ESpeech/ESpeech-webinars2
下载链接
链接失效反馈官方服务:
资源简介:
# Webinar Audio Dataset
## Dataset Description
This dataset contains 850 hours processed webinar audio segments with corresponding metadata. Each audio file represents a segment extracted from webinar recordings, processed at 44.1kHz sample rate.
### Dataset Summary
- **Language**: Russian
- **Task**: TTS, ASR, Quality Asessment
- **Audio format**: MP3, 44.1kHz sample rate
- **Structure**: Segmented audio files with JSON metadata
## Dataset Structure
### Data Fields
#### Basic Information
- `audio`: Audio data (44.1kHz sample rate, MP3 format)
- `file_name`: Name of the audio segment file (format: `<original_name>_<idx>.mp3`)
- `segment_index`: Index of the audio segment within the original webinar
- `original_name`: Original name of the webinar recording
#### Transcription and Timing
- `text`: Transcribed text of the audio segment
- `start`: Start time of the segment in seconds
- `end`: End time of the segment in seconds
- `words`: Word-level timestamps and confidence scores
#### Speaker Information
- `speaker`: Speaker identifier (e.g., "SPEAKER_00")
#### Quality Metrics
- `emos_overall`: It's not utmos, it's EMOS overall quality score
- `emos_1`, `emos_2`, `emos_3`: EMOS quality scores
- `noise_confidence`: Noise detection confidence

#### Segment Structure
- `num_sentences`: Number of sentences (for merged segments)
- `original_segments`: Original subsegments data (for merged segments)
#### VAD (Voice Activity Detection)
- `vad_trimmed`: Whether VAD trimming was applied
- `vad_start`: VAD start time
- `trim_ratio`: Ratio of trimmed audio
### Data Splits
- **Train**: All available webinar segments
## Dataset Creation
### Source Data
The dataset consists of webinar recordings that have been processed and segmented. Each webinar is split into multiple audio segments, with each segment saved as a separate MP3 file.
## Usage
### Loading the Dataset
Load all files and unpack using ```cat webinars_stripped_archive.tar.aa webinars_stripped_archive.tar.ab webinars_stripped_archive.tar.ac webinars_stripped_archive.tar.ad webinars_stripped_archive.tar.ae webinars_stripped_archive.tar.af webinars_stripped_archive.tar.ag > webinars_stripped_archive.tar && tar -xf webinars_stripped_archive.tar```
## Additional Information
### Special Thanks
Special thanks to @bethrezen for providing webinars dataset.
### Licensing Information
MIT License - see LICENSE file for details.
### Citation Information
```bibtex
@dataset{webinar_audio_dataset,
title={Webinar Audio Dataset},
authors={Denis Petrov}
year={2025},
url={https://huggingface.co/datasets/ESpeech/ESpeech-webinars2/}
}
```
# 网络研讨会音频数据集(Webinar Audio Dataset)
## 数据集描述
本数据集包含850小时经预处理的网络研讨会音频片段及对应元数据。每个音频文件均为从网络研讨会录制内容中提取的片段,采样率为44.1kHz。
### 数据集概览
- **语言**:俄语(Russian)
- **任务**:文本转语音(Text-to-Speech,TTS)、自动语音识别(Automatic Speech Recognition,ASR)、质量评估
- **音频格式**:MP3,采样率44.1kHz
- **数据结构**:带JSON元数据的分段音频文件
### 数据集结构
#### 数据字段
##### 基础信息
- `audio`:音频数据(采样率44.1kHz,格式为MP3)
- `file_name`:音频片段文件名(格式为`<original_name>_<idx>.mp3`)
- `segment_index`:音频片段在原始网络研讨会录制内容中的索引
- `original_name`:网络研讨会录制内容的原始文件名
##### 转录与时序信息
- `text`:音频片段的转写文本
- `start`:音频片段的起始时间(单位:秒)
- `end`:音频片段的结束时间(单位:秒)
- `words`:单词级时序与置信度得分
##### 说话人信息
- `speaker`:说话人标识符(例如:"SPEAKER_00")
##### 质量指标
- `emos_overall`:并非UTMOS,而是EMOS整体质量得分
- `emos_1`、`emos_2`、`emos_3`:EMOS质量得分
- `noise_confidence`:噪声检测置信度

##### 片段结构
- `num_sentences`:句子数量(适用于合并后的片段)
- `original_segments`:原始子片段数据(适用于合并后的片段)
##### 语音活动检测(Voice Activity Detection,VAD)
- `vad_trimmed`:是否已应用VAD修剪
- `vad_start`:VAD起始时间
- `trim_ratio`:音频修剪比例
### 数据划分
- **训练集**:全部可用的网络研讨会音频片段
## 数据集构建
### 源数据
本数据集由经预处理与分段的网络研讨会录制内容组成。每个网络研讨会被拆分为多个音频片段,每个片段均保存为独立的MP3文件。
## 使用方法
### 数据集加载
可通过以下命令合并并解压所有文件:
cat webinars_stripped_archive.tar.aa webinars_stripped_archive.tar.ab webinars_stripped_archive.tar.ac webinars_stripped_archive.tar.ad webinars_stripped_archive.tar.ae webinars_stripped_archive.tar.af webinars_stripped_archive.tar.ag > webinars_stripped_archive.tar && tar -xf webinars_stripped_archive.tar
## 附加信息
### 特别致谢
特别感谢@bethrezen 提供本网络研讨会数据集。
### 授权信息
采用MIT许可证,详细信息请参见LICENSE文件。
### 引用信息
bibtex
@dataset{webinar_audio_dataset,
title={Webinar Audio Dataset},
authors={Denis Petrov}
year={2025},
url={https://huggingface.co/datasets/ESpeech/ESpeech-webinars2/}
}
提供机构:
maas
创建时间:
2025-08-28



