ESpeech-webinars2

Name: ESpeech-webinars2
Creator: maas
Published: 2025-12-05 16:48:29
License: 暂无描述

魔搭社区2025-12-05 更新2025-08-30 收录

下载链接：

https://modelscope.cn/datasets/ESpeech/ESpeech-webinars2

下载链接

链接失效反馈

官方服务：

资源简介：

# Webinar Audio Dataset ## Dataset Description This dataset contains 850 hours processed webinar audio segments with corresponding metadata. Each audio file represents a segment extracted from webinar recordings, processed at 44.1kHz sample rate. ### Dataset Summary - **Language**: Russian - **Task**: TTS, ASR, Quality Asessment - **Audio format**: MP3, 44.1kHz sample rate - **Structure**: Segmented audio files with JSON metadata ## Dataset Structure ### Data Fields #### Basic Information - `audio`: Audio data (44.1kHz sample rate, MP3 format) - `file_name`: Name of the audio segment file (format: `<original_name>_<idx>.mp3`) - `segment_index`: Index of the audio segment within the original webinar - `original_name`: Original name of the webinar recording #### Transcription and Timing - `text`: Transcribed text of the audio segment - `start`: Start time of the segment in seconds - `end`: End time of the segment in seconds - `words`: Word-level timestamps and confidence scores #### Speaker Information - `speaker`: Speaker identifier (e.g., "SPEAKER_00") #### Quality Metrics - `emos_overall`: It's not utmos, it's EMOS overall quality score - `emos_1`, `emos_2`, `emos_3`: EMOS quality scores - `noise_confidence`: Noise detection confidence ![design](https://huggingface.co/datasets/ESpeech/ESpeech-webinars2/resolve/main/mos.png) #### Segment Structure - `num_sentences`: Number of sentences (for merged segments) - `original_segments`: Original subsegments data (for merged segments) #### VAD (Voice Activity Detection) - `vad_trimmed`: Whether VAD trimming was applied - `vad_start`: VAD start time - `trim_ratio`: Ratio of trimmed audio ### Data Splits - **Train**: All available webinar segments ## Dataset Creation ### Source Data The dataset consists of webinar recordings that have been processed and segmented. Each webinar is split into multiple audio segments, with each segment saved as a separate MP3 file. ## Usage ### Loading the Dataset Load all files and unpack using ```cat webinars_stripped_archive.tar.aa webinars_stripped_archive.tar.ab webinars_stripped_archive.tar.ac webinars_stripped_archive.tar.ad webinars_stripped_archive.tar.ae webinars_stripped_archive.tar.af webinars_stripped_archive.tar.ag > webinars_stripped_archive.tar && tar -xf webinars_stripped_archive.tar``` ## Additional Information ### Special Thanks Special thanks to @bethrezen for providing webinars dataset. ### Licensing Information MIT License - see LICENSE file for details. ### Citation Information ```bibtex @dataset{webinar_audio_dataset, title={Webinar Audio Dataset}, authors={Denis Petrov} year={2025}, url={https://huggingface.co/datasets/ESpeech/ESpeech-webinars2/} } ```

# 网络研讨会音频数据集（Webinar Audio Dataset） ## 数据集描述本数据集包含850小时经预处理的网络研讨会音频片段及对应元数据。每个音频文件均为从网络研讨会录制内容中提取的片段，采样率为44.1kHz。 ### 数据集概览 - **语言**：俄语（Russian） - **任务**：文本转语音（Text-to-Speech，TTS）、自动语音识别（Automatic Speech Recognition，ASR）、质量评估 - **音频格式**：MP3，采样率44.1kHz - **数据结构**：带JSON元数据的分段音频文件 ### 数据集结构 #### 数据字段 ##### 基础信息 - `audio`：音频数据（采样率44.1kHz，格式为MP3） - `file_name`：音频片段文件名（格式为`<original_name>_<idx>.mp3`） - `segment_index`：音频片段在原始网络研讨会录制内容中的索引 - `original_name`：网络研讨会录制内容的原始文件名 ##### 转录与时序信息 - `text`：音频片段的转写文本 - `start`：音频片段的起始时间（单位：秒） - `end`：音频片段的结束时间（单位：秒） - `words`：单词级时序与置信度得分 ##### 说话人信息 - `speaker`：说话人标识符（例如："SPEAKER_00"） ##### 质量指标 - `emos_overall`：并非UTMOS，而是EMOS整体质量得分 - `emos_1`、`emos_2`、`emos_3`：EMOS质量得分 - `noise_confidence`：噪声检测置信度 ![质量评估可视化结果](https://huggingface.co/datasets/ESpeech/ESpeech-webinars2/resolve/main/mos.png) ##### 片段结构 - `num_sentences`：句子数量（适用于合并后的片段） - `original_segments`：原始子片段数据（适用于合并后的片段） ##### 语音活动检测（Voice Activity Detection，VAD） - `vad_trimmed`：是否已应用VAD修剪 - `vad_start`：VAD起始时间 - `trim_ratio`：音频修剪比例 ### 数据划分 - **训练集**：全部可用的网络研讨会音频片段 ## 数据集构建 ### 源数据本数据集由经预处理与分段的网络研讨会录制内容组成。每个网络研讨会被拆分为多个音频片段，每个片段均保存为独立的MP3文件。 ## 使用方法 ### 数据集加载可通过以下命令合并并解压所有文件： cat webinars_stripped_archive.tar.aa webinars_stripped_archive.tar.ab webinars_stripped_archive.tar.ac webinars_stripped_archive.tar.ad webinars_stripped_archive.tar.ae webinars_stripped_archive.tar.af webinars_stripped_archive.tar.ag > webinars_stripped_archive.tar && tar -xf webinars_stripped_archive.tar ## 附加信息 ### 特别致谢特别感谢@bethrezen 提供本网络研讨会数据集。 ### 授权信息采用MIT许可证，详细信息请参见LICENSE文件。 ### 引用信息 bibtex @dataset{webinar_audio_dataset, title={Webinar Audio Dataset}, authors={Denis Petrov} year={2025}, url={https://huggingface.co/datasets/ESpeech/ESpeech-webinars2/} }

提供机构：

maas

创建时间：

2025-08-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集