ESpeech-igm

Name: ESpeech-igm
Creator: maas
Published: 2025-12-05 16:48:29
License: 暂无描述

魔搭社区2025-12-05 更新2025-08-30 收录

下载链接：

https://modelscope.cn/datasets/ESpeech/ESpeech-igm

下载链接

链接失效反馈

官方服务：

资源简介：

# IGM YouTube Audio Dataset ## Dataset Description This dataset contains 220 hours of processed audio segments extracted from the IGM YouTube channel with corresponding metadata. Each audio file represents a segment from IGM's educational videos and lectures, processed at 44.1kHz sample rate. ### Dataset Summary - **Language**: Russian - **Task**: TTS, ASR, Quality Assessment - **Audio format**: MP3, 44.1kHz sample rate - **Structure**: Segmented audio files with JSON metadata - **Source**: IGM YouTube channel content ## Dataset Structure ### Data Fields #### Basic Information - `audio`: Audio data (44.1kHz sample rate, MP3 format) - `file_name`: Name of the audio segment file (format: `<original_name>_<idx>.mp3`) - `segment_index`: Index of the audio segment within the original video - `original_name`: Original name of the YouTube video recording #### Transcription and Timing - `text`: Transcribed text of the audio segment - `start`: Start time of the segment in seconds - `end`: End time of the segment in seconds - `words`: Word-level timestamps and confidence scores #### Speaker Information - `speaker`: Speaker identifier (e.g., "SPEAKER_00") #### Quality Metrics - `emos_overall`: EMOS overall quality score - `noise_confidence`: Noise detection confidence ![design](https://huggingface.co/datasets/ESpeech/ESpeech-igm/resolve/main/mos.png) #### Segment Structure - `num_sentences`: Number of sentences (for merged segments) - `original_segments`: Original subsegments data (for merged segments) #### VAD (Voice Activity Detection) - `vad_trimmed`: Whether VAD trimming was applied - `vad_start`: VAD start time - `trim_ratio`: Ratio of trimmed audio ### Data Splits - **Train**: All available YouTube video segments ## Dataset Creation ### Source Data The dataset consists of audio content extracted from the IGM YouTube channel. IGM produces educational content, lectures, and discussions primarily in Russian. Each YouTube video has been processed and segmented into multiple audio clips, with each segment saved as a separate MP3 file along with its transcription and metadata. ## Usage ### Loading the Dataset Load and extract the tar.aa and tar.ab archive files using: ```bash cat igm_archive.tar.aa igm_archive.tar.ab > igm_archive.tar && tar -xf igm_archive.tar ``` ### Citation Information ```bibtex @dataset{igm_youtube_audio_dataset, title={IGM YouTube Audio Dataset}, author={Denis Petrov}, year={2025}, url={https://huggingface.co/datasets/ESpeech/ESpeech-igm/} } ```

# IGM YouTube音频数据集（IGM YouTube Audio Dataset） ## 数据集描述本数据集包含从IGM YouTube频道提取的220小时处理后音频片段及对应元数据。每个音频文件均来自IGM的教育类视频与讲座片段，采样率为44.1kHz。 ### 数据集概览 - **语言**：俄语 - **任务**：文本转语音（Text-to-Speech, TTS）、自动语音识别（Automatic Speech Recognition, ASR）与质量评估 - **音频格式**：MP3格式，采样率44.1kHz - **结构**：附带JSON元数据的分段音频文件 - **来源**：IGM YouTube频道内容 ## 数据集结构 ### 数据字段 #### 基础信息 - `audio`：音频数据（采样率44.1kHz，MP3格式） - `file_name`：音频片段文件名（格式：`<original_name>_<idx>.mp3`） - `segment_index`：原始视频内的音频片段索引 - `original_name`：原YouTube视频的原始名称 #### 转录文本与时间戳 - `text`：音频片段的转录文本 - `start`：片段起始时间（单位：秒） - `end`：片段结束时间（单位：秒） - `words`：词级时间戳与置信度得分 #### 说话人信息 - `speaker`：说话人标识符（例如："SPEAKER_00"） #### 质量指标 - `emos_overall`：EMOS整体质量评分 - `noise_confidence`：噪声检测置信度 ![design](https://huggingface.co/datasets/ESpeech/ESpeech-igm/resolve/main/mos.png) #### 片段结构 - `num_sentences`：（合并片段的）句子数量 - `original_segments`：（合并片段的）原子段数据 #### 语音活动检测（Voice Activity Detection, VAD） - `vad_trimmed`：是否应用了VAD音频修剪 - `vad_start`：VAD起始时间 - `trim_ratio`：音频修剪比例 ### 数据划分 - **训练集**：所有可用的YouTube视频片段 ## 数据集构建 ### 源数据本数据集的音频内容均提取自IGM YouTube频道。IGM主要制作俄语的教育内容、讲座与讨论类节目。每个YouTube视频均经过处理并分割为多个音频片段，每个片段以独立MP3文件形式存储，并附带对应的转录文本与元数据。 ## 使用说明 ### 加载数据集使用以下命令合并并解压归档文件： bash cat igm_archive.tar.aa igm_archive.tar.ab > igm_archive.tar && tar -xf igm_archive.tar ### 引用信息 bibtex @dataset{igm_youtube_audio_dataset, title={IGM YouTube Audio Dataset}, author={Denis Petrov}, year={2025}, url={https://huggingface.co/datasets/ESpeech/ESpeech-igm/} }

提供机构：

maas

创建时间：

2025-08-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集