TTS-AGI/EN_Emilia_Yodas_ScribeEvents

Name: TTS-AGI/EN_Emilia_Yodas_ScribeEvents
Creator: TTS-AGI
Published: 2026-03-28 12:17:09
License: 暂无描述

Hugging Face2026-03-28 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/TTS-AGI/EN_Emilia_Yodas_ScribeEvents

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 language: - en task_categories: - automatic-speech-recognition tags: - vocal-bursts - scribe-events - emilia pretty_name: EN Emilia Yodas - Scribe Events (Filtered) --- # EN Emilia Yodas - Scribe Events Filtered subset of [MrDragonFox/EN_Emilia_Yodas_616h](https://huggingface.co/datasets/MrDragonFox/EN_Emilia_Yodas_616h) containing only samples with **ElevenLabs Scribe v1 audio events** (vocal bursts, background sounds, etc.). ## Changes from source 1. **Filtered** to only include rows where `events_scribe` is non-empty (16017 rows out of 228,265 original) 2. **Bracket format unified**: Round brackets `(laughs)` in `text_scribe` replaced with square brackets `[laughs]` for consistency with vocal burst annotation format ## Example ``` text_scribe: "Can we get the-- yeah, so I have the mic. [laughs] All right, on the back here." events_scribe: "<laughs>" ``` ## Event types include - Vocal bursts: `<laughs>`, `<sighs>`, `<clears throat>`, `<clicks tongue>`, `<gulps>`, etc. - Background: `<background noise>`, `<dog barking>`, `<music>`, etc. - Other: `<pause>`, `<unintelligible>`, `<bleep>`, etc. ## Dataset Structure Same columns as source dataset: - `file_id` - Unique identifier - `audio` - Audio clip (3-30s) - `text_scribe` - ASR transcription with events in [square brackets] - `events_scribe` - Scribe event classification - `text_emilia` - Reference transcription - `duration` - Audio duration - `speaker` - Speaker ID - `language` - "en" - `dnsmos` - Audio quality score - Quality metrics: `CE`, `CU`, `PC`, `PQ` ## Source Derived from [MrDragonFox/EN_Emilia_Yodas_616h](https://huggingface.co/datasets/MrDragonFox/EN_Emilia_Yodas_616h) (CC BY 4.0)

提供机构：

TTS-AGI

5,000+

优质数据集

54 个

任务类型

进入经典数据集