five

TTS-AGI/EN_Emilia_Yodas_ScribeEvents

收藏
Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/TTS-AGI/EN_Emilia_Yodas_ScribeEvents
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 language: - en task_categories: - automatic-speech-recognition tags: - vocal-bursts - scribe-events - emilia pretty_name: EN Emilia Yodas - Scribe Events (Filtered) --- # EN Emilia Yodas - Scribe Events Filtered subset of [MrDragonFox/EN_Emilia_Yodas_616h](https://huggingface.co/datasets/MrDragonFox/EN_Emilia_Yodas_616h) containing only samples with **ElevenLabs Scribe v1 audio events** (vocal bursts, background sounds, etc.). ## Changes from source 1. **Filtered** to only include rows where `events_scribe` is non-empty (16017 rows out of 228,265 original) 2. **Bracket format unified**: Round brackets `(laughs)` in `text_scribe` replaced with square brackets `[laughs]` for consistency with vocal burst annotation format ## Example ``` text_scribe: "Can we get the-- yeah, so I have the mic. [laughs] All right, on the back here." events_scribe: "<laughs>" ``` ## Event types include - Vocal bursts: `<laughs>`, `<sighs>`, `<clears throat>`, `<clicks tongue>`, `<gulps>`, etc. - Background: `<background noise>`, `<dog barking>`, `<music>`, etc. - Other: `<pause>`, `<unintelligible>`, `<bleep>`, etc. ## Dataset Structure Same columns as source dataset: - `file_id` - Unique identifier - `audio` - Audio clip (3-30s) - `text_scribe` - ASR transcription with events in [square brackets] - `events_scribe` - Scribe event classification - `text_emilia` - Reference transcription - `duration` - Audio duration - `speaker` - Speaker ID - `language` - "en" - `dnsmos` - Audio quality score - Quality metrics: `CE`, `CU`, `PC`, `PQ` ## Source Derived from [MrDragonFox/EN_Emilia_Yodas_616h](https://huggingface.co/datasets/MrDragonFox/EN_Emilia_Yodas_616h) (CC BY 4.0)
提供机构:
TTS-AGI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作