TTS-AGI/EN_Emilia_Yodas_ScribeEvents
收藏Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/TTS-AGI/EN_Emilia_Yodas_ScribeEvents
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
language:
- en
task_categories:
- automatic-speech-recognition
tags:
- vocal-bursts
- scribe-events
- emilia
pretty_name: EN Emilia Yodas - Scribe Events (Filtered)
---
# EN Emilia Yodas - Scribe Events
Filtered subset of [MrDragonFox/EN_Emilia_Yodas_616h](https://huggingface.co/datasets/MrDragonFox/EN_Emilia_Yodas_616h) containing only samples with **ElevenLabs Scribe v1 audio events** (vocal bursts, background sounds, etc.).
## Changes from source
1. **Filtered** to only include rows where `events_scribe` is non-empty (16017 rows out of 228,265 original)
2. **Bracket format unified**: Round brackets `(laughs)` in `text_scribe` replaced with square brackets `[laughs]` for consistency with vocal burst annotation format
## Example
```
text_scribe: "Can we get the-- yeah, so I have the mic. [laughs] All right, on the back here."
events_scribe: "<laughs>"
```
## Event types include
- Vocal bursts: `<laughs>`, `<sighs>`, `<clears throat>`, `<clicks tongue>`, `<gulps>`, etc.
- Background: `<background noise>`, `<dog barking>`, `<music>`, etc.
- Other: `<pause>`, `<unintelligible>`, `<bleep>`, etc.
## Dataset Structure
Same columns as source dataset:
- `file_id` - Unique identifier
- `audio` - Audio clip (3-30s)
- `text_scribe` - ASR transcription with events in [square brackets]
- `events_scribe` - Scribe event classification
- `text_emilia` - Reference transcription
- `duration` - Audio duration
- `speaker` - Speaker ID
- `language` - "en"
- `dnsmos` - Audio quality score
- Quality metrics: `CE`, `CU`, `PC`, `PQ`
## Source
Derived from [MrDragonFox/EN_Emilia_Yodas_616h](https://huggingface.co/datasets/MrDragonFox/EN_Emilia_Yodas_616h) (CC BY 4.0)
提供机构:
TTS-AGI



