TTS-AGI/enhanced-emo-snippets-balanced-DACVAE
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/TTS-AGI/enhanced-emo-snippets-balanced-DACVAE
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- audio-classification
- text-to-speech
tags:
- emotion
- voice
- DACVAE
- empathic-insight
- balanced-dataset
size_categories:
- 10K<n<100K
---
# Enhanced Emotion Snippets — Balanced DACVAE
A balanced, emotion-bucketed subset of [TTS-AGI/enhanced-audiosnippets-DACVAE](https://huggingface.co/datasets/TTS-AGI/enhanced-audiosnippets-DACVAE),
organized by Empathic Insight Voice+ emotion and voice attribute categories.
## Overview
This dataset provides up to **100 samples per magnitude bucket** for each of the
**40 emotion categories** and **15 voice attribute dimensions** scored by
[Empathic Insight Voice+](https://huggingface.co/laion/Empathic-Insight-Voice-Plus).
## Selection Criteria
### Emotion Categories (40 dimensions)
For each emotion (e.g., Anger, Elation, Sadness, ...):
1. **Dominant emotion filter**: Only samples where this emotion has the **highest value**
among all 40 emotion dimensions are included. This ensures each sample's characteristic
emotion matches the category.
2. **Magnitude bucketing**: Samples are grouped into buckets of width 1 (e.g., [0,1), [1,2), [2,3), ...)
3. **Quality ranking**: Within each bucket, samples are ranked by `score_speech_quality` (descending)
4. **Top 100**: Up to 100 samples are selected per bucket
### Voice Attributes (15 dimensions)
For each attribute (e.g., Age, Gender, Arousal, ...):
1. **No dominant-emotion filter** (these are orthogonal to emotion)
2. Same bucketing, ranking, and selection as above
## Data Format
The dataset is in **WebDataset** format (.tar files), with one tar per category.
Each sample contains:
- `{key}.json` — Full metadata including:
- `sample_id`, `duration`, `caption`, `transcription`
- `empathic_insight_scores` (all 55 dimensions)
- `speaker_embedding` (256-d)
- `emotion_vector`, `detailed_caption`, `bude_whisper_caption`
- `_bucket_category`, `_bucket_value`, `_bucket_label`
- `_is_dominant_emotion` (whether dominant-emotion filter was applied)
- `{key}.npy` — DACVAE latent representation (pre-computed)
## File Structure
```
data/
Amusement.tar (413 samples, 5 buckets)
Elation.tar (347 samples, 6 buckets)
Pleasure_Ecstasy.tar (202 samples, 5 buckets)
Contentment.tar (326 samples, 4 buckets)
Thankfulness_Gratitude.tar (419 samples, 5 buckets)
Affection.tar (455 samples, 6 buckets)
Infatuation.tar (354 samples, 6 buckets)
Hope_Enthusiasm_Optimism.tar (498 samples, 7 buckets)
Triumph.tar (319 samples, 5 buckets)
Pride.tar (410 samples, 5 buckets)
Interest.tar (400 samples, 4 buckets)
Awe.tar (318 samples, 6 buckets)
Astonishment_Surprise.tar (385 samples, 5 buckets)
Concentration.tar (381 samples, 5 buckets)
Contemplation.tar (401 samples, 5 buckets)
Relief.tar (482 samples, 6 buckets)
Longing.tar (373 samples, 5 buckets)
Teasing.tar (306 samples, 4 buckets)
Impatience_and_Irritability.tar (430 samples, 5 buckets)
Sexual_Lust.tar (323 samples, 6 buckets)
Doubt.tar (310 samples, 5 buckets)
Fear.tar (267 samples, 5 buckets)
Distress.tar (417 samples, 5 buckets)
Confusion.tar (404 samples, 6 buckets)
Embarrassment.tar (273 samples, 4 buckets)
Shame.tar (370 samples, 6 buckets)
Disappointment.tar (415 samples, 5 buckets)
Sadness.tar (437 samples, 6 buckets)
Bitterness.tar (65 samples, 5 buckets)
Contempt.tar (353 samples, 6 buckets)
Disgust.tar (211 samples, 4 buckets)
Anger.tar (427 samples, 7 buckets)
Malevolence_Malice.tar (253 samples, 5 buckets)
Sourness.tar (164 samples, 4 buckets)
Pain.tar (319 samples, 6 buckets)
Helplessness.tar (288 samples, 4 buckets)
Fatigue_Exhaustion.tar (410 samples, 5 buckets)
Emotional_Numbness.tar (401 samples, 5 buckets)
Intoxication_Altered_States_of_Consciousness.tar (349 samples, 6 buckets)
Jealousy_and_Envy.tar (309 samples, 5 buckets)
Valence.tar (834 samples, 10 buckets)
Arousal.tar (489 samples, 7 buckets)
Submissive_vs._Dominant.tar (397 samples, 5 buckets)
Age.tar (504 samples, 6 buckets)
Gender.tar (601 samples, 7 buckets)
Serious_vs._Humorous.tar (509 samples, 7 buckets)
Vulnerable_vs._Emotionally_Detached.tar (507 samples, 6 buckets)
Confident_vs._Hesitant.tar (404 samples, 5 buckets)
Warm_vs._Cold.tar (546 samples, 6 buckets)
Monotone_vs._Expressive.tar (468 samples, 5 buckets)
High-Pitched_vs._Low-Pitched.tar (403 samples, 5 buckets)
Soft_vs._Harsh.tar (404 samples, 6 buckets)
Authenticity.tar (302 samples, 4 buckets)
Recording_Quality.tar (500 samples, 5 buckets)
Background_Noise.tar (400 samples, 4 buckets)
```
## Detailed Statistics
### Emotion Categories
| Category | Dominant Samples | Buckets | Selected | Bucket Distribution |
|----------|-----------------|---------|----------|---------------------|
| Amusement | 47,119 | 5 | 413 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):13 |
| Elation | 1,585 | 6 | 347 | [0,1):3, [1,2):100, [2,3):100, [3,4):100, [4,5):37, [5,6):7 |
| Pleasure_Ecstasy | 202 | 5 | 202 | [0,1):4, [1,2):87, [2,3):90, [3,4):20, [4,5):1 |
| Contentment | 1,502 | 4 | 326 | [0,1):85, [1,2):100, [2,3):100, [3,4):41 |
| Thankfulness_Gratitude | 43,120 | 5 | 419 | [0,1):19, [1,2):100, [2,3):100, [3,4):100, [4,5):100 |
| Affection | 10,840 | 6 | 455 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):54, [5,6):1 |
| Infatuation | 4,585 | 6 | 354 | [0,1):2, [1,2):100, [2,3):100, [3,4):100, [4,5):50, [5,6):2 |
| Hope_Enthusiasm_Optimism | 42,840 | 7 | 498 | [0,1):28, [1,2):100, [2,3):100, [3,4):100, [4,5):100, [5,6):67, [6,7):3 |
| Triumph | 2,323 | 5 | 319 | [0,1):11, [1,2):100, [2,3):100, [3,4):100, [4,5):8 |
| Pride | 8,000 | 5 | 410 | [0,1):93, [1,2):100, [2,3):100, [3,4):100, [4,5):17 |
| Interest | 1,760,408 | 4 | 400 | [0,1):100, [1,2):100, [2,3):100, [3,4):100 |
| Awe | 822 | 6 | 318 | [0,1):3, [1,2):100, [2,3):100, [3,4):100, [4,5):14, [5,6):1 |
| Astonishment_Surprise | 19,893 | 5 | 385 | [0,1):72, [1,2):100, [2,3):100, [3,4):100, [4,5):13 |
| Concentration | 358,960 | 5 | 381 | [0,1):100, [1,2):100, [2,3):100, [3,4):77, [4,5):4 |
| Contemplation | 35,067 | 5 | 401 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):1 |
| Relief | 19,039 | 6 | 482 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):80, [5,6):2 |
| Longing | 5,631 | 5 | 373 | [0,1):100, [1,2):100, [2,3):100, [3,4):70, [4,5):3 |
| Teasing | 5,011 | 4 | 306 | [0,1):100, [1,2):100, [2,3):100, [3,4):6 |
| Impatience_and_Irritability | 107,965 | 5 | 430 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):30 |
| Sexual_Lust | 5,306 | 6 | 323 | [0,1):15, [1,2):100, [2,3):100, [3,4):100, [4,5):7, [5,6):1 |
| Doubt | 1,353 | 5 | 310 | [0,1):100, [1,2):100, [2,3):100, [3,4):7, [4,5):3 |
| Fear | 3,023 | 5 | 267 | [0,1):25, [1,2):100, [2,3):100, [3,4):41, [4,5):1 |
| Distress | 23,200 | 5 | 417 | [0,1):17, [1,2):100, [2,3):100, [3,4):100, [4,5):100 |
| Confusion | 17,933 | 6 | 404 | [0,1):100, [1,2):100, [2,3):100, [3,4):97, [4,5):6, [5,6):1 |
| Embarrassment | 2,675 | 4 | 273 | [0,1):100, [1,2):100, [2,3):72, [3,4):1 |
| Shame | 1,881 | 6 | 370 | [0,1):3, [1,2):100, [2,3):100, [3,4):100, [4,5):63, [5,6):4 |
| Disappointment | 6,835 | 5 | 415 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):15 |
| Sadness | 26,666 | 6 | 437 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):36, [5,6):1 |
| Bitterness | 65 | 5 | 65 | [0,1):3, [1,2):41, [2,3):15, [3,4):5, [4,5):1 |
| Contempt | 4,201 | 6 | 353 | [0,1):28, [1,2):100, [2,3):100, [3,4):100, [4,5):24, [6,7):1 |
| Disgust | 2,139 | 4 | 211 | [0,1):8, [1,2):100, [2,3):100, [3,4):3 |
| Anger | 4,976 | 7 | 427 | [0,1):21, [1,2):100, [2,3):100, [3,4):100, [4,5):100, [5,6):5, [8,9):1 |
| Malevolence_Malice | 1,736 | 5 | 253 | [0,1):18, [1,2):100, [2,3):100, [3,4):33, [4,5):2 |
| Sourness | 223 | 4 | 164 | [0,1):34, [1,2):100, [2,3):23, [3,4):7 |
| Pain | 2,683 | 6 | 319 | [0,1):3, [1,2):100, [2,3):100, [3,4):100, [4,5):14, [5,6):2 |
| Helplessness | 1,735 | 4 | 288 | [0,1):36, [1,2):100, [2,3):100, [3,4):52 |
| Fatigue_Exhaustion | 17,533 | 5 | 410 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):10 |
| Emotional_Numbness | 20,435 | 5 | 401 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [6,7):1 |
| Intoxication_Altered_States_of_Consciousness | 10,247 | 6 | 349 | [0,1):35, [1,2):100, [2,3):100, [3,4):100, [4,5):13, [5,6):1 |
| Jealousy_&_Envy | 3,280 | 5 | 309 | [0,1):4, [1,2):100, [2,3):100, [3,4):100, [4,5):5 |
### Voice Attributes
| Attribute | Buckets | Selected | Bucket Distribution |
|-----------|---------|----------|---------------------|
| Valence | 10 | 834 | [-5,-4):33, [-4,-3):100, [-3,-2):100, [-2,-1):100, [-1,0):100, [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):1 |
| Arousal | 7 | 489 | [-1,0):3, [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):83, [5,6):3 |
| Submissive_vs._Dominant | 5 | 397 | [-1,0):100, [0,1):100, [1,2):100, [2,3):96, [3,4):1 |
| Age | 6 | 504 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):100, [5,6):4 |
| Gender | 7 | 601 | [-5,-4):1, [-3,-2):100, [-2,-1):100, [-1,0):100, [0,1):100, [1,2):100, [2,3):100 |
| Serious_vs._Humorous | 7 | 509 | [-1,0):100, [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):8, [5,6):1 |
| Vulnerable_vs._Emotionally_Detached | 6 | 507 | [-1,0):100, [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):7 |
| Confident_vs._Hesitant | 5 | 404 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):4 |
| Warm_vs._Cold | 6 | 546 | [-3,-2):46, [-2,-1):100, [-1,0):100, [0,1):100, [1,2):100, [2,3):100 |
| Monotone_vs._Expressive | 5 | 468 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):68 |
| High-Pitched_vs._Low-Pitched | 5 | 403 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):3 |
| Soft_vs._Harsh | 6 | 404 | [-3,-2):2, [-2,-1):100, [-1,0):100, [0,1):100, [1,2):100, [2,3):2 |
| Authenticity | 4 | 302 | [1,2):100, [2,3):100, [3,4):100, [4,5):2 |
| Recording_Quality | 5 | 500 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):100 |
| Background_Noise | 4 | 400 | [-1,0):100, [0,1):100, [1,2):100, [2,3):100 |
## Source Dataset
- **Source**: [TTS-AGI/enhanced-audiosnippets-DACVAE](https://huggingface.co/datasets/TTS-AGI/enhanced-audiosnippets-DACVAE)
- **Emotion Model**: [Empathic Insight Voice+](https://huggingface.co/laion/Empathic-Insight-Voice-Plus)
- **Audio Codec**: DACVAE (pre-computed latent representations stored as .npy)
## Usage
```python
import webdataset as wds
import json, io, numpy as np
# Load one emotion category
ds = wds.WebDataset("data/Anger.tar")
for sample in ds:
meta = json.loads(sample["json"])
dacvae = np.load(io.BytesIO(sample["npy"]))
print(meta["transcription"], meta["empathic_insight_scores"]["Anger"])
```
## License
Same as source dataset. See [TTS-AGI/enhanced-audiosnippets-DACVAE](https://huggingface.co/datasets/TTS-AGI/enhanced-audiosnippets-DACVAE).
提供机构:
TTS-AGI



