TTS-AGI/enhanced-emo-snippets-balanced-DACVAE

Name: TTS-AGI/enhanced-emo-snippets-balanced-DACVAE
Creator: TTS-AGI
Published: 2026-03-21 13:39:37
License: 暂无描述

Hugging Face2026-03-21 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/TTS-AGI/enhanced-emo-snippets-balanced-DACVAE

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - audio-classification - text-to-speech tags: - emotion - voice - DACVAE - empathic-insight - balanced-dataset size_categories: - 10K<n<100K --- # Enhanced Emotion Snippets — Balanced DACVAE A balanced, emotion-bucketed subset of [TTS-AGI/enhanced-audiosnippets-DACVAE](https://huggingface.co/datasets/TTS-AGI/enhanced-audiosnippets-DACVAE), organized by Empathic Insight Voice+ emotion and voice attribute categories. ## Overview This dataset provides up to **100 samples per magnitude bucket** for each of the **40 emotion categories** and **15 voice attribute dimensions** scored by [Empathic Insight Voice+](https://huggingface.co/laion/Empathic-Insight-Voice-Plus). ## Selection Criteria ### Emotion Categories (40 dimensions) For each emotion (e.g., Anger, Elation, Sadness, ...): 1. **Dominant emotion filter**: Only samples where this emotion has the **highest value** among all 40 emotion dimensions are included. This ensures each sample's characteristic emotion matches the category. 2. **Magnitude bucketing**: Samples are grouped into buckets of width 1 (e.g., [0,1), [1,2), [2,3), ...) 3. **Quality ranking**: Within each bucket, samples are ranked by `score_speech_quality` (descending) 4. **Top 100**: Up to 100 samples are selected per bucket ### Voice Attributes (15 dimensions) For each attribute (e.g., Age, Gender, Arousal, ...): 1. **No dominant-emotion filter** (these are orthogonal to emotion) 2. Same bucketing, ranking, and selection as above ## Data Format The dataset is in **WebDataset** format (.tar files), with one tar per category. Each sample contains: - `{key}.json` — Full metadata including: - `sample_id`, `duration`, `caption`, `transcription` - `empathic_insight_scores` (all 55 dimensions) - `speaker_embedding` (256-d) - `emotion_vector`, `detailed_caption`, `bude_whisper_caption` - `_bucket_category`, `_bucket_value`, `_bucket_label` - `_is_dominant_emotion` (whether dominant-emotion filter was applied) - `{key}.npy` — DACVAE latent representation (pre-computed) ## File Structure ``` data/ Amusement.tar (413 samples, 5 buckets) Elation.tar (347 samples, 6 buckets) Pleasure_Ecstasy.tar (202 samples, 5 buckets) Contentment.tar (326 samples, 4 buckets) Thankfulness_Gratitude.tar (419 samples, 5 buckets) Affection.tar (455 samples, 6 buckets) Infatuation.tar (354 samples, 6 buckets) Hope_Enthusiasm_Optimism.tar (498 samples, 7 buckets) Triumph.tar (319 samples, 5 buckets) Pride.tar (410 samples, 5 buckets) Interest.tar (400 samples, 4 buckets) Awe.tar (318 samples, 6 buckets) Astonishment_Surprise.tar (385 samples, 5 buckets) Concentration.tar (381 samples, 5 buckets) Contemplation.tar (401 samples, 5 buckets) Relief.tar (482 samples, 6 buckets) Longing.tar (373 samples, 5 buckets) Teasing.tar (306 samples, 4 buckets) Impatience_and_Irritability.tar (430 samples, 5 buckets) Sexual_Lust.tar (323 samples, 6 buckets) Doubt.tar (310 samples, 5 buckets) Fear.tar (267 samples, 5 buckets) Distress.tar (417 samples, 5 buckets) Confusion.tar (404 samples, 6 buckets) Embarrassment.tar (273 samples, 4 buckets) Shame.tar (370 samples, 6 buckets) Disappointment.tar (415 samples, 5 buckets) Sadness.tar (437 samples, 6 buckets) Bitterness.tar (65 samples, 5 buckets) Contempt.tar (353 samples, 6 buckets) Disgust.tar (211 samples, 4 buckets) Anger.tar (427 samples, 7 buckets) Malevolence_Malice.tar (253 samples, 5 buckets) Sourness.tar (164 samples, 4 buckets) Pain.tar (319 samples, 6 buckets) Helplessness.tar (288 samples, 4 buckets) Fatigue_Exhaustion.tar (410 samples, 5 buckets) Emotional_Numbness.tar (401 samples, 5 buckets) Intoxication_Altered_States_of_Consciousness.tar (349 samples, 6 buckets) Jealousy_and_Envy.tar (309 samples, 5 buckets) Valence.tar (834 samples, 10 buckets) Arousal.tar (489 samples, 7 buckets) Submissive_vs._Dominant.tar (397 samples, 5 buckets) Age.tar (504 samples, 6 buckets) Gender.tar (601 samples, 7 buckets) Serious_vs._Humorous.tar (509 samples, 7 buckets) Vulnerable_vs._Emotionally_Detached.tar (507 samples, 6 buckets) Confident_vs._Hesitant.tar (404 samples, 5 buckets) Warm_vs._Cold.tar (546 samples, 6 buckets) Monotone_vs._Expressive.tar (468 samples, 5 buckets) High-Pitched_vs._Low-Pitched.tar (403 samples, 5 buckets) Soft_vs._Harsh.tar (404 samples, 6 buckets) Authenticity.tar (302 samples, 4 buckets) Recording_Quality.tar (500 samples, 5 buckets) Background_Noise.tar (400 samples, 4 buckets) ``` ## Detailed Statistics ### Emotion Categories | Category | Dominant Samples | Buckets | Selected | Bucket Distribution | |----------|-----------------|---------|----------|---------------------| | Amusement | 47,119 | 5 | 413 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):13 | | Elation | 1,585 | 6 | 347 | [0,1):3, [1,2):100, [2,3):100, [3,4):100, [4,5):37, [5,6):7 | | Pleasure_Ecstasy | 202 | 5 | 202 | [0,1):4, [1,2):87, [2,3):90, [3,4):20, [4,5):1 | | Contentment | 1,502 | 4 | 326 | [0,1):85, [1,2):100, [2,3):100, [3,4):41 | | Thankfulness_Gratitude | 43,120 | 5 | 419 | [0,1):19, [1,2):100, [2,3):100, [3,4):100, [4,5):100 | | Affection | 10,840 | 6 | 455 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):54, [5,6):1 | | Infatuation | 4,585 | 6 | 354 | [0,1):2, [1,2):100, [2,3):100, [3,4):100, [4,5):50, [5,6):2 | | Hope_Enthusiasm_Optimism | 42,840 | 7 | 498 | [0,1):28, [1,2):100, [2,3):100, [3,4):100, [4,5):100, [5,6):67, [6,7):3 | | Triumph | 2,323 | 5 | 319 | [0,1):11, [1,2):100, [2,3):100, [3,4):100, [4,5):8 | | Pride | 8,000 | 5 | 410 | [0,1):93, [1,2):100, [2,3):100, [3,4):100, [4,5):17 | | Interest | 1,760,408 | 4 | 400 | [0,1):100, [1,2):100, [2,3):100, [3,4):100 | | Awe | 822 | 6 | 318 | [0,1):3, [1,2):100, [2,3):100, [3,4):100, [4,5):14, [5,6):1 | | Astonishment_Surprise | 19,893 | 5 | 385 | [0,1):72, [1,2):100, [2,3):100, [3,4):100, [4,5):13 | | Concentration | 358,960 | 5 | 381 | [0,1):100, [1,2):100, [2,3):100, [3,4):77, [4,5):4 | | Contemplation | 35,067 | 5 | 401 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):1 | | Relief | 19,039 | 6 | 482 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):80, [5,6):2 | | Longing | 5,631 | 5 | 373 | [0,1):100, [1,2):100, [2,3):100, [3,4):70, [4,5):3 | | Teasing | 5,011 | 4 | 306 | [0,1):100, [1,2):100, [2,3):100, [3,4):6 | | Impatience_and_Irritability | 107,965 | 5 | 430 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):30 | | Sexual_Lust | 5,306 | 6 | 323 | [0,1):15, [1,2):100, [2,3):100, [3,4):100, [4,5):7, [5,6):1 | | Doubt | 1,353 | 5 | 310 | [0,1):100, [1,2):100, [2,3):100, [3,4):7, [4,5):3 | | Fear | 3,023 | 5 | 267 | [0,1):25, [1,2):100, [2,3):100, [3,4):41, [4,5):1 | | Distress | 23,200 | 5 | 417 | [0,1):17, [1,2):100, [2,3):100, [3,4):100, [4,5):100 | | Confusion | 17,933 | 6 | 404 | [0,1):100, [1,2):100, [2,3):100, [3,4):97, [4,5):6, [5,6):1 | | Embarrassment | 2,675 | 4 | 273 | [0,1):100, [1,2):100, [2,3):72, [3,4):1 | | Shame | 1,881 | 6 | 370 | [0,1):3, [1,2):100, [2,3):100, [3,4):100, [4,5):63, [5,6):4 | | Disappointment | 6,835 | 5 | 415 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):15 | | Sadness | 26,666 | 6 | 437 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):36, [5,6):1 | | Bitterness | 65 | 5 | 65 | [0,1):3, [1,2):41, [2,3):15, [3,4):5, [4,5):1 | | Contempt | 4,201 | 6 | 353 | [0,1):28, [1,2):100, [2,3):100, [3,4):100, [4,5):24, [6,7):1 | | Disgust | 2,139 | 4 | 211 | [0,1):8, [1,2):100, [2,3):100, [3,4):3 | | Anger | 4,976 | 7 | 427 | [0,1):21, [1,2):100, [2,3):100, [3,4):100, [4,5):100, [5,6):5, [8,9):1 | | Malevolence_Malice | 1,736 | 5 | 253 | [0,1):18, [1,2):100, [2,3):100, [3,4):33, [4,5):2 | | Sourness | 223 | 4 | 164 | [0,1):34, [1,2):100, [2,3):23, [3,4):7 | | Pain | 2,683 | 6 | 319 | [0,1):3, [1,2):100, [2,3):100, [3,4):100, [4,5):14, [5,6):2 | | Helplessness | 1,735 | 4 | 288 | [0,1):36, [1,2):100, [2,3):100, [3,4):52 | | Fatigue_Exhaustion | 17,533 | 5 | 410 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):10 | | Emotional_Numbness | 20,435 | 5 | 401 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [6,7):1 | | Intoxication_Altered_States_of_Consciousness | 10,247 | 6 | 349 | [0,1):35, [1,2):100, [2,3):100, [3,4):100, [4,5):13, [5,6):1 | | Jealousy_&_Envy | 3,280 | 5 | 309 | [0,1):4, [1,2):100, [2,3):100, [3,4):100, [4,5):5 | ### Voice Attributes | Attribute | Buckets | Selected | Bucket Distribution | |-----------|---------|----------|---------------------| | Valence | 10 | 834 | [-5,-4):33, [-4,-3):100, [-3,-2):100, [-2,-1):100, [-1,0):100, [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):1 | | Arousal | 7 | 489 | [-1,0):3, [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):83, [5,6):3 | | Submissive_vs._Dominant | 5 | 397 | [-1,0):100, [0,1):100, [1,2):100, [2,3):96, [3,4):1 | | Age | 6 | 504 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):100, [5,6):4 | | Gender | 7 | 601 | [-5,-4):1, [-3,-2):100, [-2,-1):100, [-1,0):100, [0,1):100, [1,2):100, [2,3):100 | | Serious_vs._Humorous | 7 | 509 | [-1,0):100, [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):8, [5,6):1 | | Vulnerable_vs._Emotionally_Detached | 6 | 507 | [-1,0):100, [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):7 | | Confident_vs._Hesitant | 5 | 404 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):4 | | Warm_vs._Cold | 6 | 546 | [-3,-2):46, [-2,-1):100, [-1,0):100, [0,1):100, [1,2):100, [2,3):100 | | Monotone_vs._Expressive | 5 | 468 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):68 | | High-Pitched_vs._Low-Pitched | 5 | 403 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):3 | | Soft_vs._Harsh | 6 | 404 | [-3,-2):2, [-2,-1):100, [-1,0):100, [0,1):100, [1,2):100, [2,3):2 | | Authenticity | 4 | 302 | [1,2):100, [2,3):100, [3,4):100, [4,5):2 | | Recording_Quality | 5 | 500 | [0,1):100, [1,2):100, [2,3):100, [3,4):100, [4,5):100 | | Background_Noise | 4 | 400 | [-1,0):100, [0,1):100, [1,2):100, [2,3):100 | ## Source Dataset - **Source**: [TTS-AGI/enhanced-audiosnippets-DACVAE](https://huggingface.co/datasets/TTS-AGI/enhanced-audiosnippets-DACVAE) - **Emotion Model**: [Empathic Insight Voice+](https://huggingface.co/laion/Empathic-Insight-Voice-Plus) - **Audio Codec**: DACVAE (pre-computed latent representations stored as .npy) ## Usage ```python import webdataset as wds import json, io, numpy as np # Load one emotion category ds = wds.WebDataset("data/Anger.tar") for sample in ds: meta = json.loads(sample["json"]) dacvae = np.load(io.BytesIO(sample["npy"])) print(meta["transcription"], meta["empathic_insight_scores"]["Anger"]) ``` ## License Same as source dataset. See [TTS-AGI/enhanced-audiosnippets-DACVAE](https://huggingface.co/datasets/TTS-AGI/enhanced-audiosnippets-DACVAE).

提供机构：

TTS-AGI

5,000+

优质数据集

54 个

任务类型

进入经典数据集