five

TTS-AGI/vocal-bursts-taxonomy-DACVAE

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/TTS-AGI/vocal-bursts-taxonomy-DACVAE
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: category dtype: string - name: prompt dtype: string - name: gender dtype: string - name: audio dtype: audio - name: taxonomy_number dtype: int64 - name: duration dtype: float64 - name: overall_quality dtype: float64 - name: speech_quality dtype: float64 - name: background_quality dtype: float64 - name: top_attributes dtype: string - name: gemini_match_score dtype: int64 - name: gemini_caption dtype: string splits: - name: preview num_examples: 82 config_name: default configs: - config_name: default data_files: - split: preview path: preview.parquet license: cc-by-4.0 task_categories: - audio-classification tags: - vocal-bursts - speech - DACVAE - CLAP - audio-embeddings - speaker-attributes - gemini-verified pretty_name: Vocal Bursts Taxonomy (DACVAE + CLAP Scores + Gemini Verified) size_categories: - 10K<n<100K --- # Vocal Bursts Taxonomy — DACVAE + MaestroClap Embeddings & Scores Processed version of [](https://huggingface.co/datasets/krishnakalyan3/vocal_bursts_taxonomy_100_clean_wds) with DACVAE latents, MaestroClap embeddings, derived attribute/quality/speaker scores, and Gemini-verified labels. ## Overview | Metric | Value | |--------|-------| | Total samples | **16,175** | | Categories | **82** | | Genders | male, female | | Female samples | 8,097 | | Male samples | 8,078 | ## Gemini Label Verification Every sample was sent to **Gemini 3.1 Flash Lite** for two independent tasks: 1. **Match scoring**: Audio + original prompt label sent together. Model rates how well the label matches: 0 (not at all), 1 (slightly), 2 (very well). 2. **Dense captioning**: Audio sent alone (without the label). Model produces a concise technical description of the vocal burst. | Match Score | Meaning | Count | Percentage | |:-----------:|---------|------:|----------:| | 2 | Very well — label accurately describes audio | 15,082 | 93.2% | | 1 | Slightly — somewhat related but not accurate | 160 | 1.0% | | 0 | Not at all — label is completely wrong | 933 | 5.8% | **15,242 / 16,175 samples (94.2%) verified as valid** (match score 1 or 2). Total verification cost: **/bin/bash.97** (32,350 API calls across 100 parallel threads in ~21 minutes). ## Contents Each tar file corresponds to one vocal burst category and contains: ## Per-Sample JSON Fields ### Original metadata | Field | Type | Description | |-------|------|-------------| | | string | Original filename from source dataset | | | float | Audio duration in seconds | | | string | or | | | string | Vocal burst category name + description | | | int | Category ID (1–97) | | | int | Unique sample identifier | | | int | 44100 Hz | ### DACVAE latent info | Field | Type | Description | |-------|------|-------------| | | int | Number of DACVAE latent frames (T) | | | int | Latent dimension (128) | ### MaestroClap embeddings & probes | Field | Type | Description | |-------|------|-------------| | | float[512] | MaestroClap audio-audio embedding (L2 normalized) | | | float[128] | Predicted WaveELM timbre/speaker embedding | | | dict(53 floats) | 53 emotion/speaker attribute scores (see list below) | | | float | Duration in seconds predicted from CLAP embedding | | | float | Consonant/Phoneme Score | | | float | Background noise quality | | | float | Content enjoyment rating | | | float | Overall audio quality | | | float | Speech quality rating | ### Gemini verification | Field | Type | Description | |-------|------|-------------| | | int | 0=not at all, 1=slightly, 2=very well (label vs. audio match) | | | string | Brief explanation of the match rating | | | string | Dense technical vocal burst caption generated without seeing the label | ## Models Used | Model | Architecture | Output | |-------|-------------|--------| | [DACVAE](https://huggingface.co/facebook/dacvae-watermarked) | Facebook DAC-VAE encoder | (T, 128) latent at 25fps | | [MaestroClap](https://huggingface.co/TTS-AGI/audio-audio-clap-maestrino) | ViT-B/12 on DACVAE latents | 512-dim normalized embedding | | Attribute Probe | MLP 512→704→704→53+1 | 53 emotion/speaker attributes + duration | | Speaker Probe | MLP 512→704→704→128 | 128-dim timbre embedding | | Quality Experts (×5) | MLP 512→64→64→1 | 5 independent quality scores | | Gemini 3.1 Flash Lite | Google LLM with audio input | Match score (0–2) + dense caption | ## 53 Attribute Dimensions Affection, Age, Amusement, Anger, Arousal, Astonishment/Surprise, Authenticity, Awe, Bitterness, Concentration, Confident vs. Hesitant, Confusion, Contemplation, Contempt, Contentment, Disappointment, Disgust, Distress, Doubt, Elation, Embarrassment, Emotional Numbness, Fatigue/Exhaustion, Fear, Gender, Helplessness, High-Pitched vs. Low-Pitched, Hope/Enthusiasm/Optimism, Impatience and Irritability, Infatuation, Interest, Intoxication/Altered States, Jealousy & Envy, Longing, Malevolence/Malice, Monotone vs. Expressive, Pain, Pleasure/Ecstasy, Pride, Relief, Sadness, Serious vs. Humorous, Sexual Lust, Shame, Soft vs. Harsh, Sourness, Submissive vs. Dominant, Teasing, Thankfulness/Gratitude, Triumph, Valence, Vulnerable vs. Emotionally Detached, Warm vs. Cold ## Category List | Category | Description | F Total | F Match=2 | M Total | M Match=2 | Total | Valid | |----------|-------------|--------:|----------:|--------:|----------:|------:|------:| | Affirmative Grunt | Affirmative Grunt short sound indicating agreement | 100 | 100 | 100 | 94 | 200 | 194 | | Ahem | Ahem clearing throat to gain attention or show discomfort | 100 | 73 | 99 | 92 | 199 | 166 | | Blowing | Blowing a Kiss brief airy sound of sending affection | 99 | 73 | 98 | 51 | 197 | 124 | | Breathy Giggle | Breathy Giggle soft airy laugh | 99 | 94 | 100 | 70 | 199 | 178 | | Cackle | Cackle harsh sharp laugh sometimes unsettling | 97 | 74 | 99 | 99 | 196 | 195 | | Chewing Noises | Chewing Noises crunching or moist sounds from the mouth | 95 | 95 | 98 | 98 | 193 | 193 | | Childlike Giggle | Childlike Giggle light playful laugh | 100 | 100 | 98 | 98 | 198 | 198 | | Chuckle | Chuckle soft suppressed laugh showing mild amusement | 100 | 77 | 100 | 80 | 200 | 176 | | Clears Throat | Clears Throat brief intentional effort to remove phlegm or gain attention | 99 | 98 | 99 | 98 | 198 | 196 | | Click One s Tongue | Click One’s Tongue short ‘tch’ sound by flicking tongue off palate | 100 | 100 | 98 | 98 | 198 | 198 | | Clicks Tongue | Clicks Tongue distinct click by tapping tongue against the palate | 100 | 100 | 99 | 99 | 199 | 199 | | Contented Sigh | Contented Sigh deep exhale expressing satisfaction | 97 | 79 | 99 | 69 | 196 | 148 | | Convulsive Sob | Convulsive Sob intense crying with heaving breaths | 100 | 99 | 99 | 96 | 199 | 196 | | Cough | Cough sharp expulsion of air from lungs | 98 | 98 | 94 | 94 | 192 | 192 | | Coughing | Coughing series of forceful expulsions of air from lungs | 99 | 99 | 100 | 100 | 199 | 199 | | Deep Breath | Deep Breath large inhalation to calm or refocus | 98 | 92 | 99 | 93 | 197 | 185 | | Deep Breathing | Deep Breathing slow deliberate inhalations and exhalations for relaxation | 99 | 99 | 99 | 98 | 198 | 197 | | Displeased Grunt | Displeased Grunt harsh vocalization of dissatisfaction | 99 | 96 | 99 | 98 | 198 | 197 | | Drinking Noises | Drinking Noises audible liquid intake with gulps or slurps | 100 | 99 | 99 | 99 | 199 | 198 | | Effort Grunt | Effort Grunt short forceful sound during physical strain | 100 | 99 | 100 | 99 | 200 | 198 | | Exasperated Sigh | Exasperated Sigh audible breath of frustration | 99 | 95 | 100 | 99 | 199 | 194 | | Exhausted Groan | Exhausted Groan weary sound indicating fatigue | 97 | 97 | 99 | 99 | 196 | 196 | | Fast Breathing | Fast Breathing rapid inhalation and exhalation from excitement or stress | 99 | 91 | 100 | 96 | 199 | 198 | | Fearful Gasp | Fearful Gasp sudden inhale showing shock or anxiety | 98 | 85 | 99 | 85 | 197 | 172 | | Finger Snaps | Finger Snaps brief sharp click made by friction of thumb and finger | 100 | 100 | 98 | 98 | 198 | 198 | | Frustrated Groan | Frustrated Groan deep sound of annoyance | 100 | 99 | 98 | 97 | 198 | 198 | | Growl | Growl low guttural rumble conveying anger | 99 | 98 | 95 | 94 | 194 | 193 | | Guffaw | Guffaw loud unrestrained laugh of strong amusement | 99 | 99 | 99 | 99 | 198 | 198 | | Gulps | Gulps short audible swallow from nervousness or ingesting | 98 | 93 | 99 | 89 | 197 | 182 | | Gurgling | Gurgling bubbling sound in the throat | 98 | 98 | 99 | 98 | 197 | 196 | | Hand Scratching Head | Hand Scratching Head rubbing hair soft rustling of fingers on scalp | 100 | 100 | 99 | 99 | 199 | 199 | | Hand Slaps | Hand Slaps solid impact sound of palm against a surface or another hand | 100 | 100 | 100 | 100 | 200 | 200 | | Heavy Breathing | Heavy Breathing deep labored breaths of fatigue or tension | 100 | 100 | 98 | 98 | 198 | 198 | | Hiccup | Hiccup spasmodic contraction of diaphragm creating a ‘hic’ sound | 100 | 100 | 100 | 100 | 200 | 200 | | Hiccups | Hiccups involuntary diaphragmatic spasm producing ‘hic’ sound | 99 | 99 | 97 | 97 | 196 | 196 | | Hiss | Hiss sustained sibilant sound of disapproval | 100 | 96 | 99 | 99 | 199 | 195 | | Humming | Humming low continuous vocal tone often tuneful or absent-minded | 98 | 98 | 100 | 100 | 198 | 198 | | Kissing Noises | Kissing Noises repetitive gentle lip compression for affection | 98 | 98 | 100 | 98 | 198 | 196 | | Kissing Sounds | Kissing Sounds soft repeated lip contact expressing affection | 98 | 98 | 100 | 99 | 198 | 197 | | Licking Sound | Licking Sound light quick contact of tongue on surface | 98 | 97 | 100 | 99 | 198 | 196 | | Lip Smack | Lip Smack soft pop of parted lips indicating anticipation | 100 | 85 | 100 | 86 | 200 | 171 | | Low Mumble | Low Mumble indistinct vocalization of uncertainty or drowsiness | 100 | 74 | 100 | 77 | 200 | 160 | | Mournful Wail | Mournful Wail long high cry of grief | 100 | 100 | 100 | 96 | 200 | 200 | | Nervous Giggle | Nervous Giggle high-pitched laugh indicating anxiety | 100 | 61 | 96 | 54 | 196 | 145 | | Nervous Gulp | Nervous Gulp audible swallow indicating anxiety | 99 | 63 | 97 | 67 | 196 | 130 | | Normal Breathing | Normal Breathing regular rhythmic breaths at rest | 100 | 12 | 100 | 16 | 200 | 29 | | Pain Moan | Pain Moan low vocalization signaling discomfort | 100 | 99 | 99 | 99 | 199 | 198 | | Panting | Panting rapid breathing due to exertion or excitement | 100 | 98 | 96 | 95 | 196 | 196 | | Person Whistling | Person Whistling to Get Attention serious sharp whistle | 99 | 99 | 100 | 93 | 199 | 199 | | Person Whistling Playfully | Person Whistling Playfully lighthearted melodic whistle | 99 | 99 | 99 | 99 | 198 | 198 | | Pleasure Moan | Pleasure Moan soft prolonged sound of enjoyment | 100 | 100 | 99 | 89 | 199 | 189 | | Purr | Purr soft continuous vibrating hum showing contentment | 100 | 94 | 98 | 88 | 198 | 182 | | Quiet Sob | Quiet Sob soft crying with short broken breaths | 100 | 96 | 100 | 98 | 200 | 196 | | Relief Sigh | Relief Sigh release of tension after stress | 78 | 68 | 98 | 90 | 176 | 159 | | Resonant Hum | Resonant Hum lower-pitched thoughtful hum | 99 | 98 | 93 | 93 | 192 | 192 | | Scream | Scream loud high-pitched outburst of extreme emotion | 97 | 97 | 100 | 100 | 197 | 197 | | Sharp Inhale | Sharp Inhale quick breath of sudden realization | 100 | 59 | 100 | 81 | 200 | 145 | | Sharp Whistle | Sharp Whistle quick high-pitched attention grabber | 99 | 99 | 100 | 100 | 199 | 199 | | Shriek | Shriek sharp piercing cry typically of fear | 96 | 96 | 99 | 99 | 195 | 195 | | Slap Face | Slap Face sharp impact sound of palm against face | 96 | 90 | 97 | 88 | 193 | 178 | | Slow Breathing | Slow Breathing paced controlled breaths for calm | 99 | 99 | 100 | 100 | 199 | 199 | | Slurping Noises | Slurping Noises wet sucking sound typically when consuming liquids | 99 | 93 | 99 | 93 | 198 | 186 | | Smack One s Lips | Smack One’s Lips light pop of lips often from dryness or taste anticipation | 98 | 97 | 98 | 97 | 196 | 194 | | Smacks Lips | Smacks Lips repetitive lip popping after tasting or anticipating food | 99 | 97 | 99 | 97 | 198 | 194 | | Snicker | Snicker stifled laugh often conveying mockery or sarcasm | 100 | 77 | 97 | 82 | 197 | 173 | | Sniff | Sniff audible inhalation through the nose possibly sadness or restraint | 100 | 100 | 100 | 99 | 200 | 199 | | Snort | Snort quick burst of air through the nose often disbelief | 99 | 76 | 99 | 94 | 198 | 170 | | Snorting Giggle | Snorting Giggle brief nasal snort combined with laughter | 99 | 99 | 96 | 96 | 195 | 195 | | Sobs | Sobs audible heavier crying with convulsive gasps | 98 | 93 | 98 | 94 | 196 | 190 | | Soft Hum | Soft Hum steady tone with closed lips in contentment | 98 | 95 | 100 | 99 | 198 | 198 | | Soft Whistle | Soft Whistle gentle tuneful blowing of air | 100 | 100 | 100 | 100 | 200 | 200 | | Spitting | Spitting forceful expulsion of saliva | 99 | 66 | 100 | 71 | 199 | 137 | | Sucking Noise | Sucking Noise soft pulling sound created by suction | 100 | 100 | 100 | 99 | 200 | 199 | | Surprised Gasp | Surprised Gasp quick intake of breath in astonishment | 98 | 98 | 97 | 97 | 195 | 195 | | Swallows | Swallows audible movement of liquid down the throat | 99 | 97 | 99 | 99 | 198 | 196 | | Tongue Click | Tongue Click distinct clack made by tongue against palate | 100 | 100 | 99 | 99 | 199 | 199 | | Trembling Whimper | Trembling Whimper faint cry indicating fear or pain | 98 | 95 | 100 | 96 | 198 | 191 | | Tsk | Tsk clicking sound of reprimand or annoyance | 99 | 98 | 99 | 98 | 198 | 196 | | Whispered Mumble | Whispered Mumble very quiet muddled speech sounds | 98 | 94 | 96 | 88 | 194 | 182 | | Wistful Sigh | Wistful Sigh soft exhale tinged with longing | 100 | 100 | 89 | 86 | 189 | 186 | | Wolf Whistle | Wolf Whistle two-note whistle expressing admiration | 99 | 97 | 99 | 98 | 198 | 195 | | Yawn | Yawn involuntary wide-mouthed breath showing tiredness | 99 | 96 | 88 | 87 | 187 | 183 | | **TOTAL** | | **8,097** | **7,539** | **8,078** | **7,543** | **16,175** | **15,242** | > **Valid** = samples with match score 1 or 2 (label is at least slightly accurate). ## Usage ## Source Derived from [](https://huggingface.co/datasets/krishnakalyan3/vocal_bursts_taxonomy_100_clean_wds).
提供机构:
TTS-AGI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作