TTS-AGI/vocal-bursts-taxonomy-DACVAE
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/TTS-AGI/vocal-bursts-taxonomy-DACVAE
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: category
dtype: string
- name: prompt
dtype: string
- name: gender
dtype: string
- name: audio
dtype: audio
- name: taxonomy_number
dtype: int64
- name: duration
dtype: float64
- name: overall_quality
dtype: float64
- name: speech_quality
dtype: float64
- name: background_quality
dtype: float64
- name: top_attributes
dtype: string
- name: gemini_match_score
dtype: int64
- name: gemini_caption
dtype: string
splits:
- name: preview
num_examples: 82
config_name: default
configs:
- config_name: default
data_files:
- split: preview
path: preview.parquet
license: cc-by-4.0
task_categories:
- audio-classification
tags:
- vocal-bursts
- speech
- DACVAE
- CLAP
- audio-embeddings
- speaker-attributes
- gemini-verified
pretty_name: Vocal Bursts Taxonomy (DACVAE + CLAP Scores + Gemini Verified)
size_categories:
- 10K<n<100K
---
# Vocal Bursts Taxonomy — DACVAE + MaestroClap Embeddings & Scores
Processed version of [](https://huggingface.co/datasets/krishnakalyan3/vocal_bursts_taxonomy_100_clean_wds) with DACVAE latents, MaestroClap embeddings, derived attribute/quality/speaker scores, and Gemini-verified labels.
## Overview
| Metric | Value |
|--------|-------|
| Total samples | **16,175** |
| Categories | **82** |
| Genders | male, female |
| Female samples | 8,097 |
| Male samples | 8,078 |
## Gemini Label Verification
Every sample was sent to **Gemini 3.1 Flash Lite** for two independent tasks:
1. **Match scoring**: Audio + original prompt label sent together. Model rates how well the label matches: 0 (not at all), 1 (slightly), 2 (very well).
2. **Dense captioning**: Audio sent alone (without the label). Model produces a concise technical description of the vocal burst.
| Match Score | Meaning | Count | Percentage |
|:-----------:|---------|------:|----------:|
| 2 | Very well — label accurately describes audio | 15,082 | 93.2% |
| 1 | Slightly — somewhat related but not accurate | 160 | 1.0% |
| 0 | Not at all — label is completely wrong | 933 | 5.8% |
**15,242 / 16,175 samples (94.2%) verified as valid** (match score 1 or 2).
Total verification cost: **/bin/bash.97** (32,350 API calls across 100 parallel threads in ~21 minutes).
## Contents
Each tar file corresponds to one vocal burst category and contains:
## Per-Sample JSON Fields
### Original metadata
| Field | Type | Description |
|-------|------|-------------|
| | string | Original filename from source dataset |
| | float | Audio duration in seconds |
| | string | or |
| | string | Vocal burst category name + description |
| | int | Category ID (1–97) |
| | int | Unique sample identifier |
| | int | 44100 Hz |
### DACVAE latent info
| Field | Type | Description |
|-------|------|-------------|
| | int | Number of DACVAE latent frames (T) |
| | int | Latent dimension (128) |
### MaestroClap embeddings & probes
| Field | Type | Description |
|-------|------|-------------|
| | float[512] | MaestroClap audio-audio embedding (L2 normalized) |
| | float[128] | Predicted WaveELM timbre/speaker embedding |
| | dict(53 floats) | 53 emotion/speaker attribute scores (see list below) |
| | float | Duration in seconds predicted from CLAP embedding |
| | float | Consonant/Phoneme Score |
| | float | Background noise quality |
| | float | Content enjoyment rating |
| | float | Overall audio quality |
| | float | Speech quality rating |
### Gemini verification
| Field | Type | Description |
|-------|------|-------------|
| | int | 0=not at all, 1=slightly, 2=very well (label vs. audio match) |
| | string | Brief explanation of the match rating |
| | string | Dense technical vocal burst caption generated without seeing the label |
## Models Used
| Model | Architecture | Output |
|-------|-------------|--------|
| [DACVAE](https://huggingface.co/facebook/dacvae-watermarked) | Facebook DAC-VAE encoder | (T, 128) latent at 25fps |
| [MaestroClap](https://huggingface.co/TTS-AGI/audio-audio-clap-maestrino) | ViT-B/12 on DACVAE latents | 512-dim normalized embedding |
| Attribute Probe | MLP 512→704→704→53+1 | 53 emotion/speaker attributes + duration |
| Speaker Probe | MLP 512→704→704→128 | 128-dim timbre embedding |
| Quality Experts (×5) | MLP 512→64→64→1 | 5 independent quality scores |
| Gemini 3.1 Flash Lite | Google LLM with audio input | Match score (0–2) + dense caption |
## 53 Attribute Dimensions
Affection, Age, Amusement, Anger, Arousal, Astonishment/Surprise, Authenticity, Awe, Bitterness, Concentration, Confident vs. Hesitant, Confusion, Contemplation, Contempt, Contentment, Disappointment, Disgust, Distress, Doubt, Elation, Embarrassment, Emotional Numbness, Fatigue/Exhaustion, Fear, Gender, Helplessness, High-Pitched vs. Low-Pitched, Hope/Enthusiasm/Optimism, Impatience and Irritability, Infatuation, Interest, Intoxication/Altered States, Jealousy & Envy, Longing, Malevolence/Malice, Monotone vs. Expressive, Pain, Pleasure/Ecstasy, Pride, Relief, Sadness, Serious vs. Humorous, Sexual Lust, Shame, Soft vs. Harsh, Sourness, Submissive vs. Dominant, Teasing, Thankfulness/Gratitude, Triumph, Valence, Vulnerable vs. Emotionally Detached, Warm vs. Cold
## Category List
| Category | Description | F Total | F Match=2 | M Total | M Match=2 | Total | Valid |
|----------|-------------|--------:|----------:|--------:|----------:|------:|------:|
| Affirmative Grunt | Affirmative Grunt short sound indicating agreement | 100 | 100 | 100 | 94 | 200 | 194 |
| Ahem | Ahem clearing throat to gain attention or show discomfort | 100 | 73 | 99 | 92 | 199 | 166 |
| Blowing | Blowing a Kiss brief airy sound of sending affection | 99 | 73 | 98 | 51 | 197 | 124 |
| Breathy Giggle | Breathy Giggle soft airy laugh | 99 | 94 | 100 | 70 | 199 | 178 |
| Cackle | Cackle harsh sharp laugh sometimes unsettling | 97 | 74 | 99 | 99 | 196 | 195 |
| Chewing Noises | Chewing Noises crunching or moist sounds from the mouth | 95 | 95 | 98 | 98 | 193 | 193 |
| Childlike Giggle | Childlike Giggle light playful laugh | 100 | 100 | 98 | 98 | 198 | 198 |
| Chuckle | Chuckle soft suppressed laugh showing mild amusement | 100 | 77 | 100 | 80 | 200 | 176 |
| Clears Throat | Clears Throat brief intentional effort to remove phlegm or gain attention | 99 | 98 | 99 | 98 | 198 | 196 |
| Click One s Tongue | Click One’s Tongue short ‘tch’ sound by flicking tongue off palate | 100 | 100 | 98 | 98 | 198 | 198 |
| Clicks Tongue | Clicks Tongue distinct click by tapping tongue against the palate | 100 | 100 | 99 | 99 | 199 | 199 |
| Contented Sigh | Contented Sigh deep exhale expressing satisfaction | 97 | 79 | 99 | 69 | 196 | 148 |
| Convulsive Sob | Convulsive Sob intense crying with heaving breaths | 100 | 99 | 99 | 96 | 199 | 196 |
| Cough | Cough sharp expulsion of air from lungs | 98 | 98 | 94 | 94 | 192 | 192 |
| Coughing | Coughing series of forceful expulsions of air from lungs | 99 | 99 | 100 | 100 | 199 | 199 |
| Deep Breath | Deep Breath large inhalation to calm or refocus | 98 | 92 | 99 | 93 | 197 | 185 |
| Deep Breathing | Deep Breathing slow deliberate inhalations and exhalations for relaxation | 99 | 99 | 99 | 98 | 198 | 197 |
| Displeased Grunt | Displeased Grunt harsh vocalization of dissatisfaction | 99 | 96 | 99 | 98 | 198 | 197 |
| Drinking Noises | Drinking Noises audible liquid intake with gulps or slurps | 100 | 99 | 99 | 99 | 199 | 198 |
| Effort Grunt | Effort Grunt short forceful sound during physical strain | 100 | 99 | 100 | 99 | 200 | 198 |
| Exasperated Sigh | Exasperated Sigh audible breath of frustration | 99 | 95 | 100 | 99 | 199 | 194 |
| Exhausted Groan | Exhausted Groan weary sound indicating fatigue | 97 | 97 | 99 | 99 | 196 | 196 |
| Fast Breathing | Fast Breathing rapid inhalation and exhalation from excitement or stress | 99 | 91 | 100 | 96 | 199 | 198 |
| Fearful Gasp | Fearful Gasp sudden inhale showing shock or anxiety | 98 | 85 | 99 | 85 | 197 | 172 |
| Finger Snaps | Finger Snaps brief sharp click made by friction of thumb and finger | 100 | 100 | 98 | 98 | 198 | 198 |
| Frustrated Groan | Frustrated Groan deep sound of annoyance | 100 | 99 | 98 | 97 | 198 | 198 |
| Growl | Growl low guttural rumble conveying anger | 99 | 98 | 95 | 94 | 194 | 193 |
| Guffaw | Guffaw loud unrestrained laugh of strong amusement | 99 | 99 | 99 | 99 | 198 | 198 |
| Gulps | Gulps short audible swallow from nervousness or ingesting | 98 | 93 | 99 | 89 | 197 | 182 |
| Gurgling | Gurgling bubbling sound in the throat | 98 | 98 | 99 | 98 | 197 | 196 |
| Hand Scratching Head | Hand Scratching Head rubbing hair soft rustling of fingers on scalp | 100 | 100 | 99 | 99 | 199 | 199 |
| Hand Slaps | Hand Slaps solid impact sound of palm against a surface or another hand | 100 | 100 | 100 | 100 | 200 | 200 |
| Heavy Breathing | Heavy Breathing deep labored breaths of fatigue or tension | 100 | 100 | 98 | 98 | 198 | 198 |
| Hiccup | Hiccup spasmodic contraction of diaphragm creating a ‘hic’ sound | 100 | 100 | 100 | 100 | 200 | 200 |
| Hiccups | Hiccups involuntary diaphragmatic spasm producing ‘hic’ sound | 99 | 99 | 97 | 97 | 196 | 196 |
| Hiss | Hiss sustained sibilant sound of disapproval | 100 | 96 | 99 | 99 | 199 | 195 |
| Humming | Humming low continuous vocal tone often tuneful or absent-minded | 98 | 98 | 100 | 100 | 198 | 198 |
| Kissing Noises | Kissing Noises repetitive gentle lip compression for affection | 98 | 98 | 100 | 98 | 198 | 196 |
| Kissing Sounds | Kissing Sounds soft repeated lip contact expressing affection | 98 | 98 | 100 | 99 | 198 | 197 |
| Licking Sound | Licking Sound light quick contact of tongue on surface | 98 | 97 | 100 | 99 | 198 | 196 |
| Lip Smack | Lip Smack soft pop of parted lips indicating anticipation | 100 | 85 | 100 | 86 | 200 | 171 |
| Low Mumble | Low Mumble indistinct vocalization of uncertainty or drowsiness | 100 | 74 | 100 | 77 | 200 | 160 |
| Mournful Wail | Mournful Wail long high cry of grief | 100 | 100 | 100 | 96 | 200 | 200 |
| Nervous Giggle | Nervous Giggle high-pitched laugh indicating anxiety | 100 | 61 | 96 | 54 | 196 | 145 |
| Nervous Gulp | Nervous Gulp audible swallow indicating anxiety | 99 | 63 | 97 | 67 | 196 | 130 |
| Normal Breathing | Normal Breathing regular rhythmic breaths at rest | 100 | 12 | 100 | 16 | 200 | 29 |
| Pain Moan | Pain Moan low vocalization signaling discomfort | 100 | 99 | 99 | 99 | 199 | 198 |
| Panting | Panting rapid breathing due to exertion or excitement | 100 | 98 | 96 | 95 | 196 | 196 |
| Person Whistling | Person Whistling to Get Attention serious sharp whistle | 99 | 99 | 100 | 93 | 199 | 199 |
| Person Whistling Playfully | Person Whistling Playfully lighthearted melodic whistle | 99 | 99 | 99 | 99 | 198 | 198 |
| Pleasure Moan | Pleasure Moan soft prolonged sound of enjoyment | 100 | 100 | 99 | 89 | 199 | 189 |
| Purr | Purr soft continuous vibrating hum showing contentment | 100 | 94 | 98 | 88 | 198 | 182 |
| Quiet Sob | Quiet Sob soft crying with short broken breaths | 100 | 96 | 100 | 98 | 200 | 196 |
| Relief Sigh | Relief Sigh release of tension after stress | 78 | 68 | 98 | 90 | 176 | 159 |
| Resonant Hum | Resonant Hum lower-pitched thoughtful hum | 99 | 98 | 93 | 93 | 192 | 192 |
| Scream | Scream loud high-pitched outburst of extreme emotion | 97 | 97 | 100 | 100 | 197 | 197 |
| Sharp Inhale | Sharp Inhale quick breath of sudden realization | 100 | 59 | 100 | 81 | 200 | 145 |
| Sharp Whistle | Sharp Whistle quick high-pitched attention grabber | 99 | 99 | 100 | 100 | 199 | 199 |
| Shriek | Shriek sharp piercing cry typically of fear | 96 | 96 | 99 | 99 | 195 | 195 |
| Slap Face | Slap Face sharp impact sound of palm against face | 96 | 90 | 97 | 88 | 193 | 178 |
| Slow Breathing | Slow Breathing paced controlled breaths for calm | 99 | 99 | 100 | 100 | 199 | 199 |
| Slurping Noises | Slurping Noises wet sucking sound typically when consuming liquids | 99 | 93 | 99 | 93 | 198 | 186 |
| Smack One s Lips | Smack One’s Lips light pop of lips often from dryness or taste anticipation | 98 | 97 | 98 | 97 | 196 | 194 |
| Smacks Lips | Smacks Lips repetitive lip popping after tasting or anticipating food | 99 | 97 | 99 | 97 | 198 | 194 |
| Snicker | Snicker stifled laugh often conveying mockery or sarcasm | 100 | 77 | 97 | 82 | 197 | 173 |
| Sniff | Sniff audible inhalation through the nose possibly sadness or restraint | 100 | 100 | 100 | 99 | 200 | 199 |
| Snort | Snort quick burst of air through the nose often disbelief | 99 | 76 | 99 | 94 | 198 | 170 |
| Snorting Giggle | Snorting Giggle brief nasal snort combined with laughter | 99 | 99 | 96 | 96 | 195 | 195 |
| Sobs | Sobs audible heavier crying with convulsive gasps | 98 | 93 | 98 | 94 | 196 | 190 |
| Soft Hum | Soft Hum steady tone with closed lips in contentment | 98 | 95 | 100 | 99 | 198 | 198 |
| Soft Whistle | Soft Whistle gentle tuneful blowing of air | 100 | 100 | 100 | 100 | 200 | 200 |
| Spitting | Spitting forceful expulsion of saliva | 99 | 66 | 100 | 71 | 199 | 137 |
| Sucking Noise | Sucking Noise soft pulling sound created by suction | 100 | 100 | 100 | 99 | 200 | 199 |
| Surprised Gasp | Surprised Gasp quick intake of breath in astonishment | 98 | 98 | 97 | 97 | 195 | 195 |
| Swallows | Swallows audible movement of liquid down the throat | 99 | 97 | 99 | 99 | 198 | 196 |
| Tongue Click | Tongue Click distinct clack made by tongue against palate | 100 | 100 | 99 | 99 | 199 | 199 |
| Trembling Whimper | Trembling Whimper faint cry indicating fear or pain | 98 | 95 | 100 | 96 | 198 | 191 |
| Tsk | Tsk clicking sound of reprimand or annoyance | 99 | 98 | 99 | 98 | 198 | 196 |
| Whispered Mumble | Whispered Mumble very quiet muddled speech sounds | 98 | 94 | 96 | 88 | 194 | 182 |
| Wistful Sigh | Wistful Sigh soft exhale tinged with longing | 100 | 100 | 89 | 86 | 189 | 186 |
| Wolf Whistle | Wolf Whistle two-note whistle expressing admiration | 99 | 97 | 99 | 98 | 198 | 195 |
| Yawn | Yawn involuntary wide-mouthed breath showing tiredness | 99 | 96 | 88 | 87 | 187 | 183 |
| **TOTAL** | | **8,097** | **7,539** | **8,078** | **7,543** | **16,175** | **15,242** |
> **Valid** = samples with match score 1 or 2 (label is at least slightly accurate).
## Usage
## Source
Derived from [](https://huggingface.co/datasets/krishnakalyan3/vocal_bursts_taxonomy_100_clean_wds).
提供机构:
TTS-AGI



