yuvalhazan2/Acoustic-Emotion-Vocal-Signature

Name: yuvalhazan2/Acoustic-Emotion-Vocal-Signature
Creator: yuvalhazan2
Published: 2026-04-06 15:56:42
License: 暂无描述

Hugging Face2026-04-06 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/yuvalhazan2/Acoustic-Emotion-Vocal-Signature

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: Acoustic Predictors of Emotional States and Distress Signatures license: mit task_categories: - tabular-classification language: - en size_categories: - 10K<n<100K --- <video src="https://huggingface.co/datasets/yuvalhazan2/Acoustic-Emotion-Vocal-Signature/resolve/main/video11898157055.mp4" controls="controls" style="max-width: 720px;"></video> Acoustic Predictors of Emotional States and Distress Signatures 1. Project Overview: This dataset is sourced from Kaggle (Speech Emotion Detection Dataset), containing features from the RAVDESS and TESS databases. The dataset consists of approximately 10,000 rows and 12 features. Core Features: Key attributes include Pitch (Hz), Intensity (dB), and 13 coefficients of Mel-Frequency Cepstral Coefficients (MFCCs). Clinical Application: This research facilitates the identification of a "vocal signature" for patient distress in therapy sessions. 2. Research Question: To what extent can specific acoustic features (such as Pitch and MFCCs) predict a speaker's emotional state, and which of these features is the strongest predictor for identifying negative emotions? 3. Data Cleaning & Preprocessing: Integrity Check: A structural audit confirmed zero missing values and zero duplicate entries, ensuring a unique and complete dataset. MFCCs Parsing: The MFCCs column was converted from a string-object into a numerical (float64) format using a custom parsing function to extract the mean coefficient. Scaling: Due to significant magnitude differences (e.g., Pitch ~201 vs. Jitter < 0.1), StandardScaler was applied to all acoustic and sentiment features. Normalization: Data was normalized to a common scale (mean=0, std=1) to prevent model bias. Categorical Audit: Columns such as Mood, Gender, Age Group, and Language were audited for typos and found to be standardized. Date Parsing: No date parsing was required as this is an acoustic-only dataset. 4. Key Research Decisions: Outlier Handling: Detection: Global outlier detection was performed using Box Plots on the standardized data. Observation: While Pitch, Jitter, and Shimmer remained consistent, the MFCCs feature exhibited several statistical outliers at both ends. Decision: I decided to retain all identified outliers, keeping the full dataset size of 10,000 samples. Justification:This decision aligns with the goal of identifying "vocal signatures" of distress. High-intensity negative emotions, like acute distress or anger, often manifest as extreme physiological values. Statistical outliers in MFCCs likely represent peak indicators of emotional arousal rather than errors. Removing these points would result in the loss of significant "distress signature" data. ![image](https://cdn-uploads.huggingface.co/production/uploads/69cbdbb6407cdddba50e6b44/9HBpARqanurJt90ghar7w.png) 5. Descriptive Statistics & Pattern Discovery: This section summarizes the statistical findings and acoustic patterns that define the "vocal signature" of patient distress. - Distribution Analysis (Histograms): Pitch (Multi-modal Distribution): The presence of multiple peaks indicates distinct vocal regimes (e.g., zones for quiet/depressed speech versus high-energy/distressed speech). This structure establishes Pitch as the baseline of the vocal signature. MFCCs (Normal Distribution): This feature follows a classic bell curve (Normal Distribution) around the mean. This suggests that MFCCs reflect the natural, consistent variance of human vocal texture, independent of raw emotional intensity. ![image](https://cdn-uploads.huggingface.co/production/uploads/69cbdbb6407cdddba50e6b44/FiL9c2biOKy0qK_J--zO0.png) - Variance Analysis (Box Plot): Identifying Arousal Levels High-Arousal vs. Low-Arousal Distress: Negative emotions are identified through statistical extremes; while 'Sadness' clusters at the lower extreme (Z<−1.5), emotions such as 'Anger' and 'Fear' surge toward the upper extreme (Z>1.5). Objective Indicator of Crisis: These findings prove that distress is not a fixed value but a significant deviation from the mean pitch (∣Z∣>1.5), providing an objective marker for identifying moments of crisis in a therapeutic context. ![image](https://cdn-uploads.huggingface.co/production/uploads/69cbdbb6407cdddba50e6b44/N6yoL9APS4sdlR7YSJiyD.png) - Feature Interaction (Scatter Plot): The Banding Effect: The plot displays a clear vertical stratification of emotional states along the X-axis. This strong separation visually confirms that Pitch is the strongest acoustic predictor in this dataset. The Role of MFCCs: While Pitch separates the broad emotional categories, MFCCs serve as the spectral texture component. They provide the nuanced data required for finer classification within specific pitch ranges. ![image](https://cdn-uploads.huggingface.co/production/uploads/69cbdbb6407cdddba50e6b44/TxobR_GNEl6zZtstTp_T-.png) - Correlation Matrix (Heatmap): Zero Correlation (r≈0): There is a near-total lack of dependence between Pitch and MFCCs. Non-Redundancy: This independence indicates that each feature provides unique information; Pitch determines the intensity of distress (arousal), while MFCCs provide the qualitative "texture" needed to identify the specific type of emotion. ![image](https://cdn-uploads.huggingface.co/production/uploads/69cbdbb6407cdddba50e6b44/ZIit8cbGQ1ROmbAOVO-fO.png) 6. Visual Research Analysis Q1: What is the precise "shape" of the vocal density for each emotion? - Insight: High-arousal distress (Angry, Surprised) is "tall" and pushed toward the ceiling, while low-arousal distress (Sad, Bored) is compressed toward the floor. - Result: These unique density profiles establish Pitch as the baseline for identifying emotional "breaking points". ![image](https://cdn-uploads.huggingface.co/production/uploads/69cbdbb6407cdddba50e6b44/MX9TTELzRClo5IxMhVh92.png) Q2: Which feature exhibits the highest average magnitude during negative emotions? - Insight: Pitch shows the highest average absolute Z-score (~1.2) compared to Shimmer, Jitter, or MFCCs. - Result: Pitch is identified as the most reactive feature and the primary "red flag" for distress detection. ![image](https://cdn-uploads.huggingface.co/production/uploads/69cbdbb6407cdddba50e6b44/EKD_jWp2Aa1eUvFno97bx.png) Q3: How do Pitch and MFCCs interact at the margins of emotional arousal? - Insight: Pitch acts as the "arousal switch" that separates categories, while MFCCs remain statistically stable across moods. - Result: Pitch leads the distress signature, while MFCCs provide the qualitative "texture" unique to the speaker. ![image](https://cdn-uploads.huggingface.co/production/uploads/69cbdbb6407cdddba50e6b44/s1CcVXn2cuHlXM7dfKrCe.png) Q4: Is the dataset balanced enough to represent a diverse range of "Vocal Signatures"? - Insight: The dataset has a high level of class balance, with each of the 10 moods representing ~10% of the samples. - Result: This ensures that acoustic thresholds for distress are derived from a statistically significant and equally weighted sample set. ![image](https://cdn-uploads.huggingface.co/production/uploads/69cbdbb6407cdddba50e6b44/C0Qna7DjCyWb_G59wFdlK.png) 7. Final Conclusions - Primary Predictor: Pitch is clearly identified as the strongest predictor for the intensity of negative emotions and for differentiating types of distress. - The "Vocal Signature": The complete signature is achieved by combining Pitch with MFCCs as an independent, complementary variable. - Clinical Impact: This multi-feature framework allows for objective and accurate identification of emotional crises in clinical environments.

提供机构：

yuvalhazan2

5,000+

优质数据集

54 个

任务类型

进入经典数据集