Emotional Crowd Sound

Name: Emotional Crowd Sound
Creator: IEEE Dataport
License: 暂无描述

ieee-dataport.org2025-01-22 收录

下载链接：

https://ieee-dataport.org/open-access/emotional-crowd-sound

下载链接

链接失效反馈

官方服务：

资源简介：

Crowds express emotions as a collective individual, which is evident from the sounds that a crowd produces in particular events, e.g., collective booing, laughing or cheering in sports matches, movies, theaters, concerts, political demonstrations, and riots. Crowd sounds can be characterized by frequency-amplitude features, using analysis techniques similar to those applied on individual voices, where deep learning classification is applied to spectrogram images derived by sound transformations.We present the first dataset of data to apply a technique based on the generation of sound spectrograms from fragments of fixed length, extracted from original audio clips recorded in high-attendance events, where the crowd acts as a collective individual: Transfer learning techniques can be used on a neural network, novel or pre-trained on low-level features using extensive datasets of visual knowledge.The original sound clips are filtered and normalized in amplitude for a correct spectrogram generation, on which to fine-tune the domain-specific features.This dataset includes the complete data of the study, to reproduce each step.Files in the dataset:step0 original files: Approval 39 Disapproval 14 Neutral 15 step1 normalization: Approval 39 Disapproval 14 Neutral 15 We normalized the loudness of the dataset to −23 Loudness Units, following the EBU R128 standard. We filtered the sound in 20–20,000 Hz range. step2 sound blocks: Approval 1787 Disapproval 388 Neutral 7340 We divided the sound files in blocks with the following characteristics: 1s blocks length 0.25s shifting window 0.75s overlap We removed 37 silence blocks step3 spectrogram images: The blocks of the three emotional classes have been transformed to spectrogram images in four frequency scales: bark (0-3.5 kHz) erb (2-4 kHz) log (0.02-2 kHz) mel (4-6 kHz) Per each scale: Approval 1787 Disapproval 388 Neutral 7340 Spectrograms have been generated using the spgrambw draw spectrogram function. We used the Jet colormap of 64 colors, generating png images using a 400 samples hamming-window, frame increment of 4.5 millisecond. step4 train and test spectrograms: Training: Approval 1429 Disapproval 310 Neutral 5872 Test: Approval 358 Disapproval 78 Neutral 1468

人群以集体个体的形式表达情感，这一特征在特定事件中产生的群体声音中表现得尤为明显，例如在体育比赛、电影、剧院、音乐会、政治示威和暴乱中集体嘘声、笑声或欢呼声。群体声音可通过频率-振幅特征进行描述，采用与个人声音分析技术相似的方法，其中深度学习分类应用于由声音转换得到的频谱图图像。本研究首次提出了一种基于从原始音频剪辑中提取的固定长度片段生成声音频谱图的技术，这些音频剪辑记录了高出席率事件中群体作为集体个体的行为：可在神经网络上应用迁移学习技术，该网络可以是新颖的，也可以是在低级特征上预训练的，使用广泛的视觉知识数据集。原始音频剪辑经过滤波和振幅归一化，以确保正确的频谱图生成，并在其上微调特定领域的特征。本数据集包含了研究的完整数据，以重现每一步骤。数据集中的文件包括：step0 原始文件：赞成 39，反对 14，中立 15；step1 归一化：赞成 39，反对 14，中立 15；我们按照 EBU R128 标准，将数据集的响度归一化至 −23 响度单位。声音在 20–20,000 Hz 范围内进行滤波。step2 声音块：赞成 1787，反对 388，中立 7340；我们将声音文件分为具有以下特征的块：1 秒块长度，0.25 秒移动窗口，0.75 秒重叠，移除了 37 个静音块。step3 频谱图图像：三个情感类别的块被转换为四个频率尺度的频谱图图像： bark（0-3.5 kHz），erb（2-4 kHz），log（0.02-2 kHz），mel（4-6 kHz）。对于每个尺度：赞成 1787，反对 388，中立 7340；使用 spgrambw 绘制频谱图功能生成频谱图，我们使用了 64 色的 Jet 色彩映射，使用 400 样本的汉明窗口，4.5 毫秒的帧增量生成 png 图像。step4 训练和测试频谱图：训练：赞成 1429，反对 310，中立 5872；测试：赞成 358，反对 78，中立 1468。

提供机构：

IEEE Dataport

5,000+

优质数据集

54 个

任务类型

进入经典数据集