five

Replication Data for: Potential and Pitfalls of Audio-as-Data: alignment, features and classification models

收藏
DataONE2025-11-21 更新2025-11-29 收录
下载链接:
https://search.dataone.org/view/sha256:7dc35711a151b3839a82b9e326e02e8314f93632f5e6c900ea67ee352124926c
下载链接
链接失效反馈
官方服务:
资源简介:
Political science is a field rich in multimodal information sources, from televised debates to parliamentary briefings. This paper bridges a gap between computer and political science in multimodal data analysis using audio. The adoption of multimodal analyses in political science (e.g., video/audio with text-as-data approaches) has been relatively slow due to unequal distribution of computational power and skills needed. We provide solutions to challenges encountered when analyzing audio, advancing potential for multimodal data analysis in political science. Using a dataset of all televised US presidential debates from 1960-2020, we focus on three features encountered when analyzing audio data: low level descriptors (LLDs) like pitch or energy, Mel-frequency cepstral coefficients (MFCCs), and audio embeddings/encodings like Wav2Vec. We showcase four applications: a) forced alignment of audio-text using MFCCs, timestamping transcripts and speaker information; b) speech characterization using LLDs; c) custom-made classification models with audio embeddings and MFCCs; and d) emotional recognition models using Wav2Vec for classification of discrete emotions and their valence-arousal-dominance. We provide explanations to help understand how these features can be applied for different political research questions and advice on vigilance to naive interpretation, for both experienced researchers and those who want to start working with audio.
创建时间:
2025-11-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作