Indian English Speech Emotion Dataset
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/mtk28hgc6x
下载链接
链接失效反馈官方服务:
资源简介:
Overview
Despite the global diversity of English accents, there is a notable scarcity of publicly available datasets for sentiment analysis focusing on Indian English speech. Existing speech emotion recognition datasets predominantly feature Western accents, limiting the development of region-specific models for Indian English, which is characterized by unique prosodic and phonetic variations. This dataset addresses this gap by providing a robust collection of Indian English speech samples tailored for emotion recognition, enabling advancements in culturally relevant AI applications.
Research Hypothesis
Acoustic features such as pitch, energy, and prosodic cues extracted directly from Indian English speech signals can effectively predict a speaker’s emotional state without requiring speech-to-text transcription. These signals contain distinguishable patterns for emotions like Happy, Sad, Angry, and Neutral, which can be learned by deep learning models such as Long Short-Term Memory (LSTM) networks.
Data Collection and Gathering
The dataset consists of audio recordings from native Indian English speakers across diverse regions of India, capturing a variety of accents and including both genders. Recordings were conducted in controlled, quiet environments to ensure high audio quality, with participants speaking short, natural sentences designed to express four emotions: Happy, Sad, Angry, and Neutral. The original dataset comprises 1,000 samples, evenly distributed across the four emotions .
To enhance model robustness and generalizability, data augmentation techniques—including pitch shifting, time stretching, and controlled noise addition—were applied, expanding the dataset to 3,000 samples. The augmented dataset maintains balance across emotions and is organized into labeled folders (Happy, Sad, Angry, Neutral) to facilitate supervised learning.
Notable Findings
An LSTM-based model trained solely on acoustic features, without reliance on textual transcription, achieved a classification accuracy of 85% on the original dataset. With the inclusion of augmented data, the accuracy improved significantly to 96%. These results highlight the effectiveness of acoustic cues in capturing emotional states in Indian English speech and the important role of data augmentation in improving model performance.
Data Interpretation and Usage
This dataset is a valuable resource for researchers and practitioners in speech emotion recognition and related fields. Potential applications include:
Developing and benchmarking transcription-free speech emotion recognition models.
Analyzing acoustic and prosodic patterns unique to Indian English emotional speech.
Building sentiment-aware applications such as voice assistants, call center analytics, and mental health monitoring tools.
Investigating the impact of data augmentation on acoustic-based sentiment model performance.
数据集概述
尽管全球英语口音呈现丰富多样性,但面向印度英语语音的情感分析公开数据集仍显著匮乏。现有语音情感识别数据集多以西方口音为主,这制约了针对印度英语的区域专属模型开发——印度英语具备独特的韵律与语音变异特征。本数据集针对这一空白,提供了专为情感识别打造的高质量印度英语语音样本集,助力与文化适配的人工智能应用发展。
研究假设
直接从印度英语语音信号中提取的音高、能量与韵律提示等声学特征,无需依赖语音转文字转录即可有效预测说话者的情绪状态。这类信号包含快乐、悲伤、愤怒与中性等情绪的可区分模式,可通过长短期记忆网络(Long Short-Term Memory, LSTM)等深度学习模型进行学习。
数据采集与整理
本数据集包含来自印度不同地区的本土印度英语使用者的语音录音,覆盖多种口音,且涵盖男女两类受访者。录音在受控的安静环境中进行以保障高音频质量,参与者需朗读旨在表达快乐、悲伤、愤怒与中性四种情绪的简短自然语句。原始数据集共包含1000条样本,四种情绪的样本量分布均衡。
为提升模型鲁棒性与泛化能力,研究团队采用了包括音高变换、时间拉伸与可控噪声添加在内的数据增强技术,将数据集扩充至3000条样本。扩充后的数据集仍保持各情绪类别的样本均衡,并按标注文件夹(快乐、悲伤、愤怒、中性)进行组织,以方便监督学习任务开展。
主要发现
仅使用声学特征训练且无需文本转录的LSTM模型,在原始数据集上实现了85%的分类准确率。加入扩充数据后,模型准确率显著提升至96%。上述结果证实了声学线索在捕捉印度英语语音情绪状态中的有效性,以及数据增强对提升模型性能的重要作用。
数据解读与应用场景
本数据集为语音情感识别及相关领域的研究人员与从业者提供了宝贵的资源,其潜在应用包括:
1. 开发并评测无转录依赖的语音情感识别模型;
2. 分析印度英语情感语音独有的声学与韵律模式;
3. 构建具备情感感知能力的应用,如语音助手、呼叫中心分析工具与心理健康监测工具;
4. 研究数据增强对基于声学的情感模型性能的影响。
创建时间:
2025-06-09



