five

SEIRDB

收藏
魔搭社区2025-08-02 更新2025-03-08 收录
下载链接:
https://modelscope.cn/datasets/pengzhendong/SEIRDB
下载链接
链接失效反馈
官方服务:
资源简介:
# Speech Emotion Intensity Recognition Database (SEIR-DB) ## Dataset Description - **Homepage:** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact: gabegiangi@gmail.com** ### Dataset Summary The SEIR-DB is a comprehensive, multilingual speech emotion intensity recognition dataset containing over 600,000 instances from various sources. It is designed to support tasks related to speech emotion recognition and emotion intensity estimation. The database includes languages such as English, Russian, Mandarin, Greek, Italian, and French. ### Supported Tasks and Leaderboards The SEIR-DB is suitable for: - **Speech Emotion Recognition** (classification of discrete emotional states) - **Speech Emotion Intensity Estimation** (a subset of this dataset, where intensity is rated from 1–5) #### SPEAR (8 emotions – 375 hours) [SPEAR (Speech Emotion Analysis and Recognition System)](mailto:gabegiangi@gmail.com) is an **ensemble model** and serves as the SER **benchmark** for this dataset. Below is a comparison of its performance against the best fine-tuned pre-trained model (WavLM Large): | WavLM Large Test Accuracy | SPEAR Test Accuracy | Improvement | |---------------------------|---------------------|-------------| | 87.8% | 90.8% | +3.0% | More detailed metrics for **SPEAR**: | Train Accuracy (%) | Validation Accuracy (%) | Test Accuracy (%) | |--------------------|-------------------------|-------------------| | 99.8% | 90.4% | 90.8% | --- ## Languages SEIR-DB encompasses multilingual data, featuring languages such as English, Russian, Mandarin, Greek, Italian, and French. ## Dataset Structure ### Data Instances The raw data collection comprises over 600,000 data instances (375 hours). Users of the database can access the raw audio data, which is stored in subdirectories of the data directory (in their respective datasets). After processing, cleaning, and formatting, the dataset contains approximately 120,000 training instances with an average audio utterance length of 3.8 seconds. ### Data Fields - **ID**: unique sample identifier - **WAV**: path to the audio file, located in the data directory - **EMOTION**: annotated emotion - **INTENSITY**: annotated intensity (ranging from 1-5), where 1 denotes low intensity, and 5 signifies high intensity; 0 indicates no annotation - **LENGTH**: duration of the audio utterance ### Data Splits The data is divided into train, test, and validation sets, located in the respective JSON manifest files. - **Train**: 80% - **Validation**: 10% - **Test**: 10% For added flexibility, unsplit data is also available in `data.csv` to allow custom splits. ## Dataset Creation ### Curation Rationale The SEIR-DB was curated to maximize the volume of data instances, addressing a significant limitation in speech emotion recognition (SER) experimentation—the lack of emotion data and the small size of available datasets. This database aims to resolve these issues by providing a large volume of emotion-annotated data that is cleanly formatted for experimentation. ### Source Data The dataset was compiled from various sources. ### Annotations #### Annotation process For details on the annotation process, please refer to the source for each dataset, as they were conducted differently. However, the entire database is human-annotated. #### Who are the annotators? Please consult the source documentation for information on the annotators. ### Personal and Sensitive Information No attempt was made to remove personal and sensitive information, as consent and recordings were not obtained internally. ## Considerations for Using the Data ### Social Impact of Dataset The SEIR-DB dataset can significantly impact the research and development of speech emotion recognition technologies by providing a large volume of annotated data. These technologies have the potential to enhance various applications, such as mental health monitoring, virtual assistants, customer support, and communication devices for people with disabilities. ### Discussion of Biases During the dataset cleaning process, efforts were made to balance the database concerning the number of samples for each dataset, emotion distribution (with a greater focus on primary emotions and less on secondary emotions), and language distribution. However, biases may still be present. ### Other Known Limitations No specific limitations have been identified at this time. ## Additional Information ### Dataset Curators Gabriel Giangi - Concordia University - Montreal, QC Canada - [gabegiangi@gmail.com](mailto:gabegiangi@gmail.com) ### Licensing Information This dataset can be used for research and academic purposes. For commercial purposes, please contact [gabegiangi@gmail.com](mailto:gabegiangi@gmail.com). ### Citation Information Aljuhani, R. H., Alshutayri, A., & Alahdal, S. (2021). Arabic speech emotion recognition from Saudi dialect corpus. IEEE Access, 9, 127081-127085. Basu, S., Chakraborty, J., & Aftabuddin, M. (2017). Emotion recognition from speech using convolutional neural network with recurrent neural network architecture. In ICCES. Baevski, A., Zhou, H. H., & Collobert, R. (2020). Wav2vec 2.0: A framework for self-supervised learning of speech representations. In NeurIPS. Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., Kim, S., ... & Narayanan, S. (2008). Iemocap: Interactive emotional dyadic motion capture database. In LREC. Cao, H., Cooper, D.G., Keutmann, M.K., Gur, R.C., Nenkova, A., & Verma, R. (2014). CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset. IEEE Transactions on Affective Computing, 5, 377-390. Chopra, S., Mathur, P., Sawhney, R., & Shah, R. R. (2021). Meta-Learning for Low-Resource Speech Emotion Recognition. In ICASSP. Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). EMOVO Corpus: an Italian Emotional Speech Database. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) (pp. 3501-3504). European Language Resources Association (ELRA). Reykjavik, Iceland. http://www.lrec-conf.org/proceedings/lrec2014/pdf/591_Paper.pdf Duville, Mathilde Marie; Alonso-Valerdi, Luz María; Ibarra-Zarate, David I. (2022), “Mexican Emotional Speech Database (MESD)”, Mendeley Data, V5, doi: 10.17632/cy34mh68j9.5 Gournay, Philippe, Lahaie, Olivier, & Lefebvre, Roch. (2018). A Canadian French Emotional Speech Dataset (1.1) [Data set]. ACM Multimedia Systems Conference (MMSys 2018) (MMSys'18), Amsterdam, The Netherlands. Zenodo. https://doi.org/10.5281/zenodo.1478765 Kandali, A., Routray, A., & Basu, T. (2008). Emotion recognition from Assamese speeches using MFCC features and GMM classifier. In TENCON. Kondratenko, V., Sokolov, A., Karpov, N., Kutuzov, O., Savushkin, N., & Minkin, F. (2022). Large Raw Emotional Dataset with Aggregation Mechanism. arXiv preprint arXiv:2212.12266. Kwon, S. (2021). MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach. Expert Systems with Applications, 167, 114177. Lee, Y., Lee, J. W., & Kim, S. (2019). Emotion recognition using convolutional neural network and multiple feature fusion. In ICASSP. Li, Y., Baidoo, C., Cai, T., & Kusi, G. A. (2019). Speech emotion recognition using 1d cnn with no attention. In ICSEC. Lian, Z., Tao, J., Liu, B., Huang, J., Yang, Z., & Li, R. (2020). Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition. In Interspeech. Livingstone, S. R., & Russo, F. A. (2018). The Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13(5), e0196391. Peng, Z., Li, X., Zhu, Z., Unoki, M., Dang, J., & Akagi, M. (2020). Speech emotion recognition using 3d convolutions and attention-based sliding recurrent networks with auditory front-ends. IEEE Access, 8, 16560-16572. Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., & Mihalcea, R. (2019). Meld: A multimodal multi-party dataset for emotion recognition in conversations. In ACL. Schneider, A., Baevski, A., & Collobert, R. (2019). Wav2vec: Unsupervised pre-training for speech recognition. In ICLR. Schuller, B., Rigoll, G., & Lang, M. (2010). Speech emotion recognition: Features and classification models. In Interspeech. Sinnott, R. O., Radulescu, A., & Kousidis, S. (2013). Surrey audiovisual expressed emotion (savee) database. In AVEC. Vryzas, N., Kotsakis, R., Liatsou, A., Dimoulas, C. A., & Kalliris, G. (2018). Speech emotion recognition for performance interaction. Journal of the Audio Engineering Society, 66(6), 457-467. Vryzas, N., Matsiola, M., Kotsakis, R., Dimoulas, C., & Kalliris, G. (2018, September). Subjective Evaluation of a Speech Emotion Recognition Interaction Framework. In Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion (p. 34). ACM. Wang, Y., Yang, Y., Liu, Y., Chen, Y., Han, N., & Zhou, J. (2019). Speech emotion recognition using a combination of cnn and rnn. In Interspeech. Yoon, S., Byun, S., & Jung, K. (2018). Multimodal speech emotion recognition using audio and text. In SLT. Zhang, R., & Liu, M. (2020). Speech emotion recognition with self-attention. In ACL. ### Contributions Gabriel Giangi - Concordia University - Montreal, QC Canada - [gabegiangi@gmail.com](mailto:gabegiangi@gmail.com)

# 语音情感强度识别数据库(Speech Emotion Intensity Recognition Database, SEIR-DB) ## 数据集说明 - **主页**: - **仓库**: - **论文**: - **排行榜**: - **联系方式:gabegiangi@gmail.com** ### 数据集概述 SEIR-DB是一款全面的多语言语音情感强度识别数据集,包含来自多源的超60万个数据实例,旨在支撑语音情感识别与情感强度估计相关任务。该数据库涵盖英语、俄语、普通话、希腊语、意大利语及法语等语种。 ### 支持任务与排行榜 SEIR-DB适用于以下任务: - **语音情感识别(Speech Emotion Recognition, SER)**(对离散情感状态进行分类) - **语音情感强度估计**(该数据集的子任务,情感强度按1–5进行评级) #### SPEAR(8种情感——375小时) [SPEAR(语音情感分析与识别系统)](mailto:gabegiangi@gmail.com)是一款**集成模型(ensemble model)**,同时作为本数据集的SER**基准模型(benchmark)**。下表对比了其与当前最优的微调预训练模型WavLM Large的性能表现: | WavLM Large 测试准确率 | SPEAR 测试准确率 | 性能提升 | |---------------------------|---------------------|-------------| | 87.8% | 90.8% | +3.0% | SPEAR的详细性能指标如下: | 训练准确率(%) | 验证准确率(%) | 测试准确率(%) | |--------------------|-------------------------|-------------------| | 99.8% | 90.4% | 90.8% | --- ## 语种 SEIR-DB涵盖多语言数据,包含英语、俄语、普通话、希腊语、意大利语及法语等语种。 ## 数据集结构 ### 数据实例 原始数据采集包含超60万个数据实例(总时长375小时)。数据库使用者可访问原始音频数据,其存储于数据目录下的各子数据集子文件夹中。 经过处理、清洗与格式化后,该数据集包含约12万个训练样本,平均语音片段时长为3.8秒。 ### 数据字段 - **ID**:唯一样本标识符 - **WAV**:音频文件路径,位于数据目录下 - **EMOTION**:标注情感类别 - **INTENSITY**:标注情感强度(取值范围1-5),其中1代表低强度,5代表高强度;0表示无标注 - **LENGTH**:语音片段时长 ### 数据划分 数据集被划分为训练集、测试集与验证集,分别存储于对应的JSON清单文件中: - **训练集**:80% - **验证集**:10% - **测试集**:10% 为提升灵活性,未划分的完整数据也可通过`data.csv`获取,支持自定义数据划分。 ## 数据集构建 ### 构建初衷 SEIR-DB的构建旨在最大化数据实例规模,以解决语音情感识别(SER)实验中普遍存在的两大痛点:情感标注数据匮乏、可用数据集规模过小。本数据库通过提供大规模、格式规范的情感标注数据,旨在解决上述问题。 ### 源数据 本数据集由多源数据整合而成。 ### 标注 #### 标注流程 有关标注流程的详细信息,请查阅各源数据集的文档,因各数据集的标注流程存在差异。但本数据库全部标注均由人工完成。 #### 标注人员信息 请查阅各源数据集的官方文档以获取标注人员相关信息。 ### 个人与敏感信息 本数据集未尝试移除个人与敏感信息,因内部未获取相关录制同意书。 ## 数据使用注意事项 ### 数据集的社会影响 SEIR-DB可为语音情感识别技术的研发提供大规模标注数据,将对该领域的研究与开发产生显著推动作用。此类技术有望赋能诸多应用场景,例如心理健康监测、虚拟助手、客户服务,以及面向残障人士的通信设备等。 ### 偏差说明 在数据集清洗过程中,已尽力平衡各数据集的样本数量、情感分布(优先覆盖主要情感,次要情感占比相对较低)及语种分布。但仍可能存在潜在偏差。 ### 其他已知局限 目前尚未发现其他明确局限。 ## 附加信息 ### 数据集维护者 Gabriel Giangi - 加拿大魁北克省蒙特利尔康考迪亚大学 - [gabegiangi@gmail.com](mailto:gabegiangi@gmail.com) ### 授权信息 本数据集可用于研究与学术用途。若需用于商业用途,请联系[gabegiangi@gmail.com](mailto:gabegiangi@gmail.com)。 ### 引用信息 Aljuhani, R. H., Alshutayri, A., & Alahdal, S. (2021). 基于沙特方言语料库的阿拉伯语语音情感识别. IEEE 接入, 9, 127081-127085. Basu, S., Chakraborty, J., & Aftabuddin, M. (2017). 基于卷积神经网络与循环神经网络架构的语音情感识别. 国际计算机与电子工程会议(ICCES). Baevski, A., Zhou, H. H., & Collobert, R. (2020). Wav2vec 2.0:一种语音表征自监督学习框架. 神经信息处理系统大会(NeurIPS). Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., Kim, S., ... & Narayanan, S. (2008). IEMOCAP:交互式情感双向动作捕捉数据库. 国际语言资源与评价会议(LREC). Cao, H., Cooper, D.G., Keutmann, M.K., Gur, R.C., Nenkova, A., & Verma, R. (2014). CREMA-D:众包情感多模态演员数据集. IEEE情感计算汇刊, 5, 377-390. Chopra, S., Mathur, P., Sawhney, R., & Shah, R. R. (2021). 低资源语音情感识别的元学习方法. 国际声学、语音与信号处理会议(ICASSP). Costantini, G., Iaderola, I., Paoloni, A., & Todisco, M. (2014). EMOVO语料库:意大利语情感语音数据库. 第九届国际语言资源与评价会议(LREC'14)论文集 (pp. 3501-3504). 欧洲语言资源协会(ELRA). 冰岛雷克雅未克. http://www.lrec-conf.org/proceedings/lrec2014/pdf/591_Paper.pdf Duville, Mathilde Marie; Alonso-Valerdi, Luz María; Ibarra-Zarate, David I. (2022). "墨西哥情感语音数据库(MESD)", Mendeley数据, V5, doi: 10.17632/cy34mh68j9.5 Gournay, Philippe, Lahaie, Olivier, & Lefebvre, Roch. (2018). 加拿大法语情感语音数据集(1.1) [数据集]. 多媒体系统会议(MMSys 2018), 荷兰阿姆斯特丹. Zenodo. https://doi.org/10.5281/zenodo.1478765 Kandali, A., Routray, A., & Basu, T. (2008). 基于MFCC特征与GMM分类器的阿萨姆语语音情感识别. IEEE区域10会议(TENCON). Kondratenko, V., Sokolov, A., Karpov, N., Kutuzov, O., Savushkin, N., & Minkin, F. (2022). 带聚合机制的大规模原始情感数据集. arXiv预印本arXiv:2212.12266. Kwon, S. (2021). MLT-DNet:基于多学习技巧的一维膨胀卷积神经网络的语音情感识别. 专家系统与应用, 167, 114177. Lee, Y., Lee, J. W., & Kim, S. (2019). 基于卷积神经网络与多特征融合的语音情感识别. 国际声学、语音与信号处理会议(ICASSP). Li, Y., Baidoo, C., Cai, T., & Kusi, G. A. (2019). 基于无注意力一维CNN的语音情感识别. 国际安全、环境与计算机通信会议(ICSEC). Lian, Z., Tao, J., Liu, B., Huang, J., Yang, Z., & Li, R. (2020). 面向多模态情感识别的上下文相关域对抗神经网络. 国际语音通信协会年会(Interspeech). Livingstone, S. R., & Russo, F. A. (2018). 北美英语语音与歌曲情感视听数据库(RAVDESS):包含动态多模态面部与语音表情的数据集. 公共科学图书馆·综合, 13(5), e0196391. Peng, Z., Li, X., Zhu, Z., Unoki, M., Dang, J., & Akagi, M. (2020). 基于听觉前端的三维卷积与注意力滑动循环网络的语音情感识别. IEEE 接入, 8, 16560-16572. Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., & Mihalcea, R. (2019). MELD:面向对话情感识别的多模态多参与方数据集. 计算语言学协会年会(ACL). Schneider, A., Baevski, A., & Collobert, R. (2019). Wav2vec:用于语音识别的无监督预训练. 国际学习表征会议(ICLR). Schuller, B., Rigoll, G., & Lang, M. (2010). 语音情感识别:特征与分类模型. 国际语音通信协会年会(Interspeech). Sinnott, R. O., Radulescu, A., & Kousidis, S. (2013). Surrey视听情感表达(SAVEE)数据库. 音频/视频情感挑战赛(AVEC). Vryzas, N., Kotsakis, R., Liatsou, A., Dimoulas, C. A., & Kalliris, G. (2018). 面向表演交互的语音情感识别. 音频工程协会期刊, 66(6), 457-467. Vryzas, N., Matsiola, M., Kotsakis, R., Dimoulas, C., & Kalliris, G. (2018, September). 语音情感识别交互框架的主观评估. 2018音频主要会议:沉浸式与情感声音 (p. 34). ACM. Wang, Y., Yang, Y., Liu, Y., Chen, Y., Han, N., & Zhou, J. (2019). 基于CNN与RNN结合的语音情感识别. 国际语音通信协会年会(Interspeech). Yoon, S., Byun, S., & Jung, K. (2018). 基于音频与文本的多模态语音情感识别. 语音语言技术研讨会(SLT). Zhang, R., & Liu, M. (2020). 基于自注意力的语音情感识别. 计算语言学协会年会(ACL). ### 贡献 Gabriel Giangi - 加拿大魁北克省蒙特利尔康考迪亚大学 - [gabegiangi@gmail.com](mailto:gabegiangi@gmail.com)
提供机构:
maas
创建时间:
2025-03-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作