Emo-Emilia
收藏魔搭社区2025-12-05 更新2025-09-13 收录
下载链接:
https://modelscope.cn/datasets/ASLP-lab/Emo-Emilia
下载链接
链接失效反馈官方服务:
资源简介:
C<sup>2</sup>SER: [Paper](https://arxiv.org/abs/2502.18186) | [Code](https://github.com/zxzhao0/C2SER) | [HuggingFace](https://huggingface.co/collections/ASLP-lab/c2ser-67bc735d820403e7969fe8a0)
## Emo-Emilia Dataset
To better simulate real-world context, we introduce a new SER test set, **Emo-Emilia**.
Specifically, we apply the automated labeling approach to annotate Emilia, a large-scale multilingual and diverse speech generation resource with over 100,000 hours of speech data that captures a wide range of emotional contexts.
We then manually verify the accuracy of the emotion labels. Each utterance is checked by at least two experts to ensure both accuracy and reliability. The final proposed test set, Emo-Emilia, consists of 1400 test samples, with 100 samples per emotion category across seven types (angry, happy, fearful, surprised, neutral, sad and disgusted) in both Chinese and English (700 samples per language).
Emo-Emilia is a subset of Emilia dataset. The original Emilia dataset can be accessed [here](https://emilia-dataset.github.io/Emilia-Demo-Page/).
You can download the Emo-Emilia data file on HuggingFace [here](https://huggingface.co/datasets/ASLP-lab/Emo-Emilia). More audio information can be found in the `./Emo-Emilia/Emo-Emilia-ALL.jsonl` file
For more information, please refer to our paper "**Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought**" and Github: [C<sup>2</sup>SER](https://github.com/zxzhao0/C2SER)
C²SER: [论文](https://arxiv.org/abs/2502.18186) | [代码](https://github.com/zxzhao0/C2SER) | [HuggingFace数据集集合](https://huggingface.co/collections/ASLP-lab/c2ser-67bc735d820403e7969fe8a0)
## Emo-Emilia 数据集
为更逼真地模拟真实世界语境,我们提出全新的语音情感识别(Speech Emotion Recognition,SER)测试集**Emo-Emilia**。
具体而言,我们采用自动化标注方法对Emilia进行情感标注——该数据集是一款大规模多语言、多样化的语音生成资源,包含超10万小时语音数据,涵盖丰富多样的情感语境。随后我们对情感标签的准确性开展人工核验:每一条语音片段均由至少两位专家审核,以确保标签的准确性与可靠性。最终构建的测试集Emo-Emilia共包含1400条测试样本,覆盖7类情感类别(愤怒、喜悦、恐惧、惊讶、中性、悲伤、厌恶),每类情感在中文与英文语境下各含100条样本(单语言样本量为700条)。
Emo-Emilia是Emilia数据集的子集,原始Emilia数据集可通过[此链接](https://emilia-dataset.github.io/Emilia-Demo-Page/)获取。
您可通过HuggingFace平台的[此链接](https://huggingface.co/datasets/ASLP-lab/Emo-Emilia)下载Emo-Emilia数据集文件,更多音频相关信息可在`./Emo-Emilia/Emo-Emilia-ALL.jsonl`文件中查阅。
如需了解更多信息,请参阅我们的论文**《基于语境感知与思维链引导语言模型实现稳定的语音情感识别》**及GitHub仓库[C²SER](https://github.com/zxzhao0/C2SER)
提供机构:
maas
创建时间:
2025-09-04



