Emilia-with-Emotion-Annotations

Name: Emilia-with-Emotion-Annotations
Creator: maas
Published: 2025-12-05 16:53:27
License: 暂无描述

魔搭社区2025-12-05 更新2025-11-29 收录

下载链接：

https://modelscope.cn/datasets/laion/Emilia-with-Emotion-Annotations

下载链接

链接失效反馈

官方服务：

资源简介：

### **Dataset Card for Emilia with Emotion Annotations** #### **Dataset Description** This dataset is an enhanced version of the Emilia dataset, enriched with detailed emotion annotations. The annotations were generated using models from the EmoNet suite to provide deeper insight into the emotional content of speech. This work is based on the research and models described in the blog post "Do They See What We See?". The annotations include 54 scores for each sample, covering a wide range of emotional and paralinguistic attributes, as well as an emotion caption generated by the BUD-E Whisper model. The goal is to enable more nuanced research and development in emotionally intelligent AI. #### **Dataset Structure & Access** The dataset includes the original Emilia audio data, with the addition of the new emotion annotations, provided in WebDataset format. Currently, the dataset is distributed across five Hugging Face repositories: * `laion/Emilia-with-Emotion-Annotations` * `laion/Emilia-with-Emotion-Annotations2` * `laion/Emilia-with-Emotion-Annotations3` * `laion/Emilia-with-Emotion-Annotations4` * `laion/Emilia-with-Emotion-Annotations5` To access the complete dataset, you must gather the data from all five repositories. (Note: We plan to merge these into a single repository in a the coming days with an even better annotated version.) The original `.tar` files for the Emilia dataset are also included. Files belonging to the YODAS subset can be identified by a suffix in their filenames. #### **Dataset Statistics** This combined dataset comprises approximately **215,600 hours** of speech, merging the original Emilia dataset with a large portion of the YODAS dataset. The inclusion of YODAS significantly expands the linguistic diversity and the total volume of data. The language distribution is broken down as follows: | Language | Emilia Duration (hours) | Emilia-YODAS Duration (hours) | Total Duration (hours) | | :--- | :--- | :--- | :--- | | English | 46.8k | 92.2k | 139.0k | | Chinese | 49.9k | 0.3k | 50.3k | | German | 1.6k | 5.6k | 7.2k | | French | 1.4k | 7.4k | 8.8k | | Japanese | 1.7k | 1.1k | 2.8k | | Korean | 0.2k | 7.3k | 7.5k | | **Total** | **101.7k**| **113.9k**| **215.6k**| #### **Interpretation of Scores** The models predict raw scores for 40 emotional categories and 14 attribute dimensions. For the emotional categories, these raw scores are also used to calculate a normalized Softmax probability, indicating the relative likelihood of each emotion. | Attribute | Range | Description | | :--- | :--- | :--- | | **Valence** | -3 to +3 | -3: Ext. Negative, +3: Ext. Positive, 0: Neutral | | **Arousal**| 0 to 4 | 0: Very Calm, 4: Very Excited, 2: Neutral | | **Dominance**| -3 to +3 | -3: Ext. Submissive, +3: Ext. Dominant, 0: Neutral | | **Age**| 0 to 6 | 0: Infant/Toddler, 2: Teenager, 4: Adult, 6: Very Old | | **Gender**| -2 to +2 | -2: Very Masculine, +2: Very Feminine, 0: Neutral/Unsure | | **Humor** | 0 to 4 | 0: Very Serious, 4: Very Humorous, 2: Neutral | | **Detachment**| 0 to 4 | 0: Very Vulnerable, 4: Very Detached, 2: Neutral | | **Confidence**| 0 to 4 | 0: Very Confident, 4: Very Hesitant, 2: Neutral | | **Warmth**| -2 to +2 | -2: Very Cold, +2: Very Warm, 0: Neutral | | **Expressiveness**| 0 to 4 | 0: Very Monotone, 4: Very Expressive, 2: Neutral | | **Pitch**| 0 to 4 | 0: Very High-Pitched, 4: Very Low-Pitched, 2: Neutral | | **Softness**| -2 to +2 | -2: Very Harsh, +2: Very Soft, 0: Neutral | | **Authenticity**| 0 to 4 | 0: Very Artificial, 4: Very Genuine, 2: Neutral | | **Recording Quality**| 0 to 4 | 0: Very Low, 4: Very High, 2: Decent | | **Background Noise**| 0 to 3 | 0: No Noise, 3: Intense Noise | #### **Citation** If you use this dataset, please cite the original Emilia dataset paper as well as the EmoNet-Voice paper. ```bibtex @inproceedings{emilialarge, author={He, Haorui and Shang, Zengqiang and Wang, Chaoren and Li, Xuyuan and Gu, Yicheng and Hua, Hua and Liu, Liwei and Yang, Chen and Li, Jiaqi and Shi, Peiyang and Wang, Yuancheng and Chen, Kai and Zhang, Pengyuan and Wu, Zhizheng}, title={Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation}, booktitle={arXiv:2501.15907}, year={2025} } @article{emonet_voice_2025, author={Schuhmann, Christoph and Kaczmarczyk, Robert and Rabby, Gollam and Friedrich, Felix and Kraus, Maurice and Nadi, Kourosh and Nguyen, Huu and Kersting, Kristian and Auer, Sören}, title={EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection}, journal={arXiv preprint arXiv:2506.09827}, year={2025} } ```

### **带情感标注的Emilia数据集卡片** #### **数据集描述** 本数据集为Emilia数据集的增强版本，补充了精细的情感标注。标注由EmoNet套件中的模型生成，旨在深入解析语音中的情感内涵。本工作基于博文《Do They See What We See?》中阐述的研究与模型。每份样本包含54项评分，覆盖丰富的情感与副语言属性，同时附带由BUD-E Whisper模型生成的情感文本描述。本数据集旨在推动情感智能AI领域的精细化研究与开发。 #### **数据集结构与获取方式** 本数据集包含原始Emilia音频数据，并新增了以WebDataset格式存储的情感标注。目前，本数据集分布于5个Hugging Face仓库： * `laion/Emilia-with-Emotion-Annotations` * `laion/Emilia-with-Emotion-Annotations2` * `laion/Emilia-with-Emotion-Annotations3` * `laion/Emilia-with-Emotion-Annotations4` * `laion/Emilia-with-Emotion-Annotations5` 如需获取完整数据集，需从全部5个仓库中收集数据。（注：我们计划于近期将这些仓库合并为一个，并推出标注更为完善的新版本。）本数据集同时附带原始Emilia数据集的`.tar`文件。YODAS子集的文件可通过文件名后缀进行识别。 #### **数据集统计信息** 本合并数据集总计包含约**215600小时**的语音数据，由原始Emilia数据集与大部分YODAS数据集合并而来。新增YODAS数据集极大拓展了数据的语言多样性与总体量。语言分布情况如下： | 语言 | Emilia时长（小时） | Emilia-YODAS合并时长（小时） | 总时长（小时） | | :--- | :--- | :--- | :--- | | 英语 | 46.8k | 92.2k | 139.0k | | 汉语 | 49.9k | 0.3k | 50.3k | | 德语 | 1.6k | 5.6k | 7.2k | | 法语 | 1.4k | 7.4k | 8.8k | | 日语 | 1.7k | 1.1k | 2.8k | | 韩语 | 0.2k | 7.3k | 7.5k | | **总计** | **101.7k**| **113.9k**| **215.6k**| #### **评分释义** 模型为40个情感类别与14个属性维度预测原始评分。对于情感类别，这些原始评分还会被用于计算归一化Softmax概率，以表征各情感的相对发生可能性。 | 属性 | 取值范围 | 释义 | | :--- | :--- | :--- | | **效价** | -3 至 +3 | -3：极度消极，+3：极度积极，0：中性 | | **唤醒度** | 0 至 4 | 0：极度平静，4：极度兴奋，2：中性 | | **支配度** | -3 至 +3 | -3：极度顺从，+3：极度支配，0：中性 | | **年龄** | 0 至 6 | 0：婴儿/幼儿，2：青少年，4：成年，6：老年 | | **性别** | -2 至 +2 | -2：极具男性化，+2：极具女性化，0：中性/无法确定 | | **幽默感** | 0 至 4 | 0：极度严肃，4：极具幽默感，2：中性 | | **疏离感** | 0 至 4 | 0：极度脆弱，4：极度疏离，2：中性 | | **自信度** | 0 至 4 | 0：极度自信，4：极度犹豫，2：中性 | | **温暖度** | -2 至 +2 | -2：极度冷漠，+2：极度温暖，0：中性 | | **表现力** | 0 至 4 | 0：极度单调，4：极具表现力，2：中性 | | **音调** | 0 至 4 | 0：极高音调，4：极低音调，2：中性 | | **柔和度** | -2 至 +2 | -2：极度刺耳，+2：极度柔和，0：中性 | | **真实感** | 0 至 4 | 0：极度虚假，4：极度真实，2：中性 | | **录音质量** | 0 至 4 | 0：质量极低，4：质量极高，2：良好 | | **背景噪音** | 0 至 3 | 0：无噪音，3：噪音强烈 | #### **引用说明** 若使用本数据集，请同时引用原始Emilia数据集论文与EmoNet-Voice论文。 bibtex @inproceedings{emilialarge, author={He, Haorui and Shang, Zengqiang and Wang, Chaoren and Li, Xuyuan and Gu, Yicheng and Hua, Hua and Liu, Liwei and Yang, Chen and Li, Jiaqi and Shi, Peiyang and Wang, Yuancheng and Chen, Kai and Zhang, Pengyuan and Wu, Zhizheng}, title={Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation}, booktitle={arXiv:2501.15907}, year={2025} } @article{emonet_voice_2025, author={Schuhmann, Christoph and Kaczmarczyk, Robert and Rabby, Gollam and Friedrich, Felix and Kraus, Maurice and Nadi, Kourosh and Nguyen, Huu and Kersting, Kristian and Auer, Sören}, title={EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection}, journal={arXiv preprint arXiv:2506.09827}, year={2025} }

提供机构：

maas

创建时间：

2025-10-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集