EmoNet-Face-Big

Name: EmoNet-Face-Big
Creator: maas
Published: 2025-12-29 16:51:38
License: 暂无描述

魔搭社区2025-12-29 更新2025-11-03 收录

下载链接：

https://modelscope.cn/datasets/laion/EmoNet-Face-Big

下载链接

链接失效反馈

官方服务：

资源简介：

# EmoNet-Face: A Fine-Grained, Expert-Annotated Benchmark for Facial Emotion Recognition ## Dataset Summary **EmoNet-Face** is a comprehensive benchmark suite designed to address critical gaps in facial emotion recognition (FER). Current benchmarks often have a narrow emotional spectrum, lack demographic diversity, and use uncontrolled imagery. EmoNet-Face provides a robust foundation for developing and evaluating AI systems with a deeper, more nuanced understanding of human emotions. This work was accepted at NeurIPS 2025. ### Key Contributions (as recognized by reviewers): * **A Novel 40-Category Taxonomy:** A fine-grained emotion taxonomy meticulously derived from foundational psychological research (the "Handbook of Emotions") to capture a rich spectrum of human emotional states. * **High-Quality Synthetic Data:** Large-scale datasets generated with state-of-the-art text-to-image models, ensuring clear, full-face expressions with controlled, balanced representation across ethnicity, age, and gender. * **Rigorous Expert Annotation:** All evaluation and fine-tuning datasets are annotated by psychology experts, ensuring high-quality, trustworthy labels. * **Reduced Risk:** As a fully synthetic dataset, EmoNet-Face mitigates the privacy and consent risks associated with datasets of real individuals. This repository contains the **EmoNet-Face Big** dataset. ## Dataset Structure ### Data Fields Each instance in the dataset includes the following fields: * `image`: A PIL-compatible image object of the facial expression. * `prompt`: The full text prompt used to generate the image, including emotion and demographic details. * `demographics`: A dictionary containing the parsed demographic attributes (e.g., `ethnicity`, `gender`, `age`). * `annotations`: The emotion annotations for the image. The structure varies by dataset split. ### Data Splits and Annotation Format * **For `emonet-face-hq` (Benchmark/Test Set):** * **Purpose:** This dataset is the primary benchmark for evaluation. * **Size:** 2,500 images. * **Annotations:** Each image was annotated by four different psychology experts. The `annotations` field contains a list of dictionaries, where each dictionary represents one expert's ratings across all 40 emotion categories on a continuous intensity scale from 0 (absent) to 7 (very strong). * **For `emonet-face-binary` (Fine-tuning Set):** * **Purpose:** This dataset is designed for fine-tuning models. * **Size:** 19,999 images. * **Annotations:** Annotations were collected via a multi-stage binary agreement protocol. The `annotations` field contains binary labels (present/absent) for specific emotions, confirmed by up to three experts to ensure high-consensus positive and negative samples. * **For `emonet-face-big` (Pre-training Set):** * **Purpose:** This large-scale dataset is intended for model pre-training. * **Size:** 203,201 images. * **Annotations:** The `annotations` field contains synthetically generated labels from a VLM (Gemini-2.5-Flash) designed to provide broad coverage across the 40-category taxonomy. ## Dataset Creation ### Curation Rationale The dataset was created to overcome the limitations of existing FER benchmarks, which often lack emotional granularity, demographic diversity, and annotation quality. By using synthetic imagery, we can control for these factors while eliminating contextual confounders and privacy risks. All images underwent a manual expert review to filter for quality and artifacts. While labor-intensive, this rigorous curation was essential for creating a gold-standard benchmark and distinguishes EmoNet-Face from noisier, automatically collected datasets. ### Source Data and Image Generation The images are fully synthetic and were generated using state-of-the-art text-to-image models: **Midjourney v6** (under a paid subscription) and **FLUX.1 [dev]**. The generation prompts were systematically engineered to ensure a balanced distribution of emotions across diverse demographics (14 ethnic groups, ages 20-80, and three gender identities). ### Annotations Annotations for `emonet-face-hq` and `emonet-face-binary` were provided by a team of 13 annotators with verified academic degrees in psychology. The annotation was performed on a custom, open-source platform. ## Considerations for Using the Data ### Social Impact and Responsible Use This dataset is intended for academic and research purposes to advance the development of fair, nuanced, and empathetic AI systems. #### **Prohibited Uses** This dataset and any models trained on it **are not intended for and must not be used in** sensitive, high-stakes domains where misinterpretation could lead to significant harm. In compliance with emerging regulations like the EU AI Act (Article 5(1)(f)), prohibited uses include, but are not limited to: * Emotion recognition in the workplace or educational institutions. * Real-time or post-hoc surveillance in public spaces. * Systems for law enforcement, border control, or asylum applications. * Credit scoring, insurance risk assessment, or hiring. * Any application that could lead to manipulation, deception, or unlawful discrimination. #### **User Responsibility** Downstream users are solely responsible for ensuring their applications comply with all applicable laws, regulations, and ethical guidelines. The permissive license of this dataset does not override these legal and ethical obligations. ### Other Known Limitations * **Synthetic-to-Real Generalization:** While the dataset is synthetic, models trained on it have shown strong generalization to real-world data. To address this common concern, we evaluated our EmpathicInsight-Face model on the real-world FERD and AffectNet datasets, with the following results: | | Anger | Contempt | Disgust | Fear | Happy | Neutral | Sad | Surprise | **Avg.** | |------------|-------|----------|---------|--------|--------|---------|-------|----------|----------| | **FERD** | 73.68 | 31.58 | 78.95 | 100.00 | 100.00 | 84.21 | 78.95 | 78.95 | **78.29**| | **AffectNet**| 77.05 | 28.75 | 40.53 | 69.08 | 99.25 | 78.96 | 83.94 | 98.70 | **75.72**| * **Subjectivity of Emotion:** Some emotion categories show lower inter-annotator agreement. This is not a flaw but a feature, reflecting the inherent subjectivity and psychological complexity of emotion perception. A benchmark with 100% agreement would be unrealistic. EmoNet-Face captures this genuine ambiguity, making it a more robust tool for training AI. * **Static Images:** The dataset consists of static images and does not capture temporal cues (e.g., microexpressions, the evolution of an expression). This is a valuable direction for future work. * **Cross-Cultural Scope:** While we controlled for diverse ethnic representation in the imagery, the 40-category taxonomy is primarily grounded in Western-centric psychological literature. Its universality is an open question and an important area for future cross-cultural research. ## Additional Information ### Licensing Information This dataset is licensed under the **Creative Commons Attribution 4.0 (CC BY 4.0) license**. We are confident in our right to apply this license based on the terms of the source text-to-image models used for generation: * **Midjourney:** Images were created under a paid subscription. The Midjourney Terms of Service (Section 4, "Content Rights") state: *"You own all Assets You create with the Services to the fullest extent possible under applicable law."* This ownership grants us the right to release these images under CC BY 4.0. * **FLUX.1 [dev]:** The license for this model explicitly distinguishes between the model (non-commercial use) and its outputs. The FLUX.1 [dev] license (Section 2.d, “Outputs”) states: *"We claim no ownership rights in and to the Outputs... You may use Output for any purpose (including for commercial purposes)..."* ### Citation Information If you use this dataset in your research, please cite our paper: ```bibtex @misc{emonetface2025, title={EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition}, author={Christoph Schuhmann and Robert Kaczmarczyk and Gollam Rabby and Felix Friedrich and Maurice Kraus and Krishna Kalyan and Kourosh Nadi and Huu Nguyen and Kristian Kersting and Sören Auer}, year={2025}, eprint={2505.20033}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2505.20033}, }

# EmoNet-Face：面向面部表情识别的细粒度专家标注基准数据集 ## 数据集概述 **EmoNet-Face**是一套综合性基准测试套件，旨在解决当前面部表情识别（Facial Emotion Recognition, FER）领域存在的关键短板。现有基准数据集往往存在情感覆盖范围狭窄、人口统计学多样性不足、图像采集环境未受控制等问题。EmoNet-Face为开发与评估能够更深入、细致地理解人类情感的人工智能系统提供了坚实可靠的基础。本研究已被NeurIPS 2025（神经信息处理系统大会）收录。 ### 审稿人认可的核心贡献： * **全新40类情感分类体系**：基于权威心理学研究成果《情感手册》（Handbook of Emotions）精心构建的细粒度情感分类体系，能够全面覆盖人类丰富多样的情感状态。 * **高质量合成数据**：采用当前顶尖的文本到图像（Text-to-Image）模型生成的大规模数据集，可确保图像清晰呈现完整面部表情，并在种族、年龄与性别维度实现均衡的样本分布。 * **严格的专家标注流程**：所有评估与微调数据集均由心理学领域专家进行标注，保障标签的高质量与可信度。 * **隐私风险更低**：作为完全合成的数据集，EmoNet-Face可规避基于真实人物的数据集所带来的隐私与知情同意风险。本仓库包含**EmoNet-Face Big**数据集。 ## 数据集结构 ### 数据字段数据集中的每个样本均包含以下字段： * `image`：面部表情的PIL兼容图像对象。 * `prompt`：用于生成该图像的完整文本提示词，包含情感与人口统计学相关信息。 * `demographics`：包含已解析的人口统计学属性的字典，例如`ethnicity`（种族）、`gender`（性别）、`age`（年龄）。 * `annotations`：该图像的情感标注信息，标注结构因数据集划分不同而有所差异。 ### 数据划分与标注格式 * **`emonet-face-hq`（基准测试集）**： * **用途**：作为核心基准集用于模型评估。 * **规模**：2500张图像。 * **标注方式**：每张图像由4名不同的心理学专家进行标注。`annotations`字段为字典列表，每个字典代表一名专家对全部40类情感类别的评分，评分采用0（未出现）至7（程度极强）的连续强度尺度。 * **`emonet-face-binary`（微调数据集）**： * **用途**：专为模型微调设计的数据集。 * **规模**：19999张图像。 * **标注方式**：采用多阶段二元一致性协议收集标注。`annotations`字段包含特定情感的二元标签（存在/不存在），该标签由至多3名专家确认，以确保高一致性的正负样本。 * **`emonet-face-big`（预训练数据集）**： * **用途**：该大规模数据集用于模型预训练。 * **规模**：203201张图像。 * **标注方式**：`annotations`字段包含由视觉语言模型（Vision-Language Model, VLM）Gemini-2.5-Flash生成的合成标签，可全面覆盖40类情感分类体系。 ## 数据集构建 ### 构建逻辑本数据集旨在克服现有FER基准数据集的局限——现有数据集往往存在情感粒度不足、人口统计学多样性欠缺、标注质量参差不齐等问题。通过采用合成图像，我们可以可控地优化上述要素，同时消除背景干扰因素与隐私风险。所有图像均经过专家手动审核，以筛选出符合质量标准、无伪影（artifacts）的样本。尽管该流程耗时费力，但这种严格的筛选流程对于打造金标准基准数据集至关重要，也正是EmoNet-Face区别于噪声更多的自动采集数据集的关键所在。 ### 源数据与图像生成所有图像均为完全合成生成，采用了当前顶尖的文本到图像模型：**Midjourney v6**（付费订阅使用）与**FLUX.1 [dev]**。生成提示词经过系统性设计，确保情感分布均衡，且覆盖多样化的人口统计学特征（14个种族群体、20至80岁年龄段以及3种性别身份）。 ### 标注流程 `emonet-face-hq`与`emonet-face-binary`的标注由13名持有心理学学术学位的专业标注团队完成，标注工作基于定制化开源平台开展。 ## 数据使用注意事项 ### 社会影响与负责任使用本数据集仅用于学术与研究目的，旨在推动公平、细致且具共情能力的人工智能系统的发展。 #### **禁止使用场景** 本数据集及基于其训练的模型**不得用于**可能因误判导致严重危害的敏感高风险领域。为符合《欧盟人工智能法案》（EU AI Act，第5(1)(f)条）等新兴监管要求，禁止使用场景包括但不限于： * 职场或教育机构中的情感识别。 * 公共场所的实时或事后监控。 * 用于执法、边境管控或庇护申请的系统。 * 信贷评分、保险风险评估或招聘环节。 * 任何可能导致操纵、欺骗或非法歧视的应用场景。 #### **用户责任** 下游使用者需自行确保其应用符合所有适用法律法规与伦理准则。本数据集的宽松许可条款并不免除上述法律与伦理义务。 ### 其他已知局限 * **合成数据到真实数据的泛化性**：尽管本数据集为合成数据，但基于其训练的模型已展现出对真实世界数据的较强泛化能力。为回应这一普遍关切，我们在真实世界数据集FERD与AffectNet上评估了EmpathicInsight-Face模型，结果如下： | | 愤怒 | 轻蔑 | 厌恶 | 恐惧 | 快乐 | 中性 | 悲伤 | 惊讶 | **平均值** | |------------|-------|----------|---------|--------|--------|---------|-------|----------|----------| | **FERD** | 73.68 | 31.58 | 78.95 | 100.00 | 100.00 | 84.21 | 78.95 | 78.95 | **78.29**| | **AffectNet**| 77.05 | 28.75 | 40.53 | 69.08 | 99.25 | 78.96 | 83.94 | 98.70 | **75.72**| * **情感感知的主观性**：部分情感类别的标注者间一致性较低。这并非数据集的缺陷，而是反映了情感感知本身的主观性与心理学复杂性——具备100%一致性的基准数据集是不现实的。EmoNet-Face保留了这种真实存在的歧义性，使其成为训练人工智能模型的更鲁棒工具。 * **静态图像局限**：本数据集仅包含静态图像，无法捕捉时序线索（如微表情、表情的演变过程），这是未来研究的重要方向。 * **跨文化适用性**：尽管我们在图像生成阶段控制了种族多样性，但40类情感分类体系主要基于西方中心的心理学文献，其普适性仍有待验证，是未来跨文化研究的重要课题。 ## 附加信息 ### 授权信息本数据集采用**知识共享署名4.0（Creative Commons Attribution 4.0, CC BY 4.0）**许可协议发布。我们确信自身有权依据生成所用文本到图像模型的条款适用该许可： * **Midjourney**：图像生成基于付费订阅服务。Midjourney服务条款（第4节“内容权利”）明确规定："You own all Assets You create with the Services to the fullest extent possible under applicable law." 该所有权条款赋予我们将这些图像以CC BY 4.0协议发布的权利。 * **FLUX.1 [dev]**：该模型的许可协议明确区分了模型本身（非商业使用）与模型输出。FLUX.1 [dev]许可协议（第2.d节“输出”）规定："We claim no ownership rights in and to the Outputs... You may use Output for any purpose (including for commercial purposes)..." ### 引用信息若您在研究中使用本数据集，请引用以下论文： bibtex @misc{emonetface2025, title={EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition}, author={Christoph Schuhmann and Robert Kaczmarczyk and Gollam Rabby and Felix Friedrich and Maurice Kraus and Krishna Kalyan and Kourosh Nadi and Huu Nguyen and Kristian Kersting and Sören Auer}, year={2025}, eprint={2505.20033}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2505.20033}, }

提供机构：

maas

创建时间：

2025-10-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集