emonet-face-binary

Name: emonet-face-binary
Creator: maas
Published: 2025-11-27 16:53:38
License: 暂无描述

魔搭社区2025-11-27 更新2025-11-03 收录

下载链接：

https://modelscope.cn/datasets/laion/emonet-face-binary

下载链接

链接失效反馈

官方服务：

资源简介：

# EmoNet-Face: A Fine-Grained, Expert-Annotated Benchmark for Facial Emotion Recognition ## Dataset Summary **EmoNet-Face** is a comprehensive benchmark suite designed to address critical gaps in facial emotion recognition (FER). Current benchmarks often have a narrow emotional spectrum, lack demographic diversity, and use uncontrolled imagery. EmoNet-Face provides a robust foundation for developing and evaluating AI systems with a deeper, more nuanced understanding of human emotions. This work was accepted at NeurIPS 2025. ### Key Contributions (as recognized by reviewers): * **A Novel 40-Category Taxonomy:** A fine-grained emotion taxonomy meticulously derived from foundational psychological research (the "Handbook of Emotions") to capture a rich spectrum of human emotional states. * **High-Quality Synthetic Data:** Large-scale datasets generated with state-of-the-art text-to-image models, ensuring clear, full-face expressions with controlled, balanced representation across ethnicity, age, and gender. * **Rigorous Expert Annotation:** All evaluation and fine-tuning datasets are annotated by psychology experts, ensuring high-quality, trustworthy labels. * **Reduced Risk:** As a fully synthetic dataset, EmoNet-Face mitigates the privacy and consent risks associated with datasets of real individuals. This repository contains the **EmoNet-Face Binary** dataset. ## Dataset Structure ### Data Fields Each instance in the dataset includes the following fields: * `image`: A PIL-compatible image object of the facial expression. * `prompt`: The full text prompt used to generate the image, including emotion and demographic details. * `demographics`: A dictionary containing the parsed demographic attributes (e.g., `ethnicity`, `gender`, `age`). * `annotations`: The emotion annotations for the image. The structure varies by dataset split. ### Data Splits and Annotation Format * **For `emonet-face-hq` (Benchmark/Test Set):** * **Purpose:** This dataset is the primary benchmark for evaluation. * **Size:** 2,500 images. * **Annotations:** Each image was annotated by four different psychology experts. The `annotations` field contains a list of dictionaries, where each dictionary represents one expert's ratings across all 40 emotion categories on a continuous intensity scale from 0 (absent) to 7 (very strong). * **For `emonet-face-binary` (Fine-tuning Set):** * **Purpose:** This dataset is designed for fine-tuning models. * **Size:** 19,999 images. * **Annotations:** Annotations were collected via a multi-stage binary agreement protocol. The `annotations` field contains binary labels (present/absent) for specific emotions, confirmed by up to three experts to ensure high-consensus positive and negative samples. * **For `emonet-face-big` (Pre-training Set):** * **Purpose:** This large-scale dataset is intended for model pre-training. * **Size:** 203,201 images. * **Annotations:** The `annotations` field contains synthetically generated labels from a VLM (Gemini-2.5-Flash) designed to provide broad coverage across the 40-category taxonomy. ## Dataset Creation ### Curation Rationale The dataset was created to overcome the limitations of existing FER benchmarks, which often lack emotional granularity, demographic diversity, and annotation quality. By using synthetic imagery, we can control for these factors while eliminating contextual confounders and privacy risks. All images underwent a manual expert review to filter for quality and artifacts. While labor-intensive, this rigorous curation was essential for creating a gold-standard benchmark and distinguishes EmoNet-Face from noisier, automatically collected datasets. ### Source Data and Image Generation The images are fully synthetic and were generated using state-of-the-art text-to-image models: **Midjourney v6** (under a paid subscription) and **FLUX.1 [dev]**. The generation prompts were systematically engineered to ensure a balanced distribution of emotions across diverse demographics (14 ethnic groups, ages 20-80, and three gender identities). ### Annotations Annotations for `emonet-face-hq` and `emonet-face-binary` were provided by a team of 13 annotators with verified academic degrees in psychology. The annotation was performed on a custom, open-source platform. ## Considerations for Using the Data ### Social Impact and Responsible Use This dataset is intended for academic and research purposes to advance the development of fair, nuanced, and empathetic AI systems. #### **Prohibited Uses** This dataset and any models trained on it **are not intended for and must not be used in** sensitive, high-stakes domains where misinterpretation could lead to significant harm. In compliance with emerging regulations like the EU AI Act (Article 5(1)(f)), prohibited uses include, but are not limited to: * Emotion recognition in the workplace or educational institutions. * Real-time or post-hoc surveillance in public spaces. * Systems for law enforcement, border control, or asylum applications. * Credit scoring, insurance risk assessment, or hiring. * Any application that could lead to manipulation, deception, or unlawful discrimination. #### **User Responsibility** Downstream users are solely responsible for ensuring their applications comply with all applicable laws, regulations, and ethical guidelines. The permissive license of this dataset does not override these legal and ethical obligations. ### Other Known Limitations * **Synthetic-to-Real Generalization:** While the dataset is synthetic, models trained on it have shown strong generalization to real-world data. To address this common concern, we evaluated our EmpathicInsight-Face model on the real-world FERD and AffectNet datasets, with the following results: | | Anger | Contempt | Disgust | Fear | Happy | Neutral | Sad | Surprise | **Avg.** | |------------|-------|----------|---------|--------|--------|---------|-------|----------|----------| | **FERD** | 73.68 | 31.58 | 78.95 | 100.00 | 100.00 | 84.21 | 78.95 | 78.95 | **78.29**| | **AffectNet**| 77.05 | 28.75 | 40.53 | 69.08 | 99.25 | 78.96 | 83.94 | 98.70 | **75.72**| * **Subjectivity of Emotion:** Some emotion categories show lower inter-annotator agreement. This is not a flaw but a feature, reflecting the inherent subjectivity and psychological complexity of emotion perception. A benchmark with 100% agreement would be unrealistic. EmoNet-Face captures this genuine ambiguity, making it a more robust tool for training AI. * **Static Images:** The dataset consists of static images and does not capture temporal cues (e.g., microexpressions, the evolution of an expression). This is a valuable direction for future work. * **Cross-Cultural Scope:** While we controlled for diverse ethnic representation in the imagery, the 40-category taxonomy is primarily grounded in Western-centric psychological literature. Its universality is an open question and an important area for future cross-cultural research. ## Additional Information ### Licensing Information This dataset is licensed under the **Creative Commons Attribution 4.0 (CC BY 4.0) license**. We are confident in our right to apply this license based on the terms of the source text-to-image models used for generation: * **Midjourney:** Images were created under a paid subscription. The Midjourney Terms of Service (Section 4, "Content Rights") state: *"You own all Assets You create with the Services to the fullest extent possible under applicable law."* This ownership grants us the right to release these images under CC BY 4.0. * **FLUX.1 [dev]:** The license for this model explicitly distinguishes between the model (non-commercial use) and its outputs. The FLUX.1 [dev] license (Section 2.d, “Outputs”) states: *"We claim no ownership rights in and to the Outputs... You may use Output for any purpose (including for commercial purposes)..."* ### Citation Information If you use this dataset in your research, please cite our paper: ```bibtex @misc{emonetface2025, title={EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition}, author={Christoph Schuhmann and Robert Kaczmarczyk and Gollam Rabby and Felix Friedrich and Maurice Kraus and Krishna Kalyan and Kourosh Nadi and Huu Nguyen and Kristian Kersting and Sören Auer}, year={2025}, eprint={2505.20033}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2505.20033}, }

# EmoNet-Face：面向面部情绪识别（Facial Emotion Recognition, FER）的细粒度专家标注基准数据集 ## 数据集概述 **EmoNet-Face**是一款综合性基准套件，旨在填补当前面部情绪识别领域的关键空白。当前主流基准数据集往往存在情绪覆盖范围狭窄、缺乏人口统计学多样性、且使用非受控图像等问题。EmoNet-Face为开发和评估能够更深入、细致地理解人类情绪的人工智能系统提供了坚实的基础。本研究已被NeurIPS 2025收录。 ### 审稿人认可的核心贡献 * **全新40类情绪分类体系**：基于《情绪手册》（Handbook of Emotions）这一基础心理学研究成果，精心构建细粒度情绪分类体系，以覆盖丰富多样的人类情绪状态。 * **高质量合成数据**：采用当前顶尖文本生成图像模型生成大规模数据集，确保图像为清晰的全脸表情，且在种族、年龄与性别维度上实现均衡分布。 * **严格专家标注**：所有评估与微调数据集均由心理学领域专家标注，确保标签的高质量与可信度。 * **低隐私风险**：作为全合成数据集，EmoNet-Face可规避真实人类图像数据集带来的隐私与知情同意风险。本仓库包含**EmoNet-Face Binary**数据集。 ## 数据集结构 ### 数据字段数据集中的每个样本包含以下字段： * `image`：面部表情的兼容PIL格式图像对象。 * `prompt`：用于生成该图像的完整文本提示词，包含情绪与人口统计学细节。 * `demographics`：包含解析后的人口统计学属性的字典（例如`ethnicity`（种族）、`gender`（性别）、`age`（年龄））。 * `annotations`：该图像的情绪标注信息，标注结构因数据集拆分而异。 ### 数据拆分与标注格式 * **针对`emonet-face-hq`（基准/测试集）**： * **用途**：该数据集为主要评估基准。 * **规模**：2500张图像。 * **标注方式**：每张图像由4名不同的心理学专家进行标注。`annotations`字段为字典列表，每个字典代表一名专家对全部40类情绪类别的连续强度评分，评分范围为0（无此情绪）至7（情绪极强）。 * **针对`emonet-face-binary`（微调集）**： * **用途**：该数据集专为模型微调设计。 * **规模**：19999张图像。 * **标注方式**：标注通过多阶段二元一致性协议收集。`annotations`字段包含特定情绪的二元标签（存在/不存在），由至多3名专家确认，以确保高共识的正负样本。 * **针对`emonet-face-big`（预训练集）**： * **用途**：该大规模数据集用于模型预训练。 * **规模**：203201张图像。 * **标注方式**：`annotations`字段包含由视觉语言模型（Vision-Language Model, VLM）Gemini-2.5-Flash生成的合成标签，可覆盖全部40类情绪分类体系。 ## 数据集构建 ### 构建依据本数据集旨在克服现有FER基准数据集的局限——现有数据集往往缺乏情绪粒度、人口统计学多样性与标注质量。通过使用合成图像，我们可对上述因素进行精准控制，同时消除上下文干扰因素与隐私风险。所有图像均经过专家手动审核，以筛选图像质量与伪影问题。尽管该流程耗时费力，但严格的筛选流程是打造金标准基准数据集的必要环节，也让EmoNet-Face区别于噪声更多、自动收集的数据集。 ### 源数据与图像生成本数据集所有图像均为合成生成，采用当前顶尖文本生成图像模型：**Midjourney v6**（付费订阅版）与**FLUX.1 [dev]**。生成提示词经过系统化设计，确保情绪在多样化人口统计学群体中均衡分布（覆盖14个种族群体、20至80岁年龄段以及3种性别身份）。 ### 标注流程 `emonet-face-hq`与`emonet-face-binary`的标注由13名持有正规心理学学术学位的标注团队完成，标注工作在自定义开源平台上进行。 ## 数据使用注意事项 ### 社会影响与负责任使用本数据集仅用于学术与研究目的，以推动公平、细致且共情型人工智能系统的发展。 #### **禁止使用场景** 本数据集及其训练得到的模型**不得用于**可能因误判导致严重危害的敏感高风险领域。符合《欧盟人工智能法案》（EU AI Act）第5(1)(f)条等新兴监管要求，禁止使用场景包括但不限于： * 职场或教育机构中的情绪识别。 * 公共场所的实时或事后监控。 * 执法、边境管控或难民庇护申请相关系统。 * 信用评分、保险风险评估或招聘场景。 * 任何可能导致操纵、欺骗或非法歧视的应用。 #### **用户责任** 下游用户需自行确保其应用符合所有适用法律、法规与伦理准则。本数据集的宽松许可条款并不免除上述法律与伦理义务。 ### 其他已知局限 * **合成数据到真实数据的泛化性**：尽管本数据集为合成数据，但基于其训练的模型已展现出对真实世界数据的较强泛化能力。为解决这一普遍顾虑，我们在真实世界FERD与AffectNet数据集上评估了EmpathicInsight-Face模型，结果如下： | | 愤怒 | 轻蔑 | 厌恶 | 恐惧 | 快乐 | 中性 | 悲伤 | 惊讶 | **平均** | |------------|-------|----------|---------|--------|--------|---------|-------|----------|----------| | **FERD** | 73.68 | 31.58 | 78.95 | 100.00 | 100.00 | 84.21 | 78.95 | 78.95 | **78.29**| | **AffectNet**| 77.05 | 28.75 | 40.53 | 69.08 | 99.25 | 78.96 | 83.94 | 98.70 | **75.72**| * **情绪主观性**：部分情绪类别的标注者间一致性较低。这并非缺陷，而是真实反映了情绪感知的固有主观性与心理学复杂性。标注一致性达到100%的基准数据集并不现实。EmoNet-Face保留了这种真实存在的模糊性，使其成为训练人工智能模型的更鲁棒工具。 * **静态图像局限**：本数据集仅包含静态图像，无法捕捉时间维度线索（例如微表情、表情的演变过程），这是未来值得探索的重要方向。 * **跨文化适用性**：尽管我们在图像中控制了种族多样性，但40类情绪分类体系主要基于西方中心的心理学文献。其普适性仍是一个开放问题，也是未来跨文化研究的重要方向。 ## 补充信息 ### 许可信息本数据集采用**知识共享署名4.0（Creative Commons Attribution 4.0, CC BY 4.0）**许可协议发布。我们确信有权依据生成所用文本生成图像模型的条款应用该许可： * **Midjourney**：图像基于付费订阅服务生成。Midjourney服务条款（第4节“内容权利”）规定："You own all Assets You create with the Services to the fullest extent possible under applicable law." 该所有权赋予我们将这些图像以CC BY 4.0协议发布的权利。 * **FLUX.1 [dev]**：该模型的许可条款明确区分了模型本身（非商业使用）与模型输出。FLUX.1 [dev]许可协议（第2.d节“输出”）规定："We claim no ownership rights in and to the Outputs... You may use Output for any purpose (including for commercial purposes)..." ### 引用信息若您在研究中使用本数据集，请引用我们的论文： bibtex @misc{emonetface2025, title={EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition}, author={Christoph Schuhmann and Robert Kaczmarczyk and Gollam Rabby and Felix Friedrich and Maurice Kraus and Krishna Kalyan and Kourosh Nadi and Huu Nguyen and Kristian Kersting and Sören Auer}, year={2025}, eprint={2505.20033}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2505.20033}, }

提供机构：

maas

创建时间：

2025-10-21

搜集汇总

数据集介绍