five

FLUX-Reason-6M

收藏
魔搭社区2026-05-22 更新2025-09-20 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/FLUX-Reason-6M
下载链接
链接失效反馈
官方服务:
资源简介:
*** # FLUX-Reason-6M FLUX-Reason-6M is a massive, 6-million-scale text-to-image dataset engineered to instill complex reasoning capabilities in generative models. This dataset was created to bridge the performance gap between open-source and leading closed-source text-to-image systems. This dataset contains: * **6 million high-quality, reasoning-focused images** synthesized by the state-of-the-art FLUX.1-dev model. * **20 million bilingual (English and Chinese) descriptions**, providing a rich, multi-faceted annotation for each image. * Pioneering **Generation Chain-of-Thought (GCoT)** prompts that provide detailed, step-by-step breakdowns of the image generation process, moving beyond simple descriptions to explain compositional and semantic logic. * A systematic organization across **six key reasoning characteristics**: Imagination, Entity, Text rendering, Style, Affection, and Composition. The creation of this dataset was a significant undertaking, requiring **15,000 A100 GPU days**. We are releasing it to provide the community with a resource previously unattainable outside of large industrial labs. See our [paper](https://flux-reason-6m.github.io/) for more details! ## Dataset Architectural Design The core of FLUX-Reason-6M is its multidimensional framework, designed to teach models the foundational principles of visual reasoning. Each image is annotated with multiple labels and caption types. ### The Six Characteristics * **Imagination**: Captions and images representing surreal, fantastical, or abstract concepts that push beyond literal interpretations (e.g., “a city made of glass where rivers of light flow"). * **Entity**: Focuses on knowledge-grounded depiction of specific real-world objects, beings, or named entities with high fidelity (e.g., “Lionel Messi dribbling past defenders in the World Cup final”). * **Text rendering**: Addresses the common weakness of text generation in images, providing clean data for typographic control with explicit instructions on content, style, and placement. * **Style**: A diverse library of artistic and photographic styles, with captions explicitly referencing art movements, visual techniques, and the aesthetics of famous artists. * **Affection**: Connects abstract emotional concepts to concrete visual representations, using evocative language to describe a mood, feeling, or atmosphere. * **Composition**: Emphasizes the precise spatial arrangement and interaction of objects within a scene, using explicit compositional language (e.g., under, behind, next to). ### Generation Chain-of-Thought (GCoT) The cornerstone of our dataset. While standard captions describe *what* is in an image, GCoT captions elucidate *how* and *why* the image is constructed. These detailed, step-by-step reasoning chains deconstruct the semantic and compositional logic of the image, providing powerful intermediate supervisory signals for training robust reasoning capabilities. ## Associated Benchmark: PRISM-Bench To measure the reasoning capabilities taught by our dataset, we also introduce **PRISM-Bench**. It is a comprehensive evaluation standard with seven distinct tracks (the six characteristics plus a challenging **Long Text** track using GCoT prompts). The benchmark leverages advanced vision-language models (GPT-4.1 and Qwen2.5-VL-72B) to provide nuanced, human-aligned assessments of prompt-image alignment and image aesthetics. ### PRISM-Bench Evaluation Results The full leaderboard is available [here](https://flux-reason-6m.github.io/#leaderboard). The benchmark effectively differentiates state-of-the-art models, revealing critical performance gaps and areas for improvement. Below is a summary of results evaluated by GPT-4.1. | # | Model | Source | Date | Overall (Align) | Overall (Aes) | Overall (Avg) | Imagination (Align) | Imagination (Aes) | Imagination (Avg) | Entity (Align) | Entity (Aes) | Entity (Avg) | Text rendering (Align) | Text rendering (Aes) | Text rendering (Avg) | Style (Align) | Style (Aes) | Style (Avg) | Affection (Align) | Affection (Aes) | Affection (Avg) | Composition (Align) | Composition (Aes) | Composition (Avg) | Long text (Align) | Long text (Aes) | Long text (Avg) | |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| | 1 | GPT-Image-1 [High] 🥇 | [Link](https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1) | 2025-09-10 | 86.9 | 85.6 | **86.3** | 86.2 | 86.6 | 86.4 | 90.0 | 86.3 | 88.2 | 68.8 | 80.1 | 74.5 | 92.8 | 93.3 | 93.1 | 90.7 | 90.9 | 90.8 | 96.2 | 89.4 | 92.8 | 83.8 | 72.8 | 78.3 | | 2 | Gemini2.5-Flash-Image 🥈 | [Link](https://deepmind.google/models/gemini/image/) | 2025-09-10 | 87.1 | 83.4 | **85.3** | 92.4 | 84.8 | 88.6 | 87.0 | 81.3 | 84.2 | 65.2 | 74.1 | 69.7 | 90.5 | 90.8 | 90.7 | 96.0 | 88.2 | 92.1 | 92.5 | 88.5 | 90.5 | 85.9 | 76.2 | 81.1 | | 3 | Qwen-Image 🥉 | [Link](https://huggingface.co/Qwen/Qwen-Image) | 2025-09-10 | 81.1 | 78.6 | **79.9** | 80.5 | 78.6 | 79.6 | 79.3 | 73.2 | 76.3 | 54.3 | 68.9 | 61.6 | 84.5 | 88.7 | 86.6 | 91.6 | 89.1 | 90.4 | 93.7 | 86.9 | 90.3 | 83.8 | 65.1 | 74.5 | | 4 | SEEDream 3.0 | [Link](https://seed.bytedance.com/zh/tech/seedream3_0) | 2025-09-10 | 80.5 | 78.7 | **79.6** | 77.3 | 76.4 | 76.9 | 80.2 | 73.8 | 77.0 | 56.1 | 70.2 | 63.2 | 83.9 | 87.4 | 85.7 | 89.3 | 90.3 | 89.8 | 93.3 | 86.3 | 89.8 | 83.2 | 66.7 | 75.0 | | 5 | HiDream-I1-Full | [Link](https://huggingface.co/HiDream-ai/HiDream-I1-Full) | 2025-09-10 | 76.1 | 75.6 | **75.9** | 74.4 | 75.6 | 75.0 | 74.4 | 72.4 | 73.4 | 58.2 | 70.4 | 64.3 | 81.4 | 84.8 | 83.1 | 90.1 | 88.8 | 89.5 | 90.1 | 85.4 | 87.8 | 63.8 | 52.0 | 57.9 | | 6 | FLUX.1-Krea-dev | [Link](https://huggingface.co/black-forest-labs/FLUX.1-Krea-dev) | 2025-09-10 | 74.3 | 75.1 | **74.7** | 71.5 | 73.0 | 72.3 | 69.5 | 67.5 | 68.5 | 47.5 | 61.3 | 54.4 | 80.8 | 83.5 | 82.2 | 84.0 | 90.3 | 87.2 | 90.9 | 85.8 | 88.4 | 76.2 | 64.1 | 70.2 | | 7 | FLUX.1-dev | [Link](https://huggingface.co/black-forest-labs/FLUX.1-dev) | 2025-09-10 | 72.4 | 74.9 | **73.7** | 68.1 | 74.0 | 71.1 | 70.7 | 71.2 | 71.0 | 48.1 | 64.5 | 56.3 | 72.3 | 80.5 | 76.4 | 88.3 | 91.1 | 89.7 | 89.0 | 84.6 | 86.8 | 70.6 | 58.5 | 64.6 | | 8 | SD3.5-Large | [Link](https://huggingface.co/stabilityai/stable-diffusion-3.5-large) | 2025-09-10 | 73.9 | 73.5 | **73.7** | 73.3 | 71.2 | 72.3 | 76.7 | 71.9 | 74.3 | 52.0 | 65.8 | 58.9 | 77.1 | 84.2 | 80.7 | 87.1 | 85.2 | 86.2 | 87.0 | 84.7 | 85.9 | 64.3 | 51.7 | 58.0 | | 9 | HiDream-I1-Dev | [Link](https://huggingface.co/HiDream-ai/HiDream-I1-Dev) | 2025-09-10 | 70.3 | 70.0 | **70.2** | 68.2 | 69.7 | 69.0 | 72.0 | 67.0 | 69.5 | 53.4 | 64.1 | 58.8 | 68.7 | 78.6 | 73.7 | 84.2 | 83.1 | 83.7 | 87.6 | 79.8 | 83.7 | 58.1 | 47.5 | 52.8 | | 10 | SD3.5-Medium | [Link](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) | 2025-09-10 | 70.1 | 68.9 | **69.5** | 69.5 | 73.0 | 71.3 | 72.8 | 63.7 | 68.3 | 33.3 | 50.1 | 41.7 | 77.4 | 80.3 | 78.9 | 84.9 | 85.5 | 85.2 | 89.4 | 79.2 | 84.3 | 63.3 | 50.5 | 56.9 | ## Explore the Resource We are publicly releasing the entire dataset, benchmark, and evaluation suite to lower the financial and computational barriers to entry, enabling researchers worldwide to build and test more capable generative models. * [Project Website](https://flux-reason-6m.github.io/) * [Paper](https://flux-reason-6m.github.io/#leaderboard) * [Dataset](https://huggingface.co/datasets/LucasFang/FLUX-Reason-6M) * [Code](https://github.com/rongyaofang/prism-bench) ## Citation If you find our work useful, please consider citing us! ```bibtex @article{fang2025flux, title={FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark}, author={Fang, Rongyao and Yu, Aldrich and Duan, Chengqi and Huang, Linjiang and Bai, Shuai and Cai, Yuxuan and Wang, Kun and Liu, Si and Liu, Xihui and Li, Hongsheng}, journal={arXiv preprint arXiv:2509.09680}, year={2025} } ```

# FLUX-Reason-6M FLUX-Reason-6M 是一个规模达600万的超大型文本到图像数据集,旨在为生成式模型(generative models)注入复杂的推理能力。本数据集旨在缩小开源文本到图像系统与顶尖闭源同类系统之间的性能差距。 本数据集包含: * **600万张聚焦推理任务的高质量图像**:由当前最先进的FLUX.1-dev模型生成。 * **2000万条中英双语描述**:为每张图像提供多维度、丰富的标注信息。 * **开创性的生成式思维链(Generation Chain-of-Thought, GCoT)提示词**:可对图像生成过程进行详尽的分步拆解,不再局限于简单的图像描述,而是深入阐释图像的构图逻辑与语义逻辑。 * 基于**六大核心推理特征**进行系统性组织:想象力(Imagination)、实体(Entity)、文本渲染(Text rendering)、风格(Style)、情感(Affection)与构图(Composition)。 本数据集的构建是一项极具规模的工程,总计消耗**15000个A100 GPU天**。我们公开发布该数据集,旨在为全球研究者提供此前仅大型工业实验室方可获取的优质资源。 如需了解更多细节,请参阅我们的[论文](https://flux-reason-6m.github.io/)! ## 数据集架构设计 FLUX-Reason-6M 的核心在于其多维度框架,旨在帮助模型掌握视觉推理(visual reasoning)的底层原理。每张图像均配有多种类型的标注与描述文本。 ### 六大推理特征 * **想象力(Imagination)**:以图像与描述呈现超现实、奇幻或抽象的概念,突破字面释义的边界(例如:“一座由玻璃构成的城市,流淌着光河”)。 * **实体(Entity)**:聚焦于基于知识的高保真写实,精准描绘特定真实世界物体、生命体或命名实体(例如:“里奥·梅西在世界杯决赛中带球突破防守球员”)。 * **文本渲染(Text rendering)**:解决图像生成中常见的文本生成缺陷,提供可精准控制排版的优质数据,包含内容、风格与摆放位置的明确指令。 * **风格(Style)**:涵盖多样化的艺术与摄影风格,描述文本中明确提及艺术流派、视觉技法与知名艺术家的美学风格。 * **情感(Affection)**:将抽象的情感概念与具体视觉表现相结合,使用富有感染力的语言描述情绪、心境或氛围。 * **构图(Composition)**:强调场景内物体的精准空间布局与交互关系,使用明确的构图术语(例如:“在……下方”“在……后方”“在……旁边”)。 ### 生成式思维链(Generation Chain-of-Thought, GCoT) 本数据集的核心支柱。标准描述文本仅阐释图像中有“什么”,而GCoT描述文本则会阐明图像“如何构建”与“为何如此构建”。这类详尽的分步推理链可拆解图像的语义与构图逻辑,为训练具备强推理能力的模型提供强有力的中间监督信号。 ## 关联评测基准:PRISM-Bench 为评测本数据集所赋能的推理能力,我们同时推出了**PRISM-Bench**评测基准。该基准包含七大独立评测赛道(六大推理特征赛道,外加一个基于GCoT提示词的高难度**长文本(Long Text)**赛道)。本基准依托先进的视觉语言模型(vision-language models,包括GPT-4.1与Qwen2.5-VL-72B),对提示词与图像的对齐度以及图像美学质量进行精细化、贴合人类认知的评估。 ### PRISM-Bench 评测结果 完整排行榜可参阅[此处](https://flux-reason-6m.github.io/#leaderboard)。该基准可有效区分当前最先进的模型,揭示关键的性能差距与待优化方向。以下为GPT-4.1评测的结果摘要。 | 排名 | 模型 | 来源 | 发布日期 | 整体(对齐度) | 整体(美学度) | 整体(平均分) | 想象力(对齐度) | 想象力(美学度) | 想象力(平均分) | 实体(对齐度) | 实体(美学度) | 实体(平均分) | 文本渲染(对齐度) | 文本渲染(美学度) | 文本渲染(平均分) | 风格(对齐度) | 风格(美学度) | 风格(平均分) | 情感(对齐度) | 情感(美学度) | 情感(平均分) | 构图(对齐度) | 构图(美学度) | 构图(平均分) | 长文本(对齐度) | 长文本(美学度) | 长文本(平均分) | |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| | 1 | GPT-Image-1 [高性能] 🥇 | [链接](https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1) | 2025-09-10 | 86.9 | 85.6 | **86.3** | 86.2 | 86.6 | 86.4 | 90.0 | 86.3 | 88.2 | 68.8 | 80.1 | 74.5 | 92.8 | 93.3 | 93.1 | 90.7 | 90.9 | 90.8 | 96.2 | 89.4 | 92.8 | 83.8 | 72.8 | 78.3 | | 2 | Gemini2.5-Flash-Image 🥈 | [链接](https://deepmind.google/models/gemini/image/) | 2025-09-10 | 87.1 | 83.4 | **85.3** | 92.4 | 84.8 | 88.6 | 87.0 | 81.3 | 84.2 | 65.2 | 74.1 | 69.7 | 90.5 | 90.8 | 90.7 | 96.0 | 88.2 | 92.1 | 92.5 | 88.5 | 90.5 | 85.9 | 76.2 | 81.1 | | 3 | Qwen-Image 🥉 | [链接](https://huggingface.co/Qwen/Qwen-Image) | 2025-09-10 | 81.1 | 78.6 | **79.9** | 80.5 | 78.6 | 79.6 | 79.3 | 73.2 | 76.3 | 54.3 | 68.9 | 61.6 | 84.5 | 88.7 | 86.6 | 91.6 | 89.1 | 90.4 | 93.7 | 86.9 | 90.3 | 83.8 | 65.1 | 74.5 | | 4 | SEEDream 3.0 | [链接](https://seed.bytedance.com/zh/tech/seedream3_0) | 2025-09-10 | 80.5 | 78.7 | **79.6** | 77.3 | 76.4 | 76.9 | 80.2 | 73.8 | 77.0 | 56.1 | 70.2 | 63.2 | 83.9 | 87.4 | 85.7 | 89.3 | 90.3 | 89.8 | 93.3 | 86.3 | 89.8 | 83.2 | 66.7 | 75.0 | | 5 | HiDream-I1-Full | [链接](https://huggingface.co/HiDream-ai/HiDream-I1-Full) | 2025-09-10 | 76.1 | 75.6 | **75.9** | 74.4 | 75.6 | 75.0 | 74.4 | 72.4 | 73.4 | 58.2 | 70.4 | 64.3 | 81.4 | 84.8 | 83.1 | 90.1 | 88.8 | 89.5 | 90.1 | 85.4 | 87.8 | 63.8 | 52.0 | 57.9 | | 6 | FLUX.1-Krea-dev | [链接](https://huggingface.co/black-forest-labs/FLUX.1-Krea-dev) | 2025-09-10 | 74.3 | 75.1 | **74.7** | 71.5 | 73.0 | 72.3 | 69.5 | 67.5 | 68.5 | 47.5 | 61.3 | 54.4 | 80.8 | 83.5 | 82.2 | 84.0 | 90.3 | 87.2 | 90.9 | 85.8 | 88.4 | 76.2 | 64.1 | 70.2 | | 7 | FLUX.1-dev | [链接](https://huggingface.co/black-forest-labs/FLUX.1-dev) | 2025-09-10 | 72.4 | 74.9 | **73.7** | 68.1 | 74.0 | 71.1 | 70.7 | 71.2 | 71.0 | 48.1 | 64.5 | 56.3 | 72.3 | 80.5 | 76.4 | 88.3 | 91.1 | 89.7 | 89.0 | 84.6 | 86.8 | 70.6 | 58.5 | 64.6 | | 8 | SD3.5-Large | [链接](https://huggingface.co/stabilityai/stable-diffusion-3.5-large) | 2025-09-10 | 73.9 | 73.5 | **73.7** | 73.3 | 71.2 | 72.3 | 76.7 | 71.9 | 74.3 | 52.0 | 65.8 | 58.9 | 77.1 | 84.2 | 80.7 | 87.1 | 85.2 | 86.2 | 87.0 | 84.7 | 85.9 | 64.3 | 51.7 | 58.0 | | 9 | HiDream-I1-Dev | [链接](https://huggingface.co/HiDream-ai/HiDream-I1-Dev) | 2025-09-10 | 70.3 | 70.0 | **70.2** | 68.2 | 69.7 | 69.0 | 72.0 | 67.0 | 69.5 | 53.4 | 64.1 | 58.8 | 68.7 | 78.6 | 73.7 | 84.2 | 83.1 | 83.7 | 87.6 | 79.8 | 83.7 | 58.1 | 47.5 | 52.8 | | 10 | SD3.5-Medium | [链接](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) | 2025-09-10 | 70.1 | 68.9 | **69.5** | 69.5 | 73.0 | 71.3 | 72.8 | 63.7 | 68.3 | 33.3 | 50.1 | 41.7 | 77.4 | 80.3 | 78.9 | 84.9 | 85.5 | 85.2 | 89.4 | 79.2 | 84.3 | 63.3 | 50.5 | 56.9 | ## 资源获取 我们公开发布全部数据集、评测基准与评测工具套件,以降低研发的资金与算力门槛,助力全球研究者构建并测试性能更优异的生成式模型。 * **[项目官网](https://flux-reason-6m.github.io/)** * **[论文](https://flux-reason-6m.github.io/#leaderboard)** * **[数据集](https://huggingface.co/datasets/LucasFang/FLUX-Reason-6M)** * **[代码](https://github.com/rongyaofang/prism-bench)** ## 引用方式 若您认为本研究对您的工作有所帮助,请引用我们的成果! bibtex @article{fang2025flux, title={FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark}, author={Fang, Rongyao and Yu, Aldrich and Duan, Chengqi and Huang, Linjiang and Bai, Shuai and Cai, Yuxuan and Wang, Kun and Liu, Si and Liu, Xihui and Li, Hongsheng}, journal={arXiv preprint arXiv:2509.09680}, year={2025} }
提供机构:
maas
创建时间:
2025-09-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作