Echo-4o-Image

Name: Echo-4o-Image
Creator: maas
Published: 2026-01-06 16:43:22
License: 暂无描述

魔搭社区2026-01-06 更新2025-08-23 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/Echo-4o-Image

下载链接

链接失效反馈

官方服务：

资源简介：

# Echo-4o-Image Dataset [Paper](https://huggingface.co/papers/2508.09987) | [Project Page](https://yejy53.github.io/Echo-4o) | [Code](https://github.com/yejy53/Echo-4o) ## Introduction Echo-4o-Image is a 180K-scale synthetic dataset generated by GPT-4o, designed to advance open-source models in image generation. While real-world image datasets are valuable, synthetic images offer crucial advantages, especially in addressing blind spots in real-world coverage: * **Complementing Rare Scenarios:** Synthetic data can generate examples for scenarios less represented in real-world datasets, such as surreal fantasy or multi-reference image generation, which are common in user queries. * **Clean and Controllable Supervision:** Unlike real-world data, which often contains complex background noise and misalignment between text and image, synthetic images provide pure backgrounds and long-tailed supervision signals, facilitating more accurate text-to-image alignment. This dataset was instrumental in fine-tuning the unified multimodal generation baseline Bagel to obtain Echo-4o, demonstrating strong performance across standard benchmarks. Furthermore, Echo-4o-Image consistently enhances other foundation models (e.g., OmniGen2, BLIP3-o), highlighting its strong transferability. ## Echo-4o-Image Dataset Details Echo-4o-Image is a large-scale synthetic dataset distilled from GPT-4o, containing approximately 179,000 samples. It spans three distinct task types: * **38K surreal fantasy generation tasks:** Designed to address imaginative content. * **73K multi-reference image generation tasks:** For scenarios requiring multiple visual cues. * **68K complex instruction execution tasks:** To improve adherence to detailed textual prompts. For better visualization, an online gallery showcasing representative samples from our dataset is available: [Online Gallery](https://yejy53.github.io/Echo-4o/) ## Data Structure The dataset typically organizes data within compressed packages (e.g., `.tar.gz` files referenced in `configs`). Inside these packages, data is arranged as follows: ``` - package_idx/ --- package_idx.json # metadata for samples in this package --- images/ ----- 00001.png ----- 00002.png ... ``` ## Usage This dataset can be used to train and fine-tune text-to-image models, extending capabilities to support multi-reference datasets. ### Training The training process extends existing frameworks (e.g., Bagel's capabilities). 1. **Data Preparation:** Follow data preparation guidelines, ensuring multi-reference data adheres to the expected format. 2. **Training Process:** Training scripts use interfaces and parameters similar to established models (e.g., Bagel), allowing for seamless integration with existing training commands and configurations. ### Inference * **Text-to-Image Tasks:** For standard text-to-image generation, follow the inference process of base models (e.g., Bagel). * **Multi-Reference Tasks:** Specific examples and guides for tasks involving multiple references are provided in the [official GitHub repository](https://github.com/yejy53/Echo-4o). ### Code and Supporting Files The associated GitHub repository provides crucial supporting files for working with the dataset: * **Attributes and Subjects:** `./code/attributes_and_subjects.json` contains dictionaries defining various attributes and subjects used in the dataset. * **Range-sensitive filtering:** `./code/range_sensitive_filter.json` contains metadata for data filtering, and `./code/data_filter.py` converts it for use in dataloaders. * **Data Loader:** `./code/dataloader.py` provides an example of how to load the data into image pairs, incorporating filtering and balanced resampling. ## Evaluation Benchmarks The paper introduces two novel benchmarks for rigorously evaluating image generation capabilities: * **GenEval++:** Increases instruction complexity and uses an automated evaluator (powered by GPT-4.1) to mitigate score saturation and provide a more accurate assessment of text-to-image instruction following. * **Imagine-Bench:** Focuses on imaginative content, offering a comprehensive evaluation of conceptual creativity and visual consistency across dimensions like fantasy fulfillment, identity preservation, and aesthetic quality. Detailed guides for these benchmarks can be found in the [EVAL section of the GitHub repository](https://github.com/yejy53/Echo-4o/blob/main/EVAL.md). ## Acknowledgements We would like to thank the following open-source projects and research works: * [Bagel](https://github.com/ByteDance-Seed/Bagel) * [BLIP3o](https://github.com/JiuhaiChen/BLIP3o) * [OmniGen2](https://github.com/VectorSpaceLab/OmniGen2?tab=readme-ov-file) ## Citation If you find this dataset or the associated work useful for your research, please cite the paper: ```bib @article{ye2025echo4o, title={Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation}, author={Junyan Ye, Dongzhi Jiang, Zihao Wang, Leqi Zhu, Zhenghao Hu, Zilong Huang, Jun He, Zhiyuan Yan, Jinghua Yu, Hongsheng Li, Conghui He, Weijia Li}, journal={https://arxiv.org/abs/2508.09987}, year={2025}, } ```

# Echo-4o-Image 数据集 [论文](https://huggingface.co/papers/2508.09987) | [项目主页](https://yejy53.github.io/Echo-4o) | [代码仓库](https://github.com/yejy53/Echo-4o) ## 简介 Echo-4o-Image 是一款由 GPT-4o 生成的规模达18万级的合成数据集，旨在推动图像生成领域开源模型的发展。尽管真实世界图像数据集颇具价值，但合成图像具备诸多关键优势，尤其能够弥补真实数据集覆盖范围的盲区： * **补充稀有场景：** 合成数据可生成真实数据集中占比极低的场景样本，例如超现实奇幻场景或多参考图像生成场景——这类场景在用户查询中颇为常见。 * **纯净可控的监督信号：** 与常含复杂背景噪声且文本与图像存在对齐偏差的真实数据不同，合成图像拥有纯净背景与长尾监督信号，有助于实现更精准的文本到图像对齐。该数据集在对统一多模态生成基线模型 Bagel 进行微调以得到 Echo-4o 的过程中发挥了关键作用，且 Echo-4o 在各类标准基准测试中展现出优异性能。此外，Echo-4o-Image 能够持续提升其他基础模型（如 OmniGen2、BLIP3-o）的表现，凸显了其极强的迁移性。 ## Echo-4o-Image 数据集详情 Echo-4o-Image 是一款从 GPT-4o 中蒸馏得到的大规模合成数据集，共包含约17.9万个样本。其涵盖三类截然不同的任务类型： * **3.8万个超现实奇幻生成任务：** 用于生成富有想象力的内容。 * **7.3万个多参考图像生成任务：** 面向需要多视觉线索的场景。 * **6.8万个复杂指令执行任务：** 用于提升模型对详细文本提示的遵循能力。为便于直观展示，本数据集的代表性样本已上线线上展厅：[线上展厅](https://yejy53.github.io/Echo-4o/) ## 数据结构本数据集通常以压缩包（如配置文件中提及的 `.tar.gz` 格式文件）为单位组织数据。压缩包内的数据结构如下： - package_idx/ --- package_idx.json # 当前包内样本的元数据 --- images/ ----- 00001.png ----- 00002.png ... ## 使用方法本数据集可用于训练与微调文本到图像模型，并可拓展模型对多参考数据集的支持能力。 ### 训练流程本训练流程兼容现有框架（如 Bagel 的训练框架）。 1. **数据准备：** 遵循数据准备规范，确保多参考数据符合预期格式。 2. **训练流程：** 训练脚本采用与成熟模型（如 Bagel）一致的接口与参数，可与现有训练命令及配置无缝集成。 ### 推理流程 * **文本到图像任务：** 对于标准文本到图像生成任务，遵循基础模型（如 Bagel）的推理流程即可。 * **多参考任务：** 针对多参考相关任务的具体示例与操作指南，可参阅[官方GitHub仓库](https://github.com/yejy53/Echo-4o)。 ### 代码与辅助文件配套GitHub仓库提供了本数据集相关的关键辅助文件： * **属性与主题：** `./code/attributes_and_subjects.json` 包含定义数据集所用各类属性与主题的字典文件。 * **范围敏感过滤：** `./code/range_sensitive_filter.json` 包含数据过滤所需的元数据，`./code/data_filter.py` 可将其转换为数据加载器可用的格式。 * **数据加载器：** `./code/dataloader.py` 提供了如何将数据加载为图像对的示例，集成了过滤与平衡重采样功能。 ## 评估基准本论文提出了两款全新基准测试集，用于严格评估图像生成能力： * **GenEval++：** 提升了指令复杂度，并采用基于 GPT-4.1 的自动评估器，以缓解分数饱和问题，更精准地评估模型对文本到图像指令的遵循程度。 * **Imagine-Bench：** 聚焦于富有想象力的内容，从奇幻场景还原、身份一致性、美学质量等多个维度，对概念创造力与视觉一致性进行全面评估。上述基准测试的详细指南可参阅GitHub仓库的[EVAL章节](https://github.com/yejy53/Echo-4o/blob/main/EVAL.md)。 ## 致谢谨向以下开源项目与研究工作致以诚挚谢意： * [Bagel](https://github.com/ByteDance-Seed/Bagel) * [BLIP3o](https://github.com/JiuhaiChen/BLIP3o) * [OmniGen2](https://github.com/VectorSpaceLab/OmniGen2?tab=readme-ov-file) ## 引用格式若您的研究中使用了本数据集或相关工作，请引用如下论文： bib @article{ye2025echo4o, title={Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation}, author={Junyan Ye, Dongzhi Jiang, Zihao Wang, Leqi Zhu, Zhenghao Hu, Zilong Huang, Jun He, Zhiyuan Yan, Jinghua Yu, Hongsheng Li, Conghui He, Weijia Li}, journal={arXiv预印本: https://arxiv.org/abs/2508.09987}, year={2025}, }

提供机构：

maas

创建时间：

2025-08-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集