five

Echo-4o-Image

收藏
魔搭社区2026-01-06 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/Echo-4o-Image
下载链接
链接失效反馈
官方服务:
资源简介:
# Echo-4o-Image Dataset [Paper](https://huggingface.co/papers/2508.09987) | [Project Page](https://yejy53.github.io/Echo-4o) | [Code](https://github.com/yejy53/Echo-4o) ## Introduction Echo-4o-Image is a 180K-scale synthetic dataset generated by GPT-4o, designed to advance open-source models in image generation. While real-world image datasets are valuable, synthetic images offer crucial advantages, especially in addressing blind spots in real-world coverage: * **Complementing Rare Scenarios:** Synthetic data can generate examples for scenarios less represented in real-world datasets, such as surreal fantasy or multi-reference image generation, which are common in user queries. * **Clean and Controllable Supervision:** Unlike real-world data, which often contains complex background noise and misalignment between text and image, synthetic images provide pure backgrounds and long-tailed supervision signals, facilitating more accurate text-to-image alignment. This dataset was instrumental in fine-tuning the unified multimodal generation baseline Bagel to obtain Echo-4o, demonstrating strong performance across standard benchmarks. Furthermore, Echo-4o-Image consistently enhances other foundation models (e.g., OmniGen2, BLIP3-o), highlighting its strong transferability. ## Echo-4o-Image Dataset Details Echo-4o-Image is a large-scale synthetic dataset distilled from GPT-4o, containing approximately 179,000 samples. It spans three distinct task types: * **38K surreal fantasy generation tasks:** Designed to address imaginative content. * **73K multi-reference image generation tasks:** For scenarios requiring multiple visual cues. * **68K complex instruction execution tasks:** To improve adherence to detailed textual prompts. For better visualization, an online gallery showcasing representative samples from our dataset is available: [Online Gallery](https://yejy53.github.io/Echo-4o/) ## Data Structure The dataset typically organizes data within compressed packages (e.g., `.tar.gz` files referenced in `configs`). Inside these packages, data is arranged as follows: ``` - package_idx/ --- package_idx.json # metadata for samples in this package --- images/ ----- 00001.png ----- 00002.png ... ``` ## Usage This dataset can be used to train and fine-tune text-to-image models, extending capabilities to support multi-reference datasets. ### Training The training process extends existing frameworks (e.g., Bagel's capabilities). 1. **Data Preparation:** Follow data preparation guidelines, ensuring multi-reference data adheres to the expected format. 2. **Training Process:** Training scripts use interfaces and parameters similar to established models (e.g., Bagel), allowing for seamless integration with existing training commands and configurations. ### Inference * **Text-to-Image Tasks:** For standard text-to-image generation, follow the inference process of base models (e.g., Bagel). * **Multi-Reference Tasks:** Specific examples and guides for tasks involving multiple references are provided in the [official GitHub repository](https://github.com/yejy53/Echo-4o). ### Code and Supporting Files The associated GitHub repository provides crucial supporting files for working with the dataset: * **Attributes and Subjects:** `./code/attributes_and_subjects.json` contains dictionaries defining various attributes and subjects used in the dataset. * **Range-sensitive filtering:** `./code/range_sensitive_filter.json` contains metadata for data filtering, and `./code/data_filter.py` converts it for use in dataloaders. * **Data Loader:** `./code/dataloader.py` provides an example of how to load the data into image pairs, incorporating filtering and balanced resampling. ## Evaluation Benchmarks The paper introduces two novel benchmarks for rigorously evaluating image generation capabilities: * **GenEval++:** Increases instruction complexity and uses an automated evaluator (powered by GPT-4.1) to mitigate score saturation and provide a more accurate assessment of text-to-image instruction following. * **Imagine-Bench:** Focuses on imaginative content, offering a comprehensive evaluation of conceptual creativity and visual consistency across dimensions like fantasy fulfillment, identity preservation, and aesthetic quality. Detailed guides for these benchmarks can be found in the [EVAL section of the GitHub repository](https://github.com/yejy53/Echo-4o/blob/main/EVAL.md). ## Acknowledgements We would like to thank the following open-source projects and research works: * [Bagel](https://github.com/ByteDance-Seed/Bagel) * [BLIP3o](https://github.com/JiuhaiChen/BLIP3o) * [OmniGen2](https://github.com/VectorSpaceLab/OmniGen2?tab=readme-ov-file) ## Citation If you find this dataset or the associated work useful for your research, please cite the paper: ```bib @article{ye2025echo4o, title={Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation}, author={Junyan Ye, Dongzhi Jiang, Zihao Wang, Leqi Zhu, Zhenghao Hu, Zilong Huang, Jun He, Zhiyuan Yan, Jinghua Yu, Hongsheng Li, Conghui He, Weijia Li}, journal={https://arxiv.org/abs/2508.09987}, year={2025}, } ```

# Echo-4o-Image 数据集 [论文](https://huggingface.co/papers/2508.09987) | [项目主页](https://yejy53.github.io/Echo-4o) | [代码仓库](https://github.com/yejy53/Echo-4o) ## 简介 Echo-4o-Image 是一款由 GPT-4o 生成的规模达18万级的合成数据集,旨在推动图像生成领域开源模型的发展。尽管真实世界图像数据集颇具价值,但合成图像具备诸多关键优势,尤其能够弥补真实数据集覆盖范围的盲区: * **补充稀有场景:** 合成数据可生成真实数据集中占比极低的场景样本,例如超现实奇幻场景或多参考图像生成场景——这类场景在用户查询中颇为常见。 * **纯净可控的监督信号:** 与常含复杂背景噪声且文本与图像存在对齐偏差的真实数据不同,合成图像拥有纯净背景与长尾监督信号,有助于实现更精准的文本到图像对齐。 该数据集在对统一多模态生成基线模型 Bagel 进行微调以得到 Echo-4o 的过程中发挥了关键作用,且 Echo-4o 在各类标准基准测试中展现出优异性能。此外,Echo-4o-Image 能够持续提升其他基础模型(如 OmniGen2、BLIP3-o)的表现,凸显了其极强的迁移性。 ## Echo-4o-Image 数据集详情 Echo-4o-Image 是一款从 GPT-4o 中蒸馏得到的大规模合成数据集,共包含约17.9万个样本。其涵盖三类截然不同的任务类型: * **3.8万个超现实奇幻生成任务:** 用于生成富有想象力的内容。 * **7.3万个多参考图像生成任务:** 面向需要多视觉线索的场景。 * **6.8万个复杂指令执行任务:** 用于提升模型对详细文本提示的遵循能力。 为便于直观展示,本数据集的代表性样本已上线线上展厅:[线上展厅](https://yejy53.github.io/Echo-4o/) ## 数据结构 本数据集通常以压缩包(如配置文件中提及的 `.tar.gz` 格式文件)为单位组织数据。压缩包内的数据结构如下: - package_idx/ --- package_idx.json # 当前包内样本的元数据 --- images/ ----- 00001.png ----- 00002.png ... ## 使用方法 本数据集可用于训练与微调文本到图像模型,并可拓展模型对多参考数据集的支持能力。 ### 训练流程 本训练流程兼容现有框架(如 Bagel 的训练框架)。 1. **数据准备:** 遵循数据准备规范,确保多参考数据符合预期格式。 2. **训练流程:** 训练脚本采用与成熟模型(如 Bagel)一致的接口与参数,可与现有训练命令及配置无缝集成。 ### 推理流程 * **文本到图像任务:** 对于标准文本到图像生成任务,遵循基础模型(如 Bagel)的推理流程即可。 * **多参考任务:** 针对多参考相关任务的具体示例与操作指南,可参阅[官方GitHub仓库](https://github.com/yejy53/Echo-4o)。 ### 代码与辅助文件 配套GitHub仓库提供了本数据集相关的关键辅助文件: * **属性与主题:** `./code/attributes_and_subjects.json` 包含定义数据集所用各类属性与主题的字典文件。 * **范围敏感过滤:** `./code/range_sensitive_filter.json` 包含数据过滤所需的元数据,`./code/data_filter.py` 可将其转换为数据加载器可用的格式。 * **数据加载器:** `./code/dataloader.py` 提供了如何将数据加载为图像对的示例,集成了过滤与平衡重采样功能。 ## 评估基准 本论文提出了两款全新基准测试集,用于严格评估图像生成能力: * **GenEval++:** 提升了指令复杂度,并采用基于 GPT-4.1 的自动评估器,以缓解分数饱和问题,更精准地评估模型对文本到图像指令的遵循程度。 * **Imagine-Bench:** 聚焦于富有想象力的内容,从奇幻场景还原、身份一致性、美学质量等多个维度,对概念创造力与视觉一致性进行全面评估。 上述基准测试的详细指南可参阅GitHub仓库的[EVAL章节](https://github.com/yejy53/Echo-4o/blob/main/EVAL.md)。 ## 致谢 谨向以下开源项目与研究工作致以诚挚谢意: * [Bagel](https://github.com/ByteDance-Seed/Bagel) * [BLIP3o](https://github.com/JiuhaiChen/BLIP3o) * [OmniGen2](https://github.com/VectorSpaceLab/OmniGen2?tab=readme-ov-file) ## 引用格式 若您的研究中使用了本数据集或相关工作,请引用如下论文: bib @article{ye2025echo4o, title={Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation}, author={Junyan Ye, Dongzhi Jiang, Zihao Wang, Leqi Zhu, Zhenghao Hu, Zilong Huang, Jun He, Zhiyuan Yan, Jinghua Yu, Hongsheng Li, Conghui He, Weijia Li}, journal={arXiv预印本: https://arxiv.org/abs/2508.09987}, year={2025}, }
提供机构:
maas
创建时间:
2025-08-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作