five

UNO-1M

收藏
魔搭社区2026-01-09 更新2025-09-06 收录
下载链接:
https://modelscope.cn/datasets/bytedance-research/UNO-1M
下载链接
链接失效反馈
官方服务:
资源简介:
![image](./assets/uno1m.webp) <h3 align="center"> Less-to-More Generalization: Unlocking More Controllability by In-Context Generation </h3> <p align="center"> <a href="https://github.com/bytedance/UNO"><img alt="Build" src="https://img.shields.io/github/stars/bytedance/UNO"></a> <a href="https://bytedance.github.io/UNO/"><img alt="Build" src="https://img.shields.io/badge/Project%20Page-UNO-blue"></a> <a href="https://arxiv.org/abs/2504.02160"><img alt="Build" src="https://img.shields.io/badge/arXiv%20paper-UNO-b31b1b.svg"></a> <a href="https://huggingface.co/bytedance-research/UNO"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=green"></a> <a href="https://huggingface.co/datasets/bytedance-research/UNO-1M"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Dataset&color=yellow"></a> <a href="https://huggingface.co/spaces/bytedance-research/UNO-FLUX"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=demo&color=orange"></a> </p> ## Overview UNO-1M is a large dataset (~1M paired images) constructed by the in-context generation pipeline introduced in the [UNO](https://arxiv.org/abs/2504.02160) paper. Its advantages include highly diverse categories (>365 categories), high-resolution images (around 1024x1024), variable resolutions (different aspect ratios), high quality (produced by state-of-the-art text-to-image models), and high subject consistency (filtered by VLM-filter CoT). You can train on this dataset to reproduce the [UNO model](https://huggingface.co/bytedance-research/UNO) or build your own state-of-the-art subject-driven model. We now open-source the entire dataset to benefit research. ## Label Format | Key name | Type | Description | | ---------------- | ------ | ---------------------------------------------- | | `img_path1` | `str` | Reference image information (first image). | | `img_path2` | `str` | Reference image information (second image). | | `caption` | `dict` | Image caption and subject word. | | `vlm_filter_cot` | `dict` | The CoT answer of VLM-filter. | ## Dataset Structure ### Directory Structure ```bash output_root/ ├── images/ │ ├── split1.tar.gz │ ├── split2.tar.gz │ └── ... ├── labels/ │ ├── split1.json │ ├── split2.json │ └── ... ``` After extraction: ```bash output_root/ ├── images/ │ ├── split1/ │ │ ├── object365_w1024_h1536_split_Bread_0_0_1_725x1024.png │ │ ├── object365_w1024_h1536_split_Bread_0_0_2_811x1024.png │ │ └── ... │ └── ... ├── labels/ │ ├── split1.json │ ├── split2.json │ └── ... ... ``` ## Usage UNO-1M contains rich label information, and we preserve the breakdown of the consistency score as well as the final consistency scores. It can be applied to: - Text-to-image generation - Subject-driven generation - Scored-filter training - Consistency reward model training **Note:** For subject-driven generation, we recommend using data with a consistency score greater than or equal to 3.5 (the key in JSON is `score_final`). In the UNO paper, we use perfect score (score 4) data for training. You can refer to our technical report for more details. You can see an example below: ```json { "img_path1": "split1/class_generation_w1024_h1536_split_v1_Food_0_0_1_793x1024.png", "img_path2": "split1/class_generation_w1024_h1536_split_v1_Food_0_0_2_743x1024.png", "caption": { "img_path1": "A bowl of beef stew with carrots and garnished with a sprig of parsley, placed on a white surface.", "img_path2": "A bowl of beef stew with carrots and garnished with a sprig of parsley, placed on a wooden table surrounded by autumn leaves.", "judgment": "same", "subject": [ "beef stew with carrots" ] }, "vlm_filter_cot": { "score_part": { "Beef Chunks": 4.0, "Carrots": 3.0, "Appearance of the Stew": 4.0, "Garnish": 4.0 }, "score_final": 3.5 } } ``` ## Citation If you find our dataset helpful, please consider citing our work: ```bibtex @article{wu2025less, title={Less-to-more generalization: Unlocking more controllability by in-context generation}, author={Wu, Shaojin and Huang, Mengqi and Wu, Wenxu and Cheng, Yufeng and Ding, Fei and He, Qian}, journal={arXiv preprint arXiv:2504.02160}, year={2025} } ```

![image](./assets/uno1m.webp) <h3 align="center">由简至繁泛化:通过上下文生成解锁更强可控性</h3> <p align="center"> <a href="https://github.com/bytedance/UNO"><img alt="GitHub 星标" src="https://img.shields.io/github/stars/bytedance/UNO"></a> <a href="https://bytedance.github.io/UNO/"><img alt="项目主页" src="https://img.shields.io/badge/Project%20Page-UNO-blue"></a> <a href="https://arxiv.org/abs/2504.02160"><img alt="arXiv 论文" src="https://img.shields.io/badge/arXiv%20paper-UNO-b31b1b.svg"></a> <a href="https://huggingface.co/bytedance-research/UNO"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=模型&color=green"></a> <a href="https://huggingface.co/datasets/bytedance-research/UNO-1M"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=数据集&color=yellow"></a> <a href="https://huggingface.co/spaces/bytedance-research/UNO-FLUX"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=演示Demo&color=orange"></a> </p> ## 概述 UNO-1M 是由[UNO](https://arxiv.org/abs/2504.02160)论文中提出的上下文生成流程构建的大型数据集(约100万张配对图像)。其优势涵盖:类别多样性极高(超365类)、图像分辨率优异(约1024×1024)、支持可变分辨率(不同宽高比)、画质精良(由当前顶尖文本转图像模型生成)、主题一致性强(通过视觉语言模型(Vision-Language Model, VLM)过滤思维链(Chain-of-Thought, CoT)完成筛选)。您可基于该数据集复现[UNO模型](https://huggingface.co/bytedance-research/UNO),或搭建属于自己的顶尖主题驱动模型。目前我们已开源完整数据集以助力学术研究。 ## 标签格式 | 键名 | 类型 | 描述 | | ---------------- | ------ | --------------------------------------- | | `img_path1` | `str` | 参考图像信息(第一张图像)。 | | `img_path2` | `str` | 参考图像信息(第二张图像)。 | | `caption` | `dict` | 图像描述文本与主题词。 | | `vlm_filter_cot` | `dict` | VLM过滤模块的思维链(CoT)答案。 | ## 数据集结构 ### 目录结构 bash output_root/ ├── images/ │ ├── split1.tar.gz │ ├── split2.tar.gz │ └── 其他分卷压缩包 ├── labels/ │ ├── split1.json │ ├── split2.json │ └── 其他标签文件 解压完成后: bash output_root/ ├── images/ │ ├── split1/ │ │ ├── object365_w1024_h1536_split_Bread_0_0_1_725x1024.png │ │ ├── object365_w1024_h1536_split_Bread_0_0_2_811x1024.png │ │ └── 其他图像文件 │ └── 其他分卷目录 ├── labels/ │ ├── split1.json │ ├── split2.json │ └── 其他标签文件 ... ## 使用方式 UNO-1M包含丰富的标签信息,我们保留了一致性评分的分项结果与最终一致性评分。该数据集可应用于: - 文本转图像生成 - 主题驱动生成 - 带评分过滤的训练 - 一致性奖励模型训练 **注意:** 针对主题驱动生成任务,我们推荐使用一致性评分大于等于3.5的数据(JSON中的对应键为`score_final`)。在UNO论文中,我们使用满分(评分4)的数据进行模型训练。您可参阅我们的技术报告以获取更多细节。 您可参考以下示例: json { "img_path1": "split1/class_generation_w1024_h1536_split_v1_Food_0_0_1_793x1024.png", "img_path2": "split1/class_generation_w1024_h1536_split_v1_Food_0_0_2_743x1024.png", "caption": { "img_path1": "盛放在白色台面上的一碗胡萝卜炖牛肉,点缀有一小枝欧芹。", "img_path2": "盛放在木质台面上的一碗胡萝卜炖牛肉,点缀有一小枝欧芹,周围环绕秋叶。", "judgment": "一致", "subject": [ "胡萝卜炖牛肉" ] }, "vlm_filter_cot": { "score_part": { "牛肉块": 4.0, "胡萝卜": 3.0, "炖菜外观": 4.0, "点缀装饰": 4.0 }, "score_final": 3.5 } } ## 引用 如果您认为本数据集对您的研究有所帮助,请引用我们的工作: bibtex @article{wu2025less, title={Less-to-more generalization: Unlocking more controllability by in-context generation}, author={Wu, Shaojin and Huang, Mengqi and Wu, Wenxu and Cheng, Yufeng and Ding, Fei and He, Qian}, journal={arXiv preprint arXiv:2504.02160}, year={2025} }
提供机构:
maas
创建时间:
2025-08-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作