UNO-1M

Name: UNO-1M
Creator: maas
Published: 2026-01-09 11:59:49
License: 暂无描述

魔搭社区2026-01-09 更新2025-09-06 收录

下载链接：

https://modelscope.cn/datasets/bytedance-research/UNO-1M

下载链接

链接失效反馈

官方服务：

资源简介：

![image](./assets/uno1m.webp) <h3 align="center"> Less-to-More Generalization: Unlocking More Controllability by In-Context Generation </h3> <p align="center"> <a href="https://github.com/bytedance/UNO"><img alt="Build" src="https://img.shields.io/github/stars/bytedance/UNO"></a> <a href="https://bytedance.github.io/UNO/"><img alt="Build" src="https://img.shields.io/badge/Project%20Page-UNO-blue"></a> <a href="https://arxiv.org/abs/2504.02160"><img alt="Build" src="https://img.shields.io/badge/arXiv%20paper-UNO-b31b1b.svg"></a> <a href="https://huggingface.co/bytedance-research/UNO"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Model&color=green"></a> <a href="https://huggingface.co/datasets/bytedance-research/UNO-1M"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=Dataset&color=yellow"></a> <a href="https://huggingface.co/spaces/bytedance-research/UNO-FLUX"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=demo&color=orange"></a> </p> ## Overview UNO-1M is a large dataset (~1M paired images) constructed by the in-context generation pipeline introduced in the [UNO](https://arxiv.org/abs/2504.02160) paper. Its advantages include highly diverse categories (>365 categories), high-resolution images (around 1024x1024), variable resolutions (different aspect ratios), high quality (produced by state-of-the-art text-to-image models), and high subject consistency (filtered by VLM-filter CoT). You can train on this dataset to reproduce the [UNO model](https://huggingface.co/bytedance-research/UNO) or build your own state-of-the-art subject-driven model. We now open-source the entire dataset to benefit research. ## Label Format | Key name | Type | Description | | ---------------- | ------ | ---------------------------------------------- | | `img_path1` | `str` | Reference image information (first image). | | `img_path2` | `str` | Reference image information (second image). | | `caption` | `dict` | Image caption and subject word. | | `vlm_filter_cot` | `dict` | The CoT answer of VLM-filter. | ## Dataset Structure ### Directory Structure ```bash output_root/ ├── images/ │ ├── split1.tar.gz │ ├── split2.tar.gz │ └── ... ├── labels/ │ ├── split1.json │ ├── split2.json │ └── ... ``` After extraction: ```bash output_root/ ├── images/ │ ├── split1/ │ │ ├── object365_w1024_h1536_split_Bread_0_0_1_725x1024.png │ │ ├── object365_w1024_h1536_split_Bread_0_0_2_811x1024.png │ │ └── ... │ └── ... ├── labels/ │ ├── split1.json │ ├── split2.json │ └── ... ... ``` ## Usage UNO-1M contains rich label information, and we preserve the breakdown of the consistency score as well as the final consistency scores. It can be applied to: - Text-to-image generation - Subject-driven generation - Scored-filter training - Consistency reward model training **Note:** For subject-driven generation, we recommend using data with a consistency score greater than or equal to 3.5 (the key in JSON is `score_final`). In the UNO paper, we use perfect score (score 4) data for training. You can refer to our technical report for more details. You can see an example below: ```json { "img_path1": "split1/class_generation_w1024_h1536_split_v1_Food_0_0_1_793x1024.png", "img_path2": "split1/class_generation_w1024_h1536_split_v1_Food_0_0_2_743x1024.png", "caption": { "img_path1": "A bowl of beef stew with carrots and garnished with a sprig of parsley, placed on a white surface.", "img_path2": "A bowl of beef stew with carrots and garnished with a sprig of parsley, placed on a wooden table surrounded by autumn leaves.", "judgment": "same", "subject": [ "beef stew with carrots" ] }, "vlm_filter_cot": { "score_part": { "Beef Chunks": 4.0, "Carrots": 3.0, "Appearance of the Stew": 4.0, "Garnish": 4.0 }, "score_final": 3.5 } } ``` ## Citation If you find our dataset helpful, please consider citing our work: ```bibtex @article{wu2025less, title={Less-to-more generalization: Unlocking more controllability by in-context generation}, author={Wu, Shaojin and Huang, Mengqi and Wu, Wenxu and Cheng, Yufeng and Ding, Fei and He, Qian}, journal={arXiv preprint arXiv:2504.02160}, year={2025} } ```

![image](./assets/uno1m.webp) <h3 align="center">由简至繁泛化：通过上下文生成解锁更强可控性</h3> <p align="center"> <a href="https://github.com/bytedance/UNO"><img alt="GitHub 星标" src="https://img.shields.io/github/stars/bytedance/UNO"></a> <a href="https://bytedance.github.io/UNO/"><img alt="项目主页" src="https://img.shields.io/badge/Project%20Page-UNO-blue"></a> <a href="https://arxiv.org/abs/2504.02160"><img alt="arXiv 论文" src="https://img.shields.io/badge/arXiv%20paper-UNO-b31b1b.svg"></a> <a href="https://huggingface.co/bytedance-research/UNO"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=模型&color=green"></a> <a href="https://huggingface.co/datasets/bytedance-research/UNO-1M"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=数据集&color=yellow"></a> <a href="https://huggingface.co/spaces/bytedance-research/UNO-FLUX"><img src="https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Hugging%20Face&message=演示Demo&color=orange"></a> </p> ## 概述 UNO-1M 是由[UNO](https://arxiv.org/abs/2504.02160)论文中提出的上下文生成流程构建的大型数据集（约100万张配对图像）。其优势涵盖：类别多样性极高（超365类）、图像分辨率优异（约1024×1024）、支持可变分辨率（不同宽高比）、画质精良（由当前顶尖文本转图像模型生成）、主题一致性强（通过视觉语言模型（Vision-Language Model, VLM）过滤思维链（Chain-of-Thought, CoT）完成筛选）。您可基于该数据集复现[UNO模型](https://huggingface.co/bytedance-research/UNO)，或搭建属于自己的顶尖主题驱动模型。目前我们已开源完整数据集以助力学术研究。 ## 标签格式 | 键名 | 类型 | 描述 | | ---------------- | ------ | --------------------------------------- | | `img_path1` | `str` | 参考图像信息（第一张图像）。 | | `img_path2` | `str` | 参考图像信息（第二张图像）。 | | `caption` | `dict` | 图像描述文本与主题词。 | | `vlm_filter_cot` | `dict` | VLM过滤模块的思维链（CoT）答案。 | ## 数据集结构 ### 目录结构 bash output_root/ ├── images/ │ ├── split1.tar.gz │ ├── split2.tar.gz │ └── 其他分卷压缩包 ├── labels/ │ ├── split1.json │ ├── split2.json │ └── 其他标签文件解压完成后： bash output_root/ ├── images/ │ ├── split1/ │ │ ├── object365_w1024_h1536_split_Bread_0_0_1_725x1024.png │ │ ├── object365_w1024_h1536_split_Bread_0_0_2_811x1024.png │ │ └── 其他图像文件 │ └── 其他分卷目录 ├── labels/ │ ├── split1.json │ ├── split2.json │ └── 其他标签文件 ... ## 使用方式 UNO-1M包含丰富的标签信息，我们保留了一致性评分的分项结果与最终一致性评分。该数据集可应用于： - 文本转图像生成 - 主题驱动生成 - 带评分过滤的训练 - 一致性奖励模型训练 **注意：** 针对主题驱动生成任务，我们推荐使用一致性评分大于等于3.5的数据（JSON中的对应键为`score_final`）。在UNO论文中，我们使用满分（评分4）的数据进行模型训练。您可参阅我们的技术报告以获取更多细节。您可参考以下示例： json { "img_path1": "split1/class_generation_w1024_h1536_split_v1_Food_0_0_1_793x1024.png", "img_path2": "split1/class_generation_w1024_h1536_split_v1_Food_0_0_2_743x1024.png", "caption": { "img_path1": "盛放在白色台面上的一碗胡萝卜炖牛肉，点缀有一小枝欧芹。", "img_path2": "盛放在木质台面上的一碗胡萝卜炖牛肉，点缀有一小枝欧芹，周围环绕秋叶。", "judgment": "一致", "subject": [ "胡萝卜炖牛肉" ] }, "vlm_filter_cot": { "score_part": { "牛肉块": 4.0, "胡萝卜": 3.0, "炖菜外观": 4.0, "点缀装饰": 4.0 }, "score_final": 3.5 } } ## 引用如果您认为本数据集对您的研究有所帮助，请引用我们的工作： bibtex @article{wu2025less, title={Less-to-more generalization: Unlocking more controllability by in-context generation}, author={Wu, Shaojin and Huang, Mengqi and Wu, Wenxu and Cheng, Yufeng and Ding, Fei and He, Qian}, journal={arXiv preprint arXiv:2504.02160}, year={2025} }

提供机构：

maas

创建时间：

2025-08-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集