OpenGVLab/AS-V2

Name: OpenGVLab/AS-V2
Creator: OpenGVLab
Published: 2024-03-21 14:17:14
License: 暂无描述

Hugging Face2024-03-21 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/OpenGVLab/AS-V2

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 --- # The All-Seeing Project V2 We release the training data utilized for the All-Seeing Project V2 in this repository. - `llava_v1_5_mix665k_asmv2_format.json`: the instruction tuning data used in Stage 1. - `as_pretrain_10m.json`: the filtered 10M samples in AS-1B, which are used in the pretraining phase of Stage 2. - `as_mix_4m.json`: the instruction tuning data used in Stage 2. - `rec_conversation_22k.json`: the conversation data of AS-V2. - `rec_detailed_description.json`: the detailed description data of AS-V2. - `rec_region_captioning.json`: the region description data of AS-V2. ***NOTE***: - AS-V2 has been intergrated into `as_mix_4m.json`. - the bounding boxes in `rec_conversation_22k.json`, `rec_detailed_description.json`, and `rec_region_captioning.json` have been preprocessed to fit square pad. See `rec_conversation_22k_wo_square_pad.json`, `rec_detailed_description_wo_square_pad.json`, and `rec_region_captioning_wo_square_pad.json` for data without square pad preprocess. See our [paper](https://arxiv.org/abs/2402.19474) and [projects](https://github.com/OpenGVLab/all-seeing) for more details! # Citation If you find our work useful in your research, please consider cite: ```BibTeX @article{wang2023allseeing, title={The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World}, author={Wang, Weiyun and Shi, Min and Li, Qingyun and Wang, Wenhai and Huang, Zhenhang and Xing, Linjie and Chen, Zhe and Li, Hao and Zhu, Xizhou and Cao, Zhiguo and others}, journal={arXiv preprint arXiv:2308.01907}, year={2023} } @article{wang2024allseeing_v2, title={The All-Seeing Project V2: Towards General Relation Comprehension of the Open World}, author={Wang, Weiyun and Ren, Yiming and Luo, Haowen and Li, Tiantong and Yan, Chenxiang and Chen, Zhe and Wang, Wenhai and Li, Qingyun and Lu, Lewei and Zhu, Xizhou and others}, journal={arXiv preprint arXiv:2402.19474}, year={2024} } ```

--- 许可证：Apache-2.0 --- # 全视项目V2（The All-Seeing Project V2）本仓库发布了全视项目V2训练所用的数据集。 - `llava_v1_5_mix665k_asmv2_format.json`：阶段1指令微调所用的数据集。 - `as_pretrain_10m.json`：AS-1B中经过筛选的1000万条样本，用于阶段2的预训练阶段。 - `as_mix_4m.json`：阶段2指令微调所用的数据集。 - `rec_conversation_22k.json`：AS-V2的对话数据集。 - `rec_detailed_description.json`：AS-V2的详细描述数据集。 - `rec_region_captioning.json`：AS-V2的区域描述数据集。 ***注意事项***： - AS-V2已整合至`as_mix_4m.json`中。 - `rec_conversation_22k.json`、`rec_detailed_description.json`和`rec_region_captioning.json`中的边界框（bounding box）已完成预处理，以适配方形填充格式。如需获取未经方形填充预处理的原始数据，请参阅`rec_conversation_22k_wo_square_pad.json`、`rec_detailed_description_wo_square_pad.json`和`rec_region_captioning_wo_square_pad.json`。如需了解更多细节，请查阅我们的[论文](https://arxiv.org/abs/2402.19474)与[项目仓库](https://github.com/OpenGVLab/all-seeing)！ ## 引用若您在研究中使用了本项目的成果，请考虑引用以下文献： BibTeX @article{wang2023allseeing, title={The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World}, author={Wang, Weiyun and Shi, Min and Li, Qingyun and Wang, Wenhai and Huang, Zhenhang and Xing, Linjie and Chen, Zhe and Li, Hao and Zhu, Xizhou and Cao, Zhiguo and others}, journal={arXiv preprint arXiv:2308.01907}, year={2023} } @article{wang2024allseeing_v2, title={The All-Seeing Project V2: Towards General Relation Comprehension of the Open World}, author={Wang, Weiyun and Ren, Yiming and Luo, Haowen and Li, Tiantong and Yan, Chenxiang and Chen, Zhe and Wang, Wenhai and Li, Qingyun and Lu, Lewei and Zhu, Xizhou and others}, journal={arXiv preprint arXiv:2402.19474}, year={2024} }

提供机构：

OpenGVLab

原始信息汇总

The All-Seeing Project V2 数据集概述

数据文件

llava_v1_5_mix665k_asmv2_format.json: 用于阶段1的指令调优数据。
as_pretrain_10m.json: 用于阶段2预训练阶段的AS-1B过滤后的1000万样本。
as_mix_4m.json: 用于阶段2的指令调优数据。
rec_conversation_22k.json: AS-V2的对话数据。
rec_detailed_description.json: AS-V2的详细描述数据。
rec_region_captioning.json: AS-V2的区域描述数据。

注意事项

AS-V2已集成到as_mix_4m.json中。
rec_conversation_22k.json、rec_detailed_description.json和rec_region_captioning.json中的边界框已预处理为方形填充。未进行方形填充预处理的数据文件为rec_conversation_22k_wo_square_pad.json、rec_detailed_description_wo_square_pad.json和rec_region_captioning_wo_square_pad.json。

搜集汇总

数据集介绍

构建方式

在开放世界视觉识别与理解的宏大背景下，OpenGVLab/AS-V2数据集应运而生，旨在推动通用关系理解能力的边界。该数据集的构建基于多阶段精细化的策略，首先通过过滤AS-1B数据集得到10M样本用于预训练阶段，随后整合了包含LLaVA-1.5混合指令数据在内的多种微调数据。具体而言，Stage 1使用了llava_v1_5_mix665k_asmv2_format.json进行指令调优，Stage 2则融合了as_mix_4m.json等多源数据，其中还特别包含了rec_conversation_22k.json、rec_detailed_description.json和rec_region_captioning.json等针对对话、详细描述和区域描述的专用数据，且边界框已预处理为适应方形填充格式，未处理的版本亦同步提供。

特点

AS-V2数据集的核心特点在于其全面性与细粒度。它不仅整合了多模态指令调优数据，还专门设计了用于关系理解的高质量对话与描述数据。数据集中的边界框经过方形填充预处理，提升了与主流视觉语言模型的兼容性，同时提供了未处理的原始版本以满足不同研究需求。此外，该数据集覆盖了从预训练到指令微调的全流程，通过过滤机制确保样本质量，使得模型能够在开放世界场景中实现更精准的视觉与语义对齐，尤其擅长处理复杂的关系推理任务。

使用方法

使用AS-V2数据集时，研究者可根据训练阶段灵活选择数据子集。对于预训练阶段，可加载as_pretrain_10m.json进行大规模视觉特征学习；进入指令调优阶段，则可结合as_mix_4m.json与llava_v1_5_mix665k_asmv2_format.json构建多任务学习框架。针对特定任务，如区域描述或详细对话，可直接使用rec_region_captioning.json或rec_detailed_description.json等文件。需注意，默认提供的边界框已进行方形填充预处理，若需原始坐标，应选用文件名中包含wo_square_pad的版本。所有数据均以JSON格式存储，便于与常见深度学习框架集成，具体实现细节可参考相关论文与项目代码。

背景与挑战

背景概述

在开放世界视觉理解领域，如何使模型不仅识别物体，更能理解物体间复杂关系，是迈向通用人工智能的关键一步。OpenGVLab团队于2024年推出的AS-V2（All-Seeing Project V2）数据集，由上海人工智能实验室等机构的研究人员主导，旨在攻克全景视觉识别与开放世界关系理解这一核心研究问题。该数据集在AS-1B基础上，精心筛选了1000万预训练样本，并整合了400万指令微调数据，覆盖区域描述、详细描述及对话等多种任务形式。AS-V2的发布推动了多模态大模型从简单的物体检测向深层次语义关系推理的跨越，为构建更智能的视觉-语言系统奠定了数据基础，在学术界和工业界产生了广泛影响。

当前挑战

AS-V2所解决的领域挑战在于，传统视觉数据集多聚焦于孤立物体分类或定位，而现实场景中物体间的交互、空间及逻辑关系理解更为复杂且缺乏大规模标注。构建过程中，团队面临数据质量与规模的平衡难题：AS-1B原始数据噪声较多，需通过过滤算法筛选出10万高质量样本，确保预训练阶段的有效性。此外，为适配不同任务，还需设计统一的指令格式，并对边界框进行方形填充预处理以统一输入尺寸，同时保留无预处理版本供研究者灵活使用。这些处理步骤增加了数据构建的工程复杂度，但也使得AS-V2在关系理解任务上展现出更强的泛化能力。

常用场景

经典使用场景

在视觉与语言交叉领域的研究中，OpenGVLab/AS-V2数据集被广泛用于多模态大模型的训练与评估，特别是针对开放世界场景下的泛化视觉理解任务。其经典使用场景包括基于指令的视觉对话、区域级描述生成以及细粒度物体关系推理。研究者通过该数据集中的结构化指令微调数据（如as_mix_4m.json）和区域描述数据（如rec_region_captioning.json），能够有效提升模型对图像中局部区域与全局场景之间语义关联的捕捉能力，从而推动多模态系统从“看见”向“理解”的跨越。

解决学术问题

该数据集的核心学术贡献在于解决了开放世界中视觉关系理解的瓶颈问题。传统数据集多关注物体分类或简单属性识别，而AS-V2通过引入海量细粒度关系标注（如物体间空间、交互与逻辑关联），使模型能够学习到超越封闭词汇表的泛化关系推理能力。它有效缓解了视觉语言模型在复杂场景中因缺乏关系先验而导致的误判问题，为构建真正意义上的“全知”视觉系统提供了关键的数据基础，极大推动了视觉关系检测、场景图生成等方向的研究进展。

衍生相关工作

AS-V2衍生了一系列具有影响力的研究工作。其中，All-Seeing Project系列模型（如ASMv2）直接基于该数据集进行预训练和指令微调，在多个公开基准（如Visual Genome、GQA）上达到了领先的关系理解准确率。后续工作如RelationGPT进一步利用AS-V2的数据结构设计关系链式推理模块，实现了多轮对话中动态关系推理。此外，该数据集还被用于改进视觉基础模型（如SAM）的语义感知能力，催生了能同时输出分割掩码与关系描述的联合框架，成为开放世界视觉理解领域的重要基础资源。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集