AS-100M

Name: AS-100M
Creator: maas
Published: 2025-12-04 16:19:35
License: 暂无描述

魔搭社区2025-12-04 更新2024-12-28 收录

下载链接：

https://modelscope.cn/datasets/OpenGVLab/AS-100M

下载链接

链接失效反馈

官方服务：

资源简介：

# AS-100M AS-100M is a subset of AS-1B. We release this dataset in both [COCO format](https://huggingface.co/datasets/Weiyun1025/AS-100M/tree/main/coco_format) and [JSONL format](https://huggingface.co/datasets/Weiyun1025/AS-100M/tree/main/jsonl_format). ***NOTE***: The bbox format in the COCO format is `xywh`, while in the JSONL format, it is `x1y1x2y2`. ## Introduction We present the All-Seeing Project with: [***All-Seeing 1B (AS-1B) dataset***](https://huggingface.co/datasets/Weiyun1025/AS-100M): we propose a new large-scale dataset (AS-1B) for open-world panoptic visual recognition and understanding, using an economical semi-automatic data engine that combines the power of off-the-shelf vision/language models and human feedback. [***All-Seeing Model (ASM)***](https://huggingface.co/Weiyun1025/All-Seeing-Model-FT): we develop a unified vision-language foundation model (ASM) for open-world panoptic visual recognition and understanding. Aligning with LLMs, our ASM supports versatile image-text retrieval and generation tasks, demonstrating impressive zero-shot capability. <img width="820" alt="image" src="https://github.com/OpenGVLab/all-seeing/assets/8529570/e43ab8db-6437-46f1-8aa1-c95f012e9147"> Figure 1: Overview and comparison of our All-Seeing project with other popular large foundation models.  ## Dataset Overview AS-1B with over 1 billion regions annotated with semantic tags, question-answering pairs, and detailed captions. It covers a wide range of 3.5 million common and rare concepts in the real world, and has 132.2 billion tokens that describe the concepts and their attributes. <img width="800" alt="image" src="https://github.com/OpenGVLab/all-seeing/assets/8529570/adac37ed-312f-4f11-ba8a-6bc62067438f"> Some examples <img width="800" alt="image" src="https://github.com/OpenGVLab/all-seeing/assets/8529570/fcf6ab07-c4ba-441c-aa6c-111c769f75b1"> Please see our [paper](https://arxiv.org/abs/2308.01907) to learn more details. ## Model Architecture The All-Seeing model (ASM) is a unified framework for panoptic visual recognition and understanding, including image/region-text retrieval, image/region recognition, captioning, and question-answering. <img width="820" alt="image" src="https://github.com/OpenGVLab/all-seeing/assets/8529570/8995e88c-6381-452f-91e4-05d68a2795fc"> ## License This project is released under the [Apache 2.0 license](LICENSE). # Citation If you find our work useful in your research, please consider cite: ```BibTeX @article{wang2023allseeing, title={The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World}, author={Wang, Weiyun and Shi, Min and Li, Qingyun and Wang, Wenhai and Huang, Zhenhang and Xing, Linjie and Chen, Zhe and Li, Hao and Zhu, Xizhou and Cao, Zhiguo and others}, journal={arXiv preprint arXiv:2308.01907}, year={2023} } @article{wang2024allseeing_v2, title={The All-Seeing Project V2: Towards General Relation Comprehension of the Open World}, author={Wang, Weiyun and Ren, Yiming and Luo, Haowen and Li, Tiantong and Yan, Chenxiang and Chen, Zhe and Wang, Wenhai and Li, Qingyun and Lu, Lewei and Zhu, Xizhou and others}, journal={arXiv preprint arXiv:2402.19474}, year={2024} } ```

# AS-100M AS-100M是AS-1B的子集。本数据集同时以[COCO格式](https://huggingface.co/datasets/Weiyun1025/AS-100M/tree/main/coco_format)与[JSONL格式](https://huggingface.co/datasets/Weiyun1025/AS-100M/tree/main/jsonl_format)两种形式发布。 ***注意事项***：COCO格式下的边界框（bounding box, bbox）采用`xywh`格式，而JSONL格式下则为`x1y1x2y2`格式。 ## 项目介绍我们推出全视项目（All-Seeing Project），包含以下成果： [***全视1B（All-Seeing 1B, AS-1B）数据集***](https://huggingface.co/datasets/Weiyun1025/AS-100M)：我们构建了一款用于开放世界全景视觉识别与理解的大规模数据集（AS-1B），该数据集依托结合了现成视觉语言模型与人类反馈优势的经济型半自动数据引擎生成。 [***全视模型（All-Seeing Model, ASM）***](https://huggingface.co/Weiyun1025/All-Seeing-Model-FT)：我们开发了一款用于开放世界全景视觉识别与理解的统一型视觉语言基础模型（ASM）。本模型与大语言模型（Large Language Model, LLM）对齐，支持多样化的图文检索与生成任务，展现出优异的零样本（zero-shot）能力。 <img width="820" alt="image" src="https://github.com/OpenGVLab/all-seeing/assets/8529570/e43ab8db-6437-46f1-8aa1-c95f012e9147"> 图1：本全视项目与其他主流大型基础模型的概览与对比。  ## 数据集概览 AS-1B数据集包含超过10亿个标注了语义标签、问答对与详细图像描述的视觉区域。该数据集涵盖了现实世界中多达350万个常见与稀有概念，并包含1322亿个用于描述这些概念及其属性的Token。 <img width="800" alt="image" src="https://github.com/OpenGVLab/all-seeing/assets/8529570/adac37ed-312f-4f11-ba8a-6bc62067438f"> 部分示例 <img width="800" alt="image" src="https://github.com/OpenGVLab/all-seeing/assets/8529570/fcf6ab07-c4ba-441c-aa6c-111c769f75b1"> 如需了解更多细节，请参阅我们的[学术论文](https://arxiv.org/abs/2308.01907)。 ## 模型架构全视模型（ASM）是一款用于全景视觉识别与理解的统一框架，支持图像/区域-文本检索、图像/区域识别、图像描述生成以及问答等任务。 <img width="820" alt="image" src="https://github.com/OpenGVLab/all-seeing/assets/8529570/8995e88c-6381-452f-91e4-05d68a2795fc"> ## 开源协议本项目采用[Apache 2.0开源协议](LICENSE)发布。 ## 引用方式若您的研究中用到了本项目的成果，请引用以下文献： BibTeX @article{wang2023allseeing, title={The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World}, author={Wang, Weiyun and Shi, Min and Li, Qingyun and Wang, Wenhai and Huang, Zhenhang and Xing, Linjie and Chen, Zhe and Li, Hao and Zhu, Xizhou and Cao, Zhiguo and others}, journal={arXiv preprint arXiv:2308.01907}, year={2023} } @article{wang2024allseeing_v2, title={The All-Seeing Project V2: Towards General Relation Comprehension of the Open World}, author={Wang, Weiyun and Ren, Yiming and Luo, Haowen and Li, Tiantong and Yan, Chenxiang and Chen, Zhe and Wang, Wenhai and Li, Qingyun and Lu, Lewei and Zhu, Xizhou and others}, journal={arXiv preprint arXiv:2402.19474}, year={2024} }

提供机构：

maas

创建时间：

2024-12-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集