five

OpenGVLab/AS-Core

收藏
Hugging Face2024-03-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/OpenGVLab/AS-Core
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- # AS-Core AS-Core is the human-verified subset of AS-1B. - `semantic_tag_1m.json`: the human verified annotations for semantic tags. - `region_vqa_1m.jsonl`: the human verified annotations for region VQA. - `region_caption_400k.jsonl`: the region captions generated base on paraphrasing the region question-answer pairs. ***NOTE***: The bbox format is `x1y1x2y2`. ## Introduction We present the All-Seeing Project with: [***All-Seeing 1B (AS-1B) dataset***](https://huggingface.co/datasets/Weiyun1025/AS-100M): we propose a new large-scale dataset (AS-1B) for open-world panoptic visual recognition and understanding, using an economical semi-automatic data engine that combines the power of off-the-shelf vision/language models and human feedback. [***All-Seeing Model (ASM)***](https://huggingface.co/Weiyun1025/All-Seeing-Model-FT): we develop a unified vision-language foundation model (ASM) for open-world panoptic visual recognition and understanding. Aligning with LLMs, our ASM supports versatile image-text retrieval and generation tasks, demonstrating impressive zero-shot capability. <img width="820" alt="image" src="https://github.com/OpenGVLab/all-seeing/assets/8529570/e43ab8db-6437-46f1-8aa1-c95f012e9147"> Figure 1: Overview and comparison of our All-Seeing project with other popular large foundation models. <!-- ## Online Demo **All-Seeing Model demo** is available [here](https://openxlab.org.cn/apps/detail/wangweiyun/All-Seeing-Model-Demo). **Dataset Browser** is available [here](https://openxlab.org.cn/apps/detail/wangweiyun/All-Seeing-Dataset-Browser). https://github.com/OpenGVLab/all-seeing/assets/47669167/9b5b32d1-863a-4579-b576-b82523f2205e --> ## Dataset Overview AS-1B with over 1 billion regions annotated with semantic tags, question-answering pairs, and detailed captions. It covers a wide range of 3.5 million common and rare concepts in the real world, and has 132.2 billion tokens that describe the concepts and their attributes. <img width="800" alt="image" src="https://github.com/OpenGVLab/all-seeing/assets/8529570/adac37ed-312f-4f11-ba8a-6bc62067438f"> Some examples <img width="800" alt="image" src="https://github.com/OpenGVLab/all-seeing/assets/8529570/fcf6ab07-c4ba-441c-aa6c-111c769f75b1"> Please see our [paper](https://arxiv.org/abs/2308.01907) to learn more details. ## Model Architecture The All-Seeing model (ASM) is a unified framework for panoptic visual recognition and understanding, including image/region-text retrieval, image/region recognition, captioning, and question-answering. <img width="820" alt="image" src="https://github.com/OpenGVLab/all-seeing/assets/8529570/8995e88c-6381-452f-91e4-05d68a2795fc"> ## License This project is released under the [Apache 2.0 license](LICENSE). # Citation If you find our work useful in your research, please consider cite: ```BibTeX @article{wang2023allseeing, title={The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World}, author={Wang, Weiyun and Shi, Min and Li, Qingyun and Wang, Wenhai and Huang, Zhenhang and Xing, Linjie and Chen, Zhe and Li, Hao and Zhu, Xizhou and Cao, Zhiguo and others}, journal={arXiv preprint arXiv:2308.01907}, year={2023} } @article{wang2024allseeing_v2, title={The All-Seeing Project V2: Towards General Relation Comprehension of the Open World}, author={Wang, Weiyun and Ren, Yiming and Luo, Haowen and Li, Tiantong and Yan, Chenxiang and Chen, Zhe and Wang, Wenhai and Li, Qingyun and Lu, Lewei and Zhu, Xizhou and others}, journal={arXiv preprint arXiv:2402.19474}, year={2024} } ```
提供机构:
OpenGVLab
原始信息汇总

AS-Core 数据集概述

AS-Core 是 AS-1B 数据集的人工验证子集。

数据文件

  • semantic_tag_1m.json: 人工验证的语义标签标注。
  • region_vqa_1m.jsonl: 人工验证的区域视觉问答标注。
  • region_caption_400k.jsonl: 基于区域问答对改写的区域描述。

数据集概览

AS-1B 数据集包含超过 10 亿个区域标注,涵盖语义标签、问答对和详细描述。该数据集覆盖了现实世界中 350 万个常见和罕见概念,并包含 1322 亿个描述这些概念及其属性的标记。

标注格式

  • 边界框格式为 x1y1x2y2

许可证

该数据集遵循 Apache 2.0 许可证

引用

如果该数据集对你的研究有用,请考虑引用以下文献:

BibTeX @article{wang2023allseeing, title={The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World}, author={Wang, Weiyun and Shi, Min and Li, Qingyun and Wang, Wenhai and Huang, Zhenhang and Xing, Linjie and Chen, Zhe and Li, Hao and Zhu, Xizhou and Cao, Zhiguo and others}, journal={arXiv preprint arXiv:2308.01907}, year={2023} } @article{wang2024allseeing_v2, title={The All-Seeing Project V2: Towards General Relation Comprehension of the Open World}, author={Wang, Weiyun and Ren, Yiming and Luo, Haowen and Li, Tiantong and Yan, Chenxiang and Chen, Zhe and Wang, Wenhai and Li, Qingyun and Lu, Lewei and Zhu, Xizhou and others}, journal={arXiv preprint arXiv:2402.19474}, year={2024} }

搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
AS-Core是AS-1B数据集中经过人工验证的子集,包含语义标签、区域视觉问答和区域描述三类标注数据,用于支持开放世界全景视觉识别和理解任务。该数据集采用x1y1x2y2的边界框格式,基于Apache 2.0许可证发布。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作