SPRIGHT-T2I/spright

Name: SPRIGHT-T2I/spright
Creator: SPRIGHT-T2I
Published: 2024-10-09 10:05:58
License: 暂无描述

Hugging Face2024-10-09 更新2024-04-19 收录

下载链接：

https://hf-mirror.com/datasets/SPRIGHT-T2I/spright

下载链接

链接失效反馈

官方服务：

资源简介：

SPRIGHT（**SP**atially **RIGHT**）是第一个专注于空间关系的大规模视觉-语言数据集。它通过重新标注来自四个广泛使用的数据集（CC12M、Segment Anything、COCO Validation和LAION Aesthetics）的约600万张图像构建而成。该数据集包含重新标注的CC12M和Segment Anything数据，而COCO数据则单独发布。LAION的图像未发布，因为原始图像目前是私有的。数据集中的每个样本包括一张图像、相关的描述（通用描述和空间描述）以及元数据（图像宽度和高度、原始数据集和原始ID）。数据集的创建过程利用了LLaVA-1.5-13B模型生成合成空间描述，并通过FAITHScore、GPT4(V)和人工注释进行了验证。

SPRIGHT (**SP**atially **RIGHT**) is the first large-scale vision-language dataset focused on spatial relationships. It is constructed by relabeling approximately 6 million images sourced from four widely used datasets: CC12M, Segment Anything, COCO Validation, and LAION Aesthetics. The relabeled CC12M and Segment Anything data are included in this dataset, while the COCO data is released separately. LAION’s images are not released, as the original images are currently proprietary. Each sample in the dataset consists of an image, associated captions (general captions and spatial captions), and metadata (image width and height, original dataset, and original ID). The dataset creation process leverages the LLaVA-1.5-13B model to generate synthetic spatial captions, and is validated via FAITHScore, GPT4(V), and human annotations.

提供机构：

SPRIGHT-T2I

原始信息汇总

数据集概述

数据集名称

SPRIGHT (SPatially RIGHT)

数据集描述

SPRIGHT是首个专注于空间关系的大规模视觉-语言数据集。该数据集通过对约600万张来自以下四个广泛使用的数据集的图像进行重新标注构建而成：

本仓库包含来自CC12M和Segment Anything的重新标注数据。COCO数据集的数据存放在此处。LAION的图像未被释放，因为其父图像目前为私有。

数据集来源

CC-12M

重新标注了230万张图像，过滤掉分辨率低于768的图像。

Segment Anything

重新标注了350万张图像，并使用Owl-V2对象检测器过滤掉包含模糊人脸的图像。同时，使用CoCa模型生成通用标注。

数据集结构

样本组成

每个tar文件包含10,000个样本，每个样本包括：

图像文件："{idx}.jpg"
相关标注（通用标注和空间标注）："{idx}.json"
元数据（图像宽度、高度、原始数据集来源及其原始ID）："{idx}.metadata.json"

数据集创建

数据生成

使用LLaVA-1.5-13B生成合成空间标注，通过特定提示创建SPRIGHT数据集。

数据集验证

使用FAITHScore验证，平均正确率为88.9%。
使用GPT4(V)进行小规模评估，平均和中间评分为6.41和7.0。
通过众包进行人类标注，共标注3000张图像，平均准确率为66.57%。

5,000+

优质数据集

54 个

任务类型

进入经典数据集