five

Salesforce/blip3-grounding-50m

收藏
Hugging Face2025-02-03 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Salesforce/blip3-grounding-50m
下载链接
链接失效反馈
官方服务:
资源简介:
BLIP3-GROUNDING-50M数据集旨在增强视觉语言模型(VLMs)在视觉特征中定位语义概念的能力,这对于对象检测、语义分割和理解指代表达(例如“狗左边的物体”)等任务至关重要。该数据集包含5000万张图像,每张图像都带有详细的定位信息,这些信息源自最先进的开放世界图像标记和对象检测模型。数据集提供了三种不同格式的定位信息,包括边界框坐标、文本描述和位置上下文。该数据集的主要目标是提高VLMs在需要精确对象定位和复杂图像中语义理解任务中的表现。

The BLIP3-GROUNDING-50M dataset is designed to enhance the ability of Vision-Language Models (VLMs) to ground semantic concepts in visual features, which is crucial for tasks like object detection, semantic segmentation, and understanding referring expressions (e.g., "the object to the left of the dog"). Traditional datasets often lack the necessary granularity for such tasks, making it challenging for models to accurately localize and interpret objects within complex visual scenes. The dataset consists of 50 million images curated from the Datacomp-1B dataset, each annotated with detailed grounding information derived from state-of-the-art open-world image tagging and object detection models. Grounding information is provided in three different formats, each capturing varying levels of localization detail: 1. Bounding Box Coordinates: `<bbox>x1, y1, x2, y2</bbox>`. 2. Textual Description: "Starts at (x1, y1) and extends up to (x2, y2)". 3. Positional Context: "Top-left corner of the image". The primary goal of the BLIP3-GROUNDING-50M dataset is to improve the cross-modality reasoning capabilities of VLMs by providing them with enriched visual and grounding data. This dataset is aimed at advancing the performance of VLMs on tasks that require precise object localization and semantic understanding within complex images.
提供机构:
Salesforce
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
The BLIP3-GROUNDING-50M dataset is a comprehensive resource for training Vision-Language Models, offering 50 million images annotated with detailed grounding information to improve object localization and semantic understanding. It includes multi-granularity bounding boxes and is formatted in parquet files for easy access and processing.
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作