rp-yu/VPT_Datasets

Name: rp-yu/VPT_Datasets
Creator: rp-yu
Published: 2025-03-11 07:34:22
License: 暂无描述

Hugging Face2025-03-11 更新2025-08-30 收录

下载链接：

https://hf-mirror.com/datasets/rp-yu/VPT_Datasets

下载链接

链接失效反馈

官方服务：

资源简介：

视觉感知标记数据集包含了用于视觉感知标记训练和评估的训练和评估数据集。训练数据集基于LLaVA-1.5和visual-CoT数据集构建，涵盖了四种类型的任务：文本/OCR相关视觉问答、空间推理、一般视觉问答和细粒度视觉问答。文本/OCR相关视觉问答和空间推理任务用于为区域选择标记创建训练样本，而一般视觉问答和细粒度视觉问答任务用于构建DINO特征标记的训练样本。评估数据集包括训练数据集的测试部分和三个零样本数据集，这些数据集未包含在训练数据中。

The Visual Perception Token Datasets include the training and evaluation datasets used for Visual Perception Token. The training dataset is constructed based on datasets from LLaVA-1.5 and visual-CoT, covering four types of tasks: Text/OCR-Related VQA, Spatial Reasoning, General VQA, and Fine-Grained VQA. The Text/OCR-Related VQA and Spatial Reasoning tasks are used to create training samples for Region Selection Token, while the General VQA and Fine-Grained VQA tasks are used to construct training samples for DINO Feature Tokens. The evaluation datasets include the testing split of the training datasets and three zero-shot datasets, which are not included in the training.

提供机构：

rp-yu

5,000+

优质数据集

54 个

任务类型

进入经典数据集