RunsenXu/VSR

Name: RunsenXu/VSR
Creator: RunsenXu
Published: 2025-12-08 10:08:51
License: 暂无描述

Hugging Face2025-12-08 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/RunsenXu/VSR

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en task_categories: - question-answering - visual-question-answering pretty_name: VSR (Parquet) dataset_info: features: - name: index dtype: string - name: question dtype: string - name: question_type dtype: string - name: answer dtype: string - name: image sequence: dtype: image - name: image_file sequence: dtype: string - name: id dtype: string - name: text dtype: string - name: gt_value dtype: bool - name: relation dtype: string - name: subj dtype: string - name: obj dtype: string splits: - name: test configs: - config_name: default data_files: - split: test path: VSR_Zero_Shot_Test.parquet --- ## VSR (Parquet + TSV) This repo provides a Parquet-converted [VSR](https://github.com/cambridgeltl/visual-spatial-reasoning) dataset and a TSV formatted for vlmevalkit. ### Contents - `VSR_Zero_Shot_Test.parquet` - Columns: - `question` (string) — adds `<image>` placeholders (from the original `text`) and appends options + post prompt (see below) - `question_type` (string) - `answer` (string; `"A"` for True, `"B"` for False) - `image` (list[image]) — image bytes aligned with the `<image>` order - `id` (string) - `gt_value` (bool; original True/False) - `relation` (string) - `subj` (string) - `obj` (string) - `image_file` (list[string]; original image file names) - `VSR_Zero_Shot_Test.tsv` (for vlmevalkit) - Columns: - `index` (string; from `id`) - `category` (string; from `question_type`) - `image` (string) - single image → base64 string - multiple images → JSON array string of base64 strings - no image → empty string - `question` (string) - `answer` (string; `"A"` or `"B"`) - `A` (string; literal `"True"`) - `B` (string; literal `"False"`) - other fields mirrored from jsonl: `id`, `question_type`, `relation`, `subj`, `obj`, `image_file`, etc. ### How we build `question` from the original VSR Each original record contains: ```json {"id": "...", "image": ["000000085637.jpg"], "text": "<image>\nThe bed is under the suitcase.", "gt_value": true, "question_type": "vsr", "relation": "under", "subj": "bed", "obj": "suitcase"} ``` We construct the final `question` as: 1) Take the original `text` (which already contains `<image>` placeholders). 2) Append the fixed options block: ``` Options: A. True B. False ``` 3) Append the post prompt (default): ``` Is this statement True or False? Answer with the option's letter. ``` So, the final `question` looks like: ``` <image> The bed is under the suitcase. Options: A. True B. False Is this statement True or False? Answer with the option's letter. ``` The `answer` is `"A"` if `gt_value` is `true`, otherwise `"B"`. ### Notes - `<image>` placeholders are preserved in `question` and used to interleave images and text inside vlmevalkit prompts. - Options (`A. True`, `B. False`) and the post prompt are embedded into `question`, so dataset consumers do not need to add choices externally. - TSV uses base64-encoded images (string or JSON array string), while Parquet stores raw image bytes (`list[image]`).

语言： - 英语任务类别： - 问答 - 视觉问答数据集名称：VSR（Parquet格式）数据集信息：字段列表： - 字段名：索引（index），数据类型：字符串 - 字段名：问题（question），数据类型：字符串 - 字段名：问题类型（question_type），数据类型：字符串 - 字段名：答案（answer），数据类型：字符串 - 字段名：图像（image），类型为序列，序列内数据类型为图像（image） - 字段名：图像文件（image_file），类型为序列，序列内数据类型为字符串 - 字段名：ID（id），数据类型：字符串 - 字段名：文本（text），数据类型：字符串 - 字段名：真值（gt_value），数据类型：布尔值 - 字段名：关系（relation），数据类型：字符串 - 字段名：主语（subj），数据类型：字符串 - 字段名：宾语（obj），数据类型：字符串拆分集： - 拆分集名称：测试集（test）配置项： - 配置名称：默认配置（default）数据文件： - 拆分集：测试集 - 文件路径：VSR_零样本测试集.parquet ## VSR（Parquet + TSV格式）本仓库提供了经Parquet格式转换的[VSR](https://github.com/cambridgeltl/visual-spatial-reasoning)数据集，以及适配vlmevalkit的TSV格式数据集。 ### 内容说明 - `VSR_零样本测试集.parquet` 字段说明： - `question`（字符串类型）：保留原始`text`中的`<image>`占位符，并追加选项与后置提示（详见下文） - `question_type`（字符串类型）：问题类型 - `answer`（字符串类型）：真值对应`"A"`，假值对应`"B"` - `image`（图像列表）：存储图像字节数据，顺序与`<image>`占位符的出现顺序一致 - `id`（字符串类型）：样本ID - `gt_value`（布尔类型）：原始数据中的真值（True/False） - `relation`（字符串类型）：空间关系 - `subj`（字符串类型）：主语（实体） - `obj`（字符串类型）：宾语（实体） - `image_file`（字符串列表）：原始图像文件名 - `VSR_零样本测试集.tsv`（适配vlmevalkit）字段说明： - `index`（字符串类型）：由`id`转换而来的索引 - `category`（字符串类型）：由`question_type`转换而来的类别 - `image`（字符串类型）： - 单张图像：base64编码字符串 - 多张图像：base64编码字符串组成的JSON数组字符串 - 无图像：空字符串 - `question`（字符串类型）：问题文本 - `answer`（字符串类型）：答案，取值为`"A"`或`"B"` - `A`（字符串类型）：固定值为`"True"` - `B`（字符串类型）：固定值为`"False"` - 其余字段均从jsonl格式的原始数据中镜像而来：包括`id`、`question_type`、`relation`、`subj`、`obj`、`image_file`等 ### 原始VSR数据中`question`的构建流程每条原始数据记录格式如下： json {"id": "...", "image": ["000000085637.jpg"], "text": "<image> The bed is under the suitcase.", "gt_value": true, "question_type": "vsr", "relation": "under", "subj": "bed", "obj": "suitcase"} 最终`question`的构建步骤如下： 1) 提取原始`text`（已包含`<image>`占位符）。 2) 追加固定的选项块： Options: A. True B. False 3) 追加默认的后置提示： Is this statement True or False? Answer with the option's letter. 因此，最终的`question`格式如下： <image> The bed is under the suitcase. Options: A. True B. False Is this statement True or False? Answer with the option's letter. 当`gt_value`为真时，`answer`取值为`"A"`，否则为`"B"`。 ### 注意事项 - `<image>`占位符保留在`question`中，用于在vlmevalkit的提示词中实现图像与文本的交错拼接。 - 选项（`A. True`、`B. False`）与后置提示已嵌入`question`中，因此数据集使用者无需额外添加选项内容。 - TSV格式使用base64编码的图像（字符串或base64字符串组成的JSON数组字符串），而Parquet格式则存储原始图像字节数据（`list[image]`）。

提供机构：

RunsenXu

5,000+

优质数据集

54 个

任务类型

进入经典数据集