RunsenXu/VSR
收藏Hugging Face2025-12-08 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/RunsenXu/VSR
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
task_categories:
- question-answering
- visual-question-answering
pretty_name: VSR (Parquet)
dataset_info:
features:
- name: index
dtype: string
- name: question
dtype: string
- name: question_type
dtype: string
- name: answer
dtype: string
- name: image
sequence:
dtype: image
- name: image_file
sequence:
dtype: string
- name: id
dtype: string
- name: text
dtype: string
- name: gt_value
dtype: bool
- name: relation
dtype: string
- name: subj
dtype: string
- name: obj
dtype: string
splits:
- name: test
configs:
- config_name: default
data_files:
- split: test
path: VSR_Zero_Shot_Test.parquet
---
## VSR (Parquet + TSV)
This repo provides a Parquet-converted [VSR](https://github.com/cambridgeltl/visual-spatial-reasoning) dataset and a TSV formatted for vlmevalkit.
### Contents
- `VSR_Zero_Shot_Test.parquet`
- Columns:
- `question` (string) — adds `<image>` placeholders (from the original `text`) and appends options + post prompt (see below)
- `question_type` (string)
- `answer` (string; `"A"` for True, `"B"` for False)
- `image` (list[image]) — image bytes aligned with the `<image>` order
- `id` (string)
- `gt_value` (bool; original True/False)
- `relation` (string)
- `subj` (string)
- `obj` (string)
- `image_file` (list[string]; original image file names)
- `VSR_Zero_Shot_Test.tsv` (for vlmevalkit)
- Columns:
- `index` (string; from `id`)
- `category` (string; from `question_type`)
- `image` (string)
- single image → base64 string
- multiple images → JSON array string of base64 strings
- no image → empty string
- `question` (string)
- `answer` (string; `"A"` or `"B"`)
- `A` (string; literal `"True"`)
- `B` (string; literal `"False"`)
- other fields mirrored from jsonl: `id`, `question_type`, `relation`, `subj`, `obj`, `image_file`, etc.
### How we build `question` from the original VSR
Each original record contains:
```json
{"id": "...", "image": ["000000085637.jpg"], "text": "<image>\nThe bed is under the suitcase.", "gt_value": true, "question_type": "vsr", "relation": "under", "subj": "bed", "obj": "suitcase"}
```
We construct the final `question` as:
1) Take the original `text` (which already contains `<image>` placeholders).
2) Append the fixed options block:
```
Options:
A. True
B. False
```
3) Append the post prompt (default):
```
Is this statement True or False? Answer with the option's letter.
```
So, the final `question` looks like:
```
<image>
The bed is under the suitcase.
Options:
A. True
B. False
Is this statement True or False? Answer with the option's letter.
```
The `answer` is `"A"` if `gt_value` is `true`, otherwise `"B"`.
### Notes
- `<image>` placeholders are preserved in `question` and used to interleave images and text inside vlmevalkit prompts.
- Options (`A. True`, `B. False`) and the post prompt are embedded into `question`, so dataset consumers do not need to add choices externally.
- TSV uses base64-encoded images (string or JSON array string), while Parquet stores raw image bytes (`list[image]`).
语言:
- 英语
任务类别:
- 问答
- 视觉问答
数据集名称:VSR(Parquet格式)
数据集信息:
字段列表:
- 字段名:索引(index),数据类型:字符串
- 字段名:问题(question),数据类型:字符串
- 字段名:问题类型(question_type),数据类型:字符串
- 字段名:答案(answer),数据类型:字符串
- 字段名:图像(image),类型为序列,序列内数据类型为图像(image)
- 字段名:图像文件(image_file),类型为序列,序列内数据类型为字符串
- 字段名:ID(id),数据类型:字符串
- 字段名:文本(text),数据类型:字符串
- 字段名:真值(gt_value),数据类型:布尔值
- 字段名:关系(relation),数据类型:字符串
- 字段名:主语(subj),数据类型:字符串
- 字段名:宾语(obj),数据类型:字符串
拆分集:
- 拆分集名称:测试集(test)
配置项:
- 配置名称:默认配置(default)
数据文件:
- 拆分集:测试集
- 文件路径:VSR_零样本测试集.parquet
## VSR(Parquet + TSV格式)
本仓库提供了经Parquet格式转换的[VSR](https://github.com/cambridgeltl/visual-spatial-reasoning)数据集,以及适配vlmevalkit的TSV格式数据集。
### 内容说明
- `VSR_零样本测试集.parquet`
字段说明:
- `question`(字符串类型):保留原始`text`中的`<image>`占位符,并追加选项与后置提示(详见下文)
- `question_type`(字符串类型):问题类型
- `answer`(字符串类型):真值对应`"A"`,假值对应`"B"`
- `image`(图像列表):存储图像字节数据,顺序与`<image>`占位符的出现顺序一致
- `id`(字符串类型):样本ID
- `gt_value`(布尔类型):原始数据中的真值(True/False)
- `relation`(字符串类型):空间关系
- `subj`(字符串类型):主语(实体)
- `obj`(字符串类型):宾语(实体)
- `image_file`(字符串列表):原始图像文件名
- `VSR_零样本测试集.tsv`(适配vlmevalkit)
字段说明:
- `index`(字符串类型):由`id`转换而来的索引
- `category`(字符串类型):由`question_type`转换而来的类别
- `image`(字符串类型):
- 单张图像:base64编码字符串
- 多张图像:base64编码字符串组成的JSON数组字符串
- 无图像:空字符串
- `question`(字符串类型):问题文本
- `answer`(字符串类型):答案,取值为`"A"`或`"B"`
- `A`(字符串类型):固定值为`"True"`
- `B`(字符串类型):固定值为`"False"`
- 其余字段均从jsonl格式的原始数据中镜像而来:包括`id`、`question_type`、`relation`、`subj`、`obj`、`image_file`等
### 原始VSR数据中`question`的构建流程
每条原始数据记录格式如下:
json
{"id": "...", "image": ["000000085637.jpg"], "text": "<image>
The bed is under the suitcase.", "gt_value": true, "question_type": "vsr", "relation": "under", "subj": "bed", "obj": "suitcase"}
最终`question`的构建步骤如下:
1) 提取原始`text`(已包含`<image>`占位符)。
2) 追加固定的选项块:
Options:
A. True
B. False
3) 追加默认的后置提示:
Is this statement True or False? Answer with the option's letter.
因此,最终的`question`格式如下:
<image>
The bed is under the suitcase.
Options:
A. True
B. False
Is this statement True or False? Answer with the option's letter.
当`gt_value`为真时,`answer`取值为`"A"`,否则为`"B"`。
### 注意事项
- `<image>`占位符保留在`question`中,用于在vlmevalkit的提示词中实现图像与文本的交错拼接。
- 选项(`A. True`、`B. False`)与后置提示已嵌入`question`中,因此数据集使用者无需额外添加选项内容。
- TSV格式使用base64编码的图像(字符串或base64字符串组成的JSON数组字符串),而Parquet格式则存储原始图像字节数据(`list[image]`)。
提供机构:
RunsenXu



