five

RunsenXu/VSR

收藏
Hugging Face2025-12-08 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/RunsenXu/VSR
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en task_categories: - question-answering - visual-question-answering pretty_name: VSR (Parquet) dataset_info: features: - name: index dtype: string - name: question dtype: string - name: question_type dtype: string - name: answer dtype: string - name: image sequence: dtype: image - name: image_file sequence: dtype: string - name: id dtype: string - name: text dtype: string - name: gt_value dtype: bool - name: relation dtype: string - name: subj dtype: string - name: obj dtype: string splits: - name: test configs: - config_name: default data_files: - split: test path: VSR_Zero_Shot_Test.parquet --- ## VSR (Parquet + TSV) This repo provides a Parquet-converted [VSR](https://github.com/cambridgeltl/visual-spatial-reasoning) dataset and a TSV formatted for vlmevalkit. ### Contents - `VSR_Zero_Shot_Test.parquet` - Columns: - `question` (string) — adds `<image>` placeholders (from the original `text`) and appends options + post prompt (see below) - `question_type` (string) - `answer` (string; `"A"` for True, `"B"` for False) - `image` (list[image]) — image bytes aligned with the `<image>` order - `id` (string) - `gt_value` (bool; original True/False) - `relation` (string) - `subj` (string) - `obj` (string) - `image_file` (list[string]; original image file names) - `VSR_Zero_Shot_Test.tsv` (for vlmevalkit) - Columns: - `index` (string; from `id`) - `category` (string; from `question_type`) - `image` (string) - single image → base64 string - multiple images → JSON array string of base64 strings - no image → empty string - `question` (string) - `answer` (string; `"A"` or `"B"`) - `A` (string; literal `"True"`) - `B` (string; literal `"False"`) - other fields mirrored from jsonl: `id`, `question_type`, `relation`, `subj`, `obj`, `image_file`, etc. ### How we build `question` from the original VSR Each original record contains: ```json {"id": "...", "image": ["000000085637.jpg"], "text": "<image>\nThe bed is under the suitcase.", "gt_value": true, "question_type": "vsr", "relation": "under", "subj": "bed", "obj": "suitcase"} ``` We construct the final `question` as: 1) Take the original `text` (which already contains `<image>` placeholders). 2) Append the fixed options block: ``` Options: A. True B. False ``` 3) Append the post prompt (default): ``` Is this statement True or False? Answer with the option's letter. ``` So, the final `question` looks like: ``` <image> The bed is under the suitcase. Options: A. True B. False Is this statement True or False? Answer with the option's letter. ``` The `answer` is `"A"` if `gt_value` is `true`, otherwise `"B"`. ### Notes - `<image>` placeholders are preserved in `question` and used to interleave images and text inside vlmevalkit prompts. - Options (`A. True`, `B. False`) and the post prompt are embedded into `question`, so dataset consumers do not need to add choices externally. - TSV uses base64-encoded images (string or JSON array string), while Parquet stores raw image bytes (`list[image]`).

语言: - 英语 任务类别: - 问答 - 视觉问答 数据集名称:VSR(Parquet格式) 数据集信息: 字段列表: - 字段名:索引(index),数据类型:字符串 - 字段名:问题(question),数据类型:字符串 - 字段名:问题类型(question_type),数据类型:字符串 - 字段名:答案(answer),数据类型:字符串 - 字段名:图像(image),类型为序列,序列内数据类型为图像(image) - 字段名:图像文件(image_file),类型为序列,序列内数据类型为字符串 - 字段名:ID(id),数据类型:字符串 - 字段名:文本(text),数据类型:字符串 - 字段名:真值(gt_value),数据类型:布尔值 - 字段名:关系(relation),数据类型:字符串 - 字段名:主语(subj),数据类型:字符串 - 字段名:宾语(obj),数据类型:字符串 拆分集: - 拆分集名称:测试集(test) 配置项: - 配置名称:默认配置(default) 数据文件: - 拆分集:测试集 - 文件路径:VSR_零样本测试集.parquet ## VSR(Parquet + TSV格式) 本仓库提供了经Parquet格式转换的[VSR](https://github.com/cambridgeltl/visual-spatial-reasoning)数据集,以及适配vlmevalkit的TSV格式数据集。 ### 内容说明 - `VSR_零样本测试集.parquet` 字段说明: - `question`(字符串类型):保留原始`text`中的`<image>`占位符,并追加选项与后置提示(详见下文) - `question_type`(字符串类型):问题类型 - `answer`(字符串类型):真值对应`"A"`,假值对应`"B"` - `image`(图像列表):存储图像字节数据,顺序与`<image>`占位符的出现顺序一致 - `id`(字符串类型):样本ID - `gt_value`(布尔类型):原始数据中的真值(True/False) - `relation`(字符串类型):空间关系 - `subj`(字符串类型):主语(实体) - `obj`(字符串类型):宾语(实体) - `image_file`(字符串列表):原始图像文件名 - `VSR_零样本测试集.tsv`(适配vlmevalkit) 字段说明: - `index`(字符串类型):由`id`转换而来的索引 - `category`(字符串类型):由`question_type`转换而来的类别 - `image`(字符串类型): - 单张图像:base64编码字符串 - 多张图像:base64编码字符串组成的JSON数组字符串 - 无图像:空字符串 - `question`(字符串类型):问题文本 - `answer`(字符串类型):答案,取值为`"A"`或`"B"` - `A`(字符串类型):固定值为`"True"` - `B`(字符串类型):固定值为`"False"` - 其余字段均从jsonl格式的原始数据中镜像而来:包括`id`、`question_type`、`relation`、`subj`、`obj`、`image_file`等 ### 原始VSR数据中`question`的构建流程 每条原始数据记录格式如下: json {"id": "...", "image": ["000000085637.jpg"], "text": "<image> The bed is under the suitcase.", "gt_value": true, "question_type": "vsr", "relation": "under", "subj": "bed", "obj": "suitcase"} 最终`question`的构建步骤如下: 1) 提取原始`text`(已包含`<image>`占位符)。 2) 追加固定的选项块: Options: A. True B. False 3) 追加默认的后置提示: Is this statement True or False? Answer with the option's letter. 因此,最终的`question`格式如下: <image> The bed is under the suitcase. Options: A. True B. False Is this statement True or False? Answer with the option's letter. 当`gt_value`为真时,`answer`取值为`"A"`,否则为`"B"`。 ### 注意事项 - `<image>`占位符保留在`question`中,用于在vlmevalkit的提示词中实现图像与文本的交错拼接。 - 选项(`A. True`、`B. False`)与后置提示已嵌入`question`中,因此数据集使用者无需额外添加选项内容。 - TSV格式使用base64编码的图像(字符串或base64字符串组成的JSON数组字符串),而Parquet格式则存储原始图像字节数据(`list[image]`)。
提供机构:
RunsenXu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作