lifuguan/SAT
收藏Hugging Face2025-11-28 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/lifuguan/SAT
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
configs:
- config_name: default
data_files:
- split: train
path: "SAT_train.parquet"
- split: static
path: "SAT_static.parquet"
- split: val
path: "SAT_val.parquet"
- split: test
path: "SAT_test.parquet"
dataset_info:
features:
- name: image_bytes
list:
dtype: image
- name: question
dtype: string
- name: answers
list:
dtype: string
- name: question_type
dtype: string
- name: correct_answer
dtype: string
task_categories:
- question-answering
size_categories:
- 100K<n<1M
---
# SAT: Spatial Aptitude Training for Multimodal Language Models
[Project Page](https://arijitray1993.github.io/SAT/)

To use the dataset, first make sure you have Python3.10 and Huggingface datasets version 3.0.2 (`pip install datasets==3.0.2`):
```python
from datasets import load_dataset
import io
split = "val"
dataset = load_dataset("array/SAT", batch_size=128)
example = dataset[split][10] # example 10th item
images = [Image.open(io.BytesIO(im_bytes)) for im_bytes in example['image_bytes']] # this is a list of images. Some questions are on one image, and some on 2 images
question = example['question']
answer_choices = example['answers']
correct_answer = example['correct_answer']
```
The available `split` choices are:
- `train`: (175K image QA pairs) Train split of SAT data that includes both static relationships and dyamic spatial QAs involving object and scene motion. For motion-based questions, there are two images.
- `static`: (127K image QA pairs) Train split of SAT data that includes _only_ static QAs. Always has one image only.
- `val`: (4K image QA pairs) Synthetic validation split.
- `test`: (150 image QA pairs) Real-image dynamic test set.
If you find this data useful, please consider citing:
```
@misc{ray2025satdynamicspatialaptitude,
title={SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models},
author={Arijit Ray and Jiafei Duan and Ellis Brown and Reuben Tan and Dina Bashkirova and Rose Hendrix and Kiana Ehsani and Aniruddha Kembhavi and Bryan A. Plummer and Ranjay Krishna and Kuo-Hao Zeng and Kate Saenko},
year={2025},
eprint={2412.07755},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.07755},
}
```
许可证:MIT协议
配置项:
- 配置名称:默认
数据文件:
- 拆分:训练集(train)
路径: "SAT_train.parquet"
- 拆分:静态拆分(static)
路径: "SAT_static.parquet"
- 拆分:验证集(val)
路径: "SAT_val.parquet"
- 拆分:测试集(test)
路径: "SAT_test.parquet"
数据集信息:
特征字段:
- 字段名:image_bytes,类型为列表,元素为图像数据
- 字段名:question,类型为字符串
- 字段名:answers,类型为字符串列表
- 字段名:question_type,类型为字符串
- 字段名:correct_answer,类型为字符串
任务类别:问答(question-answering)
数据规模区间:10万 < 样本数量 < 100万
---
# SAT:面向多模态语言模型(Multimodal Language Models)的空间能力训练数据集
[项目主页](https://arijitray1993.github.io/SAT/)

若需使用本数据集,请先确保已安装Python 3.10以及Huggingface datasets库3.0.2版本,可通过以下命令完成安装:
python
from datasets import load_dataset
import io
split = "val"
dataset = load_dataset("array/SAT", batch_size=128)
example = dataset[split][10] # 选取第10个样本
images = [Image.open(io.BytesIO(im_bytes)) for im_bytes in example['image_bytes']] # 该字段为图像列表,部分问题仅关联单张图像,部分关联两张图像
question = example['question']
answer_choices = example['answers']
correct_answer = example['correct_answer']
可用的拆分(split)选项如下:
- `train`:(17.5万图像问答对)SAT数据集的训练拆分,涵盖静态空间关系与涉及物体及场景运动的动态空间问答任务。针对基于运动的问题,会提供两张图像。
- `static`:(12.7万图像问答对)SAT数据集仅包含静态问答任务的训练拆分,始终仅提供单张图像。
- `val`:(4000个图像问答对)合成验证拆分。
- `test`:(150个图像问答对)真实图像动态测试集。
若本数据集对您的研究有所帮助,请考虑引用以下文献:
@misc{ray2025satdynamicspatialaptitude,
title={SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models},
author={Arijit Ray and Jiafei Duan and Ellis Brown and Reuben Tan and Dina Bashkirova and Rose Hendrix and Kiana Ehsani and Aniruddha Kembhavi and Bryan A. Plummer and Ranjay Krishna and Kuo-Hao Zeng and Kate Saenko},
year={2025},
eprint={2412.07755},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.07755},
}
提供机构:
lifuguan



