SPARK
收藏魔搭社区2025-11-12 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/SPARK
下载链接
链接失效反馈官方服务:
资源简介:
# ⚡ SPARK (multi-vision Sensor Perception And Reasoning benchmarK)
[**🌐 github**](https://github.com/top-yun/SPARK) | [**🤗 Dataset**](https://huggingface.co/datasets/topyun/SPARK) | [**📃 Paper**](https://arxiv.org/abs/2408.12114)
## Dataset Details
<p align="center">
<img src="https://raw.githubusercontent.com/top-yun/SPARK/main/resources/examples.png" :height="400px" width="800px">
</p>
SPARK can reduce the fundamental multi-vision sensor information gap between images and multi-vision sensors. We generated 6,248 vision-language test samples automatically to investigate multi-vision sensory perception and multi-vision sensory reasoning on physical sensor knowledge proficiency across different formats, covering different types of sensor-related questions.
## Uses
you can easily download the dataset as follows:
```python
from datasets import load_dataset
test_dataset = load_dataset("topyun/SPARK", split="train")
```
Additionally, we have provided two example codes for evaluation: Open Model([**test.py**](https://github.com/top-yun/SPARK/blob/main/test.py)) and Closed Model([**test_closed_models.py**](https://github.com/top-yun/SPARK/blob/main/test_closed_models.py)). You can easily run them as shown below.
If you have 4 GPUs and want to run the experiment with llava-1.5-7b, you can do the following:
```bash
accelerate launch --config_file utils/ddp_accel_fp16.yaml \
--num_processes=4 \
test.py \
--batch_size 1 \
--model llava \
```
When running the closed model, make sure to insert your API KEY into the [**config.py**](https://github.com/top-yun/SPARK/blob/main/config.py) file.
If you have 1 GPU and want to run the experiment with gpt-4o, you can do the following:
```bash
accelerate launch --config_file utils/ddp_accel_fp16.yaml \
--num_processes=$n_gpu \
test_closed_models.py \
--batch_size 8 \
--model gpt \
--multiprocess True \
```
### Tips
The evaluation method we've implemented simply checks whether 'A', 'B', 'C', 'D', 'yes', or 'no' appears at the beginning of the sentence.
So, if the model you're evaluating provides unexpected answers (e.g., "'B'ased on ..." or "'C'onsidering ..."), you can resolve this by adding "Do not include any additional text." at the end of the prompt.
### Source Data
#### Data Collection and Processing
These instructions are built from five public datasets: [MS-COCO](https://arxiv.org/abs/1405.0312), [M3FD](https://arxiv.org/abs/2203.16220v1), [Dog&People](https://public.roboflow.com/object-detection/thermal-dogs-and-people), [RGB-D scene dataset](https://arxiv.org/abs/2110.11590), and [UNIFESP X-ray Body Part Classifier Competition dataset](https://www.kaggle.com/competitions/unifesp-x-ray-body-part-classifier).
## Citation
<!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
```bibtex
@misc{yu2024sparkmultivisionsensorperception,
title={SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models},
author={Youngjoon Yu and Sangyun Chung and Byung-Kwan Lee and Yong Man Ro},
year={2024},
eprint={2408.12114},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.12114},
}
```
## Contact
[SangYun Chung](https://sites.google.com/view/sang-yun-chung/profile): jelarum@kaist.ac.kr
⚡ SPARK(多视觉传感器感知与推理基准测试集,Multi-vision Sensor Perception And Reasoning Benchmark)
[**🌐 GitHub**](https://github.com/top-yun/SPARK) | [**🤗 数据集**](https://huggingface.co/datasets/topyun/SPARK) | [**📃 论文**](https://arxiv.org/abs/2408.12114)
## 数据集详情
<p align="center">
<img src="https://raw.githubusercontent.com/top-yun/SPARK/main/resources/examples.png" :height="400px" width="800px">
</p>
SPARK旨在弥合图像与多视觉传感器之间的核心信息差距。我们自动生成了6248个视觉语言测试样本,用于探究不同格式下的多视觉传感器感知与多视觉传感器推理能力,涵盖各类传感器相关问题,以考察模型对物理传感器知识的掌握程度。
## 使用方法
您可通过以下代码便捷下载该数据集:
python
from datasets import load_dataset
test_dataset = load_dataset("topyun/SPARK", split="train")
此外,我们还提供了两份评估示例代码:开源模型测试代码([**test.py**](https://github.com/top-yun/SPARK/blob/main/test.py))与闭源模型测试代码([**test_closed_models.py**](https://github.com/top-yun/SPARK/blob/main/test_closed_models.py)),可按如下方式快速运行。
若您拥有4张GPU,并希望使用llava-1.5-7b运行实验,可执行以下命令:
bash
accelerate launch --config_file utils/ddp_accel_fp16.yaml
--num_processes=4
test.py
--batch_size 1
--model llava
运行闭源模型时,请将您的API密钥填入[**config.py**](https://github.com/top-yun/SPARK/blob/main/config.py)文件。若您拥有1张GPU,并希望使用gpt-4o运行实验,可执行以下命令:
bash
accelerate launch --config_file utils/ddp_accel_fp16.yaml
--num_processes=$n_gpu
test_closed_models.py
--batch_size 8
--model gpt
--multiprocess True
### 提示
我们采用的评估方法仅会检查模型输出句首是否包含'A'、'B'、'C'、'D'、'yes'或'no'。因此,若您评估的模型输出了不符合预期的结果(例如“'B'ased on ...”或“'C'onsidering ...”),可通过在提示词末尾添加“请勿添加任何额外文本。”来解决该问题。
### 源数据
#### 数据收集与处理
本数据集基于5个公开数据集构建:[MS-COCO](https://arxiv.org/abs/1405.0312)、[M3FD](https://arxiv.org/abs/2203.16220v1)、[Dog&People](https://public.roboflow.com/object-detection/thermal-dogs-and-people)、[RGB-D场景数据集(RGB-D scene dataset)](https://arxiv.org/abs/2110.11590),以及[UNIFESP X射线身体部位分类竞赛数据集(UNIFESP X-ray Body Part Classifier Competition dataset)](https://www.kaggle.com/competitions/unifesp-x-ray-body-part-classifier)。
## 引用
<!-- 若该数据集由某篇论文或博客文章引入,请在此处添加APA和BibTeX格式的引用信息。 -->
**BibTeX格式:**
bibtex
@misc{yu2024sparkmultivisionsensorperception,
title={SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models},
author={Youngjoon Yu and Sangyun Chung and Byung-Kwan Lee and Yong Man Ro},
year={2024},
eprint={2408.12114},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.12114},
}
## 联系方式
[沈相允(SangYun Chung)](https://sites.google.com/view/sang-yun-chung/profile):jelarum@kaist.ac.kr
提供机构:
maas
创建时间:
2024-08-27



