SPARK

Name: SPARK
Creator: maas
Published: 2025-11-12 16:16:42
License: 暂无描述

魔搭社区2025-11-12 更新2024-08-31 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/SPARK

下载链接

链接失效反馈

官方服务：

资源简介：

# ⚡ SPARK (multi-vision Sensor Perception And Reasoning benchmarK) [**🌐 github**](https://github.com/top-yun/SPARK) | [**🤗 Dataset**](https://huggingface.co/datasets/topyun/SPARK) | [**📃 Paper**](https://arxiv.org/abs/2408.12114) ## Dataset Details <p align="center"> <img src="https://raw.githubusercontent.com/top-yun/SPARK/main/resources/examples.png" :height="400px" width="800px"> </p> SPARK can reduce the fundamental multi-vision sensor information gap between images and multi-vision sensors. We generated 6,248 vision-language test samples automatically to investigate multi-vision sensory perception and multi-vision sensory reasoning on physical sensor knowledge proficiency across different formats, covering different types of sensor-related questions. ## Uses you can easily download the dataset as follows: ```python from datasets import load_dataset test_dataset = load_dataset("topyun/SPARK", split="train") ``` Additionally, we have provided two example codes for evaluation: Open Model([**test.py**](https://github.com/top-yun/SPARK/blob/main/test.py)) and Closed Model([**test_closed_models.py**](https://github.com/top-yun/SPARK/blob/main/test_closed_models.py)). You can easily run them as shown below. If you have 4 GPUs and want to run the experiment with llava-1.5-7b, you can do the following: ```bash accelerate launch --config_file utils/ddp_accel_fp16.yaml \ --num_processes=4 \ test.py \ --batch_size 1 \ --model llava \ ``` When running the closed model, make sure to insert your API KEY into the [**config.py**](https://github.com/top-yun/SPARK/blob/main/config.py) file. If you have 1 GPU and want to run the experiment with gpt-4o, you can do the following: ```bash accelerate launch --config_file utils/ddp_accel_fp16.yaml \ --num_processes=$n_gpu \ test_closed_models.py \ --batch_size 8 \ --model gpt \ --multiprocess True \ ``` ### Tips The evaluation method we've implemented simply checks whether 'A', 'B', 'C', 'D', 'yes', or 'no' appears at the beginning of the sentence. So, if the model you're evaluating provides unexpected answers (e.g., "'B'ased on ..." or "'C'onsidering ..."), you can resolve this by adding "Do not include any additional text." at the end of the prompt. ### Source Data #### Data Collection and Processing These instructions are built from five public datasets: [MS-COCO](https://arxiv.org/abs/1405.0312), [M3FD](https://arxiv.org/abs/2203.16220v1), [Dog&People](https://public.roboflow.com/object-detection/thermal-dogs-and-people), [RGB-D scene dataset](https://arxiv.org/abs/2110.11590), and [UNIFESP X-ray Body Part Classifier Competition dataset](https://www.kaggle.com/competitions/unifesp-x-ray-body-part-classifier). ## Citation  **BibTeX:** ```bibtex @misc{yu2024sparkmultivisionsensorperception, title={SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models}, author={Youngjoon Yu and Sangyun Chung and Byung-Kwan Lee and Yong Man Ro}, year={2024}, eprint={2408.12114}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2408.12114}, } ``` ## Contact [SangYun Chung](https://sites.google.com/view/sang-yun-chung/profile): jelarum@kaist.ac.kr

⚡ SPARK（多视觉传感器感知与推理基准测试集，Multi-vision Sensor Perception And Reasoning Benchmark） [**🌐 GitHub**](https://github.com/top-yun/SPARK) | [**🤗 数据集**](https://huggingface.co/datasets/topyun/SPARK) | [**📃 论文**](https://arxiv.org/abs/2408.12114) ## 数据集详情 <p align="center"> <img src="https://raw.githubusercontent.com/top-yun/SPARK/main/resources/examples.png" :height="400px" width="800px"> </p> SPARK旨在弥合图像与多视觉传感器之间的核心信息差距。我们自动生成了6248个视觉语言测试样本，用于探究不同格式下的多视觉传感器感知与多视觉传感器推理能力，涵盖各类传感器相关问题，以考察模型对物理传感器知识的掌握程度。 ## 使用方法您可通过以下代码便捷下载该数据集： python from datasets import load_dataset test_dataset = load_dataset("topyun/SPARK", split="train") 此外，我们还提供了两份评估示例代码：开源模型测试代码（[**test.py**](https://github.com/top-yun/SPARK/blob/main/test.py)）与闭源模型测试代码（[**test_closed_models.py**](https://github.com/top-yun/SPARK/blob/main/test_closed_models.py)），可按如下方式快速运行。若您拥有4张GPU，并希望使用llava-1.5-7b运行实验，可执行以下命令： bash accelerate launch --config_file utils/ddp_accel_fp16.yaml --num_processes=4 test.py --batch_size 1 --model llava 运行闭源模型时，请将您的API密钥填入[**config.py**](https://github.com/top-yun/SPARK/blob/main/config.py)文件。若您拥有1张GPU，并希望使用gpt-4o运行实验，可执行以下命令： bash accelerate launch --config_file utils/ddp_accel_fp16.yaml --num_processes=$n_gpu test_closed_models.py --batch_size 8 --model gpt --multiprocess True ### 提示我们采用的评估方法仅会检查模型输出句首是否包含'A'、'B'、'C'、'D'、'yes'或'no'。因此，若您评估的模型输出了不符合预期的结果（例如“'B'ased on ...”或“'C'onsidering ...”），可通过在提示词末尾添加“请勿添加任何额外文本。”来解决该问题。 ### 源数据 #### 数据收集与处理本数据集基于5个公开数据集构建：[MS-COCO](https://arxiv.org/abs/1405.0312)、[M3FD](https://arxiv.org/abs/2203.16220v1)、[Dog&People](https://public.roboflow.com/object-detection/thermal-dogs-and-people)、[RGB-D场景数据集（RGB-D scene dataset）](https://arxiv.org/abs/2110.11590)，以及[UNIFESP X射线身体部位分类竞赛数据集（UNIFESP X-ray Body Part Classifier Competition dataset）](https://www.kaggle.com/competitions/unifesp-x-ray-body-part-classifier)。 ## 引用  **BibTeX格式：** bibtex @misc{yu2024sparkmultivisionsensorperception, title={SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models}, author={Youngjoon Yu and Sangyun Chung and Byung-Kwan Lee and Yong Man Ro}, year={2024}, eprint={2408.12114}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2408.12114}, } ## 联系方式 [沈相允（SangYun Chung）](https://sites.google.com/view/sang-yun-chung/profile)：jelarum@kaist.ac.kr

提供机构：

maas

创建时间：

2024-08-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集