SpaceJudgeDataset
收藏魔搭社区2025-12-05 更新2025-10-11 收录
下载链接:
https://modelscope.cn/datasets/remyxai/SpaceJudgeDataset
下载链接
链接失效反馈官方服务:
资源简介:

# SpaceJudge Dataset
The SpaceJudge Dataset uses [prometheus-vision](https://github.com/prometheus-eval/prometheus-vision) to apply
a rubric assessing the quality of response to spatial VQA inquiries on a 1-5 likert scale by prompting
[SpaceLLaVA](https://huggingface.co/remyxai/SpaceLLaVA) to perform VLM-as-a-Judge.
[](https://colab.research.google.com/drive/1zOxSpMIjfWM6desF5Ai-iIk1szhlurUW?usp=sharing)
The assessment is made for images in the [OpenSpaces](https://huggingface.co/datasets/remyxai/OpenSpaces) dataset in order to
distill the 13B VLM judge into smaller models like [Florence-2](https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de)
by introducing a new `<JUDGE>` task.
## Citations
```
@misc{lee2024prometheusvision,
title={Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation},
author={Seongyun Lee and Seungone Kim and Sue Hyun Park and Geewook Kim and Minjoon Seo},
year={2024},
eprint={2401.06591},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

# SpaceJudge数据集
SpaceJudge数据集借助普罗米修斯视觉(prometheus-vision)工具,通过提示SpaceLLaVA(SpaceLLaVA)执行以视觉语言模型作为评判者(VLM-as-a-Judge)的任务,采用一套评分准则在1至5分的李克特量表上对空间视觉问答(spatial VQA)任务的回答质量进行评估。
[](https://colab.research.google.com/drive/1zOxSpMIjfWM6desF5Ai-iIk1szhlurUW?usp=sharing)
本次评估针对OpenSpaces数据集(OpenSpaces)中的图像展开,旨在通过引入全新的`<JUDGE>`任务,将130亿参数的视觉语言模型评判者知识蒸馏至诸如Florence-2(Florence-2)这类轻量化模型中。
## 参考文献
@misc{lee2024prometheusvision,
title={Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation},
author={Seongyun Lee and Seungone Kim and Sue Hyun Park and Geewook Kim and Minjoon Seo},
year={2024},
eprint={2401.06591},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
提供机构:
maas
创建时间:
2025-10-09



