WangYipu2002/CrossPoint-Bench

Name: WangYipu2002/CrossPoint-Bench
Creator: WangYipu2002
Published: 2025-12-07 06:21:28
License: 暂无描述

Hugging Face2025-12-07 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/WangYipu2002/CrossPoint-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - visual-question-answering - image-to-text language: - en size_categories: - 1K<n<10K pretty_name: CrossPoint-Bench tags: - cross-view - spatial-understanding configs: - config_name: default data_files: CrossPoint-Bench.jsonl --- # CrossPoint-Bench [![arXiv](https://img.shields.io/badge/arXiv-2512.04686-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.04686) [![GitHub](https://img.shields.io/badge/GitHub-WangYipu2002/CrossPoint-181717.svg?logo=github&logoColor=white)](https://github.com/WangYipu2002/CrossPoint) <p align="center"> <img src="CrossPoint-Bench.png" alt="CrossPoint-Bench Overview" width="70%"> </p> **CrossPoint-Bench** is a comprehensive benchmark for evaluating Vision-Language Models (VLMs) on cross-view point correspondence tasks. It assesses models' abilities to spatial understanding, and correspondence between different viewpoints. ### Dataset Structure ``` CrossPoint-Bench/ ├── CrossPoint-Bench.jsonl # Main benchmark data file └── image/ ├── origin_image/ # Original scene images organized by scene ID │ ├── scene0000_02/ │ ├── 0bd6b209/ │ └── ... └── visual_image/ # Annotated visualization images ├── scene0000_02/ ├── 0bd6b209/ └── ... ``` ### Data Fields Each instance in `CrossPoint-Bench.jsonl` contains: - `idx`: Unique identifier - `type`: Task type (e.g., "Correspondence-Pointing", "Fine-grained Grounding") - `images`: List of image paths - `question`: Question text - `answer`: Ground truth answer ### Task Types CrossPoint-Bench contains **1,000 samples** across **4 task types**: 1. **Fine-grained Grounding** (161 samples): Input an image and an instruction; output the coordinates of the referred target. 2. **Visibility Reasoning** (220 samples): Input an image and a point; output whether the point is visible in another view. 3. **Correspondence-Judgement** (156 samples): Input an image and a point; select the correct correspondence from multiple candidates in another view. 4. **Correspondence-Pointing** (463 samples): Input an image and a point; predict the exact coordinates of the corresponding point in another view. ### Evaluation For evaluation scripts and detailed instructions, please visit the [CrossPoint GitHub repository](https://github.com/WangYipu2002/CrossPoint). ## Citation If you find CrossPoint-Bench useful for your research, please cite: ```bibtex @article{wang2025crosspoint, title={Towards Cross-View Point Correspondence in Vision-Language Models}, author={Wang, Yipu and Ji, Yuheng and Liu, Yuyang and Zhou, Enshen and Yang, Ziqiang and Tian, Yuxuan and Qin, Ziheng and Liu, Yue and Tan, Huajie and Chi, Cheng and Ma, Zhiyuan and Zeng, Daniel Dajun and Zheng, Xiaolong}, journal={arXiv preprint arXiv:2512.04686}, year={2025} } ``` ## Contact For questions or issues, please open an issue on our [GitHub repository](https://github.com/WangYipu2002/CrossPoint).

提供机构：

WangYipu2002

5,000+

优质数据集

54 个

任务类型

进入经典数据集