five

WangYipu2002/CrossPoint-Bench

收藏
Hugging Face2025-12-07 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/WangYipu2002/CrossPoint-Bench
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - visual-question-answering - image-to-text language: - en size_categories: - 1K<n<10K pretty_name: CrossPoint-Bench tags: - cross-view - spatial-understanding configs: - config_name: default data_files: CrossPoint-Bench.jsonl --- # CrossPoint-Bench [![arXiv](https://img.shields.io/badge/arXiv-2512.04686-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.04686) [![GitHub](https://img.shields.io/badge/GitHub-WangYipu2002/CrossPoint-181717.svg?logo=github&logoColor=white)](https://github.com/WangYipu2002/CrossPoint) <p align="center"> <img src="CrossPoint-Bench.png" alt="CrossPoint-Bench Overview" width="70%"> </p> **CrossPoint-Bench** is a comprehensive benchmark for evaluating Vision-Language Models (VLMs) on cross-view point correspondence tasks. It assesses models' abilities to spatial understanding, and correspondence between different viewpoints. ### Dataset Structure ``` CrossPoint-Bench/ ├── CrossPoint-Bench.jsonl # Main benchmark data file └── image/ ├── origin_image/ # Original scene images organized by scene ID │ ├── scene0000_02/ │ ├── 0bd6b209/ │ └── ... └── visual_image/ # Annotated visualization images ├── scene0000_02/ ├── 0bd6b209/ └── ... ``` ### Data Fields Each instance in `CrossPoint-Bench.jsonl` contains: - `idx`: Unique identifier - `type`: Task type (e.g., "Correspondence-Pointing", "Fine-grained Grounding") - `images`: List of image paths - `question`: Question text - `answer`: Ground truth answer ### Task Types CrossPoint-Bench contains **1,000 samples** across **4 task types**: 1. **Fine-grained Grounding** (161 samples): Input an image and an instruction; output the coordinates of the referred target. 2. **Visibility Reasoning** (220 samples): Input an image and a point; output whether the point is visible in another view. 3. **Correspondence-Judgement** (156 samples): Input an image and a point; select the correct correspondence from multiple candidates in another view. 4. **Correspondence-Pointing** (463 samples): Input an image and a point; predict the exact coordinates of the corresponding point in another view. ### Evaluation For evaluation scripts and detailed instructions, please visit the [CrossPoint GitHub repository](https://github.com/WangYipu2002/CrossPoint). ## Citation If you find CrossPoint-Bench useful for your research, please cite: ```bibtex @article{wang2025crosspoint, title={Towards Cross-View Point Correspondence in Vision-Language Models}, author={Wang, Yipu and Ji, Yuheng and Liu, Yuyang and Zhou, Enshen and Yang, Ziqiang and Tian, Yuxuan and Qin, Ziheng and Liu, Yue and Tan, Huajie and Chi, Cheng and Ma, Zhiyuan and Zeng, Daniel Dajun and Zheng, Xiaolong}, journal={arXiv preprint arXiv:2512.04686}, year={2025} } ``` ## Contact For questions or issues, please open an issue on our [GitHub repository](https://github.com/WangYipu2002/CrossPoint).
提供机构:
WangYipu2002
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作