WangYipu2002/CrossPoint-378K

Name: WangYipu2002/CrossPoint-378K
Creator: WangYipu2002
Published: 2025-12-08 15:39:42
License: 暂无描述

Hugging Face2025-12-08 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/WangYipu2002/CrossPoint-378K

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: mit size_categories: - 100K<n<1M task_categories: - image-to-text - visual-question-answering tags: - cross-view pretty_name: CrossPoint-378K --- # CrossPoint-378K Dataset [![arXiv](https://img.shields.io/badge/arXiv-2512.04686-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.04686) [![GitHub](https://img.shields.io/badge/GitHub-WangYipu2002/CrossPoint-181717.svg?logo=github&logoColor=white)](https://github.com/WangYipu2002/CrossPoint) ## Overview CrossPoint-378K is a large-scale dataset for cross-view point correspondence. This dataset contains 378K training samples designed to enhance vision-language models' capabilities in cross-view point correspondences. <p align="center"> <img src="CrossPoint-378K.png" width="80%"> </p> ## Dataset Structure The dataset contains the following file structure: ``` CrossPoint-378K/ ├── CrossPoint-378K.json # Main data file (ShareGPT format) ├── image/ # Original images directory │ └── [scene_id]/ # Scene ID directory │ └── [images] # Scene images └── visual_image/ # Annotated images directory └── [scene_id]/ # Scene ID directory └── [images] # Annotated images with visual markers ``` ## Data Format The dataset follows the **ShareGPT format** with the following structure: ### JSON Format Example ```json { "type": "single_spatial_understanding", "images": [ "CrossPoint-378K/image/00a231a370/DSC05031.JPG" ], "messages": [ { "content": "<image>\nWhat does the point at [56, 323] refer to?", "role": "user" }, { "content": "It corresponds to the white window handle in the image.", "role": "assistant" } ] } ``` ### Field Descriptions - **type**: Task type (e.g., `single_spatial_understanding`, `cross_correspondence`) - **images**: List of image paths relative to the dataset root - **messages**: Conversation in ShareGPT format - **role**: Either `user` or `assistant` - **content**: Message content, where `<image>` tokens indicate image positions ## Dataset Statistics ## Usage For training scripts and detailed instructions, please visit the [GitHub repository](https://github.com/WangYipu2002/CrossPoint). ## Citation If you use CrossPoint-378K in your research, please cite: ```bibtex @article{wang2025crosspoint, title={Towards Cross-View Point Correspondence in Vision-Language Models}, author={Wang, Yipu and Ji, Yuheng and Liu, Yuyang and Zhou, Enshen and Yang, Ziqiang and Tian, Yuxuan and Qin, Ziheng and Liu, Yue and Tan, Huajie and Chi, Cheng and Ma, Zhiyuan and Zeng, Daniel Dajun and Zheng, Xiaolong}, journal={arXiv preprint arXiv:2512.04686}, year={2025} } ```

提供机构：

WangYipu2002

5,000+

优质数据集

54 个

任务类型

进入经典数据集