five

WangYipu2002/CrossPoint-378K

收藏
Hugging Face2025-12-08 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/WangYipu2002/CrossPoint-378K
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: mit size_categories: - 100K<n<1M task_categories: - image-to-text - visual-question-answering tags: - cross-view pretty_name: CrossPoint-378K --- # CrossPoint-378K Dataset [![arXiv](https://img.shields.io/badge/arXiv-2512.04686-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.04686) [![GitHub](https://img.shields.io/badge/GitHub-WangYipu2002/CrossPoint-181717.svg?logo=github&logoColor=white)](https://github.com/WangYipu2002/CrossPoint) ## Overview CrossPoint-378K is a large-scale dataset for cross-view point correspondence. This dataset contains 378K training samples designed to enhance vision-language models' capabilities in cross-view point correspondences. <p align="center"> <img src="CrossPoint-378K.png" width="80%"> </p> ## Dataset Structure The dataset contains the following file structure: ``` CrossPoint-378K/ ├── CrossPoint-378K.json # Main data file (ShareGPT format) ├── image/ # Original images directory │ └── [scene_id]/ # Scene ID directory │ └── [images] # Scene images └── visual_image/ # Annotated images directory └── [scene_id]/ # Scene ID directory └── [images] # Annotated images with visual markers ``` ## Data Format The dataset follows the **ShareGPT format** with the following structure: ### JSON Format Example ```json { "type": "single_spatial_understanding", "images": [ "CrossPoint-378K/image/00a231a370/DSC05031.JPG" ], "messages": [ { "content": "<image>\nWhat does the point at [56, 323] refer to?", "role": "user" }, { "content": "It corresponds to the white window handle in the image.", "role": "assistant" } ] } ``` ### Field Descriptions - **type**: Task type (e.g., `single_spatial_understanding`, `cross_correspondence`) - **images**: List of image paths relative to the dataset root - **messages**: Conversation in ShareGPT format - **role**: Either `user` or `assistant` - **content**: Message content, where `<image>` tokens indicate image positions ## Dataset Statistics ## Usage For training scripts and detailed instructions, please visit the [GitHub repository](https://github.com/WangYipu2002/CrossPoint). ## Citation If you use CrossPoint-378K in your research, please cite: ```bibtex @article{wang2025crosspoint, title={Towards Cross-View Point Correspondence in Vision-Language Models}, author={Wang, Yipu and Ji, Yuheng and Liu, Yuyang and Zhou, Enshen and Yang, Ziqiang and Tian, Yuxuan and Qin, Ziheng and Liu, Yue and Tan, Huajie and Chi, Cheng and Ma, Zhiyuan and Zeng, Daniel Dajun and Zheng, Xiaolong}, journal={arXiv preprint arXiv:2512.04686}, year={2025} } ```
提供机构:
WangYipu2002
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作