five

Charlie019/CourtSI-1M

收藏
Hugging Face2026-03-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Charlie019/CourtSI-1M
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - visual-question-answering language: - en size_categories: - 1M<n<10M --- <h1><img width="4%"/><i>Stepping VLMs onto the Court</i>: Benchmarking Spatial Intelligence in Sports</h1> <a href="https://arxiv.org/abs/2603.09896" target="_blank"> <img alt="arXiv" src="https://img.shields.io/badge/arXiv-CourtSI-red?logo=arxiv" height="20" /> </a> <a href="https://visionary-laboratory.github.io/CourtSI/" target="_blank"> <img alt="Website" src="https://img.shields.io/badge/🌎_Website-CourtSI-blue.svg" height="20" /> </a> <a href="https://github.com/Visionary-Laboratory/CourtSI" target="_blank"> <img alt="GitHub Repo" src="https://img.shields.io/badge/GitHub-CourtSI-black?logo=github" height="20" /> </a> ## Abstract Sports have long attracted broad attention as they push the limits of human physical and cognitive capabilities. Amid growing interest in spatial intelligence for vision-language models (VLMs), sports provide a natural testbed for understanding high-intensity human motion and dynamic object interactions. To this end, we present CourtSI, the first large-scale spatial intelligence dataset tailored to sports scenarios. CourtSI contains over 1M QA pairs, organized under a holistic taxonomy that systematically covers spatial counting, distance measurement, localization, and relational reasoning, across representative net sports including badminton, tennis, and table tennis. Leveraging well-defined court geometry as metric anchors, we develop a semi-automatic data engine to reconstruct sports scenes, enabling scalable curation of CourtSI. In addition, we introduce CourtSI-Bench, a high-quality evaluation benchmark comprising 3,686 QA pairs with rigorous human verification. We evaluate 25 proprietary and open-source VLMs on CourtSI-Bench, revealing a remaining human–AI performance gap and limited generalization from existing spatial intelligence benchmarks. These findings indicate that sports scenarios expose limitations in spatial intelligence capabilities captured by existing benchmarks. Further, fine-tuning Qwen3-VL-8B on CourtSI improves accuracy on CourtSI-Bench by 23.5 percentage points. The adapted model also generalizes effectively to CourtSI-Ext, an evaluation set built on a similar but unseen sport, and demonstrates enhanced spatial-aware commentary generation. Together, these findings demonstrate that CourtSI provides a scalable pathway toward advancing spatial intelligence of VLMs in sports. ## CourtSI CourtSI is a large-scale dataset designed to study spatial intelligence in sports environments for Vision-Language Models (VLMs). It provides large-scale training data for supervised fine-tuning (SFT) of vision-language models. The dataset contains over **1M QA pairs**, built upon a holistic spatial taxonomy that includes **spatial counting, distance measurement, localization, and relational reasoning**.
提供机构:
Charlie019
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作