five

Spatial-SSRL-81k

收藏
魔搭社区2026-01-01 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/Shanghai_AI_Laboratory/Spatial-SSRL-81k
下载链接
链接失效反馈
官方服务:
资源简介:
# Spatial-SSRL-81k 📖<a href="https://arxiv.org/abs/2510.27606">Paper</a>| 🏠<a href="https://github.com/InternLM/Spatial-SSRL">Github</a> |🤗<a href="https://huggingface.co/internlm/Spatial-SSRL-7B">Spatial-SSRL-7B Model</a> | 🤗<a href="https://huggingface.co/internlm/Spatial-SSRL-Qwen3VL-4B">Spatial-SSRL-Qwen3VL-4B Model</a> | 🤗<a href="https://huggingface.co/datasets/internlm/Spatial-SSRL-81k">Spatial-SSRL-81k Dataset</a> | 📰<a href="https://huggingface.co/papers/2510.27606">Daily Paper</a> Spatial-SSRL-81k is a training dataset for enhancing spatial understanding in large vision-language models. It contains 81,053 samples of five pretext tasks for self-supervised learning, offering simple, intrinsic supervision that scales RLVR efficiently. ## 📢 News - 🚀 [2025/11/24] We have released the [🤗Spatial-SSRL-Qwen3VL-4B Model](https://huggingface.co/internlm/Spatial-SSRL-Qwen3VL-4B), initialized from Qwen3-VL-4B-Instruct. - 🚀 [2025/11/03] Now you can try out Spatial-SSRL-7B on [🤗Spatial-SSRL Space](https://huggingface.co/spaces/yuhangzang/Spatial-SSRL). - 🚀 [2025/11/03] We have released the [🤗Spatial-SSRL-7B Model](https://huggingface.co/internlm/Spatial-SSRL-7B), and [🤗Spatial-SSRL-81k Dataset](https://huggingface.co/datasets/internlm/Spatial-SSRL-81k). - 🚀 [2025/11/02] We have released the [🏠Spatial-SSRL Repository](https://github.com/InternLM/Spatial-SSRL). ## 🌈 Overview We are thrilled to introduce <strong>Spatial-SSRL</strong>, a novel self-supervised RL paradigm aimed at enhancing LVLM spatial understanding. By optimizing Qwen2.5-VL-7B with Spatial-SSRL, the model exhibits stronger spatial intelligence across seven spatial understanding benchmarks in both image and video settings. </p> <p style="text-align: center;"> <img src="assets/teaser_1029final.png" alt="Teaser" width="100%"> </p> Spatial-SSRL is a <strong>lightweight</strong> tool-free framework that is natually compatible with the RLVR training paradigm and easy to extend to a multitude of pretext tasks. Five tasks are currently formulated in the framework, requiring only ordinary RGB and RGB-D images. <strong>And we welcome you to join Spatial-SSRL with effective pretext tasks to further strengthen the capabilities of LVLMs!</strong> <p style="text-align: center;"> <img src="assets/pipeline_1029final.png" alt="Pipeline" width="100%"> </p> ## 💡 Highlights - 🔥 **Highly Scalable:** Spatial-SSRL uses ordinary raw RGB and RGB-D images instead of richly-annotated public datasets or manual labels for data curation, making it highly scalable. - 🔥 **Cost-effective:** Avoiding the need for human labels or API calls for general LVLMs throughout the entire pipeline endows Spatial-SSRL with cost-effectiveness. - 🔥 **Lightweight:** Prior approaches for spatial understanding heavily rely on annotation of external tools, incurring inherent errors in training data and additional cost. In constrast, Spatial-SSRL is completely tool-free and can easily be extended to more self-supervised tasks. - 🔥 **Naturally Verifiable:** Intrinsic supervisory signals determined by pretext objectives are naturally verifiable, aligning Spatial-SSRL well with the RLVR paradigm. <p style="text-align: center;"> <img src="assets/comparison_1029final.png" alt="Teaser" width="100%"> </p> ## 🖼️ Task examples <p style="text-align: center;"> <img src="assets/task1.png" alt="Teaser" width="100%"> </p> <p style="text-align: center;"> <img src="assets/task2.png" alt="Teaser" width="100%"> </p> <p style="text-align: center;"> <img src="assets/task3.png" alt="Teaser" width="100%"> </p> <p style="text-align: center;"> <img src="assets/task4.png" alt="Teaser" width="100%"> ## 🛠️ Usage You can find all question-answering pairs in `spatialssrl.parquet` and the images in `images.zip`. The images are organized in five folders, each corresponding to a 2D or 3D pretext task. See the formulation of each task in 📖<a href="https://arxiv.org/abs/2510.27606">Paper</a> if you are interested. ## ✒️Citation If you find this dataset useful, please kindly cite: ``` @article{liu2025spatial, title={Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning}, author={Liu, Yuhong and Zhang, Beichen and Zang, Yuhang and Cao, Yuhang and Xing, Long and Dong, Xiaoyi and Duan, Haodong and Lin, Dahua and Wang, Jiaqi}, journal={arXiv preprint arXiv:2510.27606}, year={2025} } ``` ## 📄 License ![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg) ![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg) **Usage and License Notices**: The data and code are intended and licensed for research use only.

# Spatial-SSRL-81k 📖<a href="https://arxiv.org/abs/2510.27606">研究论文</a>| 🏠<a href="https://github.com/InternLM/Spatial-SSRL">Github 仓库</a> |🤗<a href="https://huggingface.co/internlm/Spatial-SSRL-7B">Spatial-SSRL-7B 模型</a> | 🤗<a href="https://huggingface.co/internlm/Spatial-SSRL-Qwen3VL-4B">Spatial-SSRL-Qwen3VL-4B 模型</a> | 🤗<a href="https://huggingface.co/datasets/internlm/Spatial-SSRL-81k">Spatial-SSRL-81k 数据集</a> | 📰<a href="https://huggingface.co/papers/2510.27606">每日论文</a> Spatial-SSRL-81k 是用于提升大视觉语言模型(Large Vision-Language Model, LVLM)空间理解能力的训练数据集。该数据集包含81053条样本,涵盖5个自监督学习预文本任务,可提供简洁的内在监督信号,高效适配RLVR训练范式。 ## 📢 新闻 - 🚀 [2025/11/24] 我们已发布基于Qwen3-VL-4B-Instruct初始化的🤗Spatial-SSRL-Qwen3VL-4B 模型。 - 🚀 [2025/11/03] 您可通过🤗Spatial-SSRL 演示空间体验Spatial-SSRL-7B 模型。 - 🚀 [2025/11/03] 我们已发布🤗Spatial-SSRL-7B 模型与🤗Spatial-SSRL-81k 数据集。 - 🚀 [2025/11/02] 我们已上线🏠Spatial-SSRL 官方仓库。 ## 🌈 概述 我们荣幸地推出<strong>Spatial-SSRL</strong>,一款旨在提升大视觉语言模型空间理解能力的新型自监督强化学习(Reinforcement Learning, RL)范式。通过Spatial-SSRL对Qwen2.5-VL-7B进行优化后,该模型在图像与视频两类场景下的7项空间理解基准测试中均展现出更优异的空间智能表现。 <p style="text-align: center;"> <img src="assets/teaser_1029final.png" alt="Teaser" width="100%"> </p> Spatial-SSRL是一款<strong>轻量级</strong>无工具依赖框架,天然兼容RLVR训练范式,且易于拓展至多种预文本任务。当前框架内已设计5项任务,仅需使用普通RGB与RGB-D图像即可完成。<strong>我们诚邀各界研究者贡献高效预文本任务,进一步增强大视觉语言模型的空间理解能力!</strong> <p style="text-align: center;"> <img src="assets/pipeline_1029final.png" alt="Pipeline" width="100%"> </p> ## 💡 亮点 - 🔥 **高可扩展性**:Spatial-SSRL在数据构建阶段仅使用原始RGB与RGB-D图像,无需依赖标注完备的公开数据集或人工标注标签,具备极强的可扩展性。 - 🔥 **高性价比**:整个流程无需为通用大视觉语言模型申请人工标注或调用API,大幅降低了训练成本。 - 🔥 **轻量高效**:此前的空间理解方法大多依赖外部工具标注,易在训练数据中引入固有误差且额外增加成本。与之形成鲜明对比的是,Spatial-SSRL完全无需工具依赖,且可轻松拓展至更多自监督学习任务。 - 🔥 **天然可验证性**:由预文本任务目标生成的内在监督信号天然具备可验证性,完美适配RLVR训练范式。 <p style="text-align: center;"> <img src="assets/comparison_1029final.png" alt="Teaser" width="100%"> </p> ## 🖼️ 任务示例 <p style="text-align: center;"> <img src="assets/task1.png" alt="Teaser" width="100%"> </p> <p style="text-align: center;"> <img src="assets/task2.png" alt="Teaser" width="100%"> </p> <p style="text-align: center;"> <img src="assets/task3.png" alt="Teaser" width="100%"> </p> <p style="text-align: center;"> <img src="assets/task4.png" alt="Teaser" width="100%"> </p> ## 🛠️ 使用方式 您可在`spatialssrl.parquet`文件中找到所有问答对,图像资源则存放于`images.zip`压缩包内。所有图像被划分为5个文件夹,分别对应一项2D或3D预文本任务。若您希望了解每项任务的具体形式,可查阅📖<a href="https://arxiv.org/abs/2510.27606">研究论文</a>。 ## ✒️ 引用 若您认为本数据集对您的研究有所帮助,请引用以下文献: @article{liu2025spatial, title={Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning}, author={Liu, Yuhong and Zhang, Beichen and Zang, Yuhang and Cao, Yuhang and Xing, Long and Dong, Xiaoyi and Duan, Haodong and Lin, Dahua and Wang, Jiaqi}, journal={arXiv preprint arXiv:2510.27606}, year={2025} } ## 📄 许可证 ![代码许可证](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg) ![数据许可证](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg) **使用与许可证声明**:本数据集与代码仅用于学术研究用途。
提供机构:
maas
创建时间:
2025-11-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作