Spatial-SSRL-81k

Name: Spatial-SSRL-81k
Creator: maas
Published: 2026-01-01 17:17:53
License: 暂无描述

魔搭社区2026-01-01 更新2025-11-08 收录

下载链接：

https://modelscope.cn/datasets/Shanghai_AI_Laboratory/Spatial-SSRL-81k

下载链接

链接失效反馈

官方服务：

资源简介：

# Spatial-SSRL-81k 📖<a href="https://arxiv.org/abs/2510.27606">Paper</a>| 🏠<a href="https://github.com/InternLM/Spatial-SSRL">Github</a> |🤗<a href="https://huggingface.co/internlm/Spatial-SSRL-7B">Spatial-SSRL-7B Model</a> | 🤗<a href="https://huggingface.co/internlm/Spatial-SSRL-Qwen3VL-4B">Spatial-SSRL-Qwen3VL-4B Model</a> | 🤗<a href="https://huggingface.co/datasets/internlm/Spatial-SSRL-81k">Spatial-SSRL-81k Dataset</a> | 📰<a href="https://huggingface.co/papers/2510.27606">Daily Paper</a> Spatial-SSRL-81k is a training dataset for enhancing spatial understanding in large vision-language models. It contains 81,053 samples of five pretext tasks for self-supervised learning, offering simple, intrinsic supervision that scales RLVR efficiently. ## 📢 News - 🚀 [2025/11/24] We have released the [🤗Spatial-SSRL-Qwen3VL-4B Model](https://huggingface.co/internlm/Spatial-SSRL-Qwen3VL-4B), initialized from Qwen3-VL-4B-Instruct. - 🚀 [2025/11/03] Now you can try out Spatial-SSRL-7B on [🤗Spatial-SSRL Space](https://huggingface.co/spaces/yuhangzang/Spatial-SSRL). - 🚀 [2025/11/03] We have released the [🤗Spatial-SSRL-7B Model](https://huggingface.co/internlm/Spatial-SSRL-7B), and [🤗Spatial-SSRL-81k Dataset](https://huggingface.co/datasets/internlm/Spatial-SSRL-81k). - 🚀 [2025/11/02] We have released the [🏠Spatial-SSRL Repository](https://github.com/InternLM/Spatial-SSRL). ## 🌈 Overview We are thrilled to introduce Spatial-SSRL, a novel self-supervised RL paradigm aimed at enhancing LVLM spatial understanding. By optimizing Qwen2.5-VL-7B with Spatial-SSRL, the model exhibits stronger spatial intelligence across seven spatial understanding benchmarks in both image and video settings. <img src="assets/teaser_1029final.png" alt="Teaser" width="100%"> Spatial-SSRL is a lightweight tool-free framework that is natually compatible with the RLVR training paradigm and easy to extend to a multitude of pretext tasks. Five tasks are currently formulated in the framework, requiring only ordinary RGB and RGB-D images. And we welcome you to join Spatial-SSRL with effective pretext tasks to further strengthen the capabilities of LVLMs! <img src="assets/pipeline_1029final.png" alt="Pipeline" width="100%"> ## 💡 Highlights - 🔥 **Highly Scalable:** Spatial-SSRL uses ordinary raw RGB and RGB-D images instead of richly-annotated public datasets or manual labels for data curation, making it highly scalable. - 🔥 **Cost-effective:** Avoiding the need for human labels or API calls for general LVLMs throughout the entire pipeline endows Spatial-SSRL with cost-effectiveness. - 🔥 **Lightweight:** Prior approaches for spatial understanding heavily rely on annotation of external tools, incurring inherent errors in training data and additional cost. In constrast, Spatial-SSRL is completely tool-free and can easily be extended to more self-supervised tasks. - 🔥 **Naturally Verifiable:** Intrinsic supervisory signals determined by pretext objectives are naturally verifiable, aligning Spatial-SSRL well with the RLVR paradigm. <img src="assets/comparison_1029final.png" alt="Teaser" width="100%"> ## 🖼️ Task examples <img src="assets/task1.png" alt="Teaser" width="100%"> <img src="assets/task2.png" alt="Teaser" width="100%"> <img src="assets/task3.png" alt="Teaser" width="100%"> <img src="assets/task4.png" alt="Teaser" width="100%"> ## 🛠️ Usage You can find all question-answering pairs in `spatialssrl.parquet` and the images in `images.zip`. The images are organized in five folders, each corresponding to a 2D or 3D pretext task. See the formulation of each task in 📖<a href="https://arxiv.org/abs/2510.27606">Paper</a> if you are interested. ## ✒️Citation If you find this dataset useful, please kindly cite: ``` @article{liu2025spatial, title={Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning}, author={Liu, Yuhong and Zhang, Beichen and Zang, Yuhang and Cao, Yuhang and Xing, Long and Dong, Xiaoyi and Duan, Haodong and Lin, Dahua and Wang, Jiaqi}, journal={arXiv preprint arXiv:2510.27606}, year={2025} } ``` ## 📄 License ![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg) ![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg) **Usage and License Notices**: The data and code are intended and licensed for research use only.

# Spatial-SSRL-81k 📖<a href="https://arxiv.org/abs/2510.27606">研究论文</a>| 🏠<a href="https://github.com/InternLM/Spatial-SSRL">Github 仓库</a> |🤗<a href="https://huggingface.co/internlm/Spatial-SSRL-7B">Spatial-SSRL-7B 模型</a> | 🤗<a href="https://huggingface.co/internlm/Spatial-SSRL-Qwen3VL-4B">Spatial-SSRL-Qwen3VL-4B 模型</a> | 🤗<a href="https://huggingface.co/datasets/internlm/Spatial-SSRL-81k">Spatial-SSRL-81k 数据集</a> | 📰<a href="https://huggingface.co/papers/2510.27606">每日论文</a> Spatial-SSRL-81k 是用于提升大视觉语言模型（Large Vision-Language Model, LVLM）空间理解能力的训练数据集。该数据集包含81053条样本，涵盖5个自监督学习预文本任务，可提供简洁的内在监督信号，高效适配RLVR训练范式。 ## 📢 新闻 - 🚀 [2025/11/24] 我们已发布基于Qwen3-VL-4B-Instruct初始化的🤗Spatial-SSRL-Qwen3VL-4B 模型。 - 🚀 [2025/11/03] 您可通过🤗Spatial-SSRL 演示空间体验Spatial-SSRL-7B 模型。 - 🚀 [2025/11/03] 我们已发布🤗Spatial-SSRL-7B 模型与🤗Spatial-SSRL-81k 数据集。 - 🚀 [2025/11/02] 我们已上线🏠Spatial-SSRL 官方仓库。 ## 🌈 概述我们荣幸地推出Spatial-SSRL，一款旨在提升大视觉语言模型空间理解能力的新型自监督强化学习（Reinforcement Learning, RL）范式。通过Spatial-SSRL对Qwen2.5-VL-7B进行优化后，该模型在图像与视频两类场景下的7项空间理解基准测试中均展现出更优异的空间智能表现。 <img src="assets/teaser_1029final.png" alt="Teaser" width="100%"> Spatial-SSRL是一款轻量级无工具依赖框架，天然兼容RLVR训练范式，且易于拓展至多种预文本任务。当前框架内已设计5项任务，仅需使用普通RGB与RGB-D图像即可完成。我们诚邀各界研究者贡献高效预文本任务，进一步增强大视觉语言模型的空间理解能力！ <img src="assets/pipeline_1029final.png" alt="Pipeline" width="100%"> ## 💡 亮点 - 🔥 **高可扩展性**：Spatial-SSRL在数据构建阶段仅使用原始RGB与RGB-D图像，无需依赖标注完备的公开数据集或人工标注标签，具备极强的可扩展性。 - 🔥 **高性价比**：整个流程无需为通用大视觉语言模型申请人工标注或调用API，大幅降低了训练成本。 - 🔥 **轻量高效**：此前的空间理解方法大多依赖外部工具标注，易在训练数据中引入固有误差且额外增加成本。与之形成鲜明对比的是，Spatial-SSRL完全无需工具依赖，且可轻松拓展至更多自监督学习任务。 - 🔥 **天然可验证性**：由预文本任务目标生成的内在监督信号天然具备可验证性，完美适配RLVR训练范式。 <img src="assets/comparison_1029final.png" alt="Teaser" width="100%"> ## 🖼️ 任务示例 <img src="assets/task1.png" alt="Teaser" width="100%"> <img src="assets/task2.png" alt="Teaser" width="100%"> <img src="assets/task3.png" alt="Teaser" width="100%"> <img src="assets/task4.png" alt="Teaser" width="100%"> ## 🛠️ 使用方式您可在`spatialssrl.parquet`文件中找到所有问答对，图像资源则存放于`images.zip`压缩包内。所有图像被划分为5个文件夹，分别对应一项2D或3D预文本任务。若您希望了解每项任务的具体形式，可查阅📖<a href="https://arxiv.org/abs/2510.27606">研究论文</a>。 ## ✒️ 引用若您认为本数据集对您的研究有所帮助，请引用以下文献： @article{liu2025spatial, title={Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning}, author={Liu, Yuhong and Zhang, Beichen and Zang, Yuhang and Cao, Yuhang and Xing, Long and Dong, Xiaoyi and Duan, Haodong and Lin, Dahua and Wang, Jiaqi}, journal={arXiv preprint arXiv:2510.27606}, year={2025} } ## 📄 许可证 ![代码许可证](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg) ![数据许可证](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg) **使用与许可证声明**：本数据集与代码仅用于学术研究用途。

提供机构：

maas

创建时间：

2025-11-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集