Spatial-SSRL-81k
收藏魔搭社区2026-01-01 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/Shanghai_AI_Laboratory/Spatial-SSRL-81k
下载链接
链接失效反馈官方服务:
资源简介:
# Spatial-SSRL-81k
📖<a href="https://arxiv.org/abs/2510.27606">Paper</a>| 🏠<a href="https://github.com/InternLM/Spatial-SSRL">Github</a> |🤗<a href="https://huggingface.co/internlm/Spatial-SSRL-7B">Spatial-SSRL-7B Model</a> |
🤗<a href="https://huggingface.co/internlm/Spatial-SSRL-Qwen3VL-4B">Spatial-SSRL-Qwen3VL-4B Model</a> |
🤗<a href="https://huggingface.co/datasets/internlm/Spatial-SSRL-81k">Spatial-SSRL-81k Dataset</a> | 📰<a href="https://huggingface.co/papers/2510.27606">Daily Paper</a>
Spatial-SSRL-81k is a training dataset for enhancing spatial understanding in large vision-language models. It contains 81,053 samples of five pretext tasks for self-supervised learning, offering simple, intrinsic supervision that scales RLVR efficiently.
## 📢 News
- 🚀 [2025/11/24] We have released the [🤗Spatial-SSRL-Qwen3VL-4B Model](https://huggingface.co/internlm/Spatial-SSRL-Qwen3VL-4B), initialized from Qwen3-VL-4B-Instruct.
- 🚀 [2025/11/03] Now you can try out Spatial-SSRL-7B on [🤗Spatial-SSRL Space](https://huggingface.co/spaces/yuhangzang/Spatial-SSRL).
- 🚀 [2025/11/03] We have released the [🤗Spatial-SSRL-7B Model](https://huggingface.co/internlm/Spatial-SSRL-7B), and [🤗Spatial-SSRL-81k Dataset](https://huggingface.co/datasets/internlm/Spatial-SSRL-81k).
- 🚀 [2025/11/02] We have released the [🏠Spatial-SSRL Repository](https://github.com/InternLM/Spatial-SSRL).
## 🌈 Overview
We are thrilled to introduce <strong>Spatial-SSRL</strong>, a novel self-supervised RL paradigm aimed at enhancing LVLM spatial understanding.
By optimizing Qwen2.5-VL-7B with Spatial-SSRL, the model exhibits stronger spatial intelligence across seven spatial understanding benchmarks in both image and video settings.
</p>
<p style="text-align: center;">
<img src="assets/teaser_1029final.png" alt="Teaser" width="100%">
</p>
Spatial-SSRL is a <strong>lightweight</strong> tool-free framework that is natually compatible with the RLVR training paradigm and easy to extend to a multitude of pretext tasks.
Five tasks are currently formulated in the framework, requiring only ordinary RGB and RGB-D images. <strong>And we welcome you to join Spatial-SSRL with effective pretext tasks to further strengthen the capabilities of LVLMs!</strong>
<p style="text-align: center;">
<img src="assets/pipeline_1029final.png" alt="Pipeline" width="100%">
</p>
## 💡 Highlights
- 🔥 **Highly Scalable:** Spatial-SSRL uses ordinary raw RGB and RGB-D images instead of richly-annotated public datasets or manual labels for data curation, making it highly scalable.
- 🔥 **Cost-effective:** Avoiding the need for human labels or API calls for general LVLMs throughout the entire pipeline endows Spatial-SSRL with cost-effectiveness.
- 🔥 **Lightweight:** Prior approaches for spatial understanding heavily rely on annotation of external tools, incurring inherent errors in training data and additional cost. In constrast, Spatial-SSRL is completely tool-free and can easily be extended to more self-supervised tasks.
- 🔥 **Naturally Verifiable:** Intrinsic supervisory signals determined by pretext objectives are naturally verifiable, aligning Spatial-SSRL well with the RLVR paradigm.
<p style="text-align: center;">
<img src="assets/comparison_1029final.png" alt="Teaser" width="100%">
</p>
## 🖼️ Task examples
<p style="text-align: center;">
<img src="assets/task1.png" alt="Teaser" width="100%">
</p>
<p style="text-align: center;">
<img src="assets/task2.png" alt="Teaser" width="100%">
</p>
<p style="text-align: center;">
<img src="assets/task3.png" alt="Teaser" width="100%">
</p>
<p style="text-align: center;">
<img src="assets/task4.png" alt="Teaser" width="100%">
## 🛠️ Usage
You can find all question-answering pairs in `spatialssrl.parquet` and the images in `images.zip`. The images are organized in five folders, each corresponding to a 2D or 3D pretext task.
See the formulation of each task in 📖<a href="https://arxiv.org/abs/2510.27606">Paper</a> if you are interested.
## ✒️Citation
If you find this dataset useful, please kindly cite:
```
@article{liu2025spatial,
title={Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning},
author={Liu, Yuhong and Zhang, Beichen and Zang, Yuhang and Cao, Yuhang and Xing, Long and Dong, Xiaoyi and Duan, Haodong and Lin, Dahua and Wang, Jiaqi},
journal={arXiv preprint arXiv:2510.27606},
year={2025}
}
```
## 📄 License
 
**Usage and License Notices**: The data and code are intended and licensed for research use only.
# Spatial-SSRL-81k
📖<a href="https://arxiv.org/abs/2510.27606">研究论文</a>| 🏠<a href="https://github.com/InternLM/Spatial-SSRL">Github 仓库</a> |🤗<a href="https://huggingface.co/internlm/Spatial-SSRL-7B">Spatial-SSRL-7B 模型</a> |
🤗<a href="https://huggingface.co/internlm/Spatial-SSRL-Qwen3VL-4B">Spatial-SSRL-Qwen3VL-4B 模型</a> |
🤗<a href="https://huggingface.co/datasets/internlm/Spatial-SSRL-81k">Spatial-SSRL-81k 数据集</a> | 📰<a href="https://huggingface.co/papers/2510.27606">每日论文</a>
Spatial-SSRL-81k 是用于提升大视觉语言模型(Large Vision-Language Model, LVLM)空间理解能力的训练数据集。该数据集包含81053条样本,涵盖5个自监督学习预文本任务,可提供简洁的内在监督信号,高效适配RLVR训练范式。
## 📢 新闻
- 🚀 [2025/11/24] 我们已发布基于Qwen3-VL-4B-Instruct初始化的🤗Spatial-SSRL-Qwen3VL-4B 模型。
- 🚀 [2025/11/03] 您可通过🤗Spatial-SSRL 演示空间体验Spatial-SSRL-7B 模型。
- 🚀 [2025/11/03] 我们已发布🤗Spatial-SSRL-7B 模型与🤗Spatial-SSRL-81k 数据集。
- 🚀 [2025/11/02] 我们已上线🏠Spatial-SSRL 官方仓库。
## 🌈 概述
我们荣幸地推出<strong>Spatial-SSRL</strong>,一款旨在提升大视觉语言模型空间理解能力的新型自监督强化学习(Reinforcement Learning, RL)范式。通过Spatial-SSRL对Qwen2.5-VL-7B进行优化后,该模型在图像与视频两类场景下的7项空间理解基准测试中均展现出更优异的空间智能表现。
<p style="text-align: center;">
<img src="assets/teaser_1029final.png" alt="Teaser" width="100%">
</p>
Spatial-SSRL是一款<strong>轻量级</strong>无工具依赖框架,天然兼容RLVR训练范式,且易于拓展至多种预文本任务。当前框架内已设计5项任务,仅需使用普通RGB与RGB-D图像即可完成。<strong>我们诚邀各界研究者贡献高效预文本任务,进一步增强大视觉语言模型的空间理解能力!</strong>
<p style="text-align: center;">
<img src="assets/pipeline_1029final.png" alt="Pipeline" width="100%">
</p>
## 💡 亮点
- 🔥 **高可扩展性**:Spatial-SSRL在数据构建阶段仅使用原始RGB与RGB-D图像,无需依赖标注完备的公开数据集或人工标注标签,具备极强的可扩展性。
- 🔥 **高性价比**:整个流程无需为通用大视觉语言模型申请人工标注或调用API,大幅降低了训练成本。
- 🔥 **轻量高效**:此前的空间理解方法大多依赖外部工具标注,易在训练数据中引入固有误差且额外增加成本。与之形成鲜明对比的是,Spatial-SSRL完全无需工具依赖,且可轻松拓展至更多自监督学习任务。
- 🔥 **天然可验证性**:由预文本任务目标生成的内在监督信号天然具备可验证性,完美适配RLVR训练范式。
<p style="text-align: center;">
<img src="assets/comparison_1029final.png" alt="Teaser" width="100%">
</p>
## 🖼️ 任务示例
<p style="text-align: center;">
<img src="assets/task1.png" alt="Teaser" width="100%">
</p>
<p style="text-align: center;">
<img src="assets/task2.png" alt="Teaser" width="100%">
</p>
<p style="text-align: center;">
<img src="assets/task3.png" alt="Teaser" width="100%">
</p>
<p style="text-align: center;">
<img src="assets/task4.png" alt="Teaser" width="100%">
</p>
## 🛠️ 使用方式
您可在`spatialssrl.parquet`文件中找到所有问答对,图像资源则存放于`images.zip`压缩包内。所有图像被划分为5个文件夹,分别对应一项2D或3D预文本任务。若您希望了解每项任务的具体形式,可查阅📖<a href="https://arxiv.org/abs/2510.27606">研究论文</a>。
## ✒️ 引用
若您认为本数据集对您的研究有所帮助,请引用以下文献:
@article{liu2025spatial,
title={Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning},
author={Liu, Yuhong and Zhang, Beichen and Zang, Yuhang and Cao, Yuhang and Xing, Long and Dong, Xiaoyi and Duan, Haodong and Lin, Dahua and Wang, Jiaqi},
journal={arXiv preprint arXiv:2510.27606},
year={2025}
}
## 📄 许可证
 
**使用与许可证声明**:本数据集与代码仅用于学术研究用途。
提供机构:
maas
创建时间:
2025-11-04



