five

RoboSpatial-Home

收藏
魔搭社区2025-11-25 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/comefly/RoboSpatial-Home
下载链接
链接失效反馈
官方服务:
资源简介:
# RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics [**🌐 Homepage**](https://chanh.ee/RoboSpatial/) | [**📖 arXiv**](https://arxiv.org/abs/2411.16537) | [**🛠️ Data Gen**](https://github.com/NVlabs/RoboSpatial) | [**🧪 Eval Code**](https://github.com/chanhee-luke/RoboSpatial-Eval) ## 🔥 **Core spatial understanding benchmark used by [Qwen3-VL](https://qwen.ai/blog?id=99f0335c4ad9ff6153e517418d48535ab6d8afef&from=research.latest-advancements-list) and [Gemini Robotics](https://storage.googleapis.com/deepmind-media/gemini-robotics/Gemini-Robotics-1-5-Tech-Report.pdf)!** ## ⚠️ Important Note (08/14/25) **Annotation Correction:** In the context category, the microwave question was corrected from “below” to “above” to fix an annotation error. ## Dataset Description We introduce RoboSpatial-Home: a new spatial reasoning benchmark designed to evaluate vision-language models (VLMs) in real-world indoor environments for robotics. It consists of 350 spatial reasoning questions paired with crowd-sourced RGBD images captured using a handheld iPhone camera equipped with a depth sensor. Each image is annotated with three types of spatial relationship questions—spatial configuration, spatial context, and spatial compatibility—providing a comprehensive evaluation of spatial understanding in robotic applications. ## Dataset Structure RoboSpatial-Home consists of QA annotations paired with RGB and depth images. The dataset is organized as follows: - `category`: The spatial reasoning category for the entry (configuration, context, or compatibility). - `question`: The spatial reasoning question. - `answer`: The human annotated answer. - `img`: The RGB image from iPhone 13 Pro Max. - `depth_image`: The corresponding depth image from iPhone 13 Pro Max. - `mask`: (If available) The corresponding segmentation mask for spatial context questions. ## QA types: - Spatial Configuration: Determines the relative positioning of objects (e.g., "*Is the mug to the left of the laptop?*"). - Spatial Context: Identifies vacant areas in relation to a reference object (e.g., "*Identify empty space to the left of the bowl.*"). - Spatial Compatibility: Assesses whether an object can fit within a specified area (e.g., "*Can the chair be placed in front of the desk?*"). ## Load Dataset You can load the RoboSpatial-Home dataset in two ways: 1. Using the Hugging Face `datasets` library ```python import datasets dataset_name = 'chanhee-luke/RoboSpatial-Home' data = load_dataset(dataset_name, CATEGORY) ``` where `CATEGORY` is one of the spatial reasoning categories: `configuration`, `context`, `compatibility`. If not specified, the entire dataset will be loaded. 2. Downloading locally with the script If you prefer to work with local files, the RoboSpatial-Eval repo provides a [script](https://github.com/chanhee-luke/RoboSpatial-Eval/blob/master/download_benchmark.py): ``` python download_benchmark.py [OUTPUT_FOLDER_PATH] ``` This downloads the dataset locally for debugging or for setups that don’t use the Hugging Face `datasets` library. If no output path is provided, the dataset will be saved to `./RoboSpatial-Home` by default. ## Dataset Creation The data for RoboSpatial-Home was manually collected and annotated by graduate-level students in computer science. ## Disclaimers ⚠️ Disclaimer: The images in this dataset were collected from real homes of real individuals. When using or distributing this dataset, ensure that privacy and ethical considerations are upheld. Redistribution of images should be done with caution to respect the privacy of the original contributors. ## Contact - Luke Song: song.1855@osu.edu ## Citation **BibTex:** ```bibtex @inproceedings{song2025robospatial, author = {Song, Chan Hee and Blukis, Valts and Tremblay, Jonathan and Tyree, Stephen and Su, Yu and Birchfield, Stan}, title = {{RoboSpatial}: Teaching Spatial Understanding to {2D} and {3D} Vision-Language Models for Robotics}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2025}, note = {Oral Presentation}, } ```

# RoboSpatial:面向机器人学的2D与3D视觉语言模型空间理解教学数据集 [🌐 主页](https://chanh.ee/RoboSpatial/) | [📖 arXiv论文](https://arxiv.org/abs/2411.16537) | [🛠️ 数据生成代码](https://github.com/NVlabs/RoboSpatial) | [🧪 评估代码](https://github.com/chanhee-luke/RoboSpatial-Eval) ## 🔥 核心空间理解基准数据集,已被Qwen3-VL与Gemini Robotics采用! ⚠️ 重要说明(2025年8月14日) **标注修正:** 在空间上下文类别中,修正了微波炉相关问题的标注错误,将“下方”调整为“上方”。 ## 数据集描述 我们提出了RoboSpatial-Home:一款全新的空间推理基准数据集,旨在评估面向机器人实际室内场景的视觉语言模型(Vision-Language Model, VLM)。该数据集包含350道空间推理问题,搭配使用搭载深度传感器的手持iPhone相机采集的众包RGBD图像。每幅图像均标注了三类空间关系问题:空间布局、空间上下文与空间兼容性,可全面评估机器人应用中的空间理解能力。 ## 数据集结构 RoboSpatial-Home由问答标注搭配RGB与深度图像组成,数据集结构如下: - `category`:当前条目对应的空间推理类别(布局、上下文或兼容性)。 - `question`:空间推理问题。 - `answer`:人工标注的答案。 - `img`:来自iPhone 13 Pro Max的RGB图像。 - `depth_image`:来自iPhone 13 Pro Max的对应深度图像。 - `mask`:(若可用)对应空间上下文问题的分割掩码。 ## 问答类型 - 空间布局(Spatial Configuration):用于判断物体的相对位置关系(例如:*“杯子是否在笔记本电脑左侧?”*)。 - 空间上下文(Spatial Context):用于识别参考物体周边的空闲区域(例如:*“识别碗左侧的空白空间。”*)。 - 空间兼容性(Spatial Compatibility):用于评估某一物体是否可放置于指定区域内(例如:*“椅子能否放在书桌前方?”*)。 ## 数据集加载 可通过两种方式加载RoboSpatial-Home数据集: 1. 使用Hugging Face `datasets`库 python import datasets dataset_name = 'chanhee-luke/RoboSpatial-Home' data = load_dataset(dataset_name, CATEGORY) 其中`CATEGORY`为空间推理类别之一,可选值为`configuration`、`context`或`compatibility`。若未指定类别,则加载完整数据集。 2. 使用脚本本地下载 若希望使用本地文件进行开发,RoboSpatial-Eval仓库提供了[下载脚本](https://github.com/chanhee-luke/RoboSpatial-Eval/blob/master/download_benchmark.py): python download_benchmark.py [OUTPUT_FOLDER_PATH] 该脚本可将数据集下载至本地,用于调试或不使用Hugging Face `datasets`库的开发场景。若未指定输出路径,数据集将默认保存至`./RoboSpatial-Home`目录。 ## 数据集构建 RoboSpatial-Home的数据由计算机科学专业研究生手动采集并标注。 ## 免责声明 ⚠️ 免责声明:本数据集的图像采集自真实个体的私人住宅。在使用或分发该数据集时,请务必遵守隐私与伦理准则。重新分发图像时,请谨慎处理,以尊重原始贡献者的隐私权益。 ## 联系方式 - Luke Song: song.1855@osu.edu ## 引用格式 **BibTex引用:** bibtex @inproceedings{song2025robospatial, author = {Song, Chan Hee and Blukis, Valts and Tremblay, Jonathan and Tyree, Stephen and Su, Yu and Birchfield, Stan}, title = {{RoboSpatial}: Teaching Spatial Understanding to {2D} and {3D} Vision-Language Models for Robotics}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2025}, note = {Oral Presentation}, }
提供机构:
maas
创建时间:
2025-11-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作