ScanNet-SG: A Large-Scale Dataset for 3D Scene Graph Alignment
收藏DataCite Commons2026-04-10 更新2026-04-25 收录
下载链接:
https://data.4tu.nl/datasets/bebe8bd4-cf91-4f86-a28a-87cb870f6cea/2
下载链接
链接失效反馈官方服务:
资源简介:
3D scene graph alignment establishes correspondences between graphs built from partially overlapping observations, enabling robots to maintain consistent object-level representations across time and agents. It arises in two complementary settings: frame-to-scan (F2S), for aligning partial observations to a global map, and subscan-to-subscan (S2S), for aligning independently built submaps. Existing datasets support only small-scale S2S alignment with limited object diversity and without vision–language representations, leaving large-scale, open-set benchmarks for both F2S and S2S largely unexplored. In this work, we propose an automated annotation pipeline that constructs open-set 3D scene graphs from RGB-D images and poses by integrating foundation models with point cloud processing tools. Applying this pipeline to ScanNet, we build ScanNet-SG, a large-scale benchmark for 3D scene graph alignment. ScanNet-SG contains over 700k alignment samples and covers 509 object categories from ScanNet labels and over 3k categories from GPT-4o-based tagging. Each object node is enriched with semantic labels, BERT embeddings, vision–language features, object point clouds, and 3D bounding boxes. By providing large-scale, multimodal data and supporting both F2S and S2S settings, ScanNet-SG provides a comprehensive benchmark for training and evaluating robust 3D scene graph alignment in open-world environments.
3D场景图对齐(3D scene graph alignment)指在由部分重叠观测构建的场景图之间建立对应关系,使机器人能够在不同时间与不同智能体间维持一致的物体级表征。该任务存在两种互补的设置场景:帧到扫描(F2S),用于将局部观测对齐至全局地图;以及子扫描到子扫描(S2S),用于对齐独立构建的子地图。现有数据集仅支持小规模的S2S对齐任务,且物体多样性有限,未集成视觉-语言表征,导致F2S与S2S两类任务的大规模开放集基准仍未得到充分探索。本研究提出一种自动化标注流水线,通过将基础模型与点云处理工具相结合,从RGB-D图像与位姿信息中构建开放集3D场景图。将该流水线应用于ScanNet数据集后,我们构建了ScanNet-SG——一款面向3D场景图对齐的大规模基准数据集。ScanNet-SG包含超过70万个对齐样本,涵盖ScanNet标注体系中的509个物体类别,以及基于GPT-4o标注得到的3000余个额外类别。每个物体节点均集成了语义标签、BERT嵌入、视觉-语言特征、物体点云与3D边界框。凭借大规模多模态数据的支撑,并同时覆盖F2S与S2S两类设置场景,ScanNet-SG为开放世界环境下鲁棒的3D场景图对齐模型的训练与评估提供了全面的基准平台。
提供机构:
4TU.ResearchData
创建时间:
2026-04-10



