ReferIt3D
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/ReferIt3D
下载链接
链接失效反馈官方服务:
资源简介:
在这项工作中,我们介绍了使用参考语言来识别现实世界 3D 场景中的常见对象的问题。我们专注于一个具有挑战性的设置,其中引用的对象属于细粒度对象类,并且底层场景包含该类的多个对象实例。由于现有的面向 3D 的语言资源稀缺且不适合这项任务,我们首先开发了两个大规模且互补的视觉语言数据集:i) Sr3D,其中包含 83.5K 基于模板的话语,利用细粒度之间的空间关系对象类来定位场景中的引用对象,以及 ii) Nr3D,其中包含通过在 3D 场景中部署 2 玩家对象引用游戏收集的 41.5K 自然、自由形式的话语。使用任一数据集的话语,人类听众可以以高(> 86%,92%)准确度识别所引用的对象。通过利用引入的数据,我们开发了新的神经侦听器,可以理解以对象为中心的自然语言并直接在 3D 场景中识别所引用的对象。一个关键的技术贡献是设计一种方法来组合语言和几何信息(以 3D 点云的形式)并创建多模 (3D) 神经侦听器。重要的是,我们表明通过图神经网络促进对象到对象通信的架构优于上下文不感知替代方案,并且细粒度对象分类是语言辅助 3D 对象识别的重要瓶颈。
In this work, we introduce the problem of identifying common objects in real-world 3D scenes using referring language. We focus on a challenging setting where the referred object belongs to a fine-grained object class, and the underlying scene contains multiple instances of this class. Given that existing 3D-oriented language resources are scarce and unsuitable for this task, we first develop two large-scale and complementary vision-language datasets: i) Sr3D, which contains 83.5K template-based utterances that leverage spatial relationships between fine-grained object classes to locate the referred objects in the scene; and ii) Nr3D, which contains 41.5K natural, free-form utterances collected via deploying a 2-player object reference game in 3D scenes. For utterances from either dataset, human listeners can identify the referred objects with high accuracy (>86%, 92%). Leveraging the introduced datasets, we develop novel neural listeners that can understand object-centric natural language and directly identify the referred objects in 3D scenes. A key technical contribution is designing a method to combine linguistic and geometric information (in the form of 3D point clouds) and create multimodal (3D) neural listeners. Importantly, we show that architectures that facilitate object-to-object communication via Graph Neural Networks outperform context-agnostic alternatives, and fine-grained object classification is an important bottleneck for language-assisted 3D object recognition.
提供机构:
OpenDataLab
创建时间:
2022-09-01
搜集汇总
数据集介绍

背景与挑战
背景概述
ReferIt3D是一个专注于3D场景中对象识别的视觉语言数据集,包含Sr3D和Nr3D两个子集,分别提供基于模板和自然自由形式的话语,用于训练多模态神经侦听器以结合语言和3D几何信息进行细粒度对象定位。该数据集由斯坦福大学和KAUST于2020年发布,旨在解决现实世界3D场景中的语言辅助对象识别问题。
以上内容由遇见数据集搜集并总结生成



