MV-ScanQA 和 TripAlign
收藏arXiv2025-08-15 更新2025-11-27 收录
下载链接:
https://matthewdm0816.github.io/tripalign-mvscanqa/
下载链接
链接失效反馈官方服务:
资源简介:
MV-ScanQA 是一个用于评估多视图场景理解和组合推理能力的 3D 问答数据集,其中 68% 的问题需要整合来自多个视图的信息。TripAlign 是一个大规模的 2D-3D-语言语料库,包含超过 100 万个三元组,通过 2D 视图作为中介,自然地将多个上下文相关的对象组合在一起,从而实现更密集的多对象注释。这两个数据集旨在解决现有 3D 视觉语言数据集在多视图推理和稀疏注释方面的局限性,为 3D 视觉语言模型的训练和评估提供了新的挑战和机会。
MV-ScanQA is a 3D question answering dataset developed to evaluate multi-view scene understanding and compositional reasoning capabilities, with 68% of its questions requiring the integration of information from multiple views. TripAlign is a large-scale 2D-3D-language corpus containing over one million triples, which naturally groups multiple contextually related objects via 2D views as an intermediary to enable denser multi-object annotations. These two datasets aim to address the limitations of existing 3D vision-language datasets in multi-view reasoning and sparse annotation, providing novel challenges and opportunities for the training and evaluation of 3D vision-language models.
提供机构:
北京大学
创建时间:
2025-08-15



