ULIP - ShapeNet Triplets ( ULIP - Objaverse Triplets)
收藏OpenDataLab2023-10-20 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/ULIP_ShapeNet_Triplets_ULIP_Objaverse_Triplets
下载链接
链接失效反馈官方服务:
资源简介:
多模态预训练方法的最新进展通过在3D模态,其2D对应模态和相应的语言模态之间对齐特征,在3D表示学习中显示出有希望的功效。但是,现有的多模态预训练框架用于为3D应用程序收集多模态数据的方法缺乏可扩展性和全面性,这可能会限制多模态学习的全部潜力。主要瓶颈在于语言模态的可扩展性和全面性。为了解决这个瓶颈,我们引入了ULIP-2,这是一种多模态预训练框架,该框架利用先进的多模态大语言模型 (LLMs) 对广泛的知识进行预训练,以自动生成3D对象的整体语言对应物。我们在Objaverse和ShapeNet55这两个大型数据集上进行实验,并发布了生成的三模态三重数据集 (3D点云图像语言),分别命名为 “ULIP-Objaverse三胞胎” 和 “ULIP-ShapeNet三胞胎”。ULIP-2仅需要3D数据本身,并且无需任何手动注释工作,从而证明了其可伸缩性; 并且ULIP-2在ModelNet40 (74% Top1精度) 上实现了下游零镜头分类的显着改进。此外,ULIP-2在仅使用140万个参数 (比当前SOTA少约10倍) 的情况下,在现实世界的ScanObjectNN基准 (91.5% 总体精度) 上创下了新纪录,这标志着在没有人工注释的情况下可扩展多模态3D表示学习的突破。代码和数据集可在此https URL上获得。
Recent advances in multimodal pre-training methods have demonstrated promising efficacy in 3D representation learning by aligning features across 3D modalities, their 2D counterparts, and corresponding linguistic modalities. However, existing multimodal pre-training frameworks for collecting multimodal data for 3D applications lack scalability and comprehensiveness, which may restrict the full potential of multimodal learning. The core bottleneck lies in the scalability and comprehensiveness of linguistic modalities. To address this bottleneck, we introduce ULIP-2, a multimodal pre-training framework that leverages state-of-the-art multimodal large language models (LLMs) pre-trained on extensive knowledge to automatically generate holistic linguistic counterparts for 3D objects. We conduct experiments on two large datasets, Objaverse and ShapeNet55, and release the generated tri-modal triplet datasets (3D point cloud, image, language), named "ULIP-Objaverse Triplets" and "ULIP-ShapeNet Triplets" respectively. ULIP-2 only requires 3D data itself without any manual annotation work, thereby demonstrating its strong scalability; and ULIP-2 achieves significant improvements in downstream zero-shot classification on ModelNet40, reaching a Top-1 accuracy of 74%. Furthermore, ULIP-2 sets a new state-of-the-art record on the real-world ScanObjectNN benchmark with an overall accuracy of 91.5% using only 1.4 million parameters—approximately 10 times fewer than current state-of-the-art models, which marks a breakthrough in scalable multimodal 3D representation learning without human annotations. Code and datasets are available at this https URL.
提供机构:
qiangqiang
创建时间:
2023-10-20
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



