Objaverse-LVIS, ScanObjectNN, ModelNet40

Name: Objaverse-LVIS, ScanObjectNN, ModelNet40
Creator: 庆应义塾大学, 日本产业技术综合研究所, 筑波大学, TICO-AIST先进物流合作研究实验室, 纽伦堡技术大学, 牛津大学
Published: 2025-01-16 11:54:06
License: 暂无描述

arXiv2025-01-16 更新2025-02-25 收录

下载链接：

http://arxiv.org/abs/2501.09278v1

下载链接

链接失效反馈

官方服务：

资源简介：

本文主要讨论了三个用于零样本3D分类的基准数据集：Objaverse-LVIS、ScanObjectNN和ModelNet40。这些数据集由多个研究机构共同创建，旨在解决3D视觉任务中的数据稀缺问题。Objaverse-LVIS和ModelNet40分别包含大量的3D模型和点云数据，而ScanObjectNN则专注于现实世界中的3D物体识别。数据集的创建过程涉及使用LiDAR扫描或手动创建CAD模型，确保了数据的高质量。这些数据集广泛应用于机器人、制造和自动驾驶等领域，旨在通过生成合成数据来扩展有限的真实数据集，从而提升零样本3D分类的性能。

This paper primarily discusses three benchmark datasets for zero-shot 3D classification: Objaverse-LVIS, ScanObjectNN, and ModelNet40. These datasets were jointly developed by multiple research institutions to address the data scarcity issue in 3D vision tasks. Objaverse-LVIS and ModelNet40 respectively contain a large number of 3D models and point cloud data, while ScanObjectNN focuses on real-world 3D object recognition. The dataset creation process involves LiDAR scanning or manual creation of CAD models, which guarantees the high quality of the data. These datasets are widely applied in robotics, manufacturing, autonomous driving and other fields, aiming to expand limited real-world datasets via synthetic data generation and thereby enhance the performance of zero-shot 3D classification.

提供机构：

庆应义塾大学, 日本产业技术综合研究所, 筑波大学, TICO-AIST先进物流合作研究实验室, 纽伦堡技术大学, 牛津大学

创建时间：

2025-01-16

搜集汇总

数据集介绍

构建方式

TeGA (Text-guided Geometric Augmentation) is a synthetic 3D dataset expansion method that leverages text-to-3D generative models to create datasets for language-image-3D pretraining. It begins by using a text-to-3D model like Point-E to generate point clouds from category names of the real dataset, then renders images from these point clouds, and finally applies a consistency filtering strategy to ensure that the generated data aligns with the input text. This method does not require human data collection or annotation, and it effectively augments real datasets.

特点

The key feature of TeGA is its ability to generate high-quality synthetic 3D data that aligns with text prompts, which is crucial for language-image-3D pretraining. The consistency filtering strategy ensures that only data that matches the input text is used, preventing misalignment between modalities. This method has been shown to improve zero-shot 3D classification performance on benchmark datasets like Objaverse-LVIS, ScanObjectNN, and ModelNet40, demonstrating its effectiveness in addressing the scarcity of 3D data.

使用方法

To use TeGA, one must first select a text-to-3D generative model, such as Point-E, and input the category names of the real dataset as text prompts to generate point clouds. Then, images are rendered from these point clouds, and the consistency filtering strategy is applied to ensure alignment with the input text. The synthetic dataset is then combined with the real dataset to train a model for zero-shot 3D classification. The method is effective in enhancing the generalization ability of the model, even with limited real training data.

背景与挑战

背景概述

在三维视觉领域中，零样本识别模型对于泛化能力的提升需要大量的训练数据。然而，在三维分类任务中，收集三维数据和标注的成本高昂且耗时，相较于二维视觉领域而言，这构成了一个显著的障碍。近年来，生成模型在合成数据的生成方面取得了前所未有的进展，而最近的研究也表明，使用生成数据作为训练数据具有潜在的可能性。基于这一背景，Kohei Torimi等人于2025年1月16日在arXiv上发表了论文《Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding》，提出了Text-guided Geometric Augmentation (TeGA)这一合成三维数据集扩展方法。该方法针对语言-图像-三维预训练，通过生成文本到三维模型来增强和扩展有限的三维数据集，旨在解决三维数据稀缺的问题，并为零样本三维视觉应用铺平道路。

当前挑战

TeGA所面临的挑战主要包括：1)三维数据收集和标注的成本高昂且耗时；2)生成模型可能无法准确地生成与文本提示相匹配的三维数据，导致模型训练过程中出现模态错位的问题；3)如何在保证数据质量和数量的同时，减少合成数据中可能引入的社会偏见或其他意外元素。

常用场景

经典使用场景

Objaverse-LVIS, ScanObjectNN, ModelNet40 数据集被广泛应用于三维物体识别和分类任务，特别是在零样本学习的场景中。TeGA 方法通过生成具有文本指导的三维合成数据，有效扩展了这些数据集，使得模型能够在有限的真实训练数据下实现鲁棒的零样本三维分类。

解决学术问题

Objaverse-LVIS, ScanObjectNN, ModelNet40 数据集解决了三维数据稀缺的问题，为三维物体识别和分类任务提供了丰富的数据资源。TeGA 方法通过生成具有文本指导的三维合成数据，进一步扩展了这些数据集，使得模型能够在有限的真实训练数据下实现鲁棒的零样本三维分类，为三维视觉应用开辟了新的可能性。

衍生相关工作

Objaverse-LVIS, ScanObjectNN, ModelNet40 数据集衍生了许多相关的研究工作，如 MixCon3D、ULIP2 等。TeGA 方法通过生成具有文本指导的三维合成数据，进一步扩展了这些数据集，为三维视觉应用提供了更强的支持。此外，TeGA 方法还衍生了一些新的研究方向，如探索更有效的数据扩展方法、改进文本指导的三维生成模型等。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集