tiange/Cap3D

Name: tiange/Cap3D
Creator: tiange
Published: 2024-05-27 00:02:11
License: 暂无描述

Hugging Face2024-05-27 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/tiange/Cap3D

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: odc-by viewer: false task_categories: - text-to-3d --- ## Dataset Description - **Paper:** [Scalable 3D Captioning with Pretrained Models](https://arxiv.org/abs/2306.07279) - **Paper:** [View Selection for 3D Captioning via Diffusion Ranking](https://arxiv.org/abs/2404.07984) - **Repository**: [Github_Cap3D](https://github.com/crockwell/Cap3D) - **Repository**: [Github_DiffuRank](https://github.com/tiangeluo/DiffuRank) - **Project**: [Project](https://cap3d-um.github.io/) This repository hosts data for [Scalable 3D Captioning with Pretrained Models](https://cap3d-um.github.io/) and [View Selection for 3D Captioning via Diffusion Ranking](http://arxiv.org/abs/2404.07984), including descriptive **captions** for 3D objects in [Objaverse](https://arxiv.org/abs/2212.08051), [Objaverse-XL](https://arxiv.org/pdf/2307.05663.pdf), and [ABO](https://arxiv.org/abs/2110.06199). This repo also includes **point clouds** and **rendered images with camera, depth, and MatAlpha information** of Objaverse objects, as well as their Shap-E latent codes. All the captions and data provided by our papers are released under ODC-By 1.0 license. ## Usage Please download and unzip files from [**Page**](https://huggingface.co/datasets/tiange/Cap3D/tree/main) according to your usage. Below is a table listing fiels descriptions, followed by example Python scripts for data loading. | Filename | Description | | -------------------------------------- | ------------------------------------------------------------ | |**Cap3D_automated_Objaverse_full.csv** | By integrating text descriptions initially generated by [Cap3D](https://arxiv.org/abs/2306.07279) and subsequently refined by [DiffuRank](https://arxiv.org/abs/2404.07984), we have produced a total of **1,006,782** 3D-caption pairs. Out of the total, **785,150** pairs have been contributed to the [Objaverse](https://arxiv.org/abs/2212.08051) dataset, with the balance for the [Objaverse-XL](https://arxiv.org/pdf/2307.05663.pdf) dataset (specifically the highquality subset described in Section 4.1 Alignment Finetuning of [Objaverse-XL](https://proceedings.neurips.cc/paper_files/paper/2023/file/70364304877b5e767de4e9a2a511be0c-Paper-Datasets_and_Benchmarks.pdf)). For the object identifier in the left column, strings with a length of 32 characters are UIDs from Objaverse 1.0 (retrieved using `import objaverse; uids = objaverse.load_uids()`). Strings with a length of 64 characters are SHA256 hashes provided by Objaverse-XL. | Cap3D_automated_Objaverse_no3Dword.csv | Combine the text descriptions generated by [Cap3D](https://arxiv.org/abs/2306.07279), resulting in **661,577** 3D-caption pairs for the Objaverse dataset. All captions and related 3D objects here have commercial-friendly licenses (including CC-BY 4.0, CC-BY-SA 4.0, and CC0 1.0). We also filter out potential ethical-issues objects (e.g., identifiable face scans, NSFW, etc). The original captions are densely packed with "3D-model" terminology, potentially limiting their utility in applications like embodied AI. As such, we've created a version with minimized 3D-related words. For example, "A 3D model of a black and yellow samurai sword" ➡️ "a black and yellow samurai sword". This is our NeurIPS version. | **PointCloud_zips** | Provided by [Cap3D](https://arxiv.org/abs/2306.07279) and [DiffuRank](https://arxiv.org/abs/2404.07984), **1,006,782** PointClouds (16,384 colorful points) extracted from Objaverse objects. Saved as `.ply` file. | | PointCloud_pt_zips | PointClouds saved as torch.Tensor `.pt` files, providing faster loading speed than `.ply`. | | **RenderedImage_perobj_zips** | Provided by [DiffuRank](https://arxiv.org/abs/2404.07984), **1,006,782** Rendered images for Objaverse objects. Once unzip `compressed_imgs_perobj_xx.zip` will have multiple zip files which consists of **20** rendering images along with camera details (intrinsic & extrinsic), depth data, and masks ([one example](https://huggingface.co/datasets/tiange/Cap3D/tree/main/RenderedImage_perobj_zips/example_zipfile)). Please specify the unzip path, such as `unzip ed51a51909ee46c780db3a85e821feb2.zip -d ed51a51909ee46c780db3a85e821feb2`. More information are in [here](https://huggingface.co/datasets/tiange/Cap3D/blob/main/RenderedImage_perobj_zips/README.md).| | misc | Including miscellaneous files such as human-authored captions, ABO captions, finetuned models, shapE latent codes, and etc. Please refer to this [README](https://huggingface.co/datasets/tiange/Cap3D/blob/main/misc/README.md) | ``` python # load our captions import pandas as pd captions = pd.read_csv('Cap3D_automated_Objaverse_full.csv', header=None) ## captions: ## 0 1 ## 0 ed51a51909ee46c780db3a85e821feb2 Matte green rifle with a long barrel, stock, a... ## 1 9110b606f6c547b2980fcb3c8c4b6a1c Rustic single-story building with a weathered ... ## 2 80d9caaa1fa04502af666135196456e1 a pair of purple and black swords with white h... ## 3 28d43a218cd8466a8c1f82b29b71e314 3D model of a cluttered outdoor scene with veg... ## 4 75582285fab442a2ba31733f9c8fae66 Floating terrain piece with grassy landscape a... ## ... ... ... ## 1002417 3623e74f34c1c3c523af6b2bb8ffcbe2d2dce897ef61b9... Abstract 3D composition with human figures and... ## 1002418 64e9f7b7a1fc4c4ec56ed8b5917dfd610930043ac5e15f... 3D object with a rough, irregular pink surface... ## 1002419 fcd089d6a237fee21dfd5f0d6d9b74b2fd1150cdc61c7f... Bright pink abstract 3D model of a building wi... ## 1002420 f812dc980050f2d5f4b37df2a8620372f810dd6456a5f2... Monochromatic gray 3D model of a stylized huma... ## 1002421 77c09500b4d8e4b881e1ce6929d56c23658b87173c0996... Modular futuristic spacecraft with red and ora... ## if u want to obtain the caption for specific UID caption = captions[captions[0] == '80d9caaa1fa04502af666135196456e1'][1].values[0] # load point clouds (unzip https://huggingface.co/datasets/tiange/Cap3D/tree/main/PointCloud_pt_zips) import torch pts = torch.load('Cap3D_pcs_pt/80d9caaa1fa04502af666135196456e1.pt') ## pts.shape == torch.Size([6, 16384]) ``` If you have any questions, please contact [Tiange](mailto:tiange.cs@gmail.com) or [Chris](mailto:cnris@umich.edu). ## Citation Information If you find our data or code useful, please consider citing: ```bibtex @article{luo2023scalable, title={Scalable 3D Captioning with Pretrained Models}, author={Luo, Tiange and Rockwell, Chris and Lee, Honglak and Johnson, Justin}, journal={arXiv preprint arXiv:2306.07279}, year={2023} } @article{luo2024view, title={View Selection for 3D Captioning via Diffusion Ranking}, author={Luo, Tiange and Johnson, Justin and Lee, Honglak}, journal={arXiv preprint arXiv:2404.07984}, year={2024} } ``` Please cite ***Objaverse*** and ***ABO*** paper accordingly, if you use related data. ``` @inproceedings{deitke2023objaverse, title={Objaverse: A universe of annotated 3d objects}, author={Deitke, Matt and Schwenk, Dustin and Salvador, Jordi and Weihs, Luca and Michel, Oscar and VanderBilt, Eli and Schmidt, Ludwig and Ehsani, Kiana and Kembhavi, Aniruddha and Farhadi, Ali}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={13142--13153}, year={2023} } @article{deitke2024objaverse, title={Objaverse-xl: A universe of 10m+ 3d objects}, author={Deitke, Matt and Liu, Ruoshi and Wallingford, Matthew and Ngo, Huong and Michel, Oscar and Kusupati, Aditya and Fan, Alan and Laforte, Christian and Voleti, Vikram and Gadre, Samir Yitzhak and others}, journal={Advances in Neural Information Processing Systems}, volume={36}, year={2024} } @inproceedings{collins2022abo, title={Abo: Dataset and benchmarks for real-world 3d object understanding}, author={Collins, Jasmine and Goel, Shubham and Deng, Kenan and Luthra, Achleshwar and Xu, Leon and Gundogdu, Erhan and Zhang, Xi and Vicente, Tomas F Yago and Dideriksen, Thomas and Arora, Himanshu and others}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={21126--21136}, year={2022} } ```

--- license: ODC-BY viewer: false task_categories: - text-to-3d --- ## 数据集描述 - **论文**：[《基于预训练模型的可扩展3D字幕生成》(Scalable 3D Captioning with Pretrained Models)](https://arxiv.org/abs/2306.07279) - **论文**：[《基于扩散排序的3D字幕生成视角选择》(View Selection for 3D Captioning via Diffusion Ranking)](https://arxiv.org/abs/2404.07984) - **代码仓库**：[Github_Cap3D](https://github.com/crockwell/Cap3D) - **代码仓库**：[Github_DiffuRank](https://github.com/tiangeluo/DiffuRank) - **项目主页**：[Project](https://cap3d-um.github.io/) 本仓库托管了《基于预训练模型的可扩展3D字幕生成》与《基于扩散排序的3D字幕生成视角选择》两项研究的相关数据，包括[Objaverse](https://arxiv.org/abs/2212.08051)、[Objaverse-XL](https://arxiv.org/pdf/2307.05663.pdf)及[ABO](https://arxiv.org/abs/2110.06199)数据集内3D物体的描述性**字幕（captions）**。本仓库同时包含Objaverse物体的**点云（point clouds）**、带有相机参数、深度信息与MatAlpha的渲染图像，以及其Shap-E隐空间编码。本论文发布的所有字幕与数据均采用ODC-By 1.0许可证进行开源。 ## 使用方法请根据你的使用需求，从[**页面**](https://huggingface.co/datasets/tiange/Cap3D/tree/main)下载并解压对应文件。下文将先列出各文件的详细说明，随后提供用于数据加载的示例Python脚本。 | 文件名 | 说明 | | ---------------------------------------- | ------------------------------------------------------------ | |**Cap3D_automated_Objaverse_full.csv** | 本文件通过整合[Cap3D](https://arxiv.org/abs/2306.07279)初始生成的文本描述，并经[DiffuRank](https://arxiv.org/abs/2404.07984)优化后，最终生成共计**1,006,782**组3D-字幕配对数据。其中785,150组数据来自[Objaverse](https://arxiv.org/abs/2212.08051)数据集，剩余数据则属于[Objaverse-XL](https://arxiv.org/pdf/2307.05663.pdf)数据集（具体为Objaverse-XL论文第4.1节「对齐微调」中描述的高质量子集）。表格左列的物体标识符中，长度为32位的字符串为Objaverse 1.0的UID（可通过`import objaverse; uids = objaverse.load_uids()`获取）；长度为64位的字符串为Objaverse-XL提供的SHA256哈希值。 | Cap3D_automated_Objaverse_no3Dword.csv | 本文件仅包含[Cap3D](https://arxiv.org/abs/2306.07279)生成的文本描述，共计为Objaverse数据集提供**661,577**组3D-字幕配对数据。此处所有字幕与相关3D物体均采用商业友好的许可证（包括CC-BY 4.0、CC-BY-SA 4.0及CC0 1.0），同时我们过滤掉了存在伦理问题的物体（如可识别的人脸扫描、NSFW内容等）。原始字幕中密集使用「3D模型」类术语，可能限制其在具身AI等场景中的应用。因此我们生成了该版本，尽可能减少与3D相关的词汇。例如："A 3D model of a black and yellow samurai sword" → "a black and yellow samurai sword"。本版本对应NeurIPS投稿版本。 | **PointCloud_zips** | 由[Cap3D](https://arxiv.org/abs/2306.07279)和[DiffuRank](https://arxiv.org/abs/2404.07984)提供，包含从Objaverse物体中提取的**1,006,782**个点云（包含16,384个彩色点），保存为`.ply`格式文件。 | | PointCloud_pt_zips | 以torch.Tensor格式的`.pt`文件保存的点云，加载速度比`.ply`文件更快。 | | **RenderedImage_perobj_zips** | 由[DiffuRank](https://arxiv.org/abs/2404.07984)提供，包含Objaverse物体的**1,006,782**张渲染图像。解压`compressed_imgs_perobj_xx.zip`后，将得到多个压缩包，每个压缩包包含**20**张渲染图像，以及相机参数（内参与外参）、深度数据和掩码（[示例](https://huggingface.co/datasets/tiange/Cap3D/tree/main/RenderedImage_perobj_zips/example_zipfile)）。请指定解压路径，例如`unzip ed51a51909ee46c780db3a85e821feb2.zip -d ed51a51909ee46c780db3a85e821feb2`。更多信息请参阅[此处](https://huggingface.co/datasets/tiange/Cap3D/blob/main/RenderedImage_perobj_zips/README.md)。| | misc | 包含各类辅助文件，如人工撰写的字幕、ABO数据集字幕、微调后的模型、Shap-E隐空间编码等。请参阅该[README文档](https://huggingface.co/datasets/tiange/Cap3D/blob/main/misc/README.md) | python # 加载字幕数据 import pandas as pd captions = pd.read_csv('Cap3D_automated_Objaverse_full.csv', header=None) ## 字幕数据格式： ## 0 1 ## 0 ed51a51909ee46c780db3a85e821feb2 Matte green rifle with a long barrel, stock, a... ## 1 9110b606f6c547b2980fcb3c8c4b6a1c Rustic single-story building with a weathered ... ## 2 80d9caaa1fa04502af666135196456e1 a pair of purple and black swords with white h... ## 3 28d43a218cd8466a8c1f82b29b71e314 3D model of a cluttered outdoor scene with veg... ## 4 75582285fab442a2ba31733f9c8fae66 Floating terrain piece with grassy landscape a... ## ... ... ... ## 1002417 3623e74f34c1c3c523af6b2bb8ffcbe2d2dce897ef61b9... Abstract 3D composition with human figures and... ## 1002418 64e9f7b7a1fc4c4ec56ed8b5917dfd610930043ac5e15f... 3D object with a rough, irregular pink surface... ## 1002419 fcd089d6a237fee21dfd5f0d6d9b74b2fd1150cdc61c7f... Bright pink abstract 3D model of a building wi... ## 1002420 f812dc980050f2d5f4b37dfa8620372f810dd6456a5f2... Monochromatic gray 3D model of a stylized huma... ## 1002421 77c09500b4d8e4b881e1ce6929d56c23658b87173c0996... Modular futuristic spacecraft with red and ora... ## 若需获取指定UID对应的字幕 caption = captions[captions[0] == '80d9caaa1fa04502af666135196456e1'][1].values[0] # 加载点云数据（需先解压 https://huggingface.co/datasets/tiange/Cap3D/tree/main/PointCloud_pt_zips） import torch pts = torch.load('Cap3D_pcs_pt/80d9caaa1fa04502af666135196456e1.pt') ## pts的形状为 torch.Size([6, 16384]) 若有任何疑问，请联系[Tiange](mailto:tiange.cs@gmail.com)或[Chris](mailto:cnris@umich.edu)。 ## 引用信息若你认为本数据集或代码对你的研究有帮助，请引用以下论文： bibtex @article{luo2023scalable, title={基于预训练模型的可扩展3D字幕生成}, author={Luo, Tiange and Rockwell, Chris and Lee, Honglak and Johnson, Justin}, journal={arXiv preprint arXiv:2306.07279}, year={2023} } @article{luo2024view, title={基于扩散排序的3D字幕生成视角选择}, author={Luo, Tiange and Johnson, Justin and Lee, Honglak}, journal={arXiv preprint arXiv:2404.07984}, year={2024} } 若你使用了相关数据集，请同时引用***Objaverse***与***ABO***的相关论文： @inproceedings{deitke2023objaverse, title={Objaverse: A universe of annotated 3d objects}, author={Deitke, Matt and Schwenk, Dustin and Salvador, Jordi and Weihs, Luca and Michel, Oscar and VanderBilt, Eli and Schmidt, Ludwig and Ehsani, Kiana and Kembhavi, Aniruddha and Farhadi, Ali}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={13142--13153}, year={2023} } @article{deitke2024objaverse, title={Objaverse-xl: A universe of 10m+ 3d objects}, author={Deitke, Matt and Liu, Ruoshi and Wallingford, Matthew and Ngo, Huong and Michel, Oscar and Kusupati, Aditya and Fan, Alan and Laforte, Christian and Voleti, Vikram and Gadre, Samir Yitzhak and others}, journal={Advances in Neural Information Processing Systems}, volume={36}, year={2024} } @inproceedings{collins2022abo, title={ABO: 真实世界3D物体理解的数据集与基准测试}, author={Collins, Jasmine and Goel, Shubham and Deng, Kenan and Luthra, Achleshwar and Xu, Leon and Gundogdu, Erhan and Zhang, Xi and Vicente, Tomas F Yago and Dideriksen, Thomas and Arora, Himanshu and others}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={21126--21136}, year={2022} }

提供机构：

tiange

原始信息汇总

数据集描述

该数据集包含以下内容：

论文:
- Scalable 3D Captioning with Pretrained Models
- View Selection for 3D Captioning via Diffusion Ranking
代码仓库:
- Github_Cap3D
- Github_DiffuRank
项目页面: Project

数据集包括以下内容：

描述性字幕：为 Objaverse、Objaverse-XL 和 ABO 中的 3D 对象提供的字幕。
点云：从 Objaverse 对象中提取的 1,006,782 个点云（每个点云包含 16,384 个彩色点），保存为 .ply 文件。
渲染图像：包含相机、深度和 MatAlpha 信息的渲染图像。
Shap-E 潜在代码：Objaverse 对象的 Shap-E 潜在代码。

所有数据和字幕均在 ODC-By 1.0 许可下发布。

文件描述

文件名	描述
Cap3D_automated_Objaverse_full.csv	包含 1,006,782 个 3D-字幕对，其中 785,150 对贡献给 Objaverse 数据集，其余贡献给 Objaverse-XL 数据集。
Cap3D_automated_Objaverse_no3Dword.csv	包含 661,577 个 3D-字幕对，适用于 Objaverse 数据集，字幕中减少了 3D 相关词汇。
PointCloud_zips	包含 1,006,782 个点云文件，保存为 `.ply` 文件。
PointCloud_pt_zips	点云文件保存为 torch.Tensor `.pt` 文件，加载速度更快。
RenderedImage_perobj_zips	包含 1,006,782 个渲染图像文件，每个对象包含 20 张图像及相关信息。
misc	包含其他文件，如人工编写的字幕、ABO 字幕、微调模型、Shap-E 潜在代码等。

使用方法

请从页面下载并解压文件，根据需要使用。以下是数据加载的示例代码：

python

加载字幕

import pandas as pd captions = pd.read_csv(Cap3D_automated_Objaverse_full.csv, header=None)

加载点云

import torch pts = torch.load(Cap3D_pcs_pt/80d9caaa1fa04502af666135196456e1.pt)

引用信息

如果您发现我们的数据或代码有用，请考虑引用以下论文：

bibtex @article{luo2023scalable, title={Scalable 3D Captioning with Pretrained Models}, author={Luo, Tiange and Rockwell, Chris and Lee, Honglak and Johnson, Justin}, journal={arXiv preprint arXiv:2306.07279}, year={2023} }

@article{luo2024view, title={View Selection for 3D Captioning via Diffusion Ranking}, author={Luo, Tiange and Johnson, Justin and Lee, Honglak}, journal={arXiv preprint arXiv:2404.07984}, year={2024} }

如果您使用相关数据，请相应地引用 Objaverse 和 ABO 论文。

搜集汇总

数据集介绍

构建方式

在构建tiange/Cap3D数据集时，研究者们采用了两阶段的方法。首先，利用Cap3D模型生成3D对象的初始文本描述，随后通过DiffuRank模型进行精细化处理，以确保描述的准确性和丰富性。这一过程涉及对Objaverse和Objaverse-XL数据集中的1,006,782个3D对象进行处理，生成相应的描述，并将其与对象的点云数据和渲染图像相结合，形成一个综合性的数据集。

特点

tiange/Cap3D数据集的显著特点在于其大规模和高多样性。该数据集包含了超过一百万个3D对象的描述，涵盖了从日常物品到复杂场景的广泛类别。此外，数据集不仅提供了文本描述，还包含了对象的点云数据和渲染图像，这些图像附带了相机参数、深度信息和MatAlpha信息，为研究者提供了丰富的视觉和几何信息。

使用方法

使用tiange/Cap3D数据集时，用户首先需要从HuggingFace页面下载并解压缩相关文件。数据集提供了多种格式的文件，包括CSV格式的描述文件、PLY和PT格式的点云数据，以及包含渲染图像的ZIP文件。用户可以通过Python脚本加载这些数据，例如使用Pandas库读取描述文件，或使用PyTorch加载点云数据。此外，数据集还提供了详细的文档和示例代码，帮助用户快速上手。

背景与挑战

背景概述

在三维视觉与自然语言处理的交叉领域，tiange/Cap3D数据集的创建标志着一项重大进展。该数据集由Tiange Luo、Chris Rockwell及其团队于2023年提出，旨在解决三维对象的描述生成问题。通过整合预训练模型与扩散排序技术，Cap3D不仅提供了对Objaverse和ABO数据集中三维对象的详细描述，还包含了点云数据和渲染图像，极大地丰富了三维视觉数据的语义信息。这一研究不仅推动了三维描述生成技术的发展，也为相关领域的研究提供了宝贵的资源。

当前挑战

尽管tiange/Cap3D数据集在三维描述生成方面取得了显著成果，但其构建过程中仍面临诸多挑战。首先，生成高质量的三维描述需要复杂的模型架构和大量的计算资源，这对模型的训练和优化提出了高要求。其次，数据集的多样性和覆盖范围的广泛性使得数据标注和处理变得异常复杂，尤其是在处理大规模的三维对象时。此外，如何确保生成的描述既准确又具有语义丰富性，也是当前研究中亟待解决的问题。这些挑战不仅影响了数据集的质量，也制约了其在实际应用中的推广和使用。

常用场景

经典使用场景

在三维视觉与自然语言处理的交叉领域，tiange/Cap3D数据集以其丰富的三维对象描述和渲染图像，成为研究者们探索文本到三维（text-to-3d）任务的宝贵资源。该数据集不仅包含了1,006,782对三维对象与描述的配对，还提供了点云数据和渲染图像，这些数据为构建和评估三维描述生成模型提供了坚实的基础。通过结合预训练模型和扩散排序技术，Cap3D数据集在生成高质量的三维描述方面展现了显著的优势，尤其适用于需要精确描述复杂三维场景的应用场景。

解决学术问题

tiange/Cap3D数据集在解决三维对象描述生成这一学术难题上具有重要意义。传统的三维描述生成方法往往依赖于手工标注，效率低下且成本高昂。Cap3D通过自动化生成和优化描述，显著提升了描述的准确性和多样性，为大规模三维数据集的描述生成提供了新的解决方案。此外，该数据集还促进了三维视觉与自然语言处理领域的深度融合，推动了相关算法和模型的创新与发展。

衍生相关工作

基于tiange/Cap3D数据集，研究者们开展了一系列相关工作，推动了三维视觉与自然语言处理领域的进步。例如，Luo等人在《Scalable 3D Captioning with Pretrained Models》中提出的预训练模型方法，显著提升了三维描述生成的效率和质量；在《View Selection for 3D Captioning via Diffusion Ranking》中，Luo等人进一步通过扩散排序技术优化了视图选择过程，提高了描述的准确性。这些研究不仅丰富了Cap3D数据集的应用场景，也为后续研究提供了宝贵的理论和实践基础。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集