---
license: odc-by
viewer: false
task_categories:
- text-to-3d
---
## Dataset Description
- **Paper:** [Scalable 3D Captioning with Pretrained Models](https://arxiv.org/abs/2306.07279)
- **Paper:** [View Selection for 3D Captioning via Diffusion Ranking](https://arxiv.org/abs/2404.07984)
- **Repository**: [Github_Cap3D](https://github.com/crockwell/Cap3D)
- **Repository**: [Github_DiffuRank](https://github.com/tiangeluo/DiffuRank)
- **Project**: [Project](https://cap3d-um.github.io/)
This repository hosts data for [Scalable 3D Captioning with Pretrained Models](https://cap3d-um.github.io/) and [View Selection for 3D Captioning via Diffusion Ranking](http://arxiv.org/abs/2404.07984), including descriptive **captions** for 3D objects in [Objaverse](https://arxiv.org/abs/2212.08051), [Objaverse-XL](https://arxiv.org/pdf/2307.05663.pdf), and [ABO](https://arxiv.org/abs/2110.06199). This repo also includes **point clouds** and **rendered images with camera, depth, and MatAlpha information** of Objaverse objects, as well as their Shap-E latent codes. All the captions and data provided by our papers are released under ODC-By 1.0 license.
## Usage
Please download and unzip files from [**Page**](https://huggingface.co/datasets/tiange/Cap3D/tree/main) according to your usage. Below is a table listing fiels descriptions, followed by example Python scripts for data loading.
| Filename | Description |
| -------------------------------------- | ------------------------------------------------------------ |
|**Cap3D_automated_Objaverse_full.csv** | By integrating text descriptions initially generated by [Cap3D](https://arxiv.org/abs/2306.07279) and subsequently refined by [DiffuRank](https://arxiv.org/abs/2404.07984), we have produced a total of **1,006,782** 3D-caption pairs. Out of the total, **785,150** pairs have been contributed to the [Objaverse](https://arxiv.org/abs/2212.08051) dataset, with the balance for the [Objaverse-XL](https://arxiv.org/pdf/2307.05663.pdf) dataset (specifically the highquality subset described in Section 4.1 Alignment Finetuning of [Objaverse-XL](https://proceedings.neurips.cc/paper_files/paper/2023/file/70364304877b5e767de4e9a2a511be0c-Paper-Datasets_and_Benchmarks.pdf)). For the object identifier in the left column, strings with a length of 32 characters are UIDs from Objaverse 1.0 (retrieved using `import objaverse; uids = objaverse.load_uids()`). Strings with a length of 64 characters are SHA256 hashes provided by Objaverse-XL.
| Cap3D_automated_Objaverse_no3Dword.csv | Combine the text descriptions generated by [Cap3D](https://arxiv.org/abs/2306.07279), resulting in **661,577** 3D-caption pairs for the Objaverse dataset. All captions and related 3D objects here have commercial-friendly licenses (including CC-BY 4.0, CC-BY-SA 4.0, and CC0 1.0). We also filter out potential ethical-issues objects (e.g., identifiable face scans, NSFW, etc). The original captions are densely packed with "3D-model" terminology, potentially limiting their utility in applications like embodied AI. As such, we've created a version with minimized 3D-related words. For example, "A 3D model of a black and yellow samurai sword" ➡️ "a black and yellow samurai sword". This is our NeurIPS version.
| **PointCloud_zips** | Provided by [Cap3D](https://arxiv.org/abs/2306.07279) and [DiffuRank](https://arxiv.org/abs/2404.07984), **1,006,782** PointClouds (16,384 colorful points) extracted from Objaverse objects. Saved as `.ply` file. |
| PointCloud_pt_zips | PointClouds saved as torch.Tensor `.pt` files, providing faster loading speed than `.ply`. |
| **RenderedImage_perobj_zips** | Provided by [DiffuRank](https://arxiv.org/abs/2404.07984), **1,006,782** Rendered images for Objaverse objects. Once unzip `compressed_imgs_perobj_xx.zip` will have multiple zip files which consists of **20** rendering images along with camera details (intrinsic & extrinsic), depth data, and masks ([one example](https://huggingface.co/datasets/tiange/Cap3D/tree/main/RenderedImage_perobj_zips/example_zipfile)). Please specify the unzip path, such as `unzip ed51a51909ee46c780db3a85e821feb2.zip -d ed51a51909ee46c780db3a85e821feb2`. More information are in [here](https://huggingface.co/datasets/tiange/Cap3D/blob/main/RenderedImage_perobj_zips/README.md).|
| misc | Including miscellaneous files such as human-authored captions, ABO captions, finetuned models, shapE latent codes, and etc. Please refer to this [README](https://huggingface.co/datasets/tiange/Cap3D/blob/main/misc/README.md) |
``` python
# load our captions
import pandas as pd
captions = pd.read_csv('Cap3D_automated_Objaverse_full.csv', header=None)
## captions:
## 0 1
## 0 ed51a51909ee46c780db3a85e821feb2 Matte green rifle with a long barrel, stock, a...
## 1 9110b606f6c547b2980fcb3c8c4b6a1c Rustic single-story building with a weathered ...
## 2 80d9caaa1fa04502af666135196456e1 a pair of purple and black swords with white h...
## 3 28d43a218cd8466a8c1f82b29b71e314 3D model of a cluttered outdoor scene with veg...
## 4 75582285fab442a2ba31733f9c8fae66 Floating terrain piece with grassy landscape a...
## ... ... ...
## 1002417 3623e74f34c1c3c523af6b2bb8ffcbe2d2dce897ef61b9... Abstract 3D composition with human figures and...
## 1002418 64e9f7b7a1fc4c4ec56ed8b5917dfd610930043ac5e15f... 3D object with a rough, irregular pink surface...
## 1002419 fcd089d6a237fee21dfd5f0d6d9b74b2fd1150cdc61c7f... Bright pink abstract 3D model of a building wi...
## 1002420 f812dc980050f2d5f4b37df2a8620372f810dd6456a5f2... Monochromatic gray 3D model of a stylized huma...
## 1002421 77c09500b4d8e4b881e1ce6929d56c23658b87173c0996... Modular futuristic spacecraft with red and ora...
## if u want to obtain the caption for specific UID
caption = captions[captions[0] == '80d9caaa1fa04502af666135196456e1'][1].values[0]
# load point clouds (unzip https://huggingface.co/datasets/tiange/Cap3D/tree/main/PointCloud_pt_zips)
import torch
pts = torch.load('Cap3D_pcs_pt/80d9caaa1fa04502af666135196456e1.pt')
## pts.shape == torch.Size([6, 16384])
```
If you have any questions, please contact [Tiange](mailto:tiange.cs@gmail.com) or [Chris](mailto:cnris@umich.edu).
## Citation Information
If you find our data or code useful, please consider citing:
```bibtex
@article{luo2023scalable,
title={Scalable 3D Captioning with Pretrained Models},
author={Luo, Tiange and Rockwell, Chris and Lee, Honglak and Johnson, Justin},
journal={arXiv preprint arXiv:2306.07279},
year={2023}
}
@article{luo2024view,
title={View Selection for 3D Captioning via Diffusion Ranking},
author={Luo, Tiange and Johnson, Justin and Lee, Honglak},
journal={arXiv preprint arXiv:2404.07984},
year={2024}
}
```
Please cite ***Objaverse*** and ***ABO*** paper accordingly, if you use related data.
```
@inproceedings{deitke2023objaverse,
title={Objaverse: A universe of annotated 3d objects},
author={Deitke, Matt and Schwenk, Dustin and Salvador, Jordi and Weihs, Luca and Michel, Oscar and VanderBilt, Eli and Schmidt, Ludwig and Ehsani, Kiana and Kembhavi, Aniruddha and Farhadi, Ali},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={13142--13153},
year={2023}
}
@article{deitke2024objaverse,
title={Objaverse-xl: A universe of 10m+ 3d objects},
author={Deitke, Matt and Liu, Ruoshi and Wallingford, Matthew and Ngo, Huong and Michel, Oscar and Kusupati, Aditya and Fan, Alan and Laforte, Christian and Voleti, Vikram and Gadre, Samir Yitzhak and others},
journal={Advances in Neural Information Processing Systems},
volume={36},
year={2024}
}
@inproceedings{collins2022abo,
title={Abo: Dataset and benchmarks for real-world 3d object understanding},
author={Collins, Jasmine and Goel, Shubham and Deng, Kenan and Luthra, Achleshwar and Xu, Leon and Gundogdu, Erhan and Zhang, Xi and Vicente, Tomas F Yago and Dideriksen, Thomas and Arora, Himanshu and others},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={21126--21136},
year={2022}
}
```
---
license: ODC-BY
viewer: false
task_categories:
- text-to-3d
---
## 数据集描述
- **论文**:[《基于预训练模型的可扩展3D字幕生成》(Scalable 3D Captioning with Pretrained Models)](https://arxiv.org/abs/2306.07279)
- **论文**:[《基于扩散排序的3D字幕生成视角选择》(View Selection for 3D Captioning via Diffusion Ranking)](https://arxiv.org/abs/2404.07984)
- **代码仓库**:[Github_Cap3D](https://github.com/crockwell/Cap3D)
- **代码仓库**:[Github_DiffuRank](https://github.com/tiangeluo/DiffuRank)
- **项目主页**:[Project](https://cap3d-um.github.io/)
本仓库托管了《基于预训练模型的可扩展3D字幕生成》与《基于扩散排序的3D字幕生成视角选择》两项研究的相关数据,包括[Objaverse](https://arxiv.org/abs/2212.08051)、[Objaverse-XL](https://arxiv.org/pdf/2307.05663.pdf)及[ABO](https://arxiv.org/abs/2110.06199)数据集内3D物体的描述性**字幕(captions)**。本仓库同时包含Objaverse物体的**点云(point clouds)**、带有相机参数、深度信息与MatAlpha的渲染图像,以及其Shap-E隐空间编码。本论文发布的所有字幕与数据均采用ODC-By 1.0许可证进行开源。
## 使用方法
请根据你的使用需求,从[**页面**](https://huggingface.co/datasets/tiange/Cap3D/tree/main)下载并解压对应文件。下文将先列出各文件的详细说明,随后提供用于数据加载的示例Python脚本。
| 文件名 | 说明 |
| ---------------------------------------- | ------------------------------------------------------------ |
|**Cap3D_automated_Objaverse_full.csv** | 本文件通过整合[Cap3D](https://arxiv.org/abs/2306.07279)初始生成的文本描述,并经[DiffuRank](https://arxiv.org/abs/2404.07984)优化后,最终生成共计**1,006,782**组3D-字幕配对数据。其中785,150组数据来自[Objaverse](https://arxiv.org/abs/2212.08051)数据集,剩余数据则属于[Objaverse-XL](https://arxiv.org/pdf/2307.05663.pdf)数据集(具体为Objaverse-XL论文第4.1节「对齐微调」中描述的高质量子集)。表格左列的物体标识符中,长度为32位的字符串为Objaverse 1.0的UID(可通过`import objaverse; uids = objaverse.load_uids()`获取);长度为64位的字符串为Objaverse-XL提供的SHA256哈希值。
| Cap3D_automated_Objaverse_no3Dword.csv | 本文件仅包含[Cap3D](https://arxiv.org/abs/2306.07279)生成的文本描述,共计为Objaverse数据集提供**661,577**组3D-字幕配对数据。此处所有字幕与相关3D物体均采用商业友好的许可证(包括CC-BY 4.0、CC-BY-SA 4.0及CC0 1.0),同时我们过滤掉了存在伦理问题的物体(如可识别的人脸扫描、NSFW内容等)。原始字幕中密集使用「3D模型」类术语,可能限制其在具身AI等场景中的应用。因此我们生成了该版本,尽可能减少与3D相关的词汇。例如:"A 3D model of a black and yellow samurai sword" → "a black and yellow samurai sword"。本版本对应NeurIPS投稿版本。
| **PointCloud_zips** | 由[Cap3D](https://arxiv.org/abs/2306.07279)和[DiffuRank](https://arxiv.org/abs/2404.07984)提供,包含从Objaverse物体中提取的**1,006,782**个点云(包含16,384个彩色点),保存为`.ply`格式文件。 |
| PointCloud_pt_zips | 以torch.Tensor格式的`.pt`文件保存的点云,加载速度比`.ply`文件更快。 |
| **RenderedImage_perobj_zips** | 由[DiffuRank](https://arxiv.org/abs/2404.07984)提供,包含Objaverse物体的**1,006,782**张渲染图像。解压`compressed_imgs_perobj_xx.zip`后,将得到多个压缩包,每个压缩包包含**20**张渲染图像,以及相机参数(内参与外参)、深度数据和掩码([示例](https://huggingface.co/datasets/tiange/Cap3D/tree/main/RenderedImage_perobj_zips/example_zipfile))。请指定解压路径,例如`unzip ed51a51909ee46c780db3a85e821feb2.zip -d ed51a51909ee46c780db3a85e821feb2`。更多信息请参阅[此处](https://huggingface.co/datasets/tiange/Cap3D/blob/main/RenderedImage_perobj_zips/README.md)。|
| misc | 包含各类辅助文件,如人工撰写的字幕、ABO数据集字幕、微调后的模型、Shap-E隐空间编码等。请参阅该[README文档](https://huggingface.co/datasets/tiange/Cap3D/blob/main/misc/README.md) |
python
# 加载字幕数据
import pandas as pd
captions = pd.read_csv('Cap3D_automated_Objaverse_full.csv', header=None)
## 字幕数据格式:
## 0 1
## 0 ed51a51909ee46c780db3a85e821feb2 Matte green rifle with a long barrel, stock, a...
## 1 9110b606f6c547b2980fcb3c8c4b6a1c Rustic single-story building with a weathered ...
## 2 80d9caaa1fa04502af666135196456e1 a pair of purple and black swords with white h...
## 3 28d43a218cd8466a8c1f82b29b71e314 3D model of a cluttered outdoor scene with veg...
## 4 75582285fab442a2ba31733f9c8fae66 Floating terrain piece with grassy landscape a...
## ... ... ...
## 1002417 3623e74f34c1c3c523af6b2bb8ffcbe2d2dce897ef61b9... Abstract 3D composition with human figures and...
## 1002418 64e9f7b7a1fc4c4ec56ed8b5917dfd610930043ac5e15f... 3D object with a rough, irregular pink surface...
## 1002419 fcd089d6a237fee21dfd5f0d6d9b74b2fd1150cdc61c7f... Bright pink abstract 3D model of a building wi...
## 1002420 f812dc980050f2d5f4b37dfa8620372f810dd6456a5f2... Monochromatic gray 3D model of a stylized huma...
## 1002421 77c09500b4d8e4b881e1ce6929d56c23658b87173c0996... Modular futuristic spacecraft with red and ora...
## 若需获取指定UID对应的字幕
caption = captions[captions[0] == '80d9caaa1fa04502af666135196456e1'][1].values[0]
# 加载点云数据(需先解压 https://huggingface.co/datasets/tiange/Cap3D/tree/main/PointCloud_pt_zips)
import torch
pts = torch.load('Cap3D_pcs_pt/80d9caaa1fa04502af666135196456e1.pt')
## pts的形状为 torch.Size([6, 16384])
若有任何疑问,请联系[Tiange](mailto:tiange.cs@gmail.com)或[Chris](mailto:cnris@umich.edu)。
## 引用信息
若你认为本数据集或代码对你的研究有帮助,请引用以下论文:
bibtex
@article{luo2023scalable,
title={基于预训练模型的可扩展3D字幕生成},
author={Luo, Tiange and Rockwell, Chris and Lee, Honglak and Johnson, Justin},
journal={arXiv preprint arXiv:2306.07279},
year={2023}
}
@article{luo2024view,
title={基于扩散排序的3D字幕生成视角选择},
author={Luo, Tiange and Johnson, Justin and Lee, Honglak},
journal={arXiv preprint arXiv:2404.07984},
year={2024}
}
若你使用了相关数据集,请同时引用***Objaverse***与***ABO***的相关论文:
@inproceedings{deitke2023objaverse,
title={Objaverse: A universe of annotated 3d objects},
author={Deitke, Matt and Schwenk, Dustin and Salvador, Jordi and Weihs, Luca and Michel, Oscar and VanderBilt, Eli and Schmidt, Ludwig and Ehsani, Kiana and Kembhavi, Aniruddha and Farhadi, Ali},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={13142--13153},
year={2023}
}
@article{deitke2024objaverse,
title={Objaverse-xl: A universe of 10m+ 3d objects},
author={Deitke, Matt and Liu, Ruoshi and Wallingford, Matthew and Ngo, Huong and Michel, Oscar and Kusupati, Aditya and Fan, Alan and Laforte, Christian and Voleti, Vikram and Gadre, Samir Yitzhak and others},
journal={Advances in Neural Information Processing Systems},
volume={36},
year={2024}
}
@inproceedings{collins2022abo,
title={ABO: 真实世界3D物体理解的数据集与基准测试},
author={Collins, Jasmine and Goel, Shubham and Deng, Kenan and Luthra, Achleshwar and Xu, Leon and Gundogdu, Erhan and Zhang, Xi and Vicente, Tomas F Yago and Dideriksen, Thomas and Arora, Himanshu and others},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={21126--21136},
year={2022}
}