MeshCoderDataset
收藏魔搭社区2026-01-02 更新2025-11-15 收录
下载链接:
https://modelscope.cn/datasets/InternRobotics/MeshCoderDataset
下载链接
链接失效反馈官方服务:
资源简介:
# MeshCoderDataset
This is the dataset of the paper MeshCoder.
## MeshCoder
[](https://daibingquan.github.io/MeshCoder)
[](https://huggingface.co/papers/2508.14879)
[](https://huggingface.co/InternRobotics/MeshCoder)
[](https://daibingquan.github.io/MeshCoder)
[Bingquan Dai*](https://openreview.net/profile?id=%7EBingQuan_Dai1), [Li Luo*](https://openreview.net/profile?id=%7ELuo_Li1), [Qihong Tang](https://openreview.net/profile?id=%7EQihong_Tang1),
[Jie Wang](https://roywangj.github.io/), [Xinyu Lian](https://openreview.net/profile?id=~Xinyu_Lian1), [Hao Xu](https://hoytxu.me/), [Minghan Qin](https://minghanqin.github.io/), [Xudong Xu](https://sheldontsui.github.io/), [Bo Dai](https://daibo.info/), [Haoqian Wang<sup>†</sup>](https://www.sigs.tsinghua.edu.cn/whq_en/main.htm), [Zhaoyang Lyu<sup>†</sup>](https://zhaoyanglyu.github.io/) [Jiangmiao Pang](https://oceanpang.github.io/) <br />
\* Equal contribution <br />
<sup>†</sup> Corresponding author <br />
Project lead: Zhaoyang Lyu
## Overview
MeshCoder is a framework that converts 3D point clouds into editable Blender Python scripts, enabling programmatic reconstruction and editing of complex human-made objects. It overcomes prior limitations by developing expressive APIs for modeling intricate geometries, building a large-scale dataset of 1 million object-code pairs across 41 categories, and training a multimodal LLM to generate accurate, part-segmented code from point clouds. The approach outperforms existing methods in reconstruction quality, supports intuitive shape and topology editing via code modifications, and enhances 3D reasoning capabilities in LLMs.
## Usage
For model checkpoint, please see https://huggingface.co/InternRobotics/MeshCoder/ for more details.
For model usage, please see the Github repository: https://github.com/InternRobotics/MeshCoder regarding installation, training and inference instructions.
<!--
<p>
<strong>MeshCoder</strong> is a large-scale paired object-code dataset with structured, editable code. This dataset comprises approximately <strong>100 thousands diverse 3D objects</strong> and their corresponding <strong>Blender Python scripts</strong> , covering <strong>40 common object categories</strong>. This is significantly larger than existing small-scale datasets that rely on limited Domain-Specific Languages (DSLs).
</p>
<img src="assets/teaser.png" alt="Teaser" width=100% > -->
## 🔑 Key Features
<div class="section">
<p>MeshCoder integrates a wide variety of complex objects, generating paired Blender Python code that describes each object in semantic parts. This ensures:
</p>
<ul>
<li>📊
<strong>Large scale
</strong>: 100 thousand object-code pairs spanning 40 categories , with objects composed of up to 100+ parts.
<!-- An additional synthetic part dataset with ~10 million part-code pairs is also included. -->
</li>
<li>💻
<strong>Structured & Editable Code
</strong>: Generates human-readable Blender Python scripts decomposed into distinct semantic parts , enabling intuitive geometric and topological editing.
</li>
<li>🧊
<strong>Intricate Geometries
</strong>: Built on a comprehensive set of expressive Blender Python APIs capable of synthesizing complex shapes (e.g., via translation, bridge loops, booleans, and arrays) far beyond simple primitives.
</li>
</ul>
</div>
## ⚙️ Getting Started
### Download the Dataset
To download the full dataset, you can use the following code. If you encounter any issues, please refer to the official Hugging Face documentation.
```
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
# When prompted for a password, use an access token with write permissions.
# Generate one from your settings: https://huggingface.co/settings/tokens
git clone https://huggingface.co/datasets/InternRobotics/MeshCoderDataset
# If you want to clone without large files - just their pointers
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/InternRobotics/MeshCoderDataset
```
<!-- If you only want to download a specific dataset, such as `splitaloha`, you can use the following code.
```
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
# Initialize an empty Git repository
git init MeshCoderDataset
cd MeshCoderDataset
# Set the remote repository
git remote add origin https://huggingface.co/datasets/InternRobotics/MeshCoderDataset
# Enable sparse-checkout
git sparse-checkout init
# Pull the data
git pull origin main
``` -->
### Dataset Structure
In order to upload conveniently and save some storage, we zip our point clouds according to the class of object. The following is the example of the directory tree.
```shell
MeshCoderDataset/
|-- dataset_train/ # training set of point clouds
|-- ArmChairFactory.zip # zip file for each object class
|-- BarChairFactory.zip
|-- .../
|-- dataset_val/ # validation set of point clouds
|-- ArmChairFactory.zip
|-- BarChairFactory.zip
|-- .../
|-- dataset_test/ # test set of point clouds
|-- ArmChairFactory.zip
|-- BarChairFactory.zip
|-- .../
train.json
val.json
test.json
train_show.jsonl # for the huggingface rendering.
val_show.jsonl
test_show.jsonl
```
After a period time of unzipping process, you would obtain the directory tree like below, which matches the `pcd_path` of the json file:
```shell
MeshCoderDataset/
|-- dataset_train/ # training set of point clouds
|-- ArmChairFactory_1/ # point clouds for ArmChair
|-- .../ # plenties of other object class
|-- dataset_val/ # validation set of point clouds
|-- ArmChairFactory_1/ # point clouds for ArmChair
|-- .../
|-- dataset_test/ # test set of point clouds
|-- ArmChairFactory_1/ # point clouds for ArmChair
|-- .../
train.json/
val.json/
test.json/
```
Note that the `pcd_path` is relative path. The layout format of train set is listed as follows:
```json
[
{
"id": "10879",
"name": "cocktail table",
"category": "TableCocktailFactory",
"script": "import bpy\nfrom math import radians, pi\nfrom bpy_lib import *\n\ndelete_all()\n\n# object name: cocktail table\n# part_1: leg\ncreate_polygon(name='hexagon_1', sides=6, radius=0.02)\ncreate_curve(name='leg_1', profile_name='hexagon_1', control_points=[[-0.32, -1.0, -0.32], [-0.29, 0.28, -0.29], [-0.28, 0.94, -0.28]], points_radius=[1.0, 2.07, 2.07], handle_type=[1, 1, 1, 1, 1, 1], thickness=0.0, fill_caps='both')\n\n# part_2: leg\ncreate_polygon(name='hexagon_2', sides=6, radius=0.02)\ncreate_curve(name='leg_2', profile_name='hexagon_2', control_points=[[-0.32, -1.0, 0.32], [-0.29, 0.26, 0.29], [-0.28, 0.94, 0.28]], points_radius=[1.0, 2.22, 2.22], handle_type=[1, 1, 1, 1, 1, 1], thickness=0.0, fill_caps='both')\n\n# part_3: leg\ncreate_polygon(name='hexagon_3', sides=6, radius=0.02)\ncreate_curve(name='leg_3', profile_name='hexagon_3', control_points=[[0.32, -1.0, -0.32], [0.29, 0.28, -0.29], [0.28, 0.94, -0.28]], points_radius=[1.0, 2.07, 2.07], handle_type=[1, 1, 1, 1, 1, 1], thickness=0.0, fill_caps='both')\n\n# part_4: leg\ncreate_polygon(name='hexagon_4', sides=6, radius=0.02)\ncreate_curve(name='leg_4', profile_name='hexagon_4', control_points=[[0.32, -1.0, 0.32], [0.29, 0.29, 0.29], [0.28, 0.94, 0.28]], points_radius=[1.0, 2.22, 2.22], handle_type=[1, 1, 1, 1, 1, 1], thickness=0.0, fill_caps='both')\n\n# part_5: strecher\ncreate_primitive(name='strecher_5', primitive_type='cylinder', location=[-0.0, -0.09, -0.0], scale=[0.02, 0.02, 0.42], rotation=[0.01, -0.38, -0.0, 0.92])\n\n# part_6: strecher\ncreate_primitive(name='strecher_6', primitive_type='cylinder', location=[0.0, -0.09, 0.0], scale=[0.02, 0.02, 0.42], rotation=[0.66, 0.27, 0.27, 0.65])\n\n# part_7: table top\ncreate_primitive(name='table top_7', primitive_type='cube', location=[-0.0, 0.97, -0.0], scale=[0.4, 0.4, 0.03], rotation=[0.5, -0.5, 0.5, 0.5])\nbevel(name='table top_7', width=0.09, segments=8)",
"pcd_path": [
"dataset_train/TableCocktailFactory_1_other/point_cloud/10879.npz",
"dataset_train/TableCocktailFactory_1_other/point_cloud_gt/10879.npz"
],
"cd_loss": [
1e-08,
0.00012
]
}
...
]
```
The layout format of val/test set is listed as follows:
```json
[
{
"id": 41627,
"name": "Microwave",
"category": "MicrowaveFactory",
"script": "import bpy\nfrom math import radians, pi\nfrom bpy_lib import *\n\ndelete_all()\n\n# object name: Microwave\n# part_1: door\ncreate_primitive(name='door_1', primitive_type='cube', location=[0.78, -0.0, 0.22], scale=[0.78, 0.51, 0.05], rotation=[0.71, 0.0, 0.7, -0.0])\n\n# part_2: body\ncreate_primitive(name='body_2', primitive_type='cube', location=[-0.05, 0.0, 0.0], scale=[1.0, 0.5, 0.78], rotation=[0.0, 0.71, -0.0, 0.71])\ncreate_primitive(name='Bool2_2', primitive_type='cube', location=[-0.05, 0.0, 0.25], scale=[0.69, 0.43, 0.79], rotation=[0.0, 0.71, -0.0, 0.71])\nboolean_operation(name1='body_2', name2='Bool2_2', operation='DIFFERENCE')\n\n# part_3: sidedoor\ncreate_primitive(name='sidedoor_3', primitive_type='cube', location=[0.78, -0.0, -0.78], scale=[0.5, 0.22, 0.05], rotation=[0.5, 0.5, -0.5, -0.5])\n\n# part_4: plate\ncreate_curve(name='curve_4', control_points=[[0.0, 0.0, 0.0], [0.235, 0.0, 0.0], [0.372, 0.059, 0.0], [0.489, 0.117, 0.0]], handle_type=[0, 3, 0, 0, 0, 0])\nbezier_rotation(name='plate_4', profile_name='curve_4', location=[0.02, -0.43, 0.24], rotation=[0.54, -0.55, 0.46, 0.45], thickness=0.0)",
"completeness": 1.0,
"rec_parts": [
"0.obj",
"1.obj",
"2.obj",
"3.obj"
],
"ori_parts": [
"0.obj",
"1.obj",
"2.obj",
"3.obj"
],
"cd_loss": [
0.00469
],
"pcd_path": [
"dataset_val/MicrowaveFactory_3/point_cloud_gt/41627.npz"
]
}
...
]
```
Note that if the dataset split is train, the pcd path consists of two sources, which is sampled from the mesh generated by code (like `xxx/point_cloud/41627.npz`) and the mesh generated by Infinigen indoor objects (like `xxx/point_cloud_gt/41627.npz`). All pcd_path is relative path. For the list of cd loss ,the first value is with itself, which is set to 1e-8 by default, whereas the second one is calculated between the point cloud sampled from code and point cloud sampled from mesh generated by Infinigen.
<!-- ## 📋 TODO List
- [x] Release 100 thousands samples from the MeshCoderDataset.
-->
## Join Us
We are seeking engineers, interns, researchers, and PhD candidates. If you have an interest in 3D content generation, please send your resume to lvzhaoyang@pjlab.org.cn.
## 🧷 Citation
```BibTex
@article{dai2025meshcoder,
title={Meshcoder: Llm-powered structured mesh code generation from point clouds},
author={Dai, Bingquan and Luo, Li Ray and Tang, Qihong and Wang, Jie and Lian, Xinyu and Xu, Hao and Qin, Minghan and Xu, Xudong and Dai, Bo and Wang, Haoqian and others},
journal={arXiv preprint arXiv:2508.14879},
year={2025}
}
```
## 📄License
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/80x15.png" /></a>
This work is under the <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
# MeshCoder数据集
本数据集为论文MeshCoder所配套的数据集。
## MeshCoder
[](https://daibingquan.github.io/MeshCoder)
[](https://huggingface.co/papers/2508.14879)
[](https://huggingface.co/InternRobotics/MeshCoder)
[](https://daibingquan.github.io/MeshCoder)
[戴兵权*](https://openreview.net/profile?id=%7EBingQuan_Dai1), [罗莉*](https://openreview.net/profile?id=%7ELuo_Li1), [唐启宏](https://openreview.net/profile?id=%7EQihong_Tang1),
[王捷](https://roywangj.github.io/), [连新宇](https://openreview.net/profile?id=~Xinyu_Lian1), [徐浩](https://hoytxu.me/), [秦铭涵](https://minghanqin.github.io/), [徐旭东](https://sheldontsui.github.io/), [戴波](https://daibo.info/), [王浩谦<sup>†</sup>](https://www.sigs.tsinghua.edu.cn/whq_en/main.htm), [吕兆阳<sup>†</sup>](https://zhaoyanglyu.github.io/) [庞江淼](https://oceanpang.github.io/) <br />
* 共同第一作者 <br />
<sup>†</sup> 通讯作者 <br />
项目负责人:吕兆阳
## 概述
MeshCoder是一款将三维点云转换为可编辑的Blender Python脚本的框架,支持程序化重建与编辑复杂的人造物体。该框架克服了现有方法的局限:开发了用于建模复杂几何结构的高表达性API,构建了涵盖41个类别、包含100万组物体-代码对的大规模数据集,并训练了多模态大语言模型(LLM)以从点云中生成精准的分部件代码。该方法在重建质量上优于现有方案,支持通过修改脚本实现直观的形状与拓扑编辑,并可提升大语言模型的三维推理能力。
## 使用方法
如需获取模型权重,请访问https://huggingface.co/InternRobotics/MeshCoder/ 以获取更多细节。
如需了解模型使用方法,请参考GitHub仓库https://github.com/InternRobotics/MeshCoder 中的安装、训练与推理指南。
## 🔑 核心特性
<div class="section">
<p>MeshCoder涵盖各类复杂物体,并生成可按语义部件描述物体的成对Blender Python脚本,具体优势如下:
</p>
<ul>
<li>📊
<strong>大规模</strong>
: 10万组物体-代码对,涵盖40个类别,单物体可包含多达100余个部件。
</li>
<li>💻
<strong>结构化与可编辑代码</strong>
: 生成人类可读的Blender Python脚本,按独立语义部件拆分,支持直观的几何与拓扑编辑。
</li>
<li>🧊
<strong>复杂几何建模</strong>
: 基于一套完整的高表达性Blender Python API,可合成远超简单图元的复杂形状(例如通过平移、桥接循环、布尔运算与阵列操作)。
</li>
</ul>
</div>
## ⚙️ 快速入门
### 数据集下载
如需下载完整数据集,可使用如下代码。若遇到问题,请参考Hugging Face官方文档。
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
# When prompted for a password, use an access token with write permissions.
# Generate one from your settings: https://huggingface.co/settings/tokens
git clone https://huggingface.co/datasets/InternRobotics/MeshCoderDataset
# If you want to clone without large files - just their pointers
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/datasets/InternRobotics/MeshCoderDataset
### 数据集结构
为便于上传并节省存储空间,我们按物体类别对点云文件进行打包压缩。以下为目录树示例。
shell
MeshCoderDataset/
|-- dataset_train/ # 点云训练集
|-- ArmChairFactory.zip # 各物体类别的压缩包
|-- BarChairFactory.zip
|-- .../
|-- dataset_val/ # 点云验证集
|-- ArmChairFactory.zip
|-- BarChairFactory.zip
|-- .../
|-- dataset_test/ # 点云测试集
|-- ArmChairFactory.zip
|-- BarChairFactory.zip
|-- .../
train.json
val.json
test.json
train_show.jsonl # 用于Hugging Face展示
val_show.jsonl
test_show.jsonl
完成解压后,将得到如下目录结构,其与JSON文件中的`pcd_path`字段相匹配:
shell
MeshCoderDataset/
|-- dataset_train/ # 点云训练集
|-- ArmChairFactory_1/ # 扶手椅类别的点云文件
|-- .../ # 其余各类别文件夹
|-- dataset_val/ # 点云验证集
|-- ArmChairFactory_1/ # 扶手椅类别的点云文件
|-- .../
|-- dataset_test/ # 点云测试集
|-- ArmChairFactory_1/ # 扶手椅类别的点云文件
|-- .../
train.json/
val.json/
test.json/
请注意,`pcd_path`为相对路径。训练集的JSON布局格式如下:
json
[
{
"id": "10879",
"name": "cocktail table",
"category": "TableCocktailFactory",
"script": "import bpy
from math import radians, pi
from bpy_lib import *
delete_all()
# object name: cocktail table
# part_1: leg
create_polygon(name='hexagon_1', sides=6, radius=0.02)
create_curve(name='leg_1', profile_name='hexagon_1', control_points=[[-0.32, -1.0, -0.32], [-0.29, 0.28, -0.29], [-0.28, 0.94, -0.28]], points_radius=[1.0, 2.07, 2.07], handle_type=[1, 1, 1, 1, 1, 1], thickness=0.0, fill_caps='both')
# part_2: leg
create_polygon(name='hexagon_2', sides=6, radius=0.02)
create_curve(name='leg_2', profile_name='hexagon_2', control_points=[[-0.32, -1.0, 0.32], [-0.29, 0.26, 0.29], [-0.28, 0.94, 0.28]], points_radius=[1.0, 2.22, 2.22], handle_type=[1, 1, 1, 1, 1, 1], thickness=0.0, fill_caps='both')
# part_3: leg
create_polygon(name='hexagon_3', sides=6, radius=0.02)
create_curve(name='leg_3', profile_name='hexagon_3', control_points=[[0.32, -1.0, -0.32], [0.29, 0.28, -0.29], [0.28, 0.94, -0.28]], points_radius=[1.0, 2.07, 2.07], handle_type=[1, 1, 1, 1, 1, 1], thickness=0.0, fill_caps='both')
# part_4: leg
create_polygon(name='hexagon_4', sides=6, radius=0.02)
create_curve(name='leg_4', profile_name='hexagon_4', control_points=[[0.32, -1.0, 0.32], [0.29, 0.29, 0.29], [0.28, 0.94, 0.28]], points_radius=[1.0, 2.22, 2.22], handle_type=[1, 1, 1, 1, 1, 1], thickness=0.0, fill_caps='both')
# part_5: strecher
create_primitive(name='strecher_5', primitive_type='cylinder', location=[-0.0, -0.09, -0.0], scale=[0.02, 0.02, 0.42], rotation=[0.01, -0.38, -0.0, 0.92])
# part_6: strecher
create_primitive(name='strecher_6', primitive_type='cylinder', location=[0.0, -0.09, 0.0], scale=[0.02, 0.02, 0.42], rotation=[0.66, 0.27, 0.27, 0.65])
# part_7: table top
create_primitive(name='table top_7', primitive_type='cube', location=[-0.0, 0.97, -0.0], scale=[0.4, 0.4, 0.03], rotation=[0.5, -0.5, 0.5, 0.5])
bevel(name='table top_7', width=0.09, segments=8)",
"pcd_path": [
"dataset_train/TableCocktailFactory_1_other/point_cloud/10879.npz",
"dataset_train/TableCocktailFactory_1_other/point_cloud_gt/10879.npz"
],
"cd_loss": [
1e-08,
0.00012
]
}
...
]
验证集与测试集的JSON布局格式如下:
json
[
{
"id": 41627,
"name": "Microwave",
"category": "MicrowaveFactory",
"script": "import bpy
from math import radians, pi
from bpy_lib import *
delete_all()
# object name: Microwave
# part_1: door
create_primitive(name='door_1', primitive_type='cube', location=[0.78, -0.0, 0.22], scale=[0.78, 0.51, 0.05], rotation=[0.71, 0.0, 0.7, -0.0])
# part_2: body
create_primitive(name='body_2', primitive_type='cube', location=[-0.05, 0.0, 0.0], scale=[1.0, 0.5, 0.78], rotation=[0.0, 0.71, -0.0, 0.71])
create_primitive(name='Bool2_2', primitive_type='cube', location=[-0.05, 0.0, 0.25], scale=[0.69, 0.43, 0.79], rotation=[0.0, 0.71, -0.0, 0.71])
boolean_operation(name1='body_2', name2='Bool2_2', operation='DIFFERENCE')
# part_3: sidedoor
create_primitive(name='sidedoor_3', primitive_type='cube', location=[0.78, -0.0, -0.78], scale=[0.5, 0.22, 0.05], rotation=[0.5, 0.5, -0.5, -0.5])
# part_4: plate
create_curve(name='curve_4', control_points=[[0.0, 0.0, 0.0], [0.235, 0.0, 0.0], [0.372, 0.059, 0.0], [0.489, 0.117, 0.0]], handle_type=[0, 3, 0, 0, 0, 0])
bezier_rotation(name='plate_4', profile_name='curve_4', location=[0.02, -0.43, 0.24], rotation=[0.54, -0.55, 0.46, 0.45], thickness=0.0)",
"completeness": 1.0,
"rec_parts": [
"0.obj",
"1.obj",
"2.obj",
"3.obj"
],
"ori_parts": [
"0.obj",
"1.obj",
"2.obj",
"3.obj"
],
"cd_loss": [
0.00469
],
"pcd_path": [
"dataset_val/MicrowaveFactory_3/point_cloud_gt/41627.npz"
]
}
...
]
请注意,若数据集拆分方式为训练集,则`pcd_path`包含两个来源:分别为通过脚本生成的网格采样得到的点云(例如`xxx/point_cloud/41627.npz`),以及通过Infinigen室内物体网格采样得到的点云(例如`xxx/point_cloud_gt/41627.npz`)。所有`pcd_path`均为相对路径。对于CD损失列表,第一个值为自身与自身的损失,默认设为1e-8;第二个值为通过脚本生成的网格采样点云与Infinigen生成的网格采样点云之间的计算损失。
## 加入我们
我们正在招聘工程师、实习生、研究员与博士生。若您对三维内容生成领域感兴趣,请将简历发送至lvzhaoyang@pjlab.org.cn。
## 🧷 引用格式
BibTex
@article{dai2025meshcoder,
title={Meshcoder: Llm-powered structured mesh code generation from point clouds},
author={Dai, Bingquan and Luo, Li Ray and Tang, Qihong and Wang, Jie and Lian, Xinyu and Xu, Hao and Qin, Minghan and Xu, Xudong and Dai, Bo and Wang, Haoqian and others},
journal={arXiv preprint arXiv:2508.14879},
year={2025}
}
## 📄 许可协议
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/80x15.png" /></a>
本作品采用<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">知识共享署名-非商业性使用-相同方式共享4.0国际许可协议</a>进行许可。
提供机构:
maas
创建时间:
2025-11-11



