DongXintong/HistCAD
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/DongXintong/HistCAD
下载链接
链接失效反馈官方服务:
资源简介:
# HistCAD Dataset
HistCAD is a geometrically constrained, parametric history-based CAD dataset for editable CAD modeling, text-to-CAD, and design-intent-aware evaluation.
This repository/package contains a public release of HistCAD. It includes a processed subset of the full dataset described in the paper, together with native STEP exports, rendered images, and a CSV annotation table.

*Random samples from HistCAD-DeepCAD (left), HistCAD-Fusion360 (center), and HistCAD-Industrial (right).*
## Overview
HistCAD represents CAD models as executable, constraint-aware parametric histories. Compared with sequence-only CAD datasets, HistCAD explicitly exposes sketch geometry, geometric constraints, and downstream feature operations in a unified history format.
The full dataset described in the paper contains:
- **HistCAD-Academic**: 162k sequences
- **HistCAD-Industrial**: 8k sequences
- **Total**: 170k executable modeling sequences
This release contains a subset of the full HistCAD collection. Coverage may differ across JSON histories, STEP files, rendered images, and text annotations.
## Subset IDs
In the current release convention, subset IDs map to the three HistCAD partitions as follows:
- `0001`-`0099`: **HistCAD-DeepCAD**
- `0100`: **HistCAD-Fusion360**
- `0101`: **HistCAD-Industrial**
HistCAD-DeepCAD and HistCAD-Fusion360 together form the academic portion of HistCAD.
## Package Contents
This release contains four data modalities:
1. **Parametric modeling sequences** in JSON format
2. **STEP files** exported from executable CAD histories
3. **Rendered images** from multiple canonical viewpoints
4. **Text annotations** in CSV format
A typical packaged release layout is:
```text
HistCAD_release/
|-- JSON/
| |-- histcad_sequences_deepcad_0001-0099.zip
| |-- histcad_sequences_fusion360_0100.zip
| `-- histcad_sequences_industrial_0101.zip
|-- STEP/
| |-- histcad_step_deepcad_0001-0099.zip
| |-- histcad_step_fusion360_0100.zip
| `-- histcad_step_industrial_0101.zip
|-- rendered_images/
| |-- histcad_rendered_images_front.zip
| |-- histcad_rendered_images_back.zip
| |-- histcad_rendered_images_left.zip
| |-- histcad_rendered_images_right.zip
| |-- histcad_rendered_images_top.zip
| |-- histcad_rendered_images_bottom.zip
| |-- histcad_rendered_images_iso_front_right_top.zip
| `-- histcad_rendered_images_iso_back_left_top.zip
`-- annotations/
`-- annotations.csv
```
Each zip stores files directly from the subset-relative path, without an extra `HistCAD/sequences` or `HistCAD/step` prefix. When multiple modalities are available for the same sample, they share the same sample stem. For example:
- Inside `histcad_sequences_deepcad_0001-0099.zip`: `0001/00010008.json`
- Inside `histcad_step_deepcad_0001-0099.zip`: `0001/00010008.step`
- Inside `histcad_rendered_images_iso_front_right_top.zip`: `0001/00010008_iso_front_right_top_.png`
If a packaging run is configured to include only part of the modalities, the remaining modality archives may still be emitted as empty placeholder zip files so the folder structure stays stable.
## JSON Sequence Format
Each JSON file stores a **parametric modeling history** as a list of ordered operations. A simplified sketch-based entry looks like:
```json
[
{
"coordinate_system": {
"Euler Angles": [
0.0,
0.0,
0.0
],
"Translation Vector": [
0.0,
0.0,
0.0
]
},
"sketch": {
"line_1": {
"start": [
-30.0,
-30.0
],
"end": [
30.0,
-30.0
]
},
"line_2": {
"start": [
30.0,
-30.0
],
"end": [
30.0,
30.0
]
},
"line_3": {
"start": [
30.0,
30.0
],
"end": [
-30.0,
30.0
]
},
"line_4": {
"start": [
-30.0,
30.0
],
"end": [
-30.0,
-30.0
]
}
},
"constraints": {
"Coincident": [
[
"line_3.start",
"line_2.end"
],
[
"line_3.end",
"line_4.start"
],
[
"line_1.end",
"line_2.start"
],
[
"line_1.start",
"line_4.end"
]
],
"Horizontal": [
"line_1"
],
"Length": [
[
"line_3",
"60 mm"
],
[
"line_4",
"60 mm"
]
],
"Parallel": [
[
"line_1",
"line_3"
],
[
"line_2",
"line_4"
]
],
"Perpendicular": [
[
"line_1",
"line_2"
]
]
},
"towards": 6.0,
"opposite": 0.0,
"operation": "NewBody"
}
]
```
For sketch-based extrusion steps:
- `towards` is the extrusion distance along the sketch normal
- `opposite` is the extrusion distance against the sketch normal
- `operation` is the Boolean/body mode, typically one of `NewBody`, `Join`, `Cut`, or `Intersect`
Other history entries use operation-specific fields. Common examples include:
- **Revolve**: `start`, `end`, `axis`, `operation`
- **Helix sweep**: `axis`, `pitch`, `turns`, `handedness`, `operation`
- **Fillet**: `near_points`, `radius`, `operation = "Fillet"`
- **Chamfer**: `near_points`, `plane`, `dist`, `angle`, `operation = "Chamfer"`
Sketch primitives may include:
- lines
- circles
- arcs
- ellipses
- elliptical arcs
- NURBS B-splines
HistCAD explicitly supports 19 geometric constraint types:
- `Coincident`
- `Parallel`
- `Perpendicular`
- `Horizontal`
- `Vertical`
- `Tangent`
- `Equal`
- `Concentric`
- `Fix`
- `Normal`
- `Midpoint`
- `Mirror`
- `Angle`
- `Diameter`
- `Radius`
- `MajorRadius`
- `MinorRadius`
- `Length`
- `Distance`
## STEP Files
Each STEP file is a geometry export corresponding to an executable HistCAD history. These files are provided for downstream geometry processing, visualization, and evaluation in tools that do not directly execute the parametric histories.
For some subsets, STEP coverage may be ahead of other modalities.
## Rendered Images
Rendered images are organized by viewpoint. Current viewpoints include:
- `front`
- `back`
- `left`
- `right`
- `top`
- `bottom`
- `iso_front_right_top`
- `iso_back_left_top`
As with other modalities, render coverage may vary across subsets.
## Annotation CSV Format
The current column convention is:
```text
uid,Modeling Process,Geometric Feature,Functional Type,NLT
```
Field meanings:
- `uid`: unique sample identifier, usually in the form `subset_id/sample_id`
- `Modeling Process`: history-grounded description of the CAD construction process
- `Geometric Feature`: summary of the main geometric structures
- `Functional Type`: short functional or semantic category
- `NLT`: natural-language transcription, providing a fuller natural-language description of the part or assembly
## Notes
- This release contains a subset of the full HistCAD dataset described in the accompanying paper.
- Different modalities are not always perfectly aligned for every sample.
- If you need strict one-to-one alignment between JSON, STEP, rendered_images, and text, please verify availability for each sample before building a benchmark.
## License
HistCAD is released under the **MIT License**.
## Citation
This release reflects a newer internal revision of HistCAD; please cite the public arXiv version until the updated manuscript is released.
```bibtex
@misc{dong2025histcadgeometricallyconstrainedparametric,
title={HistCAD: Geometrically Constrained Parametric History-based CAD Dataset},
author={Xintong Dong and Chuanyang Li and Chuqi Han and Peng Zheng and Jiaxin Jing and Yanzhi Song and Zhouwang Yang},
year={2025},
eprint={2602.19171},
archivePrefix={arXiv},
primaryClass={cs.GR},
url={https://arxiv.org/abs/2602.19171}
}
```
提供机构:
DongXintong
搜集汇总
数据集介绍

构建方式
在计算机辅助设计(CAD)领域,传统的序列化数据集往往缺失了设计意图与几何约束信息,难以支撑可编辑建模与智能设计推理。HistCAD的构建旨在打破这一局限,其核心构想是将CAD模型表达为可执行的、带约束的参数化历史序列。数据集以统一的JSON格式记录每一步建模操作,不仅包含草图几何(如直线、圆弧、样条曲线)的详细坐标与参数,还显式标注了19种几何约束类型,包括重合、平行、垂直、相切等。此外,每个序列均明确区分布尔运算模式,如新建实体、合并、裁剪或求交,并支持旋转、螺旋扫掠、圆角、倒角等复杂特征操作。为丰富数据模态,研究团队同步输出了对应各历史序列的原生STEP几何文件、八个标准视角下的渲染图像以及CSV格式的文本标注,构建了一个多模态、可交叉索引的CAD知识库。数据集包含学术与工业两大来源,分别基于DeepCAD、Fusion360以及真实工业模型数据,总计约17万条可执行建模序列,为后续研究提供了规模与多样性兼具的基石。
特点
HistCAD的特色在于其几何约束感知与参数化历史表达的双重设计。与仅记录操作序列的现有数据集不同,该数据集在每个草图步骤中内置了完整的约束网络,使得模型不仅在外观上可复现,更在设计意图上可理解、可编辑。这种结构化的历史表示天然支持基于文本的CAD生成任务,因为约束关系可以被语言模型解释并映射到具体的几何行为上。尤为重要的是,数据集引入了差异化数据分裂方式,依据序列是否具有共用的草图轮廓与约束拓扑,将训练、验证与测试集按几何结构分离,从而避免信息泄露,为评估模型的设计意图理解能力提供了更严苛的基准。此外,多模态对齐的设计——同一样本的JSON历史、STEP实体模型、多视角渲染图和自然语言描述共享相同的标识符——为跨模态的CAD分析与生成研究提供了便利。数据集的工业来源赋予了其应对真实世界复杂形状的能力,而学术子集则保证了研究门槛的可及性。
使用方法
该数据集的多模态特性使其适用于广泛的CAD研究场景。对于文本到CAD的生成任务,研究者可以直接使用CSV标注中的自然语言描述作为输入,将对应的JSON历史序列作为输出目标,通过序列生成模型学习从语义到操作步骤的映射。对于可编辑建模研究,利用JSON中的约束信息与STEP几何文件,可以训练模型预测设计变更后的几何结果,或实现基于用户文本指令的局部模型修改。在评估层面,研究者可利用数据集提供的渲染图像与CSV标注,结合几何对比与设计意图一致性分析,开发更细粒度的评价指标。值得注意的是,不同模态之间的覆盖范围可能存在差异,因此建议在使用前检查各样本的多模态对齐状态。数据以压缩包形式按子集组织,JSON文件位于序列目录下,STEP与渲染图像分别存放于各自目录,而CSV标注文件则汇总了所有样本的元信息。加载时只需按子集ID与样本ID进行索引,即可高效获取特定模型的完整多模态数据。
背景与挑战
背景概述
在计算机辅助设计(CAD)领域,将三维模型表示为可编辑的几何构造历史是实现设计意图理解与智能建模的关键技术路径。2025年,由中国科学技术大学杨周旺教授团队发布的HistCAD数据集,旨在突破传统CAD数据集仅记录模型序列而忽略几何约束与参数化依赖的局限。该数据集包含17万条可执行建模序列,涵盖学术与工业两大子集,其中HistCAD-Academic来自DeepCAD与Fusion360等公开来源,HistCAD-Industrial则包含8千条真实工业模型。核心研究聚焦于利用统一的几何约束历史格式显式表达草图几何、约束关系及下游特征操作,为可编辑CAD建模、文本生成CAD以及设计意图评估提供了标准化的基准数据与训练资源。该数据集的提出,不仅推动了参数化建模的深度学习研究,也为解决三维模型的可解释性与可编辑性困境奠定了重要基础,在计算机图形学与智能设计领域产生了广泛影响。
当前挑战
HistCAD数据集所应对的领域挑战主要在于传统CAD模型表示缺乏对几何约束与设计意图的显式建模,导致已有序列化数据集难以支持模型编辑与设计推理。为此,HistCAD引入了19种几何约束类型(如重合、平行、垂直等)与7种草图原语,涵盖直线、圆弧、椭圆及NURBS样条,并通过统一的历史格式实现从草图到布尔操作的完整参数化流程。在构建过程中,团队需克服多源数据整合时格式不统一的难题,特别是将DeepCAD与Fusion360生成的序列转化为约束感知的标准化结构;同时,还需确保STEP几何导出、多视角渲染图像与文本标注在170万条样本中的模态对齐,并在数据发布中保留部分子集的不完备性以反映真实分布。此外,工业子集的标注需要设计功能类型与自然语言描述字段,这在保持语义准确性的同时,也增加了标注成本与质量控制难度。
常用场景
经典使用场景
HistCAD数据集的核心应用场景聚焦于基于几何约束的参数化CAD建模历史的重建与编辑。与仅包含操作序列的传统CAD数据集不同,HistCAD明确地以统一的历史格式暴露了草图几何、几何约束以及下游特征操作,使得模型生成过程具有可解释性和可编辑性。该数据集常用于评估和训练能够从文本或部分几何输入中生成完整、可执行建模历史的算法,从而推动可编辑CAD建模、文本到CAD生成以及设计意图感知评估等方向的研究。
实际应用
在实际应用中,HistCAD可被集成至智能CAD系统中,用于辅助设计师快速生成符合特定几何约束的零件模型。例如,用户可以通过自然语言描述(如“一个带有圆角的长方体支架”)或部分输入草图,系统利用在HistCAD上训练的模型自动补全完整的建模历史和约束关系,生成可直接用于制造的STEP文件。此外,该数据集还可用于工业设计中的模型检索、装配体约束生成以及有限元分析的自动前处理,显著提升设计效率和自动化水平。
衍生相关工作
HistCAD的发布催生了多项相关研究工作,包括基于约束感知的CAD生成模型、文本到CAD的跨模态生成方法以及设计意图评估指标。例如,研究者利用HistCAD中的几何约束和操作序列,开发出能够生成可编辑CAD模型的变分自编码器或扩散模型,并提出了如约束一致性、历史可回溯性等新的评估标准。此外,该数据集也被用作基准,推动了对带有复杂几何约束的工业零件建模任务的深入研究,为更智能的计算机辅助设计系统奠定了数据基础。
以上内容由遇见数据集搜集并总结生成



