BenchCAD/cad_sft_training
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/BenchCAD/cad_sft_training
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于CAD代码生成的统一SFT语料库数据集。数据集包含两个部分:recode和text2cad。recode部分包含982,847行数据,是从filapro/cad-recode-v1.5重写而来,格式从紧凑的单行样式改为多行bench-shell样式。text2cad部分包含90,492行数据,来自Text2CAD的cadquery子集,同样进行了重写并保留了原始的自然语言描述。两个部分共享相同的BenchCAD样式代码外壳,用于下游SFT。数据集的分割情况为:recode的训练集981,865行,验证集982行;text2cad的训练集76,238行,验证集6,457行,测试集7,797行。数据集的列包括uid(源标识符)、code(BenchCAD shell样式的CadQuery源代码)和description(仅text2cad有,自然语言描述)。重写是纯AST重新格式化,没有数值变化,通过网格等价性验证。
Unified SFT corpus for CAD code generation. Sources: recode — 982,847 rows from filapro/cad-recode-v1.5, rewritten from compact single-line style to multi-line bench-shell (`result = (...)` + `show_object(result)`). text2cad — 90,492 rows from Text2CAD cadquery subset, same rewrite + original natural-language descriptions preserved. Both corpora share the identical BenchCAD-style code shell for downstream SFT. Splits: recode — train: 981,865, val: 982; text2cad — train: 76,238, val: 6,457, test: 7,797. Columns: uid (string) — source identifier, code (string) — CadQuery source in BenchCAD shell style, description (string, text2cad only) — natural-language description. Semantic preservation: The rewrite is a pure AST reformatting — no numerical changes. Verified via mesh equivalence on random samples.
提供机构:
BenchCAD



