five

yongqiqng/CivilInstruct-Sample

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/yongqiqng/CivilInstruct-Sample
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 task_categories: - text-generation language: - en tags: - code - structural-engineering - openseespy - scientific-modeling - instruction-tuning - reinforcement-learning size_categories: - 1K<n<10K configs: - config_name: sft data_files: - split: train path: data/sft_train.parquet - split: validation path: data/sft_val.parquet - config_name: rl data_files: - split: train path: data/rl_train.parquet - split: test path: data/rl_test.parquet --- # CivilInstruct-Sample A 10% sample of the **CivilInstruct** dataset from the paper *Rethinking Scientific Modeling: Toward Physically Consistent and Simulation-Executable Programmatic Generation*. This sample is released for **demonstration and reproducibility** of the AutoBM training pipeline. The full dataset (10,912 samples) will be released upon paper publication. ## Overview CivilInstruct is a domain-specific instruction dataset for training LLMs to generate executable, physically consistent **OpenSeesPy** structural modeling code from natural language building specifications. It integrates four data sources: | Part | Description | Full | Sample (this repo) | |------|-------------|------|--------------------| | Part 1 | Fine-grained supervised data (OpenSeesPy API tutorials) | 3,881 | ~388 | | Part 2 | Parameterized generated long code data | 3,100 | ~310 | | Part 3 | Execution error-oriented debugging CoT data | 3,500 | ~350 | | Part 4 | Physics-informed expert data (with ground-truth periods) | 512 | ~51 | ## Subsets ### `sft` — Supervised Fine-Tuning Used for Stage I training (instruction tuning). | Split | Samples | Fields | |-------|---------|--------| | `train` | 989 | `instruction`, `response` | | `validation` | 20 | `instruction`, `response` | ### `rl` — Reinforcement Learning (SPC-GRPO) Used for Stage II training with Multi-Granularity Hybrid Reward (MGHR). | Split | Samples | Fields | |-------|---------|--------| | `train` | 45 | `data_source`, `prompt`, `ability`, `reward_model`, `extra_info` | | `test` | 5 | `data_source`, `prompt`, `ability`, `reward_model`, `extra_info` | The `reward_model.ground_truth` field contains the structural fundamental period (T1) used for physics-aware reward grading. ## Usage ```python from datasets import load_dataset # SFT data sft_train = load_dataset("yongqiqng/CivilInstruct-Sample", "sft", split="train") print(sft_train[0]) # RL data rl_train = load_dataset("yongqiqng/CivilInstruct-Sample", "rl", split="train") print(rl_train[0]) ``` ## Related - Paper: [arXiv:2602.07083](https://arxiv.org/abs/2602.07083) - Code: [github.com/Jovanqing/AutoBM](https://github.com/Jovanqing/AutoBM) - Model: [Jovanqing/Seed-Coder-8B-Reasoning-SFT](https://huggingface.co/Jovanqing/Seed-Coder-8B-Reasoning-SFT) - Full dataset: *Coming soon upon paper publication* ## Citation ```bibtex @article{jiang2026rethinking, title={Rethinking Scientific Modeling: Toward Physically Consistent and Simulation-Executable Programmatic Generation}, author={Jiang, Yongqing and Wang, Jianze and Shen, Zhiqi and Lin, Zhenghong and Wang, Jiayuan and Yang, Yijian and Dai, Kaoshan and Luo, Haoran}, journal={arXiv preprint arXiv:2602.07083}, year={2026} } ``` ## License This sample is released under **CC BY-NC 4.0** (non-commercial). For commercial inquiries, please contact the authors.
提供机构:
yongqiqng
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作