fchis/laravel-buildspec-training
收藏Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/fchis/laravel-buildspec-training
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
- php
tags:
- laravel
- php
- code-generation
- fine-tuning
- buildspec
task_categories:
- text-generation
pretty_name: Laravel 13.x BuildSpec → Code Training Dataset
size_categories:
- n<1K
---
# Laravel 13.x BuildSpec → Code Training Dataset
Training data for fine-tuning code generation models to convert structured **BuildSpec JSON** objects into Laravel 13.x PHP files.
This dataset was used to train [fchis/Laravel-13x-Qwen2.5-Coder-7B-Instruct-LoRA-Spec](https://huggingface.co/fchis/Laravel-13x-Qwen2.5-Coder-7B-Instruct-LoRA-Spec).
## What is BuildSpec?
BuildSpec is a structured JSON format that unambiguously describes a single Laravel artifact (model, migration, controller, resource, form_request, or pest_test). Instead of asking the model to interpret natural language, you give it an exact specification:
```json
{
"laravel_version": "13.x",
"artifact": "model",
"class": "Subscriber",
"namespace": "App\\Models",
"table": "subscribers",
"has_factory": true,
"soft_deletes": false,
"fillable": ["email", "name", "status", "subscribed_at"],
"casts": {"subscribed_at": "datetime"},
"relationships": [],
"scopes": [
{"name": "active", "column": "status", "value": "active"}
]
}
```
The model outputs the complete PHP file — no markdown, no explanations.
## Why BuildSpec?
The spec approach **shifts error type** from semantic hallucinations (hard to fix) to specification gaps (compiler-catchable):
| Approach | Error type | Fix method |
|----------|-----------|-----------|
| Natural language prompt | Model invents things not asked for | Runtime debugging |
| BuildSpec JSON | Wrong field names, missing spec fields | Spec compiler validates before generation |
## Dataset Statistics
| Split | Examples | Artifacts |
|-------|----------|-----------|
| train | 49 | model(14), resource(8), controller(8), form_request(10), pest_test(5), migration(4) |
| valid | 5 | mixed |
## Format
Each example uses the OpenAI chat format:
```json
{
"messages": [
{"role": "system", "content": "You are a Laravel 13.x PHP code generator..."},
{"role": "user", "content": "{...BuildSpec JSON...}"},
{"role": "assistant", "content": "<?php\nnamespace App\\Models;\n..."}
]
}
```
## Artifact Reference
### model
```json
{
"artifact": "model",
"class": "Post",
"namespace": "App\\Models",
"table": "posts",
"has_factory": true,
"soft_deletes": false,
"fillable": ["title", "body", "user_id"],
"casts": {"published_at": "datetime"},
"relationships": [
{"type": "BelongsTo", "model": "User", "method": "author", "foreign_key": "user_id"},
{"type": "HasMany", "model": "Comment", "method": "comments"},
{"type": "BelongsToMany", "model": "Tag", "method": "tags"}
],
"scopes": [
{"name": "published", "column": "status", "value": "published"}
]
}
```
### controller
```json
{
"artifact": "controller",
"class": "PostController",
"namespace": "App\\Http\\Controllers\\Api",
"model": "Post",
"resource": "PostResource",
"form_request": "StorePostRequest",
"validation_mode": "form_request",
"eager_load": ["author", "tags"],
"paginate": 15,
"filters": ["status"],
"many_to_many": {"relation": "tags", "input_key": "tag_ids"}
}
```
### form_request (with conditional rules)
```json
{
"artifact": "form_request",
"class": "StoreEventRequest",
"namespace": "App\\Http\\Requests",
"rules": {
"title": ["required", "string", "max:255"],
"venue_id": [],
"event_date": ["required", "date"]
},
"conditional_rules": {
"venue_id": {
"POST": ["required", "integer", "exists:venues,id"],
"PUT": ["sometimes", "integer", "exists:venues,id"]
},
"event_date": {
"POST": ["after:now"]
}
}
}
```
**Important**: Never use literal conditional tokens like `"required_on_post"` in `rules[]`. Use `conditional_rules{}` instead. The [spec compiler](https://github.com/florinel-chis/laravel-ai-gen) rejects them.
### resource
```json
{
"artifact": "resource",
"class": "PostResource",
"namespace": "App\\Http\\Resources",
"fields": [
{"key": "id", "source": "id"},
{"key": "title", "source": "title"},
{"key": "created_at", "source": "created_at"}
],
"loaded_relations": [
{"key": "author", "resource": "AuthorResource", "type": "make"},
{"key": "tags", "resource": "TagResource", "type": "collection"}
]
}
```
### pest_test
```json
{
"artifact": "pest_test",
"class": "Post",
"namespace": "Tests\\Feature",
"model": "App\\Models\\Post",
"endpoints": [
{"method": "GET", "path": "/api/posts", "action": "index"},
{"method": "POST", "path": "/api/posts", "action": "store"},
{"method": "GET", "path": "/api/posts/{id}", "action": "show"},
{"method": "PUT", "path": "/api/posts/{id}", "action": "update"},
{"method": "DELETE", "path": "/api/posts/{id}", "action": "destroy"}
],
"required_on_create": ["title", "body"]
}
```
## Full Pipeline
See [laravel-ai-gen](https://github.com/florinel-chis/laravel-ai-gen) for the complete pipeline:
```bash
# NL → specs → compile → generate PHP files
python3 pipeline_spec.py "Create a REST API for managing blog posts with tags"
```
Pipeline stages:
1. **Planner** (`planner.py`): NL description → BuildSpec JSON array (few-shot)
2. **Compiler** (`spec_compiler.py`): validates + normalizes each spec
3. **Generator**: each spec → PHP file (this model)
4. **Syntax check**: `php -l` on all written files
## Training Details
- **Base model**: `Qwen2.5-Coder-7B-Instruct` (4-bit quantized via MLX)
- **Method**: LoRA (rank 8, `--num-layers 8`)
- **Training framework**: `mlx-lm`
- **Hardware**: Apple M2 Pro 16GB
- **Iterations**: 225 cumulative (5 rounds: 100+25+25+25+50)
- **Final val_loss**: 0.065
- **Command**:
```bash
mlx_lm.lora \
--model mlx-community/Qwen2.5-Coder-7B-Instruct-4bit \
--train \
--data data_spec \
--adapter-path adapters_spec \
--batch-size 1 --iters 100 \
--learning-rate 1e-5 \
--num-layers 8 \
--max-seq-length 1400
```
## Results
Tested on a 3-app benchmark (Subscriber API, Book Library, Event Management):
| Metric | Score |
|--------|-------|
| PHP syntax valid | 26/26 (100%) |
| Eval perfect (0 bugs) | 26/26 (100%) |
| Pest tests pass | 20/20 (100%) |
| Manual fixes needed | 3 (SoftDeletes trait, JsonResource import, repetition loop) |
## Citation
```bibtex
@misc{laravel-buildspec-2026,
author = {Florinel Chis},
title = {Laravel 13.x BuildSpec to Code Training Dataset},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/fchis/laravel-buildspec-training}
}
```
提供机构:
fchis
搜集汇总
数据集介绍

构建方式
在软件工程领域,自动化代码生成正逐步成为提升开发效率的关键技术。本数据集专为微调代码生成模型而构建,旨在将结构化的BuildSpec JSON对象精准转换为Laravel 13.x框架的PHP文件。其构建过程依托于一个严谨的管道系统,首先通过规划器将自然语言描述转化为BuildSpec JSON数组,随后利用编译器对每个规范进行验证与标准化,最终由生成器模型输出完整的PHP代码文件。数据集包含训练集与验证集,总计54个示例,覆盖模型、资源、控制器等多种Laravel构件,确保了生成代码的语法正确性与功能完整性。
特点
该数据集的核心特点在于其采用的BuildSpec规范方法,它通过结构化的JSON格式明确描述Laravel构件,从而将模型可能产生的语义幻觉错误转化为可通过编译器捕获的规范缺失问题。数据集涵盖了模型、控制器、表单请求、资源及Pest测试等多种构件类型,每种类型均定义了详尽的属性字段,如命名空间、表名、可填充字段及关系映射等。此外,数据集遵循OpenAI聊天格式进行组织,每条记录包含系统提示、用户输入的BuildSpec JSON及助手生成的PHP代码,为模型训练提供了清晰的任务上下文与高质量监督信号。
使用方法
在具体应用层面,本数据集主要用于微调代码生成模型,以实现从BuildSpec到Laravel PHP代码的自动化转换。使用者可基于提供的训练示例,采用LoRA等参数高效微调方法,在如Qwen2.5-Coder-7B-Instruct等基础模型上进行适配训练。训练完成后,模型能够接收符合规范的BuildSpec JSON输入,并直接输出无需额外解释的完整PHP文件。开发者还可结合配套的管道工具,实现从自然语言需求到最终代码文件的端到端生成流程,并通过PHP语法检查与Pest测试确保产出代码的质量与可靠性。
背景与挑战
背景概述
在软件工程领域,自动化代码生成技术旨在提升开发效率与代码质量。Laravel-buildspec-training数据集由Florinel Chis于2026年创建,专注于Laravel 13.x框架下的代码生成任务。该数据集的核心研究问题在于如何将结构化的BuildSpec JSON规范精确转换为可执行的PHP代码文件,从而减少自然语言描述可能引发的语义歧义。通过引入规范驱动的方法,该数据集推动了代码生成模型从依赖模糊提示转向基于明确规格的生成范式,对PHP Web开发领域的自动化工具发展具有重要影响。
当前挑战
该数据集致力于解决代码生成领域的关键挑战:如何确保生成代码的准确性与可编译性。传统基于自然语言的代码生成常面临语义幻觉问题,即模型产生未请求的功能,导致难以调试的运行时错误。BuildSpec方法将错误类型转移为规范缺失或字段错误,这些可通过编译器在生成前捕获。然而,构建过程亦存在挑战,例如需要精心设计涵盖模型、控制器、资源等多种Laravel工件的规范结构,并确保训练数据中规范与代码间的严格对齐,以避免生成无效或不符合框架约定的代码。
常用场景
经典使用场景
在软件工程与人工智能交叉领域,代码生成任务常面临语义模糊的挑战。该数据集通过结构化BuildSpec JSON规范,为微调大语言模型提供了精准的转换范例,使其能够将明确的架构描述直接转化为符合Laravel 13.x框架规范的PHP源代码。这种模式将自然语言指令的开放性转化为结构化数据的确定性,典型应用于自动化生成模型、控制器、资源类等多种Laravel核心组件,显著提升了代码生成的准确性与一致性。
衍生相关工作
围绕该数据集构建的完整工具链‘laravel-ai-gen’是典型的衍生工作,它实现了一个从自然语言需求到可部署代码的端到端管道。该管道包含规划器、规格编译器、代码生成器及语法检查器等多个阶段,系统化地展示了结构化代码生成的工程化路径。此外,基于此数据集微调的‘Laravel-13x-Qwen2.5-Coder-7B-Instruct-LoRA-Spec’模型本身即是一项重要衍生成果,它验证了使用LoRA等技术在特定领域微调大型代码模型的有效性,为后续针对其他框架或语言的专用代码生成器开发提供了可复现的蓝本与方法论参考。
数据集最近研究
最新研究方向
在软件工程与人工智能交叉领域,代码生成技术正朝着结构化、可验证的方向演进。Laravel-buildspec-training数据集通过引入BuildSpec JSON规范,将自然语言需求转化为精确的结构化描述,旨在减少大语言模型在代码生成过程中的语义幻觉错误。这一方法将错误类型从难以调试的语义偏差转移为可通过编译器捕获的规范缺失,显著提升了生成代码的可靠性与可维护性。当前研究聚焦于如何利用此类结构化规范数据集,结合轻量级微调技术如LoRA,在特定框架(如Laravel)中实现高精度、零错误的代码自动生成,并推动构建从需求描述到可运行代码的端到端验证管道,为低代码开发与AI辅助编程提供了新的实践范式。
以上内容由遇见数据集搜集并总结生成



