five

Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study

收藏
Figshare2026-02-26 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_i_Leveraging_LLMs_for_Multi-File_DSL_Code_Generation_An_i_i_Industrial_Case_Study_i_/31423292
下载链接
链接失效反馈
官方服务:
资源简介:
Large language models (LLMs) perform strongly on general-purpose code generation, yet their applicability to enterprise domain-specific languages (DSLs) remains underexplored, especially for repositoryscale change generation spanning multiple files and folder structures from a single natural-language (NL) instruction.We report an industrial case study at BMW that adapts code-oriented LLMs to generate and modify project-root DSL artifacts for an Xtext-based DSL that drives downstream Java/TypeScript code generation. We develop an end-to-end pipeline for dataset construction, multi-file task representation, model adaptation, and evaluation. To support repository-scale generation in a single response, we linearize DSL folder hierarchies into structured JSON while preserving file paths and contents, enabling the learning of cross-file dependencies. We evaluate two instruction-tuned code LLMs (Qwen2.5-Coder and DeepSeek-Coder, 7B) under three configurations: baseline prompting, one-shot in-context learning, and parameter-efficient finetuning (QLoRA). Beyond standard similarity metrics, we introduce task-specific measures that assess edit correctness and repository structural fidelity. Fine-tuning yields the most significant gains across models and metrics, achieving high exact-match accuracy, substantial edit similarity, and structural fidelity of 1.00 on our heldout set for multi-file outputs. At the same time, one-shot in-context learning provides smaller but consistent improvements over baseline prompting. We further validate practical utility via an expert developer survey and an execution-based check using the existing code generator.
创建时间:
2026-02-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作