five

Deconstructing Model Collapse in Software Engineering Tasks: A Multi-Granularity Empirical Study with Open-Source LLMs

收藏
DataCite Commons2025-09-11 更新2026-02-09 收录
下载链接:
https://figshare.com/articles/dataset/The_Self-Inflicted_Collapse_How_Recursive_Training_Undermines_Large_Language_Models_in_Automated_Software_Engineering_Tasks/28559318/3
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Large Language Models (LLMs)</b> have become indispensable for <b>automated software engineering (SE) tasks</b>, such as code generation, vulnerability detection, and code summarization. Yet, their long-term robustness is strongly shaped by training methodology. A particularly risky practice is <b>recursive self-training</b>, where models are repeatedly fine-tuned on their own generated outputs. While this strategy is often adopted to compensate for scarce human-annotated data, it carries the danger of <b>model collapse</b>—a degenerative process in which output quality, diversity, and reliability degrade across generations.This paper provides the <b>first multi-granularity empirical study</b> of model collapse in SE tasks. Using <b>open-source LLMs</b>—LLaMA-3 (1B, 3B, 8B, 70B), LLaMA-4 Scout (17B MoE), and Qwen-3 (0.5B, 1.8B, 7B, 14B, 72B)—we design controlled recursive training experiments across three benchmarks:<b>HumanEval</b> (code generation, evaluated with pass@1 and BLEU-4),<b>ReVeal</b> (vulnerability detection, evaluated with F1/precision/recall),<b>CodeSearchNet</b> (code summarization, evaluated with BLEU-4 and ROUGE-L).Models are trained under three regimes—<b>real-only</b>, <b>synthetic-only</b>, and <b>hybrid</b>—for up to <b>ten recursive generations</b>. We then analyze collapse dynamics at multiple granularities: <b>task-level degradation, data distribution drift (perplexity/entropy), and mitigation effectiveness</b>.Our findings show that <b>synthetic-only recursive training leads to sharp degradation</b>, especially in smaller models, while hybrid strategies and quality filtering significantly slow collapse but cannot eliminate it entirely. These results demonstrate that collapse in SE is not a simple extension of language collapse but a <b>domain-specific phenomenon</b>, driven by the structural and security-critical nature of code.This study contributes:A <b>systematic framework</b> to reveal model collapse in SE tasks.<b>Empirical evidence</b> across model scales, training regimes, and tasks.<b>Validated mitigation strategies</b> (hybrid training, filtering, and diversity preservation) with practical implications for building stable LLM pipelines in software engineering.
提供机构:
figshare
创建时间:
2025-09-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作