Supplementary materials of the paper entitled: "Failure-Aware Enhancements for Large Language Model (LLM) Code Generation: An Empirical study on Decision Framework"

Name: Supplementary materials of the paper entitled: "Failure-Aware Enhancements for Large Language Model (LLM) Code Generation: An Empirical study on Decision Framework"
Creator: Zenodo
Published: 2025-11-18 05:58:58
License: 暂无描述

Zenodo2025-11-18 更新2026-05-26 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.17637007

下载链接

链接失效反馈

官方服务：

资源简介：

Supplementary materials of the paper entitled: "Failure-Aware Enhancements for Large Language Model (LLM) Code Generation: An Empirical study on Decision Framework" 1. data_overview.xlsxDescription:Structured Excel workbook containing research data for all 25 projects with project metadata, GitHub links, and side-by-side comparison of completion metrics across human, direct prompting, and progressive prompting approaches, plus summary statistics sheet with aggregate performance metrics. 2. ChartOne.xlsxDescription:Contains detailed project-level data for all 25 GitHub projects analyzed in Phase 1, including project characteristics (domain, languages, task count) and task completion rates for human implementations, direct prompting, and progressive prompting methods. 3. ChartTwo.xlsxDescription:Provides summary statistics comparing performance across human implementations, direct prompting, and progressive prompting, including average completion rates, median completion, and distribution of high-performing projects (between 90% and 100% completion). 4. ChartThree.xlsxDescription:Presents head-to-head comparison results showing how frequently progressive prompting outperformed direct prompting (84% of projects), average improvement magnitude (16.4%), and the greatest improvement observed (42.9% on Project 21). 5. ChartFour.xlsxDescription:Analyzes completion rates stratified by project complexity levels (simple: less or equal to 5 tasks, medium: 6-15 tasks, complex: 16+ tasks), demonstrating that progressive prompting maintains consistent performance across all complexity categories. 6. ChartFive.xlsx Description:Reports code generation efficiency metrics comparing direct and progressive prompting methods, including average lines of code generated (13,533 vs 25,325 LOC), average number of prompts required (1.8 vs 20.4), and tasks completed per prompt (5.4 vs 0.6). 7. ChartSix.xlsxDescription:Documents failure reduction by category when comparing progressive prompting to direct prompting, showing improvements in business logic (75% reduction), external services (80% reduction), database CRUD (100% reduction), user authentication (80% reduction), and UI components (100% reduction). 8. ChartSeven.xlsxDescription: Provides comprehensive failure taxonomy across all three approaches (human, direct, progressive), categorizing incomplete tasks by type (Business Logic: 59% of failures, External Services: 11%, Database CRUD: 9.6%, User Authentication: 8.2%, UI Components: 5.5%, and others). 9. DataUpdate.pdfDescription:Comprehensive documentation of all 25 GitHub projects showing task breakdowns, human implementation baselines, and detailed completion results for both direct prompting and progressive prompting approaches including lines of code, completion rates, and specific missing features. 10. Enhancement.pdfDescription:Results from testing three enhancement strategies (Self-Critique, Multi-Model collaboration, and RAG) on seven challenging projects where progressive prompting had incomplete results, showing improvement percentages, prompt counts, and time requirements for each method. 11. EnhanOne.xlsxDescription:Project-level results for six challenging projects (P2, P8, P10, P14, P18, P25) evaluated with three enhancement methods (Self-Critique, Multi-Model, RAG), showing completion rates and identifying which method achieved the best performance for each project. 12. EnhanTwo.xlsxDescription:Summary comparison of enhancement methods showing average completion rates, percentage achieving 100% completion, method win counts, and efficiency metrics including average time and prompts used across all six projects. 13. EnhanThree.xlsxDesciption:Detailed time efficiency analysis for each project comparing baseline progressive prompting time against all three enhancement methods, with efficiency calculated as minutes per percentage point improvement and identification of fastest method. 14. EnhanFour.xlsxDescription:Failure type analysis categorizing the six projects by their primary failure characteristics (CRUD, business logic, external services, DevOps/infrastructure, domain-specific) and identifying which enhancement method performed best for each category with success rates. 15. EnhanFive.xlsxDescription:Decision framework recommendations mapping failure types to recommended enhancement methods with expected success rates, average time requirements, confidence levels, and example projects for each failure category. 16. EnhanSix.xlsxDescription:Complete detailed results for all six enhancement projects showing task counts, failure types, baseline progressive prompting performance, and full metrics (completion rate, time, prompts) for Self-Critique, Multi-Model, and RAG methods.

提供机构：

Zenodo

创建时间：

2025-11-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集