Enhancing Agile Development: A Comparative Analysis of LLMs for User Story Generation (Dataset)

Name: Enhancing Agile Development: A Comparative Analysis of LLMs for User Story Generation (Dataset)
Creator: Satyam Thakur; Karim Elish; Sathish Chandra Akula; Sunim Acharya; Gervonte Fowler
License: 暂无描述

IEEE2026-04-17 收录

下载链接：

https://ieee-dataport.org/documents/enhancing-agile-development-comparative-analysis-llms-user-story-generation-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

The growing integration of Artificial Intelligence (AI) into software engineering is reshaping how requirements are captured, refined, and managed. In Agile development, user story creation and estimation remain among the most labor-intensive tasks, often marked by inefficiency and inconsistency, which in turn creates opportunities for automation using Large Language Models (LLMs). This study presents a systematic evaluation of 20 state-of-the-art LLMs in generating user stories and estimating story points for Agile Project Management. Unlike prior studies that focus narrowly on individual models or prompting strategies, we develop a comprehensive benchmarking framework that integrates Retrieval-Augmented Generation (RAG), Declarative Self-improving Python (DSPy) with MIPROv2 optimization, and fine-tuned agent-based methods. Our methodology introduces a novel composite scoring system that jointly assesses semantic accuracy, structural quality, estimation precision, readability, time efficiency, cost-effectiveness, and model stability. We conduct extensive experiments using the Neo dataset, a large-scale corpus of 40,014 user stories across 34 real-world projects, and complement them with stakeholder surveys to validate quantitative findings against expert human judgment. The models demonstrated varying strengths, with semantic accuracy, as measured by ROUGE-L Scores, ranging from 14.75\\% to 18.19\\% in alignment with human-authored titles. For estimation precision, Binary Points Scores varied significantly from 30\\% to 80\\% across the evaluated models, highlighting distinct, model-specific capabilities in story point estimation. Crucially, a subsequent stakeholder survey validated these quantitative findings, further revealing that a strategic combination of leading models outperforms any single-model approach. This work provides actionable insights into the performance characteristics of different LLMs when applied to fundamental Agile tasks, demonstrating that a Mixture of Multimodal Interaction Experts approach can effectively balance performance, cost, and practical applicability in the real-world environment.

提供机构：

Satyam Thakur; Karim Elish; Sathish Chandra Akula; Sunim Acharya; Gervonte Fowler

5,000+

优质数据集

54 个

任务类型

进入经典数据集