User Story Ambiguity Dataset: A Comprehensive Research Resource

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://data.mendeley.com/datasets/wz9spjy4v5

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset represents the largest empirical collection of user story ambiguities, encompassing 12,847 authentic user stories from eight companies spanning finance, healthcare, e-commerce, telecommunications, and manufacturing domains. The collection addresses a critical gap in requirements engineering research by providing systematically annotated real-world data for investigating ambiguity patterns in agile development environments. The dataset reveals significant organisational variation, with ambiguity rates ranging from 15.3% to 67.8% across companies, reflecting genuine differences in agile maturity and domain complexity. Seven distinct ambiguity types were identified, with semantic ambiguities being most prevalent (34.2%), followed by scope (28.7%) and actor ambiguities (19.4%). This distribution provides crucial insights into the most common sources of requirements confusion in practice. Structured across five interconnected sheets, the dataset includes comprehensive attributes covering team characteristics, project outcomes, and temporal progression data. Notably, the temporal analysis demonstrates a 23.4% average improvement in story quality over 12-month periods, providing empirical evidence of organisational learning effects in requirements practices. The collection serves multiple research purposes, from training machine learning models for automated ambiguity detection to validating requirements engineering frameworks across different organisational contexts. Strong statistical foundations underpin the dataset, with robust correlations between team experience (r=-0.73) and domain complexity (r=0.52) with ambiguity rates, supported by high inter-rater reliability (α=0.77). This resource enables researchers to conduct comparative studies, develop evidence-based tools, and advance our understanding of requirements quality in agile environments, making it an invaluable asset for the empirical software engineering community.

创建时间：

2025-07-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集