five

Identifying User Stories in Issues records

收藏
Mendeley Data2024-01-31 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/bw9md35c29
下载链接
链接失效反馈
官方服务:
资源简介:
Nowadays most software development companies have adopted agile development methodologies, which suggest capturing requirements through user stories. However, user stories are too often poorly written in practice and exhibit inherent quality defects. In addition, it is common to find the user stories of a software project immersed in large volumes of issues request logs from software quality tracking systems, which makes difficult to process them later. In order to solve these defects and to formulate high quality requirements, a current trend is the application of computational linguistic techniques to identify and then process user stories. To train the models, data were taken from public sources that contain issues from real software development projects. These sources contain positive examples of user stories in the format “As a (type of user), I want (goal), [so that (some reason)]” and negative examples (erroneous user stories or sentences with a similar syntaxis to user stories but with a different purpose). To obtain a larger data set suitable for testing the models, an algorithm was implemented for generating additional examples by splitting and mixing positive examples into random parts using the Tokenizer of TensorFlow. In order to differentiate the examples to which each classification class belonged, a manual classification work was performed, which may have introduced to the model some human error index since there was no record of the previously classified data. The resulting dataset includes a total of 7997 positive and negative examples, of which 2618 are positive, and the rest are negative. Therefore, a binary classification problem is presented, where the issues classified as user stories belong to the positive class and the rest to the negative class.
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作