Online Appendix of the paper "Requirements Information in Backlog Items: Content Analysis"
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10643449
下载链接
链接失效反馈官方服务:
资源简介:
Online Appendix of the paper "Requirements Information in Backlog Items: Content Analysis"
This is the online appendix that contains materials that support the verifiability of the results presented in the paper. In the paper, we conduct research to investigate to what extent and how requirements are represented in backlogs stored in JIRA.
Appendix structure
The appendix includes multiple files organized into subfolders, which are described in the following.
Root folder
The script merge_nvivo_datasets.py merges the different Nvivo files (located in subfolder "Tagged data\Category per item"). It creates an Excel file indicating which categories occur per item (0 or 1 only). The script link_codes_to_df.py uses the file exported by merge_nvivo_datasets.py to indicate per item whether a category occurs more than once. The notebook get_results.ipynb creates the results based on the datasets produced by merge_nvivo_datasets.py and link_codes_to_df.py. It produces the files: RQ1.xlsx, RQ1_boxplot.pdf, RQ2.xlsx, RQ3_0.xlsx, RQ3_1.xlsx, RQ4.xlsx.
Subfolder "Coding scheme"
The file scheme issue requirements.docx defines the tagging guidelines that we used for annotating the datasets.
Subfolder "results"
Subsubfolder "Projects Description"
This folder includes a single Excel file called project_description.xlsx that contains descriptive information regarding the 14 analyzed projects. In particular, the file includes the following columns:
Project name: the project name we use in the paper
Original size: the number of JIRA issues in the dataset
Sample size: the number of JIRA issues that we retained in our sample
Epic, Feature, ..., Other: counts of the issues in the sample split by issue label assigned by the dev team (extracted from JIRA)
Subsubfolder "RQ1"
This subfolder focuses on RQ1: To what extent do the backlog item labels chosen by practitioners reflect the requirements expressed in the items?
The folder includes an Excel file called RQ1.xlsx that presents, per project, a number of columns:
project_name: the name of the project
Task with req: the number of issues with task labels that include at least one requirement
Task-labels: the number of issues with task labels
perc Task: the percentage of task-labeled issues with at least one requirement
RR with req: the number of issues with requirement labels that include at least one requirement
RR-labels: the number of issues with requirements labels
perc RR: the percentage of requirement-labeled issues with at least one requirement
Then, there are additional rows and columns that calculate average and standard deviation for various splits of the data.
The folder also includes a boxplot visualization RQ1_boxplot.pdf that contrasts the ratio of requirements in task-labeled items vs. requirements-labeled items.
Subsubfolder "RQ2"
This subfolder focuses on RQ2: What categories of requirements information are more commonly used?
The folder includes an Excel file called RQ2.xlsx that reports the frequency of all combinations of granularity (high, medium, low) and requirement type (functional user-oriented, functional system-oriented, non-functional). The file presents one project per line, and it includes the following columns:
project_name: the name of the project
Total req: the total number of requirements that we tagged in that prjoect
high_nfr, low_user, ..., medium_system: counts for each of the nine combinations of granularity and requirement type, per project
The folder also includes a boxplot visualization RQ2_boxplot.pdf that contrasts the various combinations of granularity and requirement type, divided by proprietary vs. open source projects.
Subsubfolder "RQ3"
This subfolder focuses on RQ3: How often does a single backlog item include multiple requirements?
The folder includes multiple Excel files.
The file RQ3_0.xlsx includes two sheets. The first sheet Data includes information about how often various combinations of tags (granularity x type) occur per each project. Each row indicates how often one or more combinations appear in a given project. The sheet has the following columns:
project_name: the project name
multi_label: one or more combinations of tags that appear in that project. For example ('high_nfr', 'medium_system') refers to the co-occurrence of the tags (high granularity, NFR) and (medium granularity, system-oriented FR) in the same issue.
count: how many times the combination defined in multi_label occurs in the Project
multiple?: whether the occurrence in that row denotes multiple types of requirements in the same issue
The second sheet HowOften is a pivot table that groups the results from Data into a usable manner to compare the various projects.
The file RQ3.1.xlsx addresses research sub-question RQ3.1: What different requirements categories do co-occur more often in a backlog item? The file presents an ordered list (from the most frequent to the least frequent) of the combinations of tags (granularity x type) where multiple combinations occur in the same JIRA issue. The file includes two columns:
multi_label: one or more combinations of tags. For example ('high_nfr', 'medium_system') refers to the co-occurrence of the tags (high granularity, NFR) and (medium granularity, system-oriented FR) in the same issue.
Total Count: in how many issues the combination of multi_label occurs over all projects
Control, Service, ..., Red_Hat_Developer_Website_v2: in how many issues the combination of multi_label occurs in each project
Subsubfolder "RQ4"
This subfolder focuses on RQ4: To what extent are requirements complemented by a motivation for their existence?
The folder includes an Excel file called RQ4.xlsx that reports how often JIRA issue with at least one requirement contain a motivation. The file shows, per project (column B), how many issues exist with a requirement (column D), how many include a motivation (column C), and the percentage (column E).
Subfolder "select_sample"
The file select_sample_projects.py creates the samples from the raw datasets (run python select_sample_projects.py 0 to sample the datasets from raw projects - Public Jira data) (run python select_sample_projects.py 1 to sample the datasets from raw projects - TAWOS). The excel file possible_projects.xlsx contains the longlist of different OSS projects.
Subsubfolder "convert_oss"
This subfolder contains two empty folders, where the raw datasets of Montgomery et al. (2022) (as json file) and the TAWOS dataset of Tawosi et al. (2022) (as xlsx file) can be stored.
The script convert_oss_json.py can be used to filter the raw dataset from Montgomery et al. (2022). This requires placing the raw data for each repository individually in raw projects - Public Jira data and the epic links separately in the subfolder epic_links.
Subsubfolder "samples"
After running select_sample_projects.py, this subfolder will contain all samples (exported in excel files). By default, the excel files include the sample that we used in our paper.
Subfolder "Shortlist OSS projects"
This folder contains an overview of the independent tagging to shortlist the open source projects in the file final phase project selection.xlsx.
Subfolder "Tagged data"
Subsubfolder "Category per item"
For each of the OSS projects, we include an export from Nvivo showing the tagged categories per item (only indicating 1 or 0).
Subsubfolder "raw tags"
For each of the combinations (type x granularity) that we tagged (see the schema in the folder Coding scheme), we include an export from Nvivo that shows the tagged text. There are 10 files (for each of the combinations + one file for the motivation tags), and each file includes the tagged text from the open source projects. Due to confidentiality reasons, we cannot share the tagged text from the proprietary projects.
Reference to data
Montgomery, L., Lüders, C., & Maalej, W. (2022, May). An alternative issue tracking dataset of public jira repositories. In Proceedings of the 19th International Conference on Mining Software Repositories (pp. 73-77).
Tawosi, V., Al-Subaihin, A., Moussa, R., & Sarro, F. (2022, May). A versatile dataset of agile open source software projects. In Proceedings of the 19th International Conference on Mining Software Repositories (pp. 707-711).
Reference
Ashley T. van Can and Fabiano Dalpiaz. Requirements Information in Backlog Items: Content Analysis. Proceedings of the 30th International Working Conference on Requirement Engineering: Foundation for Software Quality, 2024.
创建时间:
2024-02-10



