Datasets and scripts related to the paper: "*Can Generative AI Help us in Open Coding of Software Engineering Data?*"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13134103
下载链接
链接失效反馈官方服务:
资源简介:
This replication package contains datasets and scripts related to the paper: "Can Generative AI Help us in Open Coding of Software Engineering Data?"
The replication package is organized into two directories:
manual_analysis: This directory contains all sheets used to perform the manual analysis for RQ1, RQ2, and RQ3.
stats: This directory contains all datasets, scripts, and results metrics used for the quantitative analyses of RQ1 and RQ2.
In the following, we describe the content of each directory:
manual_analysis
manual_analysis_rq1: This directory contains all sheets used to perform manual analysis for RQ1 (independent and incremental coding).
The sub-directory incremental_coding contains .csv files for all datasets (DL_Faults_COMMIT_incremental.csv, DL_Faults_ISSUE_incremental.csv, DL_Fault_SO_incremental.csv, DRL_Challenges_incremental.csv and Functional_incremental.csv). All these .csv files contain the following columns:
Link: The link to the instances
Prompt: Prompt used as input to GPT-4-Turbo
ID: Instance ID
FinalTag: Tag assigned by the human in the original paper
Chatgpt_output_memory: Output of GPT-4-Turbo with incremental coding
Chatgpt_output_memory_clean: (only for the DL Faults datasets) output of GPT-4-Turbo considering only the label assigned, excluding the text
Author1: Label assigned by the first author
Author2: Label assigned by the second author
FinalOutput: Label assigned after the resolution of the conflicts
The sub-directory independent_coding contains .csv files for all datasets (DL_Faults_COMMIT_independent.csv, DL_Faults_ISSUE_ independent.csv, DL_Fault_SO_ independent.csv, DRL_Challenges_ independent.csv and Functional_ independent.csv), containing the following columns:
Link: The link to the instances
Prompt: Prompt used as input to GPT-4-Turbo
ID: Specific ID for the instance
FinalTag: Tag assigned by the human in the original paper
Chatgpt_output: Output of GPT-4-Turbo with independent coding
Chatgpt_output_clean: (only for DL Faults datasets) output of GPT-4-Turbo considering only the label assigned, excluding the text
Author1: Label assigned by the first author
Author2: Label assigned by the second author
FinalOutput: Label assigned after the resolution of the conflicts.
Also, the sub-directory contains sheets with inconsistencies after resolving conflicts. The directory inconsistency_incremental_coding contains .csv files with the following columns:
Dataset: The dataset considered
Human: The label assigned by the human in the original paper
Machine: The label assigned by GPT-4-Turbo
Classification: The final label assigned by the authors after resolving the conflicts. Multiple classifications for a single instance are separated by a comma “,”
Final: final label assigned after the resolution of the incompatibilities
Similarly, the sub-directory inconsistency_independent_coding contains a .csv file with the same columns as before, but this is for the case of independent coding.
manual_analysis_rq2: This directory contains .csv files for all datasets (DL_Faults_redundant_tag.csv, DRL_Challenges_redundant_tag.csv, Functional_redundant_tag.csv) to perform manual analysis for RQ2.
The DL_Faults_redundant_tag.csv file contains the following columns:
Tags Redundant: tags identified as redundant by GPT-4-Turbo
Matched: inspection by the authors to see if the tags are redundant matching or not
FinalTag: final tag assigned by the authors after the resolution of the conflict
The Functional_redundant_tag.csv file contains the same columns as before
The DRL_Challenges_redundant_tag.csv file is organized as follows:
Tags Suggested: The final tag suggested by GPT-4-Turbo
Tags Redundant: tags identified as redundant by GPT-4-Turbo
Matched: inspection by the authors to see if the tags redundant matching or not with the tags suggested
FinalTag: final tag assigned by the authors after the resolution of the conflict
The sub-directory code_consolidation_mapping_overview contains .csv files (DL_Faults_rq2_overview.csv, DRL_Challenges_rq2_overview.csv, Functional_rq2_overview.csv) organized as follows:
Initial_Tags: list of the unique initial tags assigned by GPT-4-Turbo for each dataset
Mapped_tags: list of tags mapped by GPT-4-Turbo
Unmatched_tags: list of unmatched tags by GPT-4-Turbo
Aggregating_tags: list of consolidated tags
Final_tags: list of final tags after the consolidation task
prompt_for_each_rq: This directory contains: - (i) the history of prompts used in each dataset (prompts_history.txt) -(ii) all final prompt used for the analysis of each dataset, prompt used for incremental coding, prompt used in rq2 to consolidate redundant codes, prompt used in rq3 to create taxonomy (generic_prompt.txt) -(iii) all .csv files in which there are indicate, for each dataset, the link and the prompt used (prompt_DL_Faults_COMMIT.csv, prompt_DL_Faults_ISSUE.csv, prompt_DL_Faults_SO.csv, prompt_DRL_Challenges.csv). For the Functional Dataset .csv file contains, instead, Question, Answer and Prompt used (prompt_Functional.csv)
rq3: This directory contains the taxonomies obtained from GPT-4-Turbo for the DL Faults and for the DRL Challenges (taxonomy_DL_Faults.txt,taxonomy_DRL_Challenges.txt)
stats
RQ1: contains script and datasets used to perform metrics for RQ1. The analysis calculates all possible combinations between Matched, More Abstract, More Specific, and Unmatched.
RQ1_Stats.ipynb is a Python Jupyter nooteook to compute the RQ1 metrics. To use it, as explained in the notebook, it is necessary to change the values of variables contained in the first code block.
independent-prompting: Contains the datasets related to the independent prompting. Each line contains the following fields:
Link: Link to the artifact being tagged
Prompt: Prompt sent to GPT-4-Turbo
FinalTag: Artifact coding from the replicated study
chatgpt_output_text: GPT-4-Turbo output
chatgpt_output: Codes parsed from the GPT-4-Turbo output
Author1: Annotator 1 evaluation of the coding
Author2: Annotator 2 evaluation of the coding
FinalOutput: Consolidated evaluation
incremental-prompting: Contains the datasets related to the incremental prompting (same format as independent prompting)
results: contains files for the RQ1 quantitative results. The files are named RQ1\_<>\_<>\_<>\_<>.csv, where Dataset is the dataset name, Prompt method indicates whether results are for independent or incremental prompting, Excluding Negatives (for datasets where this applies) whether results have been obtained by excluding negative instances, and MetricAggregation (where it applies) how metrics have been aggregated (macro or weighted average). The files report columns indicating the Dataset, the Matching type, the Accuracy, Precision, Recall, F1 Score, and Cohen's Kappa.
RQ2: contains the script used to perform metrics for RQ2, the datasets it uses, and its output.
RQ2_SetStats.ipynb is the Python Jupyter notebook to perform the analyses. The scripts takes as input the following types of files, contained in the directory contains the script used to perform the metrics for RQ2. The script takes in input:
RQ1 Data Files (RQ1_DLFaults_Issues.csv, RQ1_DLFaults_Commits.csv, and RQ1_DLFaults_SO.csv, joined in a single .csv RQ1_DLFaults.csv). These are the same files used in RQ1.
Mapping Files (RQ2_Mappings_DRL.csv, RQ2_Mappings_Functional.csv, RQ2_Mappings_DLFaults.csv). These contain the mappings between human tags (HumanTags), GPT-4-Turbo tags (Final Tags), with indicated the type of matching (MatchType).
Additional codes creating during the consolidation (RQ2_newCodes_DRL.csv, RQ2_newCodes_Functional.csv, RQ2_newCodes_DLFaults.csv), annotated with the matching: new code,old code,human code,match type
Set files (RQ2_Sets_DRL.csv, RQ2_Sets_Functional.csv, RQ2_Sets_DLFaults.csv). Each file contains the following columns:
HumanTags: List of tags from the original dataset
InitialTags: Set of tags from RQ1,
ConsolidatedTags: Tags that have been consolidated,
FinalTags: Final set of tags (results of RQ2, used in RQ3)
NewTags: New tags created during consolidation
RQ2_Set_Metrics.csv: Reports the RQ2 output metrics (Precision, Recall, F1-Score, Jaccard).
创建时间:
2025-01-07



