Datasets and scripts related to the paper: "Can Generative AI Help us in Open Coding of Software Engineering Data?"

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/13134103

下载链接

链接失效反馈

官方服务：

资源简介：

This replication package contains datasets and scripts related to the paper: "Can Generative AI Help us in Open Coding of Software Engineering Data?" The replication package is organized into two directories: manual_analysis: This directory contains all sheets used to perform the manual analysis for RQ1, RQ2, and RQ3. stats: This directory contains all datasets, scripts, and results metrics used for the quantitative analyses of RQ1 and RQ2. In the following, we describe the content of each directory: manual_analysis manual_analysis_rq1: This directory contains all sheets used to perform manual analysis for RQ1 (independent and incremental coding). The sub-directory incremental_coding contains .csv files for all datasets (DL_Faults_COMMIT_incremental.csv, DL_Faults_ISSUE_incremental.csv, DL_Fault_SO_incremental.csv, DRL_Challenges_incremental.csv and Functional_incremental.csv). All these .csv files contain the following columns: Link: The link to the instances Prompt: Prompt used as input to GPT-4-Turbo ID: Instance ID FinalTag: Tag assigned by the human in the original paper Chatgpt_output_memory: Output of GPT-4-Turbo with incremental coding Chatgpt_output_memory_clean: (only for the DL Faults datasets) output of GPT-4-Turbo considering only the label assigned, excluding the text Author1: Label assigned by the first author Author2: Label assigned by the second author FinalOutput: Label assigned after the resolution of the conflicts The sub-directory independent_coding contains .csv files for all datasets (DL_Faults_COMMIT_independent.csv, DL_Faults_ISSUE_ independent.csv, DL_Fault_SO_ independent.csv, DRL_Challenges_ independent.csv and Functional_ independent.csv), containing the following columns: Link: The link to the instances Prompt: Prompt used as input to GPT-4-Turbo ID: Specific ID for the instance FinalTag: Tag assigned by the human in the original paper Chatgpt_output: Output of GPT-4-Turbo with independent coding Chatgpt_output_clean: (only for DL Faults datasets) output of GPT-4-Turbo considering only the label assigned, excluding the text Author1: Label assigned by the first author Author2: Label assigned by the second author FinalOutput: Label assigned after the resolution of the conflicts. Also, the sub-directory contains sheets with inconsistencies after resolving conflicts. The directory inconsistency_incremental_coding contains .csv files with the following columns: Dataset: The dataset considered Human: The label assigned by the human in the original paper Machine: The label assigned by GPT-4-Turbo Classification: The final label assigned by the authors after resolving the conflicts. Multiple classifications for a single instance are separated by a comma “,” Final: final label assigned after the resolution of the incompatibilities Similarly, the sub-directory inconsistency_independent_coding contains a .csv file with the same columns as before, but this is for the case of independent coding. manual_analysis_rq2: This directory contains .csv files for all datasets (DL_Faults_redundant_tag.csv, DRL_Challenges_redundant_tag.csv, Functional_redundant_tag.csv) to perform manual analysis for RQ2. The DL_Faults_redundant_tag.csv file contains the following columns: Tags Redundant: tags identified as redundant by GPT-4-Turbo Matched: inspection by the authors to see if the tags are redundant matching or not FinalTag: final tag assigned by the authors after the resolution of the conflict The Functional_redundant_tag.csv file contains the same columns as before The DRL_Challenges_redundant_tag.csv file is organized as follows: Tags Suggested: The final tag suggested by GPT-4-Turbo Tags Redundant: tags identified as redundant by GPT-4-Turbo Matched: inspection by the authors to see if the tags redundant matching or not with the tags suggested FinalTag: final tag assigned by the authors after the resolution of the conflict The sub-directory code_consolidation_mapping_overview contains .csv files (DL_Faults_rq2_overview.csv, DRL_Challenges_rq2_overview.csv, Functional_rq2_overview.csv) organized as follows: Initial_Tags: list of the unique initial tags assigned by GPT-4-Turbo for each dataset Mapped_tags: list of tags mapped by GPT-4-Turbo Unmatched_tags: list of unmatched tags by GPT-4-Turbo Aggregating_tags: list of consolidated tags Final_tags: list of final tags after the consolidation task prompt_for_each_rq: This directory contains: - (i) the history of prompts used in each dataset (prompts_history.txt) -(ii) all final prompt used for the analysis of each dataset, prompt used for incremental coding, prompt used in rq2 to consolidate redundant codes, prompt used in rq3 to create taxonomy (generic_prompt.txt) -(iii) all .csv files in which there are indicate, for each dataset, the link and the prompt used (prompt_DL_Faults_COMMIT.csv, prompt_DL_Faults_ISSUE.csv, prompt_DL_Faults_SO.csv, prompt_DRL_Challenges.csv). For the Functional Dataset .csv file contains, instead, Question, Answer and Prompt used (prompt_Functional.csv) rq3: This directory contains the taxonomies obtained from GPT-4-Turbo for the DL Faults and for the DRL Challenges (taxonomy_DL_Faults.txt,taxonomy_DRL_Challenges.txt) stats RQ1: contains script and datasets used to perform metrics for RQ1. The analysis calculates all possible combinations between Matched, More Abstract, More Specific, and Unmatched. RQ1_Stats.ipynb is a Python Jupyter nooteook to compute the RQ1 metrics. To use it, as explained in the notebook, it is necessary to change the values of variables contained in the first code block. independent-prompting: Contains the datasets related to the independent prompting. Each line contains the following fields: Link: Link to the artifact being tagged Prompt: Prompt sent to GPT-4-Turbo FinalTag: Artifact coding from the replicated study chatgpt_output_text: GPT-4-Turbo output chatgpt_output: Codes parsed from the GPT-4-Turbo output Author1: Annotator 1 evaluation of the coding Author2: Annotator 2 evaluation of the coding FinalOutput: Consolidated evaluation incremental-prompting: Contains the datasets related to the incremental prompting (same format as independent prompting) results: contains files for the RQ1 quantitative results. The files are named RQ1\_<>\_<>\_<>\_<>.csv, where Dataset is the dataset name, Prompt method indicates whether results are for independent or incremental prompting, Excluding Negatives (for datasets where this applies) whether results have been obtained by excluding negative instances, and MetricAggregation (where it applies) how metrics have been aggregated (macro or weighted average). The files report columns indicating the Dataset, the Matching type, the Accuracy, Precision, Recall, F1 Score, and Cohen's Kappa. RQ2: contains the script used to perform metrics for RQ2, the datasets it uses, and its output. RQ2_SetStats.ipynb is the Python Jupyter notebook to perform the analyses. The scripts takes as input the following types of files, contained in the directory contains the script used to perform the metrics for RQ2. The script takes in input: RQ1 Data Files (RQ1_DLFaults_Issues.csv, RQ1_DLFaults_Commits.csv, and RQ1_DLFaults_SO.csv, joined in a single .csv RQ1_DLFaults.csv). These are the same files used in RQ1. Mapping Files (RQ2_Mappings_DRL.csv, RQ2_Mappings_Functional.csv, RQ2_Mappings_DLFaults.csv). These contain the mappings between human tags (HumanTags), GPT-4-Turbo tags (Final Tags), with indicated the type of matching (MatchType). Additional codes creating during the consolidation (RQ2_newCodes_DRL.csv, RQ2_newCodes_Functional.csv, RQ2_newCodes_DLFaults.csv), annotated with the matching: new code,old code,human code,match type Set files (RQ2_Sets_DRL.csv, RQ2_Sets_Functional.csv, RQ2_Sets_DLFaults.csv). Each file contains the following columns: HumanTags: List of tags from the original dataset InitialTags: Set of tags from RQ1, ConsolidatedTags: Tags that have been consolidated, FinalTags: Final set of tags (results of RQ2, used in RQ3) NewTags: New tags created during consolidation RQ2_Set_Metrics.csv: Reports the RQ2 output metrics (Precision, Recall, F1-Score, Jaccard).

创建时间：

2025-01-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集

Datasets and scripts related to the paper: "*Can Generative AI Help us in Open Coding of Software Engineering Data?*"

Datasets and scripts related to the paper: "Can Generative AI Help us in Open Coding of Software Engineering Data?"