Data Package for "A Platform-Agnostic Approach for Automatically Identifying Real-Life Performance Issue Reports with Heuristic Linguistic Patterns"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10944185
下载链接
链接失效反馈官方服务:
资源简介:
This Zenodo repository contains the data supporting the findings of the journal paper, titled "A Platform-Agnostic Approach for Automatically Identifying Real-Life Performance Issue Reports with Heuristic Linguistic Patterns", published on IEEE Transactions on Software Engineering, including:
Heuristic Linguistic Pattern Set: we listed the 80 HLP we derived from Apache's JIRA issue tracking system. Column "Category" lists the type of each pattern. Namely, LEX represents lexical pattern, STR represents structural pattern, SEM represents semantic pattern, and PRF represents profiling pattern. Column "Name" is a descriptive name we give to each pattern. Column "Definition" defines the detailed content in each pattern.
Manual Tagging Results: manual_tagging.xlsx spreadsheet comprises both sentence-level and issue-level manually tagging results for three datasets: 'Dataset-1: Apache Jira's Homologous Evaluation', 'Dataset-2: Apache Jira's Heterologous Evaluation', and 'Dataset-3: Other Platform's Evaluation'. The tagging results are segmented into sentence-level tabs ("Dataset-1 Sen", "Dataset-2 Sen", "Dataset-3 Sen") and issue-level tabs ("Dataset-1 Issue", "Dataset-2 Issue", "Dataset-3 Issue").
RQ Findings:
This section contains detailed data findings from six research questions (RQ1 to RQ6).
The RQ1 tab provides an evaluation of our HLP-based approach, showing the precision, recall, and F1-Score of eight classifiers. These results are juxtaposed with the corresponding values from baseline methods, at both sentence and issue levels for automatic tagging.
The RQ2 tab illustrates the precision, recall, and F1-Score of eight classifiers under two training conditions: a balanced training dataset (BT+HLP) and an imbalanced training dataset (UBT+HLP). These outcomes are contrasted with the equivalent values from baseline methods, also trained under balanced (BT+BLM) and imbalanced (UBT+BLM) conditions. The results are shown at both sentence and issue levels for automatic tagging.
The RQ3 tab evaluates the dataset transferability of our HLP-based approach in comparison to baseline methods. It achieves this by analyzing the precision, recall, and F1-Score metrics for eight classifiers under two different "training/testing" dataset conditions, i.e., 'D1/D1' and 'D1/D3'. These conditions allow for a direct comparison of performance when applied to the same dataset ('D1/D1') versus when transferred to a different dataset ('D1/D3'). Additionally, the tab includes an 'Avg Change' and 'p-value' section, summarizing the statistical change in performance metrics between the two dataset conditions.
The RQ4 tab presents a direct comparison between strict and fuzzy HLP matching approaches, assessed through precision, recall, and F1-Score metrics across eight issue classifiers.
The RQ5 tab examines the influence of sentence order on the accuracy of eight classifiers within our approach. It shows the change in precision, recall, and F1-Score when the sentence order feature is taken into consideration versus when it is not.
The RQ6 tab explores the impact of feature selection algorithms on both issue and sentence-level tagging accuracy. This tab presents the average precision, recall, and F1-Score for three experiments: Boruta, Recursive Feature Elimination (RFE), and the usage of all 80 features.
Qualitative Analysis:
This spreadsheet offers a comprehensive examination of the data supporting Section 6.1, which focuses on Qualitative Analysis. It is organized into several tabs, each dedicated to specific research questions (RQs) as outlined below:
Tab "RQ-1" showcases performance issue reports accurately detected by our High-Level Performance (HLP) approach's top model, XGBoost, which were not identified by the benchmark method's leading model, BERT. This highlights the comparative advantage of our approach in identifying nuanced performance issues.
Tab "RQ-2" continues the exploration of performance issue reports, presenting cases with specific details (to be added).
Tab "RQ-3" delves into the unique capabilities of XGBoost, the leading model in our HLP approach, showcasing its ability to detect performance issues missed by the baseline's top model, BERT. This comparison is drawn under distinct conditions: with pre-training (Dataset 1) and without pre-training (Dataset 3), illustrating the robustness and adaptability of our model.
Tab "RQ-4" focuses on performance issue reports uniquely identified through the implementation of Fuzzy HLP Matching within our HLP approach. This method underscores the innovative matching techniques that enhance issue detection.
Tab "RQ-5" presents performance issue reports pinpointed exclusively by applying the Issue HLP Matrix within our approach. This tab demonstrates the effectiveness of our matrix-based analysis in isolating and identifying specific performance concerns.
Tab "RQ-6" is dedicated to performance issue reports uniquely detected by incorporating feature selection techniques into our HLP approach. This illustrates the value of advanced feature selection in improving the precision of performance issue identification.
LLM Experiment Data: presents the tagging outcomes of Large Language Models (LLMs), specifically ChatGPT-3.5 and ChatGPT-4, across three distinct datasets: 'Dataset-1: Apache Jira's Homologous Evaluation', 'Dataset-2: Apache Jira's Heterologous Evaluation', and 'Dataset-3: Evaluation on Other Platforms'. The results are organized into three separate tabs: 'Dataset-1 Issue', 'Dataset-2 Issue', and 'Dataset-3 Issue'.
ChatGPT Operation Python Script: crafted for automating the evaluation and tagging of issue reports in Excel using Large Language Models (LLMs) like ChatGPT-3.5 and ChatGPT-4. It underscores the importance of administrative rights for file modifications and outlines procedures for reading from and writing responses to Excel files. Key functions include querying LLMs with issue descriptions, processing their responses, and updating the spreadsheet with 'Yes' or 'No' labels and explanatory reasons, thereby facilitating an organized review of LLM performance across different datasets.
创建时间:
2024-07-06



