Flakify: A Black-Box, Language Model-based Predictor for Flaky Tests – Replication Package

NIAID Data Ecosystem2026-03-13 收录

下载链接：

https://zenodo.org/record/6994691

下载链接

链接失效反馈

官方服务：

资源简介：

This is the replication package associated with the paper: Flakify: A Black-Box, Language Model-based Predictor for Flaky Tests. We explain how to use it to reproduce the results reported in the paper. A maintainable version of this replication package is available on GitHub (https://github.com/uOttawa-Nanda-Lab/Flakify). Flakify Test Smell Detector This is a step-by-step guideline to detect test smells in the source code of test cases and retain statements that match them. Requirements: Eclipse IDE (the version we used was 2021-12) The libraries (the .jar files in the lib\ directory) Input Files: This is a list of input files that are required to accomplish this step: dataset/FlakeFlagger/FlakeFlagger_filtered_dataset.csv dataset/FlakeFlagger/FlakeFlagger_class_files/ dataset/IDoFT/IDoFT_filtered_dataset.csv dataset/IDoFT/IDoFT_class_files/ The dataset/FlakeFlagger/FlakeFlagger_filtered_dataset.csv and dataset/IDoFT/IDoFT_filtered_dataset.csv are used to obtain the label (flaky=1 or non-flaky=0) and project name for each test case parsed from dataset/FlakeFlagger/FlakeFlagger_class_files/ and dataset/IDoFT/IDoFT_class_files/, respectively. Output Files: dataset/FlakeFlagger/FlakeFlagger_dataset.csv dataset/FlakeFlagger/FlakeFlagger_test_cases_full_code/ dataset/FlakeFlagger/FlakeFlagger_test_cases_preprocessed_code/ dataset/IDoFT/IDoFT_dataset.csv dataset/IDoFT/IDoFT_test_cases_full_code/ dataset/IDoFT/IDoFT_test_cases_preprocessed_code/ Replicating the experiment To detect test smells and retain only code statements related to them, the src/FlakifySmellsDetector.java file should be compiled and run using the Eclipse IDE by having all the .jar files in the classpath. The pre-generated executable Jar file src/FlakifySmellsDetector.jar can be executed using the shell script src/FlakifySmellsDetector.sh after changing paths for each dataset as needed, using the following commands: bash FlakifySmellsDetector.sh FlakeFlagger bash FlakifySmellsDetector.sh IDoFT It will generate the dataset required to run Flakify's flaky test prediction model for the datasets given as input. The class file containing each of the test cases is then parsed to produce the corresponding full code and pre-processed code of the test case. The full and pre-processed source code of all test cases are also combined and saved in a CSV file, along with test smells found, project names, and labels. Flakify Replication This is the guideline for replicating the experiments we used to evaluate Flakify for classifying test cases as flaky and non-flaky using both cross-validation and per-project validation. Requirements: This is a list of all required python packages: python =3.8.5 imbalanced_learn= 0.8.1 numpy= 1.19.5 pandas= 1.3.3 transformer= 4.10.2 torch=1.5.0 scikit_learn= 0.22.1 Input Files: This is a list of input files that are required to accomplish this step: dataset/FlakeFlagger/Flakify_FlakeFlagger_dataset.csv dataset/IDoFT/Flakify_IDoFT_dataset.csv This file contains the full code and pre-processed code of the test cases in both FlakeFlagger and IDOFT datasets, along with their ground truth labels (flaky and non-flaky). Output File: results/Flakify_cross_validation_results_on_FlakeFlagger_dataset.csv results/Flakify_per_project_results_on_FlakeFlagger_dataset.csv results/Flakify_model_trained_on_FlakeFlagger_dataset.pt results/Flakify_cross_validation_results_on_IDoFT_dataset.csv results/Flakify_per_project_results_on_IDoFT_dataset.csv results/Flakify_model_trained_on_IDoFT_dataset.pt Replicating Flakify experiments Cross-Validation To run the Flakify experiment using cross-validation on the two datasets, navigate to src\ folder and run the following commands: bash Flakify_predictor_cross_validation.sh FlakeFlagger bash Flakify_predictor_cross_validation.sh IDoFT This will generate the classification results into results/Flakify_cross_validation_results_on_FlakeFlagger_dataset.csv and results/Flakify_cross_validation_results_on_IDoFT_dataset.csv for the cross-validation experiments on both datasets. It will also save the weights of the two models trained on the FlakeFlagger and IDoFT datasets into results/Flakify_model_trained_on_FlakeFlagger_dataset.pt and results/Flakify_model_trained_on_IDoFT_dataset.pt, respectively. Per-project Validation To run the Flakify experiment using per-project validation on the two datasets, navigate to src\ folder and run the following commands: bash Flakify_predictor_per_project.sh FlakeFlagger bash Flakify_predictor_per_project.sh IDoFT This will generate the classification results into results/Flakify_per_project_results_on_FlakeFlagger_dataset.csv and results/Flakify_per_project_results_on_IDoFT_dataset.csv for the whole per-project validation experiments on both datasets. FlakeFlagger Replication This is the guideline for replicating the experiments we used to evaluate the two versions of FlakeFlagger, white-box and black-box, for classifying test cases as flaky and non-flaky using cross-validation on the FlakeFlagger dataset. Requirements: This is a list of all required python packages: python =3.8.5 imbalanced_learn= 0.8.1 pandas= 1.3.3 scikit_learn= 0.22.1 Input File: This is a list of input files that are required to accomplish this step: dataset/FlakeFlagger/FlakeFlagger_filtered_dataset.csv dataset/FlakeFlagger/FlakeFlaggerFeaturesTypes.csv dataset/FlakeFlagger/Information_gain_per_feature.csv Output File: results/FlakeFlagger_black-box_results.csv results/FlakeFlagger_white-box_results.csv Replicating FlakeFlagger experiments To run the FlakeFlagger experiments, navigate to src\ folder and run the following command: bash FlakeFlagger_predictor.sh white-box bash FlakeFlagger_predictor.sh black-box This will generate the classification results into results/FlakeFlagger_white-box_results.csv and results/FlakeFlagger_black-box_results.csv for both white-box and black-box experiments, respectively.

创建时间：

2022-08-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集