Reproduction Package for Bachelor's Thesis 'Evaluation of JVM Garbage Collectors for CPAchecker'

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/13468616

下载链接

链接失效反馈

官方服务：

资源简介：

Reproduction Package Evaluation of JVM Garbage Collectors for CPAchecker This is a reproduction artifact for the bachelor's thesis "Evaluation of JVM Garbage Collectors for CPAchecker" to reproduce our experimental evaluation. It includes the modified source code of CPAchecker used in our experiments, all benchmark definitions for BenchExec, tables with the complete results, and the raw measurement data. Additionally, we provide the scripts and programs used to process these results. The set of verifications tasks of SV-COMP24 is not included in this package. It can be downloaded from Zenodo under 10.5281/zenodo.10669722. Contents: benchmark/: this directory contains the determined subset of SV-COMP24 verification tasks, categorized by the different properties. benchmark-definitions/: this directory contains all benchmark definitions for BenchExec. cpachecker/: this directory contains the source code of CPAchecker, as modified for our experiments. results/: this directory contains all raw measurement data and tables with the complete results, organized according to the sections of the thesis. scripts/: this directory contains the programs and scripts used to process the data. The logs of garbage collection are provided in a separate file due to technical limitations, as the file names were too long. Preparing the Evalutation: Ensure that you use the version of CPAchecker included in this artifact, as we have removed the default 15 min CPU limit in the property settings. Additionally, we have slightly modified the cpa.sh script to include the flag "jvm-arguments", which allows multiple JVM flags to be passed as a string to BenchExec. Please ensure that BenchExec is set up correctly. Detailed setup instructions are available in the project repository. To reproduce all experiments, it requires 8 CPU units and 15 GB of RAM. Performing the Evalutation: The benchmark definitions, provided as .xml files, can be executed using BenchExec through the benchmark.py script. Processing the Results: To process the data with TaskFilter.jar, ResultFilter.jar, and ResultFilterShort.jar, you can execute them via the command line. Each programm requires a path to a .csv file as input, passed directly as a flag. For instance, resultfilter.jar can be excecuted with the following command: java -jar ResultFilter.jar -"<../../ParallelGCTimeRatio19.table.csv>" TaskFilter.jar determines the subset from the SV-COMP24 verification tasks by excluding tasks that exceeded 1800 seconds of CPU time and those where the garbage collection time accounted for less than 2 percent of the wall time. It specifically requires AllVerificationTasks.table.csv as input because this file contains 8 columns, with the eighth column including the necessary gctime information. ResultFilter.jar und ResultFilterShort.jar process the CSV files from individual benchmark runs to determine the number of timeouts and "out of memory" errors. They calculate the average CPU time for all tasks, excluding those that resulted in an "out of memory" error and assuming a CPU time of 900 seconds for each timeout. ResultFilter.jar can be applied to CSV files with 9 columns, while ResultFilterShort.jar can be applied to CSV files with 7 columns. To process the data with age_distribution.py and gc_events.py, you can execute them via the command line. Each program requires GC logs as input, specifically the path to the directory .files. For instance, age_distribution.py can be executed with the following command: python age_distribution.py <../../AgeLoggingSubset900sAllLoggingsG1GC.files> age_distribution.py generates a plot showing the age distribution of bytes. gc_events.py counts the total number of GC events. Independently of this, it also calculates the number of young collections, full collections, and concurrent marking cycles. For G1GC, additional concurrent undo cycles are included to verify that the total number of individual collection types adds up to the overall number of GC events. To determine the average values of the commonly solved subset, the union tables for each section must be reviewed. For all configurations, you have to select that only the correct tasks are displayed. The average values for each measure will then be provided by the table. categorial_regression_cputime.py, categorial_regression_walltime.py, and categorial_regression_memory.py can be executed without any additional input path. These scripts generate the results of the categorical linear regression.

创建时间：

2024-10-27