Artefact to our paper "An Empirical Study of Automated Unit Test Generation for Python"
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6838657
下载链接
链接失效反馈官方服务:
资源简介:
Artefact for “An Empirical Study of Automated Unit Test Generation for Python”
Together with our paper “An Empirical Study of Automated Unit Test Generation for Python”, we provide this artefact for future use.
Pynguin Version
We used Pynguin 0.25.2 for our experiments. The releases of Pynguin are archieved by Zenodo, too. Pynguin 0.25.2 is available under DOI 10.5281/zenodo.6836225.
Preparation of the Environment
We use the poetry dependency-management tool to manage all dependencies for this artefact. Install this tool if you have not done yet. Furthermore, let poetry create a virtual environment for the experiment by execution poetry install.
Execution of the Experiment
The execution scripts make several assumptions that are based on our infrastructure. We maintain a SLURM cluster infrastructure that defines different constraints for different machines.
Furthermore, we assume some paths to be present: we assume every computing machine to have writable mount points at /local/${USER} and /local/hdd/${USER}. On our machines, both are mount points on the local hard disk/SSD of the computing machines. Additionally, we have a shared mount /scratch/${USER}, which is mounted via NFS from a central file server. This mount point is also mounted on all computing machines.
We assume the created and packaged Docker image to be located at /scratch/lukasczy/pynguin.tar. You can change this path by editing the XML files. These XML files contain the basic definitions of the jobs: they specify the SLURM constraint, the version of the Pynguin Docker container, the used Pynguin configurations as well as the modules used for the experiments. These modules have to reside under projects, as they come with this artefact.
The Python script execution.py generates the actual run scripts from the XML file. It generates all scripts necessary to run a SLURM array job consisting of all runs for the experiment. Further general settings for the SLURM array job are present in this file.
The Bash script run_experiment.sh executes the full execution pipeline; one has to specify the variable EXPERIMENT_NAME to match the name of the respective XML file who's defined experiment shall be executed.
Important: Executing the full experiment can take several days, depending on your computing infrastructure! We do therefore provide the raw result CSVs for further inspection.
Data Analysis
All raw data resides in the data folder:
loc_data.csv contains all information about the lines of code in each module. This file was created using the extract_locs_and_types.py script in the root folder. Please note that executing this script requires that the cloc utility tool is installed on your system's path.
types.csv and types_per_module.csv contain type information extract from the modules at different granularity level. They are also generated using the aforementioned script.
results-assertion.csv.xz contains the raw results from the experiment for RQ3 that evaluates the effectiveness of the assertions.
results.csv.xz contains the raw results from the experiment for RQ1 and RQ2.
We provide the Jupyter Notebook that generated the plots, tables, and various LaTeX macros in the notebooks folder. Please note that if you want to reexecute this notebook, you might have to change the PAPER_EXPORT_PATH constant in cell [2] to a suitable location on your machine. Executing this notebook requires a installation of a TeX system to be available on your system because the plots are generated using pdflatex and matplotlibs pgf backend.
Further Data
The folder projects contains all the projects in the versions stated in our paper. The folder run-logs contains all the run logs from our experiment executions.
创建时间:
2022-07-15



