Computational Artifacts for Performance Feedback Autoscaling Experiments with Workloads of Workflows in Apache Airflow

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/2635572

下载链接

链接失效反馈

官方服务：

资源简介：

These computational artifacts are related to the software artifacts DOI:10.5281/zenodo.2635571 The content of the computational artifacts: experiments.pdf contains the list of all the conducted experiments with the Airflow system. Experiment IDs are not sequential since some experiments required rerunning, etc., we report only successful results. The file lists different experiment configurations, e.g., the number of processed workflows, the name of the used workload, the user budgets, and PFA settings. db.tar.gz contains directories with Airflow database snapshots and autoscaler logs. The names of the directories correspond to those listed in `experiments.pdf`. Each experiment directory contains an autoscaler log and a full copy of a PostgreSQL database directory just after each experiment finished. The database name is `airflow`, the user name is `ailyushk`. Within each database, most of the paper-related data are stored in the `stat_log` table. The scripts for extracting data from these databases are available as software artifacts in `tools/analysis`. gurobi.tar.gz contains the results obtained from the Gurobi solver when solving the MIP model. pdf.tar.gz contains all the figures in pdf format, also those that were not included neither in the paper nor in the technical report. The scripts for creating this plots are delivered as software artifacts. csv.tar.gz contains the analysis results extracted from Airflow database snapshots. These files are used to create the plots in the `pdf` directory. The scripts for doing this are delivered as software artifacts. wl1.tar.gz is the first synthetic realistic workload (WL I) with three subsets of 200 workflows each (`1_0`, `1_1`, `1_2`). Each directory contains the file with interarrivals `interarrivals.txt`, and the file with workflow IDs `workload.txt` in the subset. The `dags` directory contains Python-based Airflow descriptors and CSV files that summarise the same descriptors in CSV format for simpler analysis. The scripts for extracting workload statistics from these CSV files are available in the software artifacts in `tools/analysis`. The `inputs` directory contains initial input files for each worfklow. The `dax` contains original DAX files obtained from the generator: https://github.com/pegasus-isi/WorkflowGenerator/tree/master/bharathi/src/simulation/generator wl2.tar.gz is the second synthetic realistic workload (WL II) with three subsets of 200 workflows each (`4_0`, `4_1`, `4_2`). Has similar structure as `wl1.tar.gz`, except that `dax` directory is omitted, as WL II uses the same DAX structures as WL I. wl3.tar.gz is the small synthetic workload based on WL I for the experiment with the MIP solver, contains three subsets with 5 workflows in each, all in the `3_0` directory (thus, the structure differs from the WL I and WL II). The input data files are empty. The identifiers of workflows forming each subset are stored in the `workload_1.txt`, `workload_2.txt`, and `workload_3.txt` files.

创建时间：

2024-07-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集