Dataset (2025) for article "Resource Optimization with MPI Process Malleability for Dynamic Workloads in HPC Clusters"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14812021
下载链接
链接失效反馈官方服务:
资源简介:
This dataset was generated and used in the publication "Resource Optimization with MPI Process Malleability for Dynamic Workloads in HPC Clusters." The dataset is organized into three stages: raw data, preprocessed data, and processed data.
Each workload execution includes log files, application code, executables, and launching scripts. Execution times are extracted from log files, specifically from "slurm-dmr_*.out" and "slurm-dmr_*.info", where "*" represents a number corresponding to a specific job execution.
Dataset Structure:
1. raw_data
Contains the output files from executing workloads on the MarenostrumV HPC cluster. This section is divided into three subsections, each corresponding to a different workload type:
Static_Workload: Data for the static workload, which does not use malleability.
Sync_Workload: Data for the synchronous dynamic workload, including results for both baseline and merge configurations (5 executions each).
Async_Workload: Data for the asynchronous dynamic workload, including results for both baseline and merge configurations (5 executions each).
2. preprocessed_data
This section contains the collected raw data in .pkl files, following the same structure as the raw_data folder. For each workload execution, four .pkl files are generated. The variable name can take values from [baseline, merge, static], while X represents a workload execution number:
If X = J, the file contains a compilation of all workloads with the same configuration.
If A appears before X, it refers to an asynchronous execution.
The four types of .pkl files are:
nameX_data.pkl: Contains application runtime data. A description is available in nameX_data_description.txt.
nameX_data_resize.pkl: Contains application resize data. A description is available in nameX_data_resize_description.txt.
nameAX_iter_data.pkl: Contains iteration time data for asynchronous (A) workloads. A description is available in nameAX_iter_data_description.txt.
nameX_workload.pkl: Contains Slurm workload metrics. A description is available in nameX_workload_description.txt.
3. processed_data
Includes the analyzed results from the preprocessed_data folder. This section contains .xlsx files and images used in the Experimental Setup section of the paper. The Excel files are categorized as follows:
Exec_dataX.xlsx: Contains application execution results.
Mall_dataX.xlsx: Contains resize time results for dynamic executions.
4. Codes
This folder contains the scripts used to convert raw_data into preprocessed_data, along with a Jupyter Notebook used for data analysis and visualization. To understand or use these codes, please contact the dataset creators.
创建时间:
2025-02-06



