Dataset used to reproduce graphs in the SC21 submission "Understanding why machine learning models of I/O fail: A taxonomy of I/O throughput modelling errors"

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://zenodo.org/record/4902919

下载链接

链接失效反馈

官方服务：

资源简介：

Three datasets necessary to reproduce figures in our SC21 submission titled "Understanding why machine learning models of \\ I/O fail: A taxonomy of I/O throughput modelling errors". The darshan_theta_2017_2020.csv file is a CSV file constructed from Darshan logs, where every row represents an HPC job ran on ALCF Theta, and each column is a different feature of the job. This data is post-processed, in order to simplify reproduction of the paper. It is also anonymized, where the apps_short column represents the anonymized name of the application. The cobalt_theta_2017_2020.csv file contains Cobalt scheduler logs, where UIDs of allocations correspond to Darshan job UIDs. This data is also public, and is not preprocessed, only aggregated over 4 years. The gauge_data.csv contains data from a single cluster of HPC jobs, collected using the Gauge tool (https://gauge.ascslab-tools.org). This data is a strict subset of the Darshan CSV listed above.

创建时间：

2021-06-05