five

Dataset used to reproduce graphs in the SC21 submission "Understanding why machine learning models of I/O fail: A taxonomy of I/O throughput modelling errors"

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4902919
下载链接
链接失效反馈
官方服务:
资源简介:
Three datasets necessary to reproduce figures in our SC21 submission titled "Understanding why machine learning models of \\ I/O fail: A taxonomy of I/O throughput modelling errors".  The darshan_theta_2017_2020.csv file is a CSV file constructed from Darshan logs, where every row represents an HPC job ran on ALCF Theta, and each column is a different feature of the job. This data is post-processed, in order to simplify reproduction of the paper. It is also anonymized, where the apps_short column represents the anonymized name of the application.  The cobalt_theta_2017_2020.csv file contains Cobalt scheduler logs, where UIDs of allocations correspond to Darshan job UIDs. This data is also public, and is not preprocessed, only aggregated over 4 years. The gauge_data.csv contains data from a single cluster of HPC jobs, collected using the Gauge tool (https://gauge.ascslab-tools.org). This data is a strict subset of the Darshan CSV listed above.
创建时间:
2021-06-05
二维码
社区交流群
二维码
科研交流群
商业服务