F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11467482
下载链接
链接失效反馈官方服务:
资源简介:
F-DATA is a novel workload dataset containing the data of around 24 million jobs executed on Supercomputer Fugaku, over the three years of public system usage (March 2021-April 2024). Each job data contains an extensive set of features, such as exit code, duration, power consumption and performance metrics (e.g. #flops, memory bandwidth, operational intensity and memory/compute bound label), which allows for a multitude of job characteristics prediction. The full list of features can be found in the file feature_list.csv.
The sensitive data appears both in anonymized and encoded versions. The encoding is based on a Natural Language Processing model and retains sensitive but useful job information for prediction purposes, without violating data privacy. The scripts used to generate the dataset are available in the F-DATA GitHub repository, along with a series of plots and instruction on how to load the data.
F-DATA is composed of 38 files, with each YY_MM.parquet file containing the data of the jobs submitted in the month MM of the year YY.
The files of F-DATA are saved as .parquet files. It is possible to load such files as dataframes by leveraging the pandas APIs, after installing pyarrow (pip install pyarrow). A single file can be read with the following Python instrcutions:
# Importing pandas library
import pandas as pd
# Read the 21_01.parquet file in a dataframe format
df = pd.read_parquet("21_01.parquet")
df.head()
创建时间:
2024-06-10



