F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/11467482

下载链接

链接失效反馈

官方服务：

资源简介：

F-DATA is a novel workload dataset containing the data of around 24 million jobs executed on Supercomputer Fugaku, over the three years of public system usage (March 2021-April 2024). Each job data contains an extensive set of features, such as exit code, duration, power consumption and performance metrics (e.g. #flops, memory bandwidth, operational intensity and memory/compute bound label), which allows for a multitude of job characteristics prediction. The full list of features can be found in the file feature_list.csv. The sensitive data appears both in anonymized and encoded versions. The encoding is based on a Natural Language Processing model and retains sensitive but useful job information for prediction purposes, without violating data privacy. The scripts used to generate the dataset are available in the F-DATA GitHub repository, along with a series of plots and instruction on how to load the data. F-DATA is composed of 38 files, with each YY_MM.parquet file containing the data of the jobs submitted in the month MM of the year YY. The files of F-DATA are saved as .parquet files. It is possible to load such files as dataframes by leveraging the pandas APIs, after installing pyarrow (pip install pyarrow). A single file can be read with the following Python instrcutions: # Importing pandas library import pandas as pd # Read the 21_01.parquet file in a dataframe format df = pd.read_parquet("21_01.parquet") df.head()

创建时间：

2024-06-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集