Examon data from Marconi HPC system (snapshot)
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/4537849
下载链接
链接失效反馈官方服务:
资源简介:
This data has been gathered and analyzed by Massimo Schembri during his internship at University of Bologna, with the supervision of Andrea Borghesi (assistant professor at the same university).
The data was collected from a monitored supercomputer hosted at CINECA and called "Marconi"; the data was collected with a holistic data monitoring infrastructure called Examon, developed by researchers from the University of Bologna with the collaborations of CINECA system administrators.
The data set is composed of two monitored periods January and May 2020; for these two periods, there is data relating to a subset of the nodes in Marconi supercomputer.
The information monitored on Marconi's nodes is varied, ranging from the load of the different cores, to the temperature of the room where the nodes are located, the speed of the fans, details on memory accesses in writing / reading, etc. The sampling rate of the data at the source varies between 5 and 10 seconds. However, in your data set the data are aggregated in 5-minutes intervals; in particular, the mean value ("avg: ") and variance ("var: ") are computed over each 5-minute interval.
In the CSVs, each row corresponds to a different timestamp (first column on the left), therefore separated by intervals of 5 minutes. For example, if you a timestamp equal to "2020-01-01 02:10:00" this indicates that the mean and variance values were calculated in the previous 5 minutes (2020-01-01 02:05:00 - 2020- 01-01 02:10:00). The remaining columns (apart from the last two) contain the aggregate metrics (mean and variance). The last column "Jobs" indicates the number of applications (called HPC jobs) finished on the node in the last half hour.
The penultimate "Label" column indicates the presence or absence of a failure on the node (as registered using Nagios service).
本数据集由马西莫·肯布里(Massimo Schembri)在博洛尼亚大学(University of Bologna)实习期间收集并分析,由同校助理教授安德里亚·博尔盖西(Andrea Borghesi)指导。
数据采集自由博洛尼亚大学研究者与CINECA系统管理员合作开发的全维度数据监控基础设施Examon,采集对象为CINECA托管的超级计算机“Marconi”(马可尼)。
本数据集涵盖2020年1月与5月两个监控周期,包含该超级计算机部分节点的监控数据。
Marconi节点的监控指标覆盖范围广泛,涵盖各核心负载、节点所在机房温度、风扇转速、读写内存访问细节等。原始数据的采样间隔为5至10秒,本数据集中已将其聚合为5分钟间隔的统计值:每个5分钟区间内的平均值(标记为"avg: ")与方差(标记为"var: ")。
在CSV文件中,每一行对应一个时间戳(左侧第一列),各行间隔为5分钟。例如,时间戳“2020-01-01 02:10:00”代表统计区间为前5分钟(2020-01-01 02:05:00至2020-01-01 02:10:00)的均值与方差。除最后两列外,其余各列均为聚合后的统计指标(均值与方差)。最后一列“Jobs”代表该节点在过去半小时内完成的高性能计算(High Performance Computing, HPC)作业数量。
倒数第二列“Label”代表该节点是否发生故障(依据Nagios服务的登记记录)。
创建时间:
2023-04-19



