five

NLR HPC Eagle GPU Node Metrics

收藏
DataCite Commons2026-01-29 更新2026-04-25 收录
下载链接:
https://www.osti.gov/servlets/purl/3015213
下载链接
链接失效反馈
官方服务:
资源简介:
Ganglia node metrics and iLO (Integrated Lights Out) power data captured from six representative Eagle GPU nodes The Eagle HPC operated at NLR from 2019 through 2024. Eagle was a 2,000-node, 8-petaflop system. This dataset is a representative sample of metrics for 6 of the GPU nodes. Each GPU node contained 2 CPUs and 2 GPUs. Data provided in compressed CSV format. Ganglia and iLO Power Time Series Fields ts: Timestamp dv: Device / Node - Rack and Unit - r103u17 == r(ack)103u(nit)17 mt: Metric (only present for Ganglia) vl: Value - Value in watts for iLO power (instantaneous value at sampling time) or specified Ganglia metric below Ganglia Metrics Metric name -- Metric description -- Unit cpu_aidle -- Percent of time since boot idle CPU -- Percent cpu_idle -- Percent CPU idle -- Percent cpu_nice -- Percent CPU nice -- Percent cpu_speed -- Speed in MHz of CPU -- MHz cpu_user -- Percent CPU user -- Percent cpu_wio -- The percentage of CPU Wait I/O -- Percent gpu0_bar1_memory -- Used GPU bar1 memory -- MB gpu0_decoder_util -- GPU decoder utilization -- Percent gpu0_ecc_db_error -- Total ECC error counts for the GPU -- Number gpu0_encoder_util -- GPU encoder utilization -- Percent gpu0_fan -- Fan speed -- RPM gpu0_fb_memory -- Used GPU framebuffer memory -- MB gpu0_graphics_clock_report -- Current clock speeds for the device -- MHz gpu0_mem_total -- Memory total -- MB gpu0_mem_util -- Memory utilization -- Percent gpu0_power_usage_report -- Power usage report -- Watts gpu0_temp -- GPU 1 temperature -- Celsius gpu1_bar1_memory -- Used GPU bar1 memory -- MB gpu1_decoder_util -- GPU decoder utilization -- Percent gpu1_ecc_db_error -- Total ECC error counts for the GPU -- Number gpu1_encoder_util -- GPU encoder utilization -- Percent gpu1_fan -- Fan speed -- RPM gpu1_fb_memory -- Used GPU framebuffer memory -- MB gpu1_graphics_clock_report -- Current clock speeds for the GPU -- MHz gpu1_mem_total -- Memory total -- MB gpu1_mem_util -- Memory utilization -- MB gpu1_power_usage_report -- Power usage report -- Watts gpu1_temp -- GPU 1 temperature -- Celsius ipmi_cpu1_temp -- CPU 1 temperature -- Celsius ipmi_cpu2_temp -- CPU 2 temperature -- Celsius ipmi_inlet_ambient_temp -- Temperature measured at intake -- Celsius ipmi_vr_p1_temp -- CPU 1 voltage regulator temperature -- Celsius ipmi_vr_p2_temp -- CPU 2 voltage regulator temperature -- Celsius mem_buffers -- Amount of buffered memory -- Bytes mem_cached -- Amount of cached memory -- Bytes mem_free -- Amount of available memory -- Bytes mem_shared -- Amount of shared memory -- Bytes mem_total -- Amount of available memory -- Bytes
提供机构:
National Laboratory of the Rockies
创建时间:
2026-01-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作