NLR HPC Eagle Jobs Data and Additional Energy Metrics

Name: NLR HPC Eagle Jobs Data and Additional Energy Metrics
Creator: National Laboratory of the Rockies
Published: 2026-04-22 20:52:08
License: 暂无描述

DataCite Commons2026-04-22 更新2026-04-25 收录

下载链接：

https://www.osti.gov/servlets/purl/3023273

下载链接

链接失效反馈

官方服务：

资源简介：

Overview: Anonymized job-level records from the Eagle high-performance computing (HPC) system at the National Laboratory of the Rockies (NLR). Each record represents a Slurm batch job with scheduling metadata, resource requests, resource utilization, CPU/GPU energy consumption, and efficiency metrics. Sensitive fields (user, account, job name) are replaced with cryptographic hashes. System & Timeframe: Eagle was a 2,000-node, 8-petaflop system operated at NLR from 2019–2024. Data covers the full operational lifetime of the system. Slurm data was processed nightly; timestamps are in Mountain Time. Funding provided by the U.S. Department of Energy, EERE. Files: <ul> <li>esif.hpc.eagle.job-anon.zip — Core anonymized job records (Hive-partitioned Parquet)</li> <li>esif.hpc.eagle.job-anon-energy-metrics.zip — Same records with additional iLO and Ganglia energy metrics</li> <li>datacard.md — Full dataset documentation</li> </ul> ~13.8 million rows, 62 variables. Readable with PyArrow, pandas, DuckDB, Apache Spark, or any Parquet-compatible tool. Data Collection: Jobs collected via sacct through a pipeline: Eagle Jobs API → Redpanda → StreamSets → HPCMON API → PostgreSQL. Node-level power from iLO (HP Integrated Lights-Out); GPU power from Ganglia monitoring, joined to jobs via node lists and time ranges. Preprocessing: <ul> <li>Anonymization of name, user, and account fields via cryptographic hashing</li> <li>Derived columns: queue_wait, cpu_eff, max_mem_eff</li> <li>Simplified job state mapping (e.g., "CANCELLED BY 12345" → "CANCELLED")</li> <li>QoS accounting rules (buy-in, standby, or Slurm QoS value)</li> <li>CPU energy estimated from TDP (200W, Intel Xeon Gold 6154, 18 cores)</li> <li>Timezone-aware columns (_tz) sourced from LEX accounting database to correctly handle DST transitions</li> </ul> Key Variables:  Scheduling: job_id, partition, state_simple, submit_time_tz, start_time_tz, end_time_tz, queue_waitResources: nodes_req/used, processors_req/used, memory_req, wallclock_req/used, gpus_requested Efficiency: cpu_eff, max_mem_eff Energy: cpu_energy_tdp_estimated_max/used_watt_hours, node_energy_total_watt_hours (iLO), gpu0/1_energy_total_watt_hours (Ganglia) Partitions: bigmem, bigmem-8600, bigscratch, csc, dav, ddn, debug, gpu, haswell, long, mono, short, standard Job States: CANCELLED, COMPLETED, FAILED, NODE_FAIL, OUT_OF_MEMORY, PENDING, RUNNING, TIMEOUT QoS Levels: Unknown, normal, buy-in, debug, penalty, high, standby Important Notes: <ul> <li>Non-_tz timestamp columns may be off by one hour across DST boundaries; use _tz columns for time difference calculations</li> <li>Energy fields are null for jobs without monitoring coverage</li> <li>Job step records and raw Slurm JSONB fields are excluded from this extract</li> <li>Do not attempt to re-identify individuals from hashed fields</li> </ul>

提供机构：

National Laboratory of the Rockies

创建时间：

2026-04-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集