five

NLR HPC Eagle Jobs Data and Additional Energy Metrics

收藏
DataCite Commons2026-04-22 更新2026-04-25 收录
下载链接:
https://www.osti.gov/servlets/purl/3023273
下载链接
链接失效反馈
官方服务:
资源简介:
<p><strong>Overview: </strong>Anonymized job-level records from the Eagle high-performance computing (HPC) system at the National Laboratory of the Rockies (NLR). Each record represents a Slurm batch job with scheduling metadata, resource requests, resource utilization, CPU/GPU energy consumption, and efficiency metrics. Sensitive fields (user, account, job name) are replaced with cryptographic hashes.</p> <p><strong>System &amp; Timeframe: </strong>Eagle was a 2,000-node, 8-petaflop system operated at NLR from 2019–2024. Data covers the full operational lifetime of the system. Slurm data was processed nightly; timestamps are in Mountain Time. Funding provided by the U.S. Department of Energy, EERE.</p> <p><strong>Files:</strong></p> <ul> <li>esif.hpc.eagle.job-anon.zip — Core anonymized job records (Hive-partitioned Parquet)</li> <li>esif.hpc.eagle.job-anon-energy-metrics.zip — Same records with additional iLO and Ganglia energy metrics</li> <li>datacard.md — Full dataset documentation</li> </ul> <p>~13.8 million rows, 62 variables. Readable with PyArrow, pandas, DuckDB, Apache Spark, or any Parquet-compatible tool.</p> <p><strong>Data Collection: </strong>Jobs collected via sacct through a pipeline: Eagle Jobs API → Redpanda → StreamSets → HPCMON API → PostgreSQL. Node-level power from iLO (HP Integrated Lights-Out); GPU power from Ganglia monitoring, joined to jobs via node lists and time ranges.</p> <p><strong>Preprocessing:</strong></p> <ul> <li>Anonymization of name, user, and account fields via cryptographic hashing</li> <li>Derived columns: queue_wait, cpu_eff, max_mem_eff</li> <li>Simplified job state mapping (e.g., "CANCELLED BY 12345" → "CANCELLED")</li> <li>QoS accounting rules (buy-in, standby, or Slurm QoS value)</li> <li>CPU energy estimated from TDP (200W, Intel Xeon Gold 6154, 18 cores)</li> <li>Timezone-aware columns (_tz) sourced from LEX accounting database to correctly handle DST transitions</li> </ul> <p><strong>Key Variables:&nbsp;</strong></p> <p>Scheduling: job_id, partition, state_simple, submit_time_tz, start_time_tz, end_time_tz, queue_waitResources: nodes_req/used, processors_req/used, memory_req, wallclock_req/used, gpus_requested</p> <p>Efficiency: cpu_eff, max_mem_eff</p> <p>Energy: cpu_energy_tdp_estimated_max/used_watt_hours, node_energy_total_watt_hours (iLO), gpu0/1_energy_total_watt_hours (Ganglia)</p> <p><strong>Partitions:</strong> bigmem, bigmem-8600, bigscratch, csc, dav, ddn, debug, gpu, haswell, long, mono, short, standard</p> <p><strong>Job States:</strong> CANCELLED, COMPLETED, FAILED, NODE_FAIL, OUT_OF_MEMORY, PENDING, RUNNING, TIMEOUT</p> <p><strong>QoS Levels:</strong> Unknown, normal, buy-in, debug, penalty, high, standby</p> <p><strong>Important Notes:</strong></p> <ul> <li>Non-_tz timestamp columns may be off by one hour across DST boundaries; use _tz columns for time difference calculations</li> <li>Energy fields are null for jobs without monitoring coverage</li> <li>Job step records and raw Slurm JSONB fields are excluded from this extract</li> <li>Do not attempt to re-identify individuals from hashed fields</li> </ul>
提供机构:
National Laboratory of the Rockies
创建时间:
2026-04-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作