NLR HPC Kestrel Jobs Data

Name: NLR HPC Kestrel Jobs Data
Creator: National Laboratory of the Rockies
Published: 2026-04-22 20:49:06
License: 暂无描述

DataCite Commons2026-04-22 更新2026-04-25 收录

下载链接：

https://www.osti.gov/servlets/purl/3023270

下载链接

链接失效反馈

官方服务：

资源简介：

Overview: Anonymized job-level records from the Kestrel HPC system at the National Laboratory of the Rockies (NLR). Each record represents a Slurm batch job with scheduling metadata, resource requests, utilization, energy estimates, and efficiency metrics. Sensitive fields (user, account, job name, submit line, working directory, submit script, and job type) are replaced with 7-character cryptographic hashes. System & Timeframe: Kestrel is located at the NLR campus. Standard compute nodes have 104 cores and 256 GB RAM; bigmem nodes have 2,000 GB. GPU nodes (gpu-h100 partition) use NVIDIA H100 GPUs. Data covers jobs submitted August 2023 through December 2025. Funding provided by the U.S. Department of Energy, EERE. Files: <ul> <li>esif.hpc.kestrel.job-anon.zip — Anonymized job records (Hive-partitioned Parquet)</li> <li>datacard.md — Full dataset documentation</li> </ul> ~11 million rows, 50 variables. Readable with PyArrow, pandas, DuckDB, Apache Spark, or any Parquet-compatible tool. Data Collection: Jobs collected via sacct with timezone-aware export (SLURM_TIME_FORMAT="%Y-%m-%dT%H:%M:%S%z"), loaded into PostgreSQL. Calculated columns updated via database triggers and batch functions. All timestamps use timestamptz and correctly handle DST transitions. Preprocessing: <ul> <li>Anonymization of name, user, account, submit_line, work_dir, submit_script, and job_type via 7-char hex hashes</li> <li>Derived columns: queue_wait, cpu_eff, max/min/avg_mem_eff, energy estimates</li> <li>Simplified job state mapping (e.g., "CANCELLED by 132357" → "CANCELLED")</li> <li>Boolean flags: python_job, reframe_job</li> <li>Temporal decomposition: year, month, day, day_of_week, hour, minute from submit_time</li> <li>Shared node tracking: shared_job_count, nodes_shared, jobs_shared</li> </ul> Key Variables: Scheduling: job_id, partition, state_simple, submit_time, start_time, end_time, queue_wait Resources: nodes_req/used, processors_req/used, memory_req, wallclock_req/used, gpus_requested Efficiency: cpu_eff, max/min/avg_mem_eff Energy: cpu_energy_tdp_estimated_max/used_watt_hours, consumed_energy_raw_joules, consumed_energy_raw_watt_hours Sharing: shared_job_count, nodes_shared, jobs_shared Partitions: short, standard, debug, gpu-h100 Job States: CANCELLED, COMPLETED, FAILED, PENDING, RUNNING QoS Levels: normal, high Important Notes: <ul> <li>Timestamps include timezone offsets; DST transitions are handled correctly, though adding intervals across DST boundaries requires offset adjustment</li> <li>shared_job_count reflects physical node co-residency, not use of the shared partition</li> <li>Job step records and raw Slurm JSONB fields are excluded</li> <li>Do not attempt to re-identify individuals from hashed fields</li> </ul>

提供机构：

National Laboratory of the Rockies

创建时间：

2026-04-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集