NLR HPC Kestrel Jobs Data
收藏DataCite Commons2026-04-22 更新2026-04-25 收录
下载链接:
https://www.osti.gov/servlets/purl/3023270
下载链接
链接失效反馈官方服务:
资源简介:
<b>Overview:</b> Anonymized job-level records from the Kestrel HPC system at the National Laboratory of the Rockies (NLR). Each record represents a Slurm batch job with scheduling metadata, resource requests, utilization, energy estimates, and efficiency metrics. Sensitive fields (user, account, job name, submit line, working directory, submit script, and job type) are replaced with 7-character cryptographic hashes.
<p><b>System & Timeframe:</b> Kestrel is located at the NLR campus. Standard compute nodes have 104 cores and 256 GB RAM; bigmem nodes have 2,000 GB. GPU nodes (gpu-h100 partition) use NVIDIA H100 GPUs. Data covers jobs submitted August 2023 through December 2025. Funding provided by the U.S. Department of Energy, EERE.</p>
<p><b>Files:</b></p>
<ul>
<li>esif.hpc.kestrel.job-anon.zip — Anonymized job records (Hive-partitioned Parquet)</li>
<li>datacard.md — Full dataset documentation</li>
</ul>
~11 million rows, 50 variables. Readable with PyArrow, pandas, DuckDB, Apache Spark, or any Parquet-compatible tool.
<p><b>Data Collection:</b> Jobs collected via sacct with timezone-aware export (SLURM_TIME_FORMAT="%Y-%m-%dT%H:%M:%S%z"), loaded into PostgreSQL. Calculated columns updated via database triggers and batch functions. All timestamps use timestamptz and correctly handle DST transitions.</p>
<p><b>Preprocessing:</b></p>
<ul>
<li>Anonymization of name, user, account, submit_line, work_dir, submit_script, and job_type via 7-char hex hashes</li>
<li>Derived columns: queue_wait, cpu_eff, max/min/avg_mem_eff, energy estimates</li>
<li>Simplified job state mapping (e.g., "CANCELLED by 132357" → "CANCELLED")</li>
<li>Boolean flags: python_job, reframe_job</li>
<li>Temporal decomposition: year, month, day, day_of_week, hour, minute from submit_time</li>
<li>Shared node tracking: shared_job_count, nodes_shared, jobs_shared</li>
</ul>
<p><b>Key Variables: </b>
<br>
Scheduling: job_id, partition, state_simple, submit_time, start_time, end_time, queue_wait
<br>
Resources: nodes_req/used, processors_req/used, memory_req, wallclock_req/used, gpus_requested
<br>
Efficiency: cpu_eff, max/min/avg_mem_eff
<br>
Energy: cpu_energy_tdp_estimated_max/used_watt_hours, consumed_energy_raw_joules, consumed_energy_raw_watt_hours
<br>
Sharing: shared_job_count, nodes_shared, jobs_shared</p>
<p><b>Partitions:</b> short, standard, debug, gpu-h100</p>
<p><b>Job States:</b> CANCELLED, COMPLETED, FAILED, PENDING, RUNNING</p>
<p><b>QoS Levels:</b> normal, high</p>
<p><b>Important Notes:</b></p>
<ul>
<li>Timestamps include timezone offsets; DST transitions are handled correctly, though adding intervals across DST boundaries requires offset adjustment</li>
<li>shared_job_count reflects physical node co-residency, not use of the shared partition</li>
<li>Job step records and raw Slurm JSONB fields are excluded</li>
<li>Do not attempt to re-identify individuals from hashed fields</li>
</ul>
提供机构:
National Laboratory of the Rockies
创建时间:
2026-04-01



