five

Long Term Per-Component Power and Thermal Measurements of the OLCF Summit System

收藏
DataCite Commons2022-04-12 更新2025-04-09 收录
下载链接:
https://www.osti.gov/servlets/purl/1861393/
下载链接
链接失效反馈
官方服务:
资源简介:
As we move into the exascale era, the power and energy footprints of high-performance computing (HPC) systems have grown significantly larger. Due to the harsh power and thermal conditions the system, components are exposed to extreme operating conditions. Operation of such modern HPC systems requires deep insights into long term system behavior to maintain its efficiency as well as its longevity. To help the HPC community to gain such insights, we provide a dataset that records the long-term power and thermal behavior of the 200PF pre-exascale supercomputer at the Oak Ridge Leadership Computing Facility (OLCF), Summit. This system is an IBM AC922 based system that has 9,252 IBM Power9 CPUs and 27,756 Nvidia V100 GPUs and can consume up to 13MW power at peak. Heat removal is performed using medium temperature direct liquid cooling and rear-door heat exchanger based secondary cooling loop. Originally extracted from a high-resolution (1Hz) per-component (GPUs, CPUs) measurements from the system, we primarily provide a dataset that has 10-second and 1-minute mean power & thermal measurements selected from five month-long segments over the course of 2020 (January & August), 2021 (February & August), and 2022 (January). For convenience, we also provide various sub datasets randomly sampled from the time and space (hosts) of the cluster. Further details and example code for analysis can be found in the following GitHub repository: https://github.com/at-aaims/summit_power_and_thermal_data
提供机构:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
创建时间:
2022-04-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作