five

HFlow: efficiently manage high-throughput applications on HPC systems

收藏
中国科学数据2026-02-05 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.1360/SSI-2025-0222
下载链接
链接失效反馈
官方服务:
资源简介:
High-throughput computing (HTC) typically executes a vast number of small-scale, short-duration, and mutually independent computational tasks. Although high-performance computing (HPC) systems possess abundant computational resources, mainstream resource management systems and existing HTC-oriented solutions exhibit significant deficiencies in throughput, application compatibility, and fault tolerance, resulting in inefficient resource management for HTC applications on HPC systems. To address this challenge, this paper proposes HFlow—a resource management solution integrating centralized and distributed resource management architectures. HFlow achieves high application compatibility through a hybrid job management mechanism and concurrently enhances throughput and fault tolerance via a fine-grained task partitioning algorithm coupled with a multi-level fault tolerance framework. Experimental evaluations on the Tianhe-2A supercomputer demonstrate that HFlow maintains HPC application management efficiency while successfully supporting HTC resource management requirements. Specifically, HFlow delivers task throughput 2.1× to 108.3× higher than mainstream resource management systems and dedicated HTC solutions, alongside robust multi-level fault tolerance capabilities.
创建时间:
2025-11-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作