HPC fault and performance analysis in field of petroleum geophysical exploration
收藏中国科学数据2026-01-15 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.13700/j.bh.1001-5965.2023.0768
下载链接
链接失效反馈官方服务:
资源简介:
Terabytes or even petabytes of data are now available due to the ongoing advancements in seismic data gathering technologies in the field of petroleum geophysical prospecting. As the volume of data, running time, and the number of nodes in high-performance computer (HPC) clusters increase, the probability of cluster issues and the difficulty of maintenance also rise. When a cluster or node fails, it often requires re-running the computing program, resulting in significant waste of resources. Therefore, based on the concepts of open tracing format (OTF) and distributed link tracing, this paper suggests a low invasive fault analysis method for high-performance computing clusters and programs in production environments in order to address the issues of low observability, difficult to analyze fault, and performance of HPC programs in computing clusters. The technique has the advantage of low invasion in addition to being able to monitor HPC programs in a production setting. The suggested method can be applied in production environments programs nearly without changing code. After that, the study applies this technique to the distributed "Trace Gather" sort program in a production setting for data gathering and analysis. It then confirms the efficacy of the approach outlined in this paper and identifies any hidden software flaws or performance issues in the program.
创建时间:
2026-01-15



