Graphshot: A Large-Scale Program Representation Dataset for Machine Learning in Compilers
收藏Zenodo2026-05-12 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20130319
下载链接
链接失效反馈官方服务:
资源简介:
Various software analysis techniques and compiler optimizations rely heavily on understanding a program’s execution context to be effective. While dynamic execution profiles provide the ground truth for this behavior, collecting them is computationally expensive and introduces significant overhead to the compilation and analysis pipelines. As a result, there is growing interest in developing data-driven models capable of inferring program behavior statically. However, the software engineering and machine learning communities lack publicly available, large-scale datasets that reliably map static program structure to dynamic execution profiles.
To address this gap, we propose Graphshot, a comprehensive open-source dataset of execution profiles collected from over 400k benchmarks across two distinct suites. This dataset is intended to assist researchers and compiler developers in developing analysis and inference models for downstream tasks such as program characterization and profile prediction. Graphshot merges static profiles collected at the compiler's intermediate representation level with dynamic profiles aggregated over multiple representative runs at different program granularities. Furthermore, our open source release includes utilities to make our dataset compatible with graph-based ML models.
提供机构:
Zenodo
创建时间:
2026-05-12



