Graphshot: A Large-Scale Program Representation Dataset for Machine Learning in Compilers

Name: Graphshot: A Large-Scale Program Representation Dataset for Machine Learning in Compilers
Creator: Zenodo
Published: 2026-05-12 06:09:39
License: 暂无描述

Zenodo2026-05-12 更新2026-05-26 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.20130319

下载链接

链接失效反馈

官方服务：

资源简介：

Various software analysis techniques and compiler optimizations rely heavily on understanding a program’s execution context to be effective. While dynamic execution profiles provide the ground truth for this behavior, collecting them is computationally expensive and introduces significant overhead to the compilation and analysis pipelines. As a result, there is growing interest in developing data-driven models capable of inferring program behavior statically. However, the software engineering and machine learning communities lack publicly available, large-scale datasets that reliably map static program structure to dynamic execution profiles. To address this gap, we propose Graphshot, a comprehensive open-source dataset of execution profiles collected from over 400k benchmarks across two distinct suites. This dataset is intended to assist researchers and compiler developers in developing analysis and inference models for downstream tasks such as program characterization and profile prediction. Graphshot merges static profiles collected at the compiler's intermediate representation level with dynamic profiles aggregated over multiple representative runs at different program granularities. Furthermore, our open source release includes utilities to make our dataset compatible with graph-based ML models.

提供机构：

Zenodo

创建时间：

2026-05-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集