five

Graphshot: A Large-Scale Program Representation Dataset for Machine Learning in Compilers

收藏
Zenodo2026-05-12 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20130319
下载链接
链接失效反馈
官方服务:
资源简介:
Various software analysis techniques and compiler optimizations rely heavily on understanding a program’s execution context to be effective. While dynamic execution profiles provide the ground truth for this behavior, collecting them is computationally expensive and introduces significant overhead to the compilation and analysis pipelines. As a result, there is growing interest in developing data-driven models capable of inferring program behavior statically. However, the software engineering and machine learning communities lack publicly available, large-scale datasets that reliably map static program structure to dynamic execution profiles. To address this gap, we propose Graphshot, a comprehensive open-source dataset of execution profiles collected from over 400k benchmarks across two distinct suites. This dataset is intended to assist researchers and compiler developers in developing analysis and inference models for downstream tasks such as program characterization and profile prediction. Graphshot merges static profiles collected at the compiler's intermediate representation level with dynamic profiles aggregated over multiple representative runs at different program granularities. Furthermore, our open source release includes utilities to make our dataset compatible with graph-based ML models.
提供机构:
Zenodo
创建时间:
2026-05-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作