Dockerfiles Dataset
收藏arXiv2020-03-29 更新2024-06-21 收录
下载链接:
https://doi.org/10.5281/zenodo.3628771
下载链接
链接失效反馈官方服务:
资源简介:
Dockerfiles Dataset是由威斯康星大学麦迪逊分校和微软研究院的研究人员创建的,包含约178,000个独特的Dockerfiles。数据集从GitHub收集,通过多级解析和抽象,提供了五种不同的Dockerfiles表示形式,旨在支持对Dockerfiles的挖掘和静态检查。数据集不仅包括Dockerfiles本身,还包含相关的元数据和工具,用于从一种表示形式转换到另一种。这些丰富的数据资源为研究Docker和DevOps工具的高级功能提供了基础,特别是解决嵌套语言解析的挑战,从而推动了自动化规则挖掘和静态检查技术的发展。
Dockerfiles Dataset was created by researchers from the University of Wisconsin-Madison and Microsoft Research, containing approximately 178,000 unique Dockerfiles. The dataset was collected from GitHub, and through multi-level parsing and abstraction, it provides five distinct representations of Dockerfiles, aiming to support the mining and static inspection of Dockerfiles. In addition to the Dockerfiles themselves, the dataset also includes relevant metadata and tools for converting between different representations. These rich data resources lay a foundation for researching advanced functionalities of Docker and DevOps tools, particularly addressing the challenges of nested language parsing, thereby advancing the development of automated rule mining and static inspection technologies.
提供机构:
威斯康星大学麦迪逊分校, 微软研究院
创建时间:
2020-03-29



