five

TerraDS: A Dataset for Terraform HCL Programs

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14217385
下载链接
链接失效反馈
官方服务:
资源简介:
TerraDS The TerraDS dataset provides a comprehensive collection of Terraform programs written in the HashiCorp Configuration Language (HCL). As Infrastructure as Code (IaC) gains popularity for managing cloud infrastructure, Terraform has become one of the leading tools due to its declarative nature and widespread adoption. However, a lack of publicly available, large-scale datasets has hindered systematic research on Terraform practices. TerraDS addresses this gap by compiling metadata and source code from 62,406 open-source repositories with valid licenses. This dataset aims to foster research on best practices, vulnerabilities, and improvements in IaC methodologies. Structure of the Database The TerraDS dataset is organized into two main components: a SQLite database containing metadata and an archive of source code (~335 MB). The metadata, captured in a structured format, includes information about repositories, modules, and resources: 1. Repository Data: Contains 62,406 repositories with fields such as repository name, creation date, star count, and permissive license details. Provides cloneable URLs for access and analysis. Tracks additional metrics like repository size and the latest commit details. 2. Module Data: Consists of 279,344 modules identified within the repositories. Each module includes its relative path, referenced providers, and external module calls stored as JSON objects. 3. Resource Data: Encompasses 1,773,991 resources, split into managed (1,484,185) and data (289,806) resources. Each resource entry details its type, provider, and whether it is managed or read-only. Structure of the Archive The provided archive contains the source code of the 62,406 repositories to allow further analysis based on the actual source instead of the metadata only. As such, researcher can access the permissive repositories and conduct studies on the executable HCL code. Tools The "HCL Dataset Tools" file contains a snapshot of the https://github.com/prg-grp/hcl-dataset-tools repository - for long term archival reasons. The tools in this repository can be used to reproduce this dataset. One of the tools - "RepositorySearcher" - can be used to fetch metadata for various other GitHub API queries, not only Terraform code. While the RepositorySearcher allows usage for other types of repository search, the other tools provided are focused on Terraform repositories.
创建时间:
2024-11-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作