Dataset for AI-Driven Cloud Misconfiguration Risk Classification in SOC Environments
收藏DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20059679
下载链接
链接失效反馈官方服务:
资源简介:
This dataset was generated as part of the research project titled “Enabling AI-Driven Cloud SOC Resilience”.
The dataset contains 30,000 synthetically generated cloud configuration samples designed to support supervised machine learning-based risk classification of cloud misconfigurations within Security Operations Center (SOC) environments.
Each sample represents a cloud resource configuration and includes features associated with common cloud security misconfigurations, including:
- Publicly accessible storage
- Over-permissive IAM policies
- Open network ports
- Disabled encryption
- Unsecured endpoints
- Disabled logging and monitoring
A numerical risk scoring mechanism ranging from 1 to 5 was used during dataset generation to represent the severity and combination of misconfigurations. These scores were subsequently mapped into three risk classes:
- Low Risk (0): score 1
- Medium Risk (1): scores 2–3
- High Risk (2): scores 4–5
The dataset was structured as flat tabular data in CSV format to facilitate preprocessing and compatibility with machine learning workflows. A balanced class distribution was maintained across the three risk categories.
This dataset was developed to address the limited availability of publicly accessible cloud misconfiguration datasets and to support reproducibility of the experiments presented in the associated research work. It is intended for academic and research purposes related to cloud security, AI-driven SOC operations, risk scoring, and cybersecurity analytics.
提供机构:
Zenodo
创建时间:
2026-05-06



