"Pod-Level Labelled Resource Utilization Dataset for Intrusion Detection in Kubernetes"

Name: "Pod-Level Labelled Resource Utilization Dataset for Intrusion Detection in Kubernetes"
Creator: IEEE DataPort
Published: 2025-12-16 09:19:10
License: 暂无描述

DataCite Commons2025-12-16 更新2026-05-03 收录

下载链接：

https://ieee-dataport.org/documents/pod-level-labelled-resource-utilization-dataset-intrusion-detection-kubernetes

下载链接

链接失效反馈

官方服务：

资源简介：

"AbstractThis work presents a labeled resource-usage dataset for post-compromise intrusion detection in containerized Kubernetes environments. Two distinct operating modes Normal Mode and Attack Mode were emulated inside isolated Linux containers to capture realistic benign and malicious runtime behaviors. Resource metrics were collected externally using Prometheus exporters to ensure tamper-resilience. The resulting dataset provides high-resolution CPU, memory, and disk I\/O traces suitable for training classical and temporal machine learning models for anomaly detection.I. SYSTEM AND PLATFORM CONFIGURATIONA. Hardware ConfigurationThe experimental testbed was deployed on a semi\u2013server-grade physical machine with the following specifications:CPU: 32 physical coresSystem Memory: 96 GB RAMThis configuration provides sufficient computational headroom to execute multiple high-load containerized workloads simultaneously without host-level contention, ensuring realistic multi-tenant behavior.Docker was explicitly configured to allocate 64 GB of RAM for container operations. This allocation guarantees stable execution of concurrent Kubernetes pods under varying resource demands and prevents artificial throttling effects that could bias resource-usage measurements.B. Platform ConfigurationThe software stack used in the experimental setup is summarized below:Host Operating System: Ubuntu LinuxContainer Runtime: DockerOrchestration Platform: Kubernetes (single-node cluster)Kubernetes was deployed on top of Docker and used to orchestrate all workloads within isolated namespaces. Multiple Linux-based workload pods were scheduled on the same node to simulate realistic multi-tenant resource contention, a key characteristic of production Kubernetes environments. I. NORMAL MODE DATA GENERATIONNormal Mode aims to capture realistic, benign workload fluctuations typical of containerized applications. To generate this dataset, we executed a controlled load-simulation script inside a Kubernetes pod. The script introduces randomized CPU, memory, and I\/O activity while remaining within safe operational limits.A. CPU and Memory BehaviorThe script employs stress-ng to generate light-to-moderate CPU activity (10\u201350%) with randomized intervals. Occasional brief spikes up to 100% emulate legitimate transient events such as software updates or computational bursts. Memory usage is similarly varied between low and moderate allocations (10\u2013200 MB), with infrequent larger spikes. These fluctuations approximate natural application behavior and avoid the stationary patterns typical of malware.B. Disk I\/O CharacteristicsDisk activity is generated by writing and reading small temporary files under a safe working directory. The volume ranges between 10\u201350 MB per cycle, reflecting typical benign I\/O operations without risking container storage exhaustion. Files are cleaned after each iteration to prevent accumulation.C. Temporal VariabilityThe script introduces variability in duration, amplitude, and frequency of resource utilization. This temporal heterogeneity ensures the Normal Mode dataset captures realistic noise and irregularities inherent to legitimate workloads.II. ATTACK MODE DATA GENERATIONAttack Mode emulates post-compromise resource-abusive behavior such as cryptomining, data exfiltration, and memory flooding. The workload is generated via the stress-kali-final.sh script executed within a compromised Kali Linux pod.A. CPU SaturationThe script determines the available number of CPU cores and spawns sufficient busy workers to saturate approximately 85\u201390% of total CPU capacity. This deterministic high CPU load reflects the sustained operation of malicious mining or hashing processes.B. Memory FloodingContainer memory limits are detected using cgroup introspection. Approximately 80\u201390% of memory is allocated and continuously touched to maintain residency. This behavior models memory-intensive attacks and post-compromise staging tools. A small headroom margin is preserved to prevent out-of-memory termination.C. High-Volume Disk I\/O (Exfiltration\/Ransomware Emulation)A repeating I\/O loop writes large blocks of pseudorandom data until reaching the container\u2019s storage cap, performs sequential reads, truncates the file, and resumes growth. This cyclical pattern simulates bulk data extraction and ransomware-like file encryption activities.D. Long-Duration Stationary PatternsUnlike the variability of Normal Mode, Attack Mode exhibits stable, sustained, and repetitive load characteristics. This distinction is crucial for training models to detect long-term malicious persistence rather than short bursts.III. DATA SCRAPING AND LABELINGAll resource metrics CPU utilization, memory consumption, disk throughput were collected externally using Prometheus pod-level exporters at fixed intervals.IV. DATASET STRUCTURE AND FEATURE DESCRIPTIONBoth the Normal Mode dataset and Attack Mode dataset share an identical schema, enabling direct comparison and supervised learning.Each row represents a single timestamped observation. The columns are described below:A. CPU MetricsCPU-etcd-minikubeCPU utilization of the Kubernetes control-plane service (etcd) running on the Minikube node. This feature captures baseline system activity.CPU-kalicCPU utilization of the Kali Linux workload pod. This metric is the primary indicator of benign versus malicious compute behavior.B. Memory MetricsMemory_Cache-etcd-minikubeCached memory usage of the etcd component. Reflects system-level memory behavior and background Kubernetes activity.Memory_Cache-kalicCached memory usage of the Kali Linux pod, indicating memory reuse and buffering patterns.Memory-etcd-minikubeTotal memory consumption of the etcd service, representing baseline memory overhead.Memory-kalicTotal memory consumption of the Kali Linux pod. This feature is critical for identifying memory-flooding attacks.C. Disk I\/O Metricswrite-etcd-minikubeDisk write throughput generated by the Kubernetes control-plane service.write-kalicDisk write throughput of the Kali Linux pod, capturing benign file operations in Normal Mode and high-volume I\/O during attack scenarios. "

提供机构：

IEEE DataPort

创建时间：

2025-12-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集