Simulated application load on a Kubernetes system based on Human traffic pattern
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11210558
下载链接
链接失效反馈官方服务:
资源简介:
Contained here is the dataset generated using the tool based on the report 'A Dynamic Kubernetes Load Generation Solution Mimicking Human Traffic Pattern'. The tool was created to fill in the gap of having to generate artificial load within Kubernetes infrastructure. This tool could then be used as a stable testing framework to determine whether the Autoscaling solutions in place can handle the expected load pattern or not. It can also be used as a standard benchmarking tool to test various algorithms that aim to improve the already existing autoscaling solutions. The tool is primarily for IaaS (Infrastructure As A Service) providers, who have to deal with applications as a black box and are unaware of the scaling needs of the application. Some of the data generated in this dataset were made in reference to existing datasets obtained from Google Dataset of different containers. The reference dataset will also be uploaded in the near future.
The dataset contains CSV files, which have the name of the pod, average CPU usage, average memory usage, and the timestamp of when the data was collected. These measurements were taken directly from the Prometheus service which uses using Kubernetes metrics server to scrape the data from the pods. The `avg_cpu_usage` is the average CPU percent of the total CPU available the application utilized over a 1-minute interval window. While the `avg_memory_usage` is the average memory utilized in bytes over 1 1-minute interval window. The CSV file names also mention the period of time from when the measurements were taken. Any date and time within the dataset are in the CEST timezone. All of the CSV files in the dataset have the same layout.
The datasets were collected from a Kubernetes node in a VM, with the following specifications,- Intel core (Broadwell, no TSX, IBRS), 2 cores @ 2.594 GHzkb- 4GiB of memory- Ubuntu 22.04.4 LTS x86_64- Kernel: 5.15.0-105-generic
The names of the file itself specify the type of application load generated. The file names can be divided into segments separated by `__`. The last two segments signify the start time of the measurement data, and the end time of the measurement data respectively. The optional segment before the mentioned two segments signifies the container from the original input dataset that was used as a reference to produce the given data.
Similarly, there is an optional tag of `CPU`, `mem`, or `test` at the beginning of the file name. This signifies the type of application load that was used to generate the data. `CPU` would mean the application was created by using options on the tool that used CPU stressing algorithms from `ng-stress`. This was used to better mimic a CPU-intensive application. Similarly, `mem` primarily makes use of a test algorithm that stresses memory. `test` is a special option making use of a combination of tests to have a balanced outcome.
In the real world, the applications are never static, and between each run, with the same parameters, the application will always show some variations. To account for these variations, we added some inherent randomness in the number of concurrent connections and the amount of time a virtual user waits to make a request. There are some measurements where these measurements were frozen, which makes the class,- fixed-load_fixed-sleep - random-load_fixed-sleep- fixed-load_random-sleep
measurements.The default class of (random-load_random-sleep) is every other measurement not marked by these tags in their name.
The dataset itself was collected over a period of 3 days, from 25-03-2024 to 28-03-2024. Each dataset generally contains 34 min of simulation data, with the first and last 2 minutes without any traffic being directed to stabilize the system. The actual 30 min of data corresponds to the daily behavioral pattern as shown in a 30-day period.
创建时间:
2024-05-17



