five

DataCenter-Traces-Datasets

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14564846
下载链接
链接失效反馈
官方服务:
资源简介:
Public datasets organized for machine learning or artificial intelligence usage. The following dasets can be used: Alibaba 2018 machine usage Processed from the original files found at: https://github.com/alibaba/clusterdata/tree/master/cluster-trace-v2018 This repository dataset of machine usage includes the following columns: +--------------------------------------------------------------------------------------------+ | Field | Type | Label | Comment | +--------------------------------------------------------------------------------------------+ | cpu_util_percent | bigint | | [0, 100] | | mem_util_percent | bigint | | [0, 100] | | net_in | double | | normarlized in coming network traffic, [0, 100] | | net_out | double | | normarlized out going network traffic, [0, 100] | | disk_io_percent | double | | [0, 100], abnormal values are of -1 or 101 | +--------------------------------------------------------------------------------------------+   Three sampled datasets are found: average value of each column grouped every 10 seconds as original, and downsampled to 30 seconds and 300 seconds as well. Every column includes the average utilization of the whole data center. Google 2019 instance usage   Processed from the original dataset and queried using Big Query. More information available at: https://research.google/tools/datasets/google-cluster-workload-traces-2019/ This repository dataset of instance usage includes the following columns: +--------------------------------------------------------------------------------------------+ | Field | Type | Label | Comment | +--------------------------------------------------------------------------------------------+ | avg_cpu | double | | [0, 1] | | avg_mem | double | | [0, 1] | | avg_assigned_mem | double | | [0, 1] | | avg_cycles_per_instruction | double | | [0, _] | +--------------------------------------------------------------------------------------------+   One sampled dataset is found: average value of each column grouped every 300 seconds as original. Every column includes the average utilization of the whole data center. Azure v2 virtual machine workload   Processed from the original dataset. More information available at: https://github.com/Azure/AzurePublicDataset/blob/master/AzurePublicDatasetV2.md This repository dataset of instance usage includes the following columns: +--------------------------------------------------------------------------------------------+ | Field | Type | Label | Comment | +--------------------------------------------------------------------------------------------+ | cpu_usage | double | | [0, _] | | assigned_mem | double | | [0, _] | +--------------------------------------------------------------------------------------------+   One sampled dataset is found: sum value of each column grouped every 300 seconds as original. For computing CPU_usage, we used core_count usage of each virtual machine. Every column includes the total consumption of the whole data center virtual machines. There is a version of each file including timestamp (from 0 to 2591700, in 300 seconds timestep), and other version without timestamp   Access LevelThe dataset is freely accessible under an Open Access model. There are no restrictions for reuse, and it is licensed under [Creative Commons Attribution 4.0 (CC-BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
创建时间:
2024-12-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作