five

cloudops_tsf

收藏
魔搭社区2025-12-05 更新2025-09-13 收录
下载链接:
https://modelscope.cn/datasets/Salesforce/cloudops_tsf
下载链接
链接失效反馈
官方服务:
资源简介:
# Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain [Paper](https://arxiv.org/abs/2310.05063) | [Code](https://github.com/SalesforceAIResearch/pretrain-time-series-cloudops) Datasets accompanying the paper "Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain". ## Quick Start ```bash pip install datasets==2.12.0 fsspec==2023.5.0 ``` ### azure_vm_traces_2017 ```python from datasets import load_dataset dataset = load_dataset('Salesforce/cloudops_tsf', 'azure_vm_traces_2017') print(dataset) DatasetDict({ train_test: Dataset({ features: ['start', 'target', 'item_id', 'feat_static_cat', 'feat_static_real', 'past_feat_dynamic_real'], num_rows: 17568 }) pretrain: Dataset({ features: ['start', 'target', 'item_id', 'feat_static_cat', 'feat_static_real', 'past_feat_dynamic_real'], num_rows: 159472 }) }) ``` ### borg_cluster_data_2011 ```python dataset = load_dataset('Salesforce/cloudops_tsf', 'borg_cluster_data_2011') print(dataset) DatasetDict({ train_test: Dataset({ features: ['start', 'target', 'item_id', 'feat_static_cat', 'past_feat_dynamic_real'], num_rows: 11117 }) pretrain: Dataset({ features: ['start', 'target', 'item_id', 'feat_static_cat', 'past_feat_dynamic_real'], num_rows: 143386 }) }) ``` ### alibaba_cluster_trace_2018 ```python dataset = load_dataset('Salesforce/cloudops_tsf', 'alibaba_cluster_trace_2018') print(dataset) DatasetDict({ train_test: Dataset({ features: ['start', 'target', 'item_id', 'feat_static_cat', 'past_feat_dynamic_real'], num_rows: 6048 }) pretrain: Dataset({ features: ['start', 'target', 'item_id', 'feat_static_cat', 'past_feat_dynamic_real'], num_rows: 58409 }) }) ``` ## Dataset Config ```python from datasets import load_dataset_builder config = load_dataset_builder('Salesforce/cloudops_tsf', 'azure_vm_traces_2017').config print(config) CloudOpsTSFConfig( name='azure_vm_traces_2017', version=1.0.0, data_dir=None, data_files=None, description='', prediction_length=48, freq='5T', stride=48, univariate=True, multivariate=False, optional_fields=( 'feat_static_cat', 'feat_static_real', 'past_feat_dynamic_real' ), rolling_evaluations=12, test_split_date=Period('2016-12-13 15:55', '5T'), _feat_static_cat_cardinalities={ 'pretrain': ( ('vm_id', 177040), ('subscription_id', 5514), ('deployment_id', 15208), ('vm_category', 3) ), 'train_test': ( ('vm_id', 17568), ('subscription_id', 2713), ('deployment_id', 3255), ('vm_category', 3) ) }, target_dim=1, feat_static_real_dim=3, past_feat_dynamic_real_dim=2 ) ``` ```test_split_date``` is provided to achieve the same train-test split as given in the paper. This is essentially the date/time of ```rolling_evaluations * prediction_length``` time steps before the last time step in the dataset. Note that the pre-training dataset includes the test region, and thus should also be filtered before usage. ## Acknowledgements The datasets were processed from the following original sources. Please cite the original sources if you use the datasets. * Azure VM Traces 2017 * Bianchini. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th Symposium on Operating Systems Principles, pp. 153–167, 2017. * https://github.com/Azure/AzurePublicDataset * Borg Cluster Data 2011 * John Wilkes. More Google cluster data. Google research blog, November 2011. Posted at http://googleresearch.blogspot.com/2011/11/more-google-cluster-data.html. * https://github.com/google/cluster-data * Alibaba Cluster Trace 2018 * Jing Guo, Zihao Chang, Sa Wang, Haiyang Ding, Yihui Feng, Liang Mao, and Yungang Bao. Who limits the resource efficiency of my datacenter: An analysis of alibaba datacenter traces. In Proceedings of the International Symposium on Quality of Service, pp. 1–10, 2019. * https://github.com/alibaba/clusterdata ## Citation <pre> @article{woo2023pushing, title={Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain}, author={Woo, Gerald and Liu, Chenghao and Kumar, Akshat and Sahoo, Doyen}, journal={arXiv preprint arXiv:2310.05063}, year={2023} } </pre> ## Ethical Considerations This release is for research purposes only in support of an academic paper. Our models, datasets, and code are not specifically designed or evaluated for all downstream purposes. We strongly recommend users evaluate and address potential concerns related to accuracy, safety, and fairness before deploying this model. We encourage users to consider the common limitations of AI, comply with applicable laws, and leverage best practices when selecting use cases, particularly for high-risk scenarios where errors or misuse could significantly impact people’s lives, rights, or safety. For further guidance on use cases, refer to our AUP and AI AUP.

# 突破云运维领域时序预测预训练的极限 [Paper](https://arxiv.org/abs/2310.05063) | [Code](https://github.com/SalesforceAIResearch/pretrain-time-series-cloudops) 本数据集配套于论文《突破云运维领域时序预测预训练的极限》。 ## 快速入门 bash pip install datasets==2.12.0 fsspec==2023.5.0 ### Azure虚拟机追踪数据集2017(azure_vm_traces_2017) python from datasets import load_dataset dataset = load_dataset('Salesforce/cloudops_tsf', 'azure_vm_traces_2017') print(dataset) DatasetDict({ train_test: Dataset({ features: ['start(起始时间戳)', 'target(目标序列)', 'item_id(项目ID)', 'feat_static_cat(静态分类特征)', 'feat_static_real(静态实值特征)', 'past_feat_dynamic_real(过去动态实值特征)'], num_rows: 17568 }), pretrain: Dataset({ features: ['start(起始时间戳)', 'target(目标序列)', 'item_id(项目ID)', 'feat_static_cat(静态分类特征)', 'feat_static_real(静态实值特征)', 'past_feat_dynamic_real(过去动态实值特征)'], num_rows: 159472 }) }) ### Borg集群数据集2011(borg_cluster_data_2011) python dataset = load_dataset('Salesforce/cloudops_tsf', 'borg_cluster_data_2011') print(dataset) DatasetDict({ train_test: Dataset({ features: ['start(起始时间戳)', 'target(目标序列)', 'item_id(项目ID)', 'feat_static_cat(静态分类特征)', 'past_feat_dynamic_real(过去动态实值特征)'], num_rows: 11117 }), pretrain: Dataset({ features: ['start(起始时间戳)', 'target(目标序列)', 'item_id(项目ID)', 'feat_static_cat(静态分类特征)', 'past_feat_dynamic_real(过去动态实值特征)'], num_rows: 143386 }) }) ### 阿里巴巴集群追踪数据集2018(alibaba_cluster_trace_2018) python dataset = load_dataset('Salesforce/cloudops_tsf', 'alibaba_cluster_trace_2018') print(dataset) DatasetDict({ train_test: Dataset({ features: ['start(起始时间戳)', 'target(目标序列)', 'item_id(项目ID)', 'feat_static_cat(静态分类特征)', 'past_feat_dynamic_real(过去动态实值特征)'], num_rows: 6048 }), pretrain: Dataset({ features: ['start(起始时间戳)', 'target(目标序列)', 'item_id(项目ID)', 'feat_static_cat(静态分类特征)', 'past_feat_dynamic_real(过去动态实值特征)'], num_rows: 58409 }) }) ## 数据集配置 python from datasets import load_dataset_builder config = load_dataset_builder('Salesforce/cloudops_tsf', 'azure_vm_traces_2017').config print(config) CloudOpsTSFConfig( name='azure_vm_traces_2017', version=1.0.0, data_dir=None, data_files=None, description='', prediction_length(预测长度)=48, freq(频率)='5T', stride(步长)=48, univariate(单变量)=True, multivariate(多变量)=False, optional_fields(可选字段)=( 'feat_static_cat', 'feat_static_real', 'past_feat_dynamic_real' ), rolling_evaluations(滚动评估轮次)=12, test_split_date(测试分割时间点)=Period('2016-12-13 15:55', '5T'), _feat_static_cat_cardinalities(静态分类特征基数)={ 'pretrain': ( ('vm_id', 177040), ('subscription_id', 5514), ('deployment_id', 15208), ('vm_category', 3) ), 'train_test': ( ('vm_id', 17568), ('subscription_id', 2713), ('deployment_id', 3255), ('vm_category', 3) ) }, target_dim(目标维度)=1, feat_static_real_dim(静态实值特征维度)=3, past_feat_dynamic_real_dim(过去动态实值特征维度)=2 ) 此处提供的test_split_date(测试分割时间点)用于实现与论文中一致的训练-测试划分逻辑。其本质为数据集最后一个时间步之前的`rolling_evaluations * prediction_length`个时间步对应的日期/时间。需注意,预训练数据集包含测试区域,因此在使用前也应进行过滤。 ## 致谢 本数据集源自以下原始来源,若您使用本数据集,请引用对应的原始文献: * Azure虚拟机追踪数据集2017 * Bianchini M. Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms[C]//Proceedings of the 26th Symposium on Operating Systems Principles. 2017: 153-167. * https://github.com/Azure/AzurePublicDataset * Borg集群数据集2011 * Wilkes J. More Google Cluster Data[EB/OL]. Google Research Blog, 2011-11. * https://github.com/google/cluster-data * 阿里巴巴集群追踪数据集2018 * Guo J, Chang Z, Wang S, et al. Who Limits the Resource Efficiency of My Datacenter: An Analysis of Alibaba Datacenter Traces[C]//Proceedings of the International Symposium on Quality of Service. 2019: 1-10. * https://github.com/alibaba/clusterdata ## 引用 <pre> @article{woo2023pushing, title={Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain}, author={Woo, Gerald and Liu, Chenghao and Kumar, Akshat and Sahoo, Doyen}, journal={arXiv preprint arXiv:2310.05063}, year={2023} } </pre> ## 伦理考量 本数据集仅用于支持学术论文的研究目的。我们发布的模型、数据集与代码并非为所有下游应用场景专门设计或评估。我们强烈建议用户在部署本模型前,针对其准确性、安全性与公平性相关问题进行充分评估与优化。我们鼓励用户考量人工智能的常见局限性,遵守适用法律法规,并在选择应用场景时采用最佳实践,尤其针对那些错误或不当使用可能严重影响民众生命、权利或安全的高风险场景。如需了解更多应用场景相关指导,请参阅我们的AUP(可接受使用政策)与AI AUP(人工智能可接受使用政策)。
提供机构:
maas
创建时间:
2025-08-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作