five

hq-bench/quitobench

收藏
Hugging Face2026-03-30 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/hq-bench/quitobench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - time-series-forecasting language: - en tags: - time-series - forecasting - application-traffic - cloud-computing - benchmark - TSF-regime - regime-balanced - single-provenance pretty_name: "QuitoBench: A High-Quality Open Time Series Forecasting Benchmark" size_categories: - 10M<n<100M configs: - config_name: hour data_files: - split: test path: v20260315/test_hour-00001-of-00001.parquet description: > Hourly evaluation split (1-hour granularity). 517 test series, each with 15,356 time steps spanning 2021-11-18 to 2023-08-19. Test-set length per series: 552 steps. - config_name: min data_files: - split: test path: v20260315/test_min-00001-of-00001.parquet description: > 10-minute evaluation split (10-min granularity). 773 test series, each with 5,904 time steps spanning 2023-07-10 to 2023-08-19. Test-set length per series: 3,312 steps. --- # QuitoBench **QuitoBench** is a regime-balanced evaluation benchmark curated from **Quito**, a billion-scale, single-provenance time series dataset of application-traffic workloads from Alipay's production platform. > 🌐 **Project Page:** [hq-bench.github.io/quito](https://hq-bench.github.io/quito/) > 📄 **Paper:** [arXiv:2603.26017](https://arxiv.org/abs/2603.26017) > 💻 **Code:** [github.com/alipay/quito](https://github.com/alipay/quito) > 📦 **Training Corpus:** [hq-bench/quito-corpus](https://huggingface.co/datasets/hq-bench/quito-corpus) --- ## Dataset Overview | | `hour` config | `min` config | |---|---|---| | Granularity | 1 hour | 10 minutes | | # test series | 517 | 773 | | Series length | 15,356 steps | 5,904 steps | | Test-set length / series | 552 steps | 3,312 steps | | Date range | 2021-11-18 → 2023-08-19 | 2023-07-10 → 2023-08-19 | | # variates / series | 5 | 5 | The 1,290 test series are **stratified across all eight trend × seasonality × forecastability (TSF) regime cells** (~160 series/cell), ensuring balanced evaluation. **Train/test split:** Global temporal cutoff at **2023-07-28 00:00:00 UTC**. Data before the cutoff is train (70%) / validation (20%); data from the cutoff onward is the test set. --- ## Schema Each row represents one timestamp of one series (long/tidy format). | Column | Type | Description | |---|---|---| | `item_id` | int64 | Unique series identifier | | `date_time` | datetime64[ns] | UTC timestamp | | `ind_1` … `ind_5` | float64 | Five anonymised traffic variates (NaN for missing) | To reconstruct a single multivariate series: filter by `item_id`, sort by `date_time`, then apply the 2023-07-28 cutoff for train/test splits. --- ## Quick Start ```python from datasets import load_dataset # Load hourly test split ds_hour = load_dataset("hq-bench/quitobench", "hour") df_hour = ds_hour["test"].to_pandas() # Load 10-minute test split ds_min = load_dataset("hq-bench/quitobench", "min") df_min = ds_min["test"].to_pandas() ``` ### Reconstruct train/test splits ```python import pandas as pd CUTOFF = pd.Timestamp("2023-07-28 00:00:00") df = load_dataset("hq-bench/quitobench", "hour")["test"].to_pandas() # Pick one series series = df[df["item_id"] == df["item_id"].iloc[0]].sort_values("date_time") train = series[series["date_time"] < CUTOFF] test = series[series["date_time"] >= CUTOFF] X_train = train[["ind_1", "ind_2", "ind_3", "ind_4", "ind_5"]].values X_test = test[["ind_1", "ind_2", "ind_3", "ind_4", "ind_5"]].values ``` --- ## License [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) ## Citation ```bibtex @article{xue2026quitobench, title = {{QuitoBench}: A High-Quality Open Time Series Forecasting Benchmark}, author = {Xue, Siqiao and Zhu, Zhaoyang and Zhang, Wei and Cai, Rongyao and Wang, Rui and Mu, Yixiang and Zhou, Fan and Li, Jianguo and Di, Peng and Yu, Hang}, journal = {arXiv preprint arXiv:2603.26017}, year = {2026}, url = {https://arxiv.org/abs/2603.26017} } ```
提供机构:
hq-bench
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作