hq-bench/quitobench
收藏Hugging Face2026-03-30 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/hq-bench/quitobench
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- time-series-forecasting
language:
- en
tags:
- time-series
- forecasting
- application-traffic
- cloud-computing
- benchmark
- TSF-regime
- regime-balanced
- single-provenance
pretty_name: "QuitoBench: A High-Quality Open Time Series Forecasting Benchmark"
size_categories:
- 10M<n<100M
configs:
- config_name: hour
data_files:
- split: test
path: v20260315/test_hour-00001-of-00001.parquet
description: >
Hourly evaluation split (1-hour granularity). 517 test series, each with 15,356 time steps
spanning 2021-11-18 to 2023-08-19. Test-set length per series: 552 steps.
- config_name: min
data_files:
- split: test
path: v20260315/test_min-00001-of-00001.parquet
description: >
10-minute evaluation split (10-min granularity). 773 test series, each with 5,904 time steps
spanning 2023-07-10 to 2023-08-19. Test-set length per series: 3,312 steps.
---
# QuitoBench
**QuitoBench** is a regime-balanced evaluation benchmark curated from **Quito**, a billion-scale,
single-provenance time series dataset of application-traffic workloads from Alipay's production
platform.
> 🌐 **Project Page:** [hq-bench.github.io/quito](https://hq-bench.github.io/quito/)
> 📄 **Paper:** [arXiv:2603.26017](https://arxiv.org/abs/2603.26017)
> 💻 **Code:** [github.com/alipay/quito](https://github.com/alipay/quito)
> 📦 **Training Corpus:** [hq-bench/quito-corpus](https://huggingface.co/datasets/hq-bench/quito-corpus)
---
## Dataset Overview
| | `hour` config | `min` config |
|---|---|---|
| Granularity | 1 hour | 10 minutes |
| # test series | 517 | 773 |
| Series length | 15,356 steps | 5,904 steps |
| Test-set length / series | 552 steps | 3,312 steps |
| Date range | 2021-11-18 → 2023-08-19 | 2023-07-10 → 2023-08-19 |
| # variates / series | 5 | 5 |
The 1,290 test series are **stratified across all eight trend × seasonality × forecastability
(TSF) regime cells** (~160 series/cell), ensuring balanced evaluation.
**Train/test split:** Global temporal cutoff at **2023-07-28 00:00:00 UTC**. Data before the
cutoff is train (70%) / validation (20%); data from the cutoff onward is the test set.
---
## Schema
Each row represents one timestamp of one series (long/tidy format).
| Column | Type | Description |
|---|---|---|
| `item_id` | int64 | Unique series identifier |
| `date_time` | datetime64[ns] | UTC timestamp |
| `ind_1` … `ind_5` | float64 | Five anonymised traffic variates (NaN for missing) |
To reconstruct a single multivariate series: filter by `item_id`, sort by `date_time`, then
apply the 2023-07-28 cutoff for train/test splits.
---
## Quick Start
```python
from datasets import load_dataset
# Load hourly test split
ds_hour = load_dataset("hq-bench/quitobench", "hour")
df_hour = ds_hour["test"].to_pandas()
# Load 10-minute test split
ds_min = load_dataset("hq-bench/quitobench", "min")
df_min = ds_min["test"].to_pandas()
```
### Reconstruct train/test splits
```python
import pandas as pd
CUTOFF = pd.Timestamp("2023-07-28 00:00:00")
df = load_dataset("hq-bench/quitobench", "hour")["test"].to_pandas()
# Pick one series
series = df[df["item_id"] == df["item_id"].iloc[0]].sort_values("date_time")
train = series[series["date_time"] < CUTOFF]
test = series[series["date_time"] >= CUTOFF]
X_train = train[["ind_1", "ind_2", "ind_3", "ind_4", "ind_5"]].values
X_test = test[["ind_1", "ind_2", "ind_3", "ind_4", "ind_5"]].values
```
---
## License
[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
## Citation
```bibtex
@article{xue2026quitobench,
title = {{QuitoBench}: A High-Quality Open Time Series Forecasting Benchmark},
author = {Xue, Siqiao and Zhu, Zhaoyang and Zhang, Wei and
Cai, Rongyao and Wang, Rui and
Mu, Yixiang and Zhou, Fan and Li, Jianguo and Di, Peng and Yu, Hang},
journal = {arXiv preprint arXiv:2603.26017},
year = {2026},
url = {https://arxiv.org/abs/2603.26017}
}
```
提供机构:
hq-bench



