five

abhishek8aiml/METR-LA

收藏
Hugging Face2026-01-02 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/abhishek8aiml/METR-LA
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - time-series-forecasting - tabular-regression tags: - traffic-prediction - time-series - graph-neural-networks - transportation size_categories: - 1M<n<10M --- # METR-LA Traffic Dataset ## Dataset Description This dataset contains traffic flow data for time series forecasting tasks, commonly used with Graph Neural Networks and specifically the Diffusion Convolutional Recurrent Neural Network (DCRNN) model. ## Dataset Structure ### Data Format - **Format**: Parquet files for efficient loading and analysis - **Splits**: train (70%), validation (10%), test (20%) - **temporal splits** preserving chronological order - **Features**: Time series traffic flow data with temporal and spatial dimensions ### Split Strategy - **Temporal splitting**: Data is split chronologically to prevent data leakage - **All sensors included**: Each split contains data for all sensors at each time step - **Training period**: Earliest 70% of time samples across all sensors - **Validation period**: Next 10% of time samples across all sensors - **Test period**: Latest 20% of time samples across all sensors - **Graph structure preserved**: Spatial relationships maintained in all splits ### Data Schema - `node_id`: Sensor/node identifier (0-206 for METR-LA, 0-324 for PEMS-BAY) - `t0_timestamp`: ISO 8601 timestamp of the reference time point (t+0) for each sequence - `x_t*_d*`: Input features at different time offsets and dimensions - `x_t-11_d0` to `x_t+0_d0`: Traffic flow values at 12 historical time steps - `x_t-11_d1` to `x_t+0_d1`: Time-of-day features (normalized 0-1) - `y_t*_d*`: Target values at future time steps and dimensions - `y_t+1_d0` to `y_t+12_d0`: Traffic flow predictions for next 12 time steps - `y_t+1_d1` to `y_t+12_d1`: Time-of-day features for prediction horizon ### Dataset Statistics - **Total time series samples**: ~34K (METR-LA) / ~52K (PEMS-BAY) - **Total records**: ~7M (METR-LA) / ~17M (PEMS-BAY) - **Records per sample**: 207 (METR-LA) / 325 (PEMS-BAY) sensors - **Temporal resolution**: 5-minute intervals - **Prediction horizon**: 1 hour (12 time steps) ## Usage ```python from datasets import Dataset, DatasetDict import pandas as pd # Load from local parquet files train_df = pd.read_parquet("METR-LA/train.parquet") val_df = pd.read_parquet("METR-LA/val.parquet") test_df = pd.read_parquet("METR-LA/test.parquet") ds = DatasetDict({ "train": Dataset.from_pandas(train_df, preserve_index=False), "val": Dataset.from_pandas(val_df, preserve_index=False), "test": Dataset.from_pandas(test_df, preserve_index=False) }) print(f"Train records: {len(ds['train']):,}") print(f"Val records: {len(ds['val']):,}") print(f"Test records: {len(ds['test']):,}") ``` ## Citation If you use this dataset, please cite the original DCRNN paper: ```bibtex @inproceedings{li2018dcrnn_traffic, title={{Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting}}, author={{Li, Yaguang and Yu, Rose and Shahabi, Cyrus and Liu, Yan}}, booktitle={{International Conference on Learning Representations}}, year={{2018}} } ``` ## Dataset Generation The code used to generate this Hugging Face-compatible dataset can be found at [witgaw/DCRNN](https://github.com/witgaw/DCRNN), a fork of the original DCRNN repository with enhanced data processing capabilities. ## Original Data Source This dataset is derived from the original METR-LA dataset used in the DCRNN paper. ## License MIT License - See the [original repository LICENSE](https://github.com/liyaguang/DCRNN/blob/master/LICENSE) for details.
提供机构:
abhishek8aiml
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作