Mobility Networked Time Series Benchmark Datasets
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14590708
下载链接
链接失效反馈官方服务:
资源简介:
Overview
Human mobility is crucial for urban planning (e.g., public transportation) and epidemic response strategies. However, existing research often neglects integrating comprehensive perspectives on spatial dynamics, temporal trends, and other contextual views due to the limitations of existing mobility datasets. To bridge this gap, we introduce MOBINS (MOBIlity Networked time Series), a novel dataset collection designed for networked time-series forecasting of dynamic human movements. MOBINS features diverse and explainable datasets that capture various mobility patterns across different transportation modes in four cities and two countries and cover both transportation and epidemic domains at the administrative area level. Our experiments with nine baseline methods reveal the significant impact of different model backbones on the proposed six datasets. We provide a valuable resource for advancing urban mobility research, and our dataset collection is available at DOI 10.5281/zenodo.14590709.
Benchmark Code
Go to Github: https://github.com/kaist-dmlab/MOBINS
Benchmark Baseline List
Linear-based: DLinear, NLinear
RNN-based: SegRNN
Transformer-based: Informer, Reformer, PatchTST
CNN-based: TimesNet
GNN-based: STGCN, MPNNLSTM
Detailed Benchmark Results
There is MOBINS_Results.pdf in the Github Link, the detailed benchmark results of MOBINS were reported with MAE, MSE, and standard deviation.
Code Licence
Our code implementation is released under the MIT License
Code Reference
DLinear: https://github.com/cure-lab/LTSF-Linear
NLinear: https://github.com/cure-lab/LTSF-Linear
SegRNN: https://github.com/lss-1138/SegRNN
Informer: https://github.com/zhouhaoyi/Informer2020
Reformer: https://github.com/lucidrains/reformer-pytorch
PatchTST: https://github.com/yuqinie98/PatchTST
TimesNet: https://github.com/thuml/TimesNet
STGCN: https://github.com/hazdzz/STGCN
MPNNLSTM: https://github.com/geopanag/pandemic_tgnn
Benchmark Datasets
Dataset Descriptions
Dataset
Locations
Spatial node units
Edges
Domain
Daily Movements
Daily Amounts
Time interval
Time Range
Frames
Target dimension
Transportation
Seoul
128
290
Station-based administrative area
SmartCard:2.68M
In/Out-flow:4.02M
1 hour
01/01/2022-12/31/2023
17520
16640
Busan
60
121
Station-based administrative area
SmartCard:0.63M
In/Out-flow:0.75M
1 hour
01/01/2021-12/31/2023
26280
3720
Daegu
61
123
Station-based administrative area
SmartCard:0.10M
In/Out-flow:0.34M
1 hour
01/01/2021-12/31/2023
26280
3843
NYC
5
12
Borough
Taxi:0.10M
Ridership:3.03M
1 hour
02/01/2022-03/31/2024
17280
30
Epidemic
Korea
16
45
City&Province
SmartCards:13.41M
Infection:25834
1 day
01/20/2020-08/31/2023
1320
272
NYC
5
12
Borough
Taxi:2418
Infection:2038
1 day
03/01/2020-12/31/2023
1401
30
Formats of datasets (MOBINS.zip)
csv format datasets in every environment: each dataset has three components.
SPATIAL_NETWORK.csv: ( n∗n where n = # of nodes )
Column name list: INDEX, N0, N1, …, Nn
INDEX list: N0, N1, …, Nn
NODE_TIME_SERIES_FEATURES.csv: ( t * p ) * ( n * d ) where t = # of timestamps in a day, p = total period, and d = # of variables from time series
Column name list: datetime, N0 _{VARIABLE_NAME}, N1 _{VARIABLE_NAME}, …, Nn _{VARIABLE_NAME}
VARIABLE_NAME list: Transportation-[Seoul, Busan, Deagu]} datasets (INFLOW, OUTFLOW), Transportation-NYC dataset (RIDERSHIP), Epidemic-[Korea, NYC] dataset (INFECTION)
OD_MOVEMENTS.csv: ( t * p ) * ( n, n )
Column name list: N0 _ N0, N0 _ N1, N0 _ N2, … , Nn _ Nn−1 , Nn _ Nn
Meta datasets
In the Github Link, there is metadata for MOBINS_Meta.pdf.
Metadata for Transportation Datasets
Each file contains information about a single node or a node pair, which is abstracted for simplicity by describing only the i-th node. We omit the detailed description in metadata for Transportation-[Busan, Daegu] because the CSV file structures are identical to the metadata for Transportation_Seoul, differing only in the number of nodes, which is unique to each dataset. Transportation_NYC follows a similar structure, with the exception of the variable for node time-series features (ridership).
Metadata for Epidemic Datasets
Each file contains information about a single node or a node pair, which is abstracted for simplicity by describing only the i-th node. Both datasets share a consistent structure in terms of node time-series features, OD movements, and spatial networks.
Data Licence
The Transportation-[Seoul, Busan, Daegu, NYC] and Epidemic-NYC datasets are released under a CC BY-NC 4.0 International License.
The Epidemic-Korea datasets are released under a CC BY-NC-ND 4.0 International License.
How to Curate MOBINS
Composition
The MOBINS dataset collection consists of mobility networked time-series data for forecasting tasks in two domains: Transportation-[Seoul, Busan, Daegu, NYC] and Epidemic-[Korea, NYC]. Each dataset comprises three key components: (1) OD movements, (2) a spatial network, and (3) time series. These datasets capture the temporal evolution of OD movements and time series within a fixed spatial network. OD movements represent the volume of movements between pairs of nodes, while time series denotes the time-varying features within each node. These datasets provide a comprehensive understanding of mobility patterns, exhibiting high correlation and synergy between OD movements and time series.
Collection Process
All datasets in the MOBINS are collected from reliable sources, including government agencies, local governments, public transportation operators, and smart card companies. These sources provide publicly accessible data downloads based on their administrative systems. The source data from smart transit card information systems is accessed through API calls at the administrative area level, such as neighborhoods or provinces, to align the spatial resolution of the time series. The use of data available on the Korea Public Data Portal is either unrestricted or covered by the CC BY license. For sources without a specific license indication, we obtained responses about the uses for research through inquiries via phone or email. Additionally, data from the Korea Disease Control and Prevention Agency was used without numerical value modifications after obtaining permission.
Preprocessing/Cleaning/Labeling
Each dataset in the MOBINS collection is derived from different sources for OD movements and time series. To ensure consistent spatial and temporal resolution, we align these two sources using Python. In the Transportation-[Seoul, Busan, Daegu] datasets, we use 'station-based administrative areas' as spatial node units, treating stations within the same administrative area as a single node. For the Transportation-NYC dataset, we use boroughs as spatial node units to align the spatial resolution between taxi zones and stations. In the Epidemic-Korea dataset, the source infection case data is collected at the city and province levels. Hence, we use OD movements based on the city and province levels to match spatial resolution. Similarly, for the \emph{Epidemic-NYC} dataset, we use corresponding OD movements at the borough level to maintain consistent spatial node units. After the spatial resolutions are determined, we generate the spatial network based on these resolutions.
Regarding the temporal aspect, although the source frequency of OD movements from Transportation-[Busan, Daegu, NYC] is less than 15 minutes, we set the frequency to 1 hour in the MOBINS to match the time-series data frequency. This integration of double sources with positive or negative correlations enables the interpretation and forecasting of data from various contextual perspectives.
Among our dataset collection, the source OD movements of the Transportation-Seoul dataset have 14 missing days (07/01/2022 -- 07/06/2022, 07/13/2022, 07/20/2022, 08/06/2022, 08/07/2022, 09/13/2022, 10/31/2022, 11/01/2022, and 12/04/2022) in the Korea Public Data Portal. These missing days are filled with additional OD movement information from the smart transit card information system. Meanwhile, source OD movements from the NYC taxi dataset contain abnormal taxi records. To provide clean NYC OD movements, we remove abnormal taxi records if the difference between drop-off and pick-up timestamps is less than 0 seconds or more than 6 hours for each record. To facilitate future data updates, we maintain backups of the raw source data.
Data Reference
References of Origin-Destination Movements
Transportation-Seoul: Korea Public Data Portal and Smart Transit Card Information System
Transportation-[Busan,Daegu]: Smart Transit Card Information System
Transportation-NYC: NYC Taxi and Limousine Commission(TLC)
Epidemic-Korea: Smart transit card information system
Epidemic-NYC: NYC Taxi and Limousine Commission(TLC)
References of Time Series
Transportation-Seoul: Korea Public Data Portal (Seoul subway line 1-8 and line 9)
Transportation-[Busan,Daegu]: Korea Public Data Portal (Busan and Daegu)
Transportation-NYC: NYC Data Portal
Epidemic-Korea: Korea Disease Control and Prevention Agency
Epidemic-NYC: NYC Health
[note] All source websites support the official English version except Smart Transit Card Information System and Korea Disease Control and Prevention Agency. Therefore, we write down how to contact or use two source datasets.
Uses of Smart Transit Card Information System: Please contact this email (stcis@kotsa.or.kr).
Time Series of Epidemic-Korea: direct download link. If you want to contact the reference, please use this official English link.
7. Code Reference
we implemented our benchmark code based on Time Series Library (TSLib) .
DLinear: https://github.com/cure-lab/LTSF-Linear
NLinear: https://github.com/cure-lab/LTSF-Linear
SegRNN: https://github.com/lss-1138/SegRNN
Informer: https://github.com/zhouhaoyi/Informer2020
Reformer: https://github.com/lucidrains/reformer-pytorch
PatchTST: https://github.com/yuqinie98/PatchTST
TimesNet: https://github.com/thuml/TimesNet
STGCN: https://github.com/hazdzz/STGCN
MPNNLSTM: https://github.com/geopanag/pandemic_tgnn
Citation
@inproceedings{na2025mobility,
title={Mobility Networked Time Series Benchmark Datasets},
author={Na, Jihye, and Nam, Youngeun, and Yoon, Susik and Song, Hwanjun and Lee, Byung Suk and Lee, Jae-Gil},
booktitle={ICWSM},
year={2025},
}
Acknowledgement
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science and ICT) (No. 2023R1A2C2003690).
创建时间:
2025-03-21



