five

pppdlhh/LaDe

收藏
Hugging Face2026-01-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/pppdlhh/LaDe
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 tags: - Logistics - Last-mile Delivery - Spatial-Temporal - Graph size_categories: - 10M<n<100M --- Dataset Download: https://huggingface.co/datasets/Cainiao-AI/LaDe/tree/main Dataset Website: https://cainiaotechai.github.io/LaDe-website/ Code Link:https://github.com/wenhaomin/LaDe Paper Link: https://arxiv.org/abs/2306.10675 # 1. About Dataset **LaDe** is a publicly available last-mile delivery dataset with millions of packages from industry. It has three unique characteristics: (1) Large-scale. It involves 10,677k packages of 21k couriers over 6 months of real-world operation. (2) Comprehensive information, it offers original package information, such as its location and time requirements, as well as task-event information, which records when and where the courier is while events such as task-accept and task-finish events happen. (3) Diversity: the dataset includes data from various scenarios, such as package pick-up and delivery, and from multiple cities, each with its unique spatio-temporal patterns due to their distinct characteristics such as populations. ![LaDe.png](./img/LaDe.png) # 2. Download LaDe is composed of two subdatasets: i) [LaDe-D](https://huggingface.co/datasets/Cainiao-AI/LaDe-D), which comes from the package delivery scenario. ii) [LaDe-P](https://huggingface.co/datasets/Cainiao-AI/LaDe-P), which comes from the package pickup scenario. To facilitate the utilization of the dataset, each sub-dataset is presented in CSV format. LaDe can be used for research purposes. Before you download the dataset, please read these terms. And [Code link](https://github.com/wenhaomin/LaDe). Then put the data into "./data/raw/". The structure of "./data/raw/" should be like: ``` * ./data/raw/ * delivery * delivery_sh.csv * ... * pickup * pickup_sh.csv * ... * road-network * roads.csv * data_with_trajectory_20s * courier_detailed_trajectory_20s.pkl.xz ``` road-network/roads.csv records the road network of the five cities. data_with_trajectory_20s/* records the trajectory of courier. ```python import pandas as pd >>> import pandas as pd >>> df = pd.read_pickle("courier_detailed_trajectory_20s.pkl.xz") >>> df.head(3) ds postman_id gps_time lat lng 0 321 106f5ac22cfd1574b196d16fed62f90d 03-21 07:31:58 3.953700e+06 3.053400e+06 1 321 106f5ac22cfd1574b196d16fed62f90d 03-21 07:32:18 3.953700e+06 3.053398e+06 2 321 106f5ac22cfd1574b196d16fed62f90d 03-21 07:32:41 3.953700e+06 3.053398e+06 ``` Each sub-dataset (delivery, pickup) contains 5 CSV files, with each representing the data from a specific city, the detail of each city can be find in the following table. | City | Description | |------------|----------------------------------------------------------------------------------------------| | Shanghai | One of the most prosperous cities in China, with a large number of orders per day. | | Hangzhou | A big city with well-developed online e-commerce and a large number of orders per day. | | Chongqing | A big city with complicated road conditions in China, with a large number of orders. | | Jilin | A middle-size city in China, with a small number of orders each day. | | Yantai | A small city in China, with a small number of orders every day. | # 3. Description Below is the detailed field of each sub-dataset. ## 3.1 LaDe-P | Data field | Description | Unit/format | |----------------------------|----------------------------------------------|--------------| | **Package information** | | | | package_id | Unique identifier of each package | Id | | time_window_start | Start of the required time window | Time | | time_window_end | End of the required time window | Time | | **Stop information** | | | | lng/lat | Coordinates of each stop | Float | | city | City | String | | region_id | Id of the Region | String | | aoi_id | Id of the AOI (Area of Interest) | Id | | aoi_type | Type of the AOI | Categorical | | **Courier Information** | | | | courier_id | Id of the courier | Id | | **Task-event Information** | | | | accept_time | The time when the courier accepts the task | Time | | accept_gps_time | The time of the GPS point closest to accept time | Time | | accept_gps_lng/lat | Coordinates when the courier accepts the task | Float | | pickup_time | The time when the courier picks up the task | Time | | pickup_gps_time | The time of the GPS point closest to pickup_time | Time | | pickup_gps_lng/lat | Coordinates when the courier picks up the task | Float | | **Context information** | | | | ds | The date of the package pickup | Date | ## 3.2 LaDe-D | Data field | Description | Unit/format | |-----------------------|--------------------------------------|---------------| | **Package information** | | | | package_id | Unique identifier of each package | Id | | **Stop information** | | | | lng/lat | Coordinates of each stop | Float | | city | City | String | | region_id | Id of the region | Id | | aoi_id | Id of the AOI | Id | | aoi_type | Type of the AOI | Categorical | | **Courier Information** | | | | courier_id | Id of the courier | Id | | **Task-event Information**| | | | accept_time | The time when the courier accepts the task | Time | | accept_gps_time | The time of the GPS point whose time is the closest to accept time | Time | | accept_gps_lng/accept_gps_lat | Coordinates when the courier accepts the task | Float | | delivery_time | The time when the courier finishes delivering the task | Time | | delivery_gps_time | The time of the GPS point whose time is the closest to the delivery time | Time | | delivery_gps_lng/delivery_gps_lat | Coordinates when the courier finishes the task | Float | | **Context information** | | | | ds | The date of the package delivery | Date | # 4. Leaderboard Blow shows the performance of different methods in Shanghai. ## 4.1 Route Prediction Experimental results of route prediction. We use bold and underlined fonts to denote the best and runner-up model, respectively. | Method | HR@3 | KRC | LSD | ED | |--------------|--------------|--------------|-------------|-------------| | TimeGreedy | 57.65 | 31.81 | 5.54 | 2.15 | | DistanceGreedy | 60.77 | 39.81 | 5.54 | 2.15 | | OR-Tools | 66.21 | 47.60 | 4.40 | 1.81 | | LightGBM | 73.76 | 55.71 | 3.01 | 1.84 | | FDNET | 73.27 ± 0.47 | 53.80 ± 0.58 | 3.30 ± 0.04 | 1.84 ± 0.01 | | DeepRoute | 74.68 ± 0.07 | 56.60 ± 0.16 | 2.98 ± 0.01 | 1.79 ± 0.01 | | Graph2Route | 74.84 ± 0.15 | 56.99 ± 0.52 | 2.86 ± 0.02 | 1.77 ± 0.01 | ## 4.2 Estimated Time of Arrival Prediction | Method | MAE | RMSE | ACC@30 | | ------ |--------------|--------------|-------------| | LightGBM | 30.99 | 35.04 | 0.59 | | SPEED | 23.75 | 27.86 | 0.73 | | KNN | 36.00 | 31.89 | 0.58 | | MLP | 21.54 ± 2.20 | 25.05 ± 2.46 | 0.79 ± 0.04 | | FDNET | 18.47 ± 0.25 | 21.44 ± 0.28 | 0.84 ± 0.01 | ## 4.3 Spatio-temporal Graph Forecasting | Method | MAE | RMSE | |-------|-------------|-------------| | HA | 4.63 | 9.91 | | DCRNN | 3.69 ± 0.09 | 7.08 ± 0.12 | | STGCN | 3.04 ± 0.02 | 6.42 ± 0.05 | | GWNET | 3.16 ± 0.06 | 6.56 ± 0.11 | | ASTGCN | 3.12 ± 0.06 | 6.48 ± 0.14 | | MTGNN | 3.13 ± 0.04 | 6.51 ± 0.13 | | AGCRN | 3.93 ± 0.03 | 7.99 ± 0.08 | | STGNCDE | 3.74 ± 0.15 | 7.27 ± 0.16 | # 5. Citation If you find this helpful, please cite our paper: ```shell @misc{wu2023lade, title={LaDe: The First Comprehensive Last-mile Delivery Dataset from Industry}, author={Lixia Wu and Haomin Wen and Haoyuan Hu and Xiaowei Mao and Yutong Xia and Ergang Shan and Jianbin Zhen and Junhong Lou and Yuxuan Liang and Liuqing Yang and Roger Zimmermann and Youfang Lin and Huaiyu Wan}, year={2023}, eprint={2306.10675}, archivePrefix={arXiv}, primaryClass={cs.DB} } ```

许可证:Apache-2.0 标签: - 物流(Logistics) - 最后一公里配送(Last-mile Delivery) - 时空(Spatial-Temporal) - 图(Graph) 样本量类别: - 1000万 < 样本量 < 1亿 数据集下载地址:https://huggingface.co/datasets/Cainiao-AI/LaDe/tree/main 数据集官网:https://cainiaotechai.github.io/LaDe-website/ 代码仓库:https://github.com/wenhaomin/LaDe 论文链接:https://arxiv.org/abs/2306.10675 # 1. 数据集简介 **LaDe**是首个面向工业场景的公开最后一公里配送数据集,收录了数百万真实包裹数据。其具备三大独特优势:(1)大规模属性:覆盖6个月真实运营周期内,2.1万名配送员的1067.7万单包裹数据;(2)信息维度全面:既包含包裹原始信息(如位置、时间要求),也包含任务事件信息,可记录配送员在接单、完成任务等事件发生时的时空状态;(3)场景多样性:涵盖包裹揽收、配送等多种业务场景,同时包含来自5个不同城市的数据——各城市因人口等特征差异,具备独特的时空分布模式。 ![LaDe.png](./img/LaDe.png) # 2. 下载说明 LaDe包含两个子数据集:i) [LaDe-D](https://huggingface.co/datasets/Cainiao-AI/LaDe-D),对应包裹配送场景;ii) [LaDe-P](https://huggingface.co/datasets/Cainiao-AI/LaDe-P),对应包裹揽收场景。为便于使用,两个子数据集均以CSV格式存储。 本数据集仅可用于学术研究,下载前请阅读相关使用条款。代码仓库链接见前文,需将解压后的数据放入`./data/raw/`目录下,该目录的标准结构如下: * ./data/raw/ * delivery * delivery_sh.csv * ... * pickup * pickup_sh.csv * ... * road-network * roads.csv * data_with_trajectory_20s * courier_detailed_trajectory_20s.pkl.xz 其中`road-network/roads.csv`记录了5个城市的道路网络数据;`data_with_trajectory_20s/*`记录了配送员的轨迹数据。 以下为Python读取轨迹数据的示例代码: python import pandas as pd >>> import pandas as pd >>> df = pd.read_pickle("courier_detailed_trajectory_20s.pkl.xz") >>> df.head(3) ds postman_id gps_time lat lng 0 321 106f5ac22cfd1574b196d16fed62f90d 03-21 07:31:58 3.953700e+06 3.053400e+06 1 321 106f5ac22cfd1574b196d16fed62f90d 03-21 07:32:18 3.953700e+06 3.053398e+06 2 321 106f5ac22cfd1574b196d16fed62f90d 03-21 07:32:41 3.953700e+06 3.053398e+06 每个子数据集(delivery、pickup)均包含5个CSV文件,分别对应一个城市的数据,各城市详情如下表所示: | 城市 | 描述 | |------------|----------------------------------------------------------------------------------------------| | 上海(Shanghai) | 中国最繁华的城市之一,每日订单量庞大。 | | 杭州(Hangzhou) | 电子商务产业发达的特大城市,每日订单量可观。 | | 重庆(Chongqing) | 中国道路条件复杂的特大城市,订单规模较大。 | | 吉林(Jilin) | 中国中型城市,每日订单量较少。 | | 烟台(Yantai) | 中国小型城市,每日订单量稀少。 | # 3. 数据字段详情 ## 3.1 LaDe-P | 数据字段 | 描述 | 单位/格式 | |----------------------------|----------------------------------------------|--------------| | **包裹信息** | | | | package_id | 每个包裹的唯一标识符 | 编号(Id) | | time_window_start | 要求的时间窗口起始时间 | 时间(Time) | | time_window_end | 要求的时间窗口结束时间 | 时间(Time) | | **停靠点信息** | | | | lng/lat | 每个停靠点的坐标 | 浮点数(Float) | | city | 所属城市 | 字符串(String) | | region_id | 区域编号 | 编号(String) | | aoi_id | 兴趣区域(Area of Interest, AOI)编号 | 编号(Id) | | aoi_type | 兴趣区域类型 | 分类变量(Categorical) | | **配送员信息** | | | | courier_id | 配送员编号 | 编号(Id) | | **任务事件信息** | | | | accept_time | 配送员接单的时间 | 时间(Time) | | accept_gps_time | 与接单时刻最接近的GPS点对应的时间 | 时间(Time) | | accept_gps_lng/lat | 配送员接单时的坐标 | 浮点数(Float) | | pickup_time | 配送员完成揽件的时间 | 时间(Time) | | pickup_gps_time | 与揽件时刻最接近的GPS点对应的时间 | 时间(Time) | | pickup_gps_lng/lat | 配送员揽件时的坐标 | 浮点数(Float) | | **上下文信息** | | | | ds | 包裹揽件日期 | 日期(Date) | ## 3.2 LaDe-D | 数据字段 | 描述 | 单位/格式 | |-----------------------|--------------------------------------|---------------| | **包裹信息** | | | | package_id | 每个包裹的唯一标识符 | 编号(Id) | | **停靠点信息** | | | | lng/lat | 每个停靠点的坐标 | 浮点数(Float) | | city | 所属城市 | 字符串(String) | | region_id | 区域编号 | 编号(Id) | | aoi_id | 兴趣区域(Area of Interest, AOI)编号 | 编号(Id) | | aoi_type | 兴趣区域类型 | 分类变量(Categorical) | | **配送员信息** | | | | courier_id | 配送员编号 | 编号(Id) | | **任务事件信息**| | | | accept_time | 配送员接单的时间 | 时间(Time) | | accept_gps_time | 与接单时刻最接近的GPS点对应的时间 | 时间(Time) | | accept_gps_lng/accept_gps_lat | 配送员接单时的坐标 | 浮点数(Float) | | delivery_time | 配送员完成配送的时间 | 时间(Time) | | delivery_gps_time | 与配送完成时刻最接近的GPS点对应的时间 | 时间(Time) | | delivery_gps_lng/delivery_gps_lat | 配送员完成配送时的坐标 | 浮点数(Float) | | **上下文信息** | | | | ds | 包裹配送日期 | 日期(Date) | # 4. 性能排行榜 ## 4.1 路线预测 以下为路线预测任务的实验结果,我们使用粗体下划线分别标注最优模型和次优模型。 | 方法 | HR@3 | KRC | LSD | ED | |--------------|--------------|--------------|-------------|-------------| | TimeGreedy | 57.65 | 31.81 | 5.54 | 2.15 | | DistanceGreedy | 60.77 | 39.81 | 5.54 | 2.15 | | OR-Tools | 66.21 | 47.60 | 4.40 | 1.81 | | LightGBM | 73.76 | 55.71 | 3.01 | 1.84 | | FDNET | 73.27 ± 0.47 | 53.80 ± 0.58 | 3.30 ± 0.04 | 1.84 ± 0.01 | | DeepRoute | 74.68 ± 0.07 | 56.60 ± 0.16 | 2.98 ± 0.01 | 1.79 ± 0.01 | | Graph2Route | 74.84 ± 0.15 | 56.99 ± 0.52 | 2.86 ± 0.02 | 1.77 ± 0.01 | ## 4.2 预计到达时间预估 以下为到达时间预估任务的实验结果: | 方法 | 平均绝对误差(Mean Absolute Error, MAE) | 均方根误差(Root Mean Squared Error, RMSE) | 准确率@30(ACC@30) | | ------ |--------------|--------------|-------------| | LightGBM | 30.99 | 35.04 | 0.59 | | SPEED | 23.75 | 27.86 | 0.73 | | KNN | 36.00 | 31.89 | 0.58 | | MLP | 21.54 ± 2.20 | 25.05 ± 2.46 | 0.79 ± 0.04 | | FDNET | 18.47 ± 0.25 | 21.44 ± 0.28 | 0.84 ± 0.01 | ## 4.3 时空图预测 以下为时空图预测任务的实验结果: | 方法 | 平均绝对误差(Mean Absolute Error, MAE) | 均方根误差(Root Mean Squared Error, RMSE) | |-------|-------------|-------------| | HA | 4.63 | 9.91 | | DCRNN | 3.69 ± 0.09 | 7.08 ± 0.12 | | STGCN | 3.04 ± 0.02 | 6.42 ± 0.05 | | GWNET | 3.16 ± 0.06 | 6.56 ± 0.11 | | ASTGCN | 3.12 ± 0.06 | 6.48 ± 0.14 | | MTGNN | 3.13 ± 0.04 | 6.51 ± 0.13 | | AGCRN | 3.93 ± 0.03 | 7.99 ± 0.08 | | STGNCDE | 3.74 ± 0.15 | 7.27 ± 0.16 | # 5. 引用方式 若您使用本数据集,请引用以下论文: shell @misc{wu2023lade, title={LaDe: The First Comprehensive Last-mile Delivery Dataset from Industry}, author={Lixia Wu and Haomin Wen and Haoyuan Hu and Xiaowei Mao and Yutong Xia and Ergang Shan and Jianbin Zhen and Junhong Lou and Yuxuan Liang and Liuqing Yang and Roger Zimmermann and Youfang Lin and Huaiyu Wan}, year={2023}, eprint={2306.10675}, archivePrefix={arXiv}, primaryClass={cs.DB} }
提供机构:
pppdlhh
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作