Johnson567/FreshRetailNet-50K
收藏Hugging Face2026-01-29 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Johnson567/FreshRetailNet-50K
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
task_categories:
- time-series-forecasting
tags:
- fresh-retail
- censored-demand
- hourly-stock-status
size_categories:
- 1M<n<10M
pretty_name: FreshRetailNet-50K
configs:
- config_name: default
data_files:
- split: train
path: data/train.parquet
- split: eval
path: data/eval.parquet
---
# FreshRetailNet-50K
## Dataset Overview
FreshRetailNet-50K is the first large-scale benchmark for censored demand estimation in the fresh retail domain, **incorporating approximately 20% organically occurring stockout data**. It comprises 50,000 store-product 90-day time series of detailed hourly sales data from 898 stores in 18 major cities, encompassing 865 perishable SKUs with meticulous stockout event annotations. The hourly stock status records unique to this dataset, combined with rich contextual covariates including promotional discounts, precipitation, and other temporal features, enable innovative research beyond existing solutions.
- [Technical Report](https://arxiv.org/abs/2505.16319) - Discover the methodology and technical details behind FreshRetailNet-50K.
- [Github Repo](https://github.com/Dingdong-Inc/frn-50k-baseline) - Access the complete pipeline used to train and evaluate.
This dataset is ready for commercial/non-commercial use.
## Data Fields
|Field|Type|Description|
|:---|:---|:---|
|city_id|int64|The encoded city id|
|store_id|int64|The encoded store id|
|management_group_id|int64|The encoded management group id|
|first_category_id|int64|The encoded first category id|
|second_category_id|int64|The encoded second category id|
|third_category_id|int64|The encoded third category id|
|product_id|int64|The encoded product id|
|dt|string|The date|
|sale_amount|float64|The daily sales amount after global normalization (Multiplied by a specific coefficient)|
|hours_sale|Sequence(float64)|The hourly sales amount after global normalization (Multiplied by a specific coefficient)|
|stock_hour6_22_cnt|int32|The number of out-of-stock hours between 6:00 and 22:00|
|hours_stock_status|Sequence(int32)|The hourly out-of-stock status|
|discount|float64|The discount rate (1.0 means no discount, 0.9 means 10% off)|
|holiday_flag|int32|Holiday indicator|
|activity_flag|int32|Activity indicator|
|precpt|float64|The total precipitation|
|avg_temperature|float64|The average temperature|
|avg_humidity|float64|The average humidity|
|avg_wind_level|float64|The average wind force|
### Hierarchical structure
- **warehouse**: city_id > store_id
- **product category**: management_group_id > first_category_id > second_category_id > third_category_id > product_id
## How to use it
You can load the dataset with the following lines of code.
```python
from datasets import load_dataset
dataset = load_dataset("Dingdong-Inc/FreshRetailNet-50K")
print(dataset)
```
```text
DatasetDict({
train: Dataset({
features: ['city_id', 'store_id', 'management_group_id', 'first_category_id', 'second_category_id', 'third_category_id', 'product_id', 'dt', 'sale_amount', 'hours_sale', 'stock_hour6_22_cnt', 'hours_stock_status', 'discount', 'holiday_flag', 'activity_flag', 'precpt', 'avg_temperature', 'avg_humidity', 'avg_wind_level'],
num_rows: 4500000
})
eval: Dataset({
features: ['city_id', 'store_id', 'management_group_id', 'first_category_id', 'second_category_id', 'third_category_id', 'product_id', 'dt', 'sale_amount', 'hours_sale', 'stock_hour6_22_cnt', 'hours_stock_status', 'discount', 'holiday_flag', 'activity_flag', 'precpt', 'avg_temperature', 'avg_humidity', 'avg_wind_level'],
num_rows: 350000
})
})
```
## License/Terms of Use
This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0) available at https://creativecommons.org/licenses/by/4.0/legalcode.
**Data Developer:** Dingdong-Inc
### Use Case: <br>
Developers researching latent demand recovery and demand forecasting techniques. <br>
### Release Date: <br>
05/08/2025 <br>
## Data Version
1.0 (05/08/2025)
## Intended use
The FreshRetailNet-50K Dataset is intended to be freely used by the community to continue to improve latent demand recovery and demand forecasting techniques.
**However, for each dataset an user elects to use, the user is responsible for checking if the dataset license is fit for the intended purpose**.
## Citation
If you find the data useful, please cite:
```
@article{2025freshretailnet-50k,
title={FreshRetailNet-50K: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail},
author={Yangyang Wang, Jiawei Gu, Li Long, Xin Li, Li Shen, Zhouyu Fu, Xiangjun Zhou, Xu Jiang},
year={2025},
eprint={2505.16319},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.16319},
}
```
语言:
- 英语
许可证:CC BY 4.0
任务类别:
- 时间序列预测
标签:
- 生鲜零售
- 删失需求(censored demand)
- 小时库存状态
数据规模:
- 100万 < 样本量 < 1000万
规范名称:FreshRetailNet-50K
配置项:
- 配置名称:默认
数据文件:
- 划分集:训练集
路径:data/train.parquet
- 划分集:验证集
路径:data/eval.parquet
# FreshRetailNet-50K
## 数据集概览
FreshRetailNet-50K是生鲜零售领域首个面向删失需求估计的大规模基准数据集,包含约20%自然产生的售罄数据。该数据集涵盖来自18个核心城市898家门店的5万条门店-商品90天时长的细粒度小时级销售时序数据,包含865个易腐SKU,并附带精细标注的售罄事件信息。本数据集独有的小时级库存状态记录,结合促销折扣、降水量等丰富的上下文协变量与时序特征,可支撑现有方案之外的创新性研究。
- [技术报告](https://arxiv.org/abs/2505.16319):了解FreshRetailNet-50K背后的方法论与技术细节。
- [GitHub仓库](https://github.com/Dingdong-Inc/frn-50k-baseline):获取用于模型训练与评估的完整流程代码。
本数据集可用于商业与非商业用途。
## 数据字段
| 字段 | 数据类型 | 描述 |
|:---|:---|:---|
| city_id | int64 | 编码后的城市ID |
| store_id | int64 | 编码后的门店ID |
| management_group_id | int64 | 编码后的管理组ID |
| first_category_id | int64 | 编码后的一级品类ID |
| second_category_id | int64 | 编码后的二级品类ID |
| third_category_id | int64 | 编码后的三级品类ID |
| product_id | int64 | 编码后的商品ID |
| dt | string | 日期 |
| sale_amount | float64 | 经过全局归一化后的日销售额(已乘以特定系数) |
| hours_sale | Sequence(float64) | 经过全局归一化后的小时销售额序列(已乘以特定系数) |
| stock_hour6_22_cnt | int32 | 6:00至22:00区间内的售罄小时数 |
| hours_stock_status | Sequence(int32) | 小时级库存状态序列 |
| discount | float64 | 折扣率(1.0代表无折扣,0.9代表九折) |
| holiday_flag | int32 | 节假日标识 |
| activity_flag | int32 | 营销活动标识 |
| precpt | float64 | 总降水量 |
| avg_temperature | float64 | 平均气温 |
| avg_humidity | float64 | 平均湿度 |
| avg_wind_level | float64 | 平均风力等级 |
### 层级结构
- **仓储层级**:城市ID > 门店ID
- **商品品类层级**:管理组ID > 一级品类ID > 二级品类ID > 三级品类ID > 商品ID
## 使用方法
您可通过以下代码加载该数据集:
python
from datasets import load_dataset
dataset = load_dataset("Dingdong-Inc/FreshRetailNet-50K")
print(dataset)
text
DatasetDict({
train: Dataset({
features: ['city_id', 'store_id', 'management_group_id', 'first_category_id', 'second_category_id', 'third_category_id', 'product_id', 'dt', 'sale_amount', 'hours_sale', 'stock_hour6_22_cnt', 'hours_stock_status', 'discount', 'holiday_flag', 'activity_flag', 'precpt', 'avg_temperature', 'avg_humidity', 'avg_wind_level'],
num_rows: 4500000
})
eval: Dataset({
features: ['city_id', 'store_id', 'management_group_id', 'first_category_id', 'second_category_id', 'third_category_id', 'product_id', 'dt', 'sale_amount', 'hours_sale', 'stock_hour6_22_cnt', 'hours_stock_status', 'discount', 'holiday_flag', 'activity_flag', 'precpt', 'avg_temperature', 'avg_humidity', 'avg_wind_level'],
num_rows: 350000
})
})
## 使用许可条款
本数据集采用知识共享署名4.0国际许可协议(CC BY 4.0),详情可访问:https://creativecommons.org/licenses/by/4.0/legalcode
**数据开发方**:Dingdong-Inc
### 应用场景
面向潜在需求恢复与需求预测技术开展研究的开发者。
### 发布日期
2025年8月5日
## 数据版本
1.0(2025年8月5日)
## 预期用途
FreshRetailNet-50K数据集面向全球社区免费开放,用于推动潜在需求恢复与需求预测技术的迭代优化。**但使用者需自行核查数据集许可是否适配其预期用途**。
## 引用声明
如您使用本数据集,请引用以下文献:
@article{2025freshretailnet-50k,
title={FreshRetailNet-50K: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail},
author={Yangyang Wang, Jiawei Gu, Li Long, Xin Li, Li Shen, Zhouyu Fu, Xiangjun Zhou, Xu Jiang},
year={2025},
eprint={2505.16319},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.16319},
}
提供机构:
Johnson567



