PI2I/PI2I
收藏Hugging Face2026-01-30 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/PI2I/PI2I
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
tags:
- recommender-system
size_categories:
- 1B<n<10B
---
# Dataset Overview
The dataset presented in our paper *"PI2I: A Personalized Item-Based Collaborative Filtering Retrieval Framework"*, which has been accepted by the **Industry Track of TheWebConf 2026**, comprises **130 million real-world user-item interactions** collected from Taobao.
Below is a summary of key statistics (<time,userid,itemid>):
| Description | Value |
|---------------------------------------------|---------------|
| Total number of interactions (rows) | 130,828,023 |
| Number of distinct users (`userid`) | 705,647 |
| *Note:* Slight discrepancies may exist compared to the values reported in the paper due to hash collisions. | |
| Number of distinct items (`itemid`) | 20,351,625 |
| *Note:* Slight discrepancies may exist compared to the values reported in the paper due to hash collisions. | |
| Time span | 23 days |
| Average user interaction count | 185 |
| Maximum user interaction count | 20,894 |
| Minimum user interaction count | 1 |
| Sparsity | 99.9% |
| *(calculated as $1 - \frac{130,828,023}{20,351,625 \times 705,647}$)* | |
Please cite the following paper if you find our code helpful:
@article{wang2026pi2i,
title={PI2I: A Personalized Item-Based Collaborative Filtering Retrieval Framework},
author={Wang, Shaoqing and Ma, Yingcai and Fu, Kairui and Wang, Ziyang and Huang, Dunxian and Yan, Yuliang and Wu, Jian},
journal={arXiv preprint arXiv:2601.16815},
year={2026}
}
提供机构:
PI2I



