PI2I/PI2I

Name: PI2I/PI2I
Creator: 阿里巴巴集团; 浙江大学
Published: 2026-01-23 23:10:39
License: 暂无描述

arXiv2026-01-23 更新2026-01-27 收录

下载链接：

https://huggingface.co/datasets/PI2I/PI2I

下载链接

链接失效反馈

官方服务：

资源简介：

PI2I数据集由阿里巴巴集团构建并开源，是一个大规模推荐系统基准数据集，包含来自淘宝平台的1.3亿条真实用户交互记录。该数据集通过用户行为日志（如点击流）构建，采用Swing算法计算商品间相似度，并创新性地引入触发-目标关系进行负采样。数据集主要应用于个性化推荐系统研究，旨在解决传统协同过滤方法在截断策略和用户-商品交互建模方面的局限性，为推荐算法在召回阶段的性能优化提供评估基准。

The PI2I dataset, constructed and open-sourced by Alibaba Group, is a large-scale benchmark dataset for recommender systems, containing 130 million real user interaction records from the Taobao platform. It is built based on user behavior logs such as clickstreams, adopts the Swing algorithm to calculate item similarity, and innovatively introduces the trigger-target relationship for negative sampling. This dataset is mainly used for personalized recommender system research, aiming to address the limitations of traditional collaborative filtering methods in truncation strategies and user-item interaction modeling, and provides an evaluation benchmark for optimizing the performance of recommendation algorithms in the recall phase.

提供机构：

阿里巴巴集团; 浙江大学

创建时间：

2026-01-23

原始信息汇总

数据集概述

基本信息

数据集名称: PI2I
来源论文: 《PI2I: A Personalized Item-Based Collaborative Filtering Retrieval Framework》
论文状态: 已被TheWebConf 2026工业轨道接受
数据来源: 淘宝
数据内容: 真实的用户-商品交互数据
许可协议: Apache-2.0