电商推进场景下的召回数据
收藏浙江省数据知识产权登记平台2024-03-14 更新2024-05-08 收录
下载链接:
https://www.zjip.org.cn/home/announce/trends/30899
下载链接
链接失效反馈官方服务:
资源简介:
该数据的适用条件包括平台拥有大规模的商品信息和用户数据,并且能够追踪和记录用户的详尽历史交互信息。数据的范围主要涵盖了商品数据库、用户个人信息以及详尽的用户交互行为数据。主要应用对象为电商平台的用户群体和平台运营方。 通过算法模型对用户交互行为的分析,能够准确地锁定目标用户,提高召回效率,减少客户流失,从而实现商业价值的最大化。商品侧:itemId 作为id特征进行embedding操作,item_name、item_desc做文本的嵌入表征,price做分桶操作后得到离散特征同时也做embedding操作,category作为类别特征做embedding操作;用户侧:userId作为id特征进行embedding操作,用户的性别和省份作为类别特征做embedding操作,用户的年龄需要分桶后做为离散特征和分桶操作,此外用户的各种行为序列包括(view_history、click_history、collect_history、buy_history)都会做embeddding操作,在这些特征处理的基础上使用MMOE模型,训练模型后使用模型推断,加权输出获得线上排序得分。
在获得基础排序得分的基础上还需要进行多样性打散和流量调控等操作。对于多样性打散而言,主要是对获取的商品向量表征,基于DPP多样性模型进行排序的重调整;对于流量调控而言,主要是对获取实施的商品曝光、点击等数据进行基于流量目标的调整,使用强化学习Q-Learning 以及 PID控制技术进行流量调控,确保不同类型商品的曝光量符合整体策略。
This dataset is applicable to platforms that possess large-scale commodity and user datasets, and are capable of tracking and recording users’ detailed historical interaction records. The scope of the dataset mainly covers commodity databases, user personal information, and detailed user interaction behavior data. The primary target users are e-commerce platform users and platform operators.
By analyzing user interaction behaviors via algorithmic models, target users can be accurately identified, recall efficiency can be improved, customer churn can be reduced, thereby maximizing commercial value.
For commodity-side features: itemId is used as an ID feature for embedding operations; item_name and item_desc are used for text embedding representation; price is discretized via bucketing to obtain discrete features, which are then subjected to embedding operations; category, as a categorical feature, is processed with embedding.
For user-side features: userId is used as an ID feature for embedding operations; the user’s gender and province are categorical features processed with embedding; the user’s age needs to be discretized via bucketing, with the binned results used as discrete features followed by embedding. In addition, various user behavior sequences including view_history, click_history, collect_history, and buy_history will all undergo embedding operations.
Based on these processed features, the MMoE (Multi-gate Mixture-of-Experts) model is adopted for model training. After completing model training, inference is performed, and a weighted output is used to obtain the online ranking score.
On the basis of obtaining the basic ranking score, additional operations such as diversity re-ranking and traffic control are required. For diversity re-ranking, the obtained commodity vector representations are re-adjusted for ranking based on the DPP (Determinantal Point Process) diversity model. For traffic control, adjustments based on traffic targets are performed on the obtained commodity exposure, click and other related data. Reinforcement learning Q-Learning and PID control technologies are used for traffic regulation, ensuring that the exposure volume of different types of commodities conforms to the overall platform strategy.
提供机构:
杭州网易再顾科技有限公司
创建时间:
2023-12-04
搜集汇总
数据集介绍

特点
该数据集包含电商平台的商品信息、用户信息及用户历史交互数据,共101条,按需更新。适用于电商平台用户群体和运营方,通过算法模型分析用户行为提高召回效率。算法涉及特征嵌入、多样性打散和流量调控等技术。
以上内容由遇见数据集搜集并总结生成



