电商推荐“抱大腿”攻击识别数据
收藏阿里云天池2026-06-09 更新2024-03-07 收录
下载链接:
https://tianchi.aliyun.com/dataset/123862
下载链接
链接失效反馈官方服务:
资源简介:
随着互联网的发展,网购成为越来越多人的选择,据阿里巴巴财报显示,2020财年阿里巴巴网站成交总额突破一万亿美元,全球年度活跃消费者达9.60亿。
为了满足不同用户的个性化需求,电商平台会根据用户的兴趣爱好推荐合适的商品,从而实现商品排序的千人千面需求。推荐系统常见的召回路径有U2I(User-Item)、I2I(Item-Item)等。特别的,在推荐场景中,为了更好的提升推荐的时效性与准确性,平台会基于全网的用户行为信息进行实时的 U2I 及 I2I 的更新,并且基于用户最近的行为信息进行相关性的推荐。
为了获取更多的平台流量曝光,将自己的商品展现在更多的消费者面前,部分商家通过HACK平台的推荐机制从而增加商品的曝光机会。其中一种典型的手法为“抱大腿”攻击,该方法通过雇佣一批恶意用户协同点击目标商品和爆款商品,从而建立目标商品与爆款商品之间的关联关系,提升目标商品与爆款商品之间的I2I关联分。商家通过这种方式诱导用户以爆款的心理预期购买名不符实的商品,不仅损害了消费者的利益,降低其购物体验,还影响了平台和其他商家的信誉,严重扰乱了平台的公平性。实时拦截此类行为,有助于在保证推荐的时效性的同时,保护实时推荐系统不受恶意攻击影响。
如何准确、高效地识别这类型的恶意流量攻击,实时过滤恶意的点击数据是推荐系统中迫切需要解决的问题。
<br/>本数据集来源于第三届Apache Flink 极客挑战赛暨AAIG CUP——电商推荐“抱大腿”攻击识别大赛:https://tianchi.aliyun.com/competition/entrance/531925/introduction
With the development of the Internet, online shopping has become an increasingly popular choice for consumers. According to Alibaba's financial reports, the gross merchandise volume (GMV) of Alibaba's platforms exceeded one trillion US dollars in fiscal year 2020, with 960 million annual active consumers worldwide.
To meet the personalized needs of different users, e-commerce platforms recommend suitable products based on users' interests, so as to achieve the requirement of personalized product ranking tailored for each individual user. Common recall pathways of recommendation systems include U2I (User-Item) and I2I (Item-Item), etc. Specifically, in recommendation scenarios, to better improve the timeliness and accuracy of recommendations, platforms perform real-time updates of U2I and I2I mappings based on user behavior data across the entire network, and conduct relevance-based recommendations using users' recent behavioral information.
To gain more platform traffic exposure and display their products to more consumers, some merchants attempt to hack the platform's recommendation mechanism to increase their product exposure opportunities. One typical tactic is the so-called "hug the thigh" attack. This method involves hiring a group of malicious users to collaboratively click on both the target product and a popular hot-selling item, thereby establishing a correlation between the target product and the hot-selling item and raising their I2I correlation score. By using this approach, merchants induce users to purchase products that fail to meet the quality expectations set by the hot-selling items, which not only harms consumers' interests and deteriorates their shopping experience, but also undermines the credibility of the platform and other legitimate merchants, seriously disrupting the platform's fairness. Real-time interception of such behaviors helps protect real-time recommendation systems from malicious attacks while preserving the timeliness of recommendations.
Accurately and efficiently identifying this type of malicious traffic attack and filtering malicious click data in real time is an urgent problem to be addressed in recommendation systems.
This dataset is sourced from the 3rd Apache Flink Geek Challenge and AAIG CUP – E-commerce Recommendation "Hug the Thigh" Attack Recognition Competition: https://tianchi.aliyun.com/competition/entrance/531925/introduction
提供机构:
阿里云天池
创建时间:
2022-03-23
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集专注于电商推荐系统中的“抱大腿”攻击识别,来源于Apache Flink极客挑战赛,旨在通过恶意用户协同点击行为检测来保护推荐公平性。数据集包含初赛和复赛的在线与离线数据,提供用户行为、商品及用户特征,用于模型训练和预测,以解决实时风控场景中的半监督学习问题。数据格式清晰,包括标签指示正常或恶意行为,适用于推荐系统安全研究。
以上内容由遇见数据集搜集并总结生成



