BuzzCity mobile advertisement dataset
收藏Mendeley Data2024-01-31 更新2024-06-28 收录
下载链接:
https://researchdata.smu.edu.sg/articles/dataset/BuzzCity_mobile_advertisement_dataset/12062703/1
下载链接
链接失效反馈官方服务:
资源简介:
This competition involves advertisement data provided by BuzzCity Pte. Ltd. BuzzCity is a global mobile advertising network that has millions of consumers around the world on mobile phones and devices. In Q1 2012, over 45 billion ad banners were delivered across the BuzzCity network consisting of more than 10,000 publisher sites which reach an average of over 300 million unique users per month. The number of smartphones active on the network has also grown significantly. Smartphones now account for more than 32% phones that are served advertisements across the BuzzCity network. The "raw" data used in this competition has two types: publisher database and click database, both provided in CSV format. The publisher database records the publisher's (aka partner's) profile and comprises several fields: publisherid - Unique identifier of a publisher. Bankaccount - Bank account associated with a publisher (may be empty) address - Mailing address of a publisher (obfuscated; may be empty) status - Label of a publisher, which can be the following: "OK" - Publishers whom BuzzCity deems as having healthy traffic (or those who slipped their detection mechanisms) "Observation" - Publishers who may have just started their traffic or their traffic statistics deviates from system wide average. BuzzCity does not have any conclusive stand with these publishers yet "Fraud" - Publishers who are deemed as fraudulent with clear proof. Buzzcity suspends their accounts and their earnings will not be paid On the other hand, the click database records the click traffics and has several fields: id - Unique identifier of a particular click numericip - Public IP address of a clicker/visitor deviceua - Phone model used by a clicker/visitor publisherid - Unique identifier of a publisher adscampaignid - Unique identifier of a given advertisement campaign usercountry - Country from which the surfer is clicktime - Timestamp of a given click (in YYYY-MM-DD format) publisherchannel - Publisher's channel type, which can be the following: ad - Adult sites co - Community es - Entertainment and lifestyle gd - Glamour and dating in - Information mc - Mobile content pp - Premium portal se - Search, portal, services referredurl - URL where the ad banners were clicked (obfuscated; may be empty). More details about the HTTP Referer protocol can be found in this article. Related Publication: R. J. Oentaryo, E.-P. Lim, M. Finegold, D. Lo, F.-D. Zhu, C. Phua, E.-Y. Cheu, G.-E. Yap, K. Sim, M. N. Nguyen, K. Perera, B. Neupane, M. Faisal, Z.-Y. Aung, W. L. Woon, W. Chen, D. Patel, and D. Berrar. (2014). Detecting click fraud in online advertising: A data mining approach, Journal of Machine Learning Research, 15, 99-140.
本竞赛使用由BuzzCity Pte. Ltd.提供的广告数据。BuzzCity是一家全球移动广告网络服务商,在全球移动设备端拥有数百万消费者用户。2012年第一季度,BuzzCity广告网络累计投放广告横幅超450亿次,覆盖超过1万家发布商站点,月均独立触达用户超3亿。该网络的活跃智能手机用户占比亦显著提升,目前在其广告投放的移动设备中,智能手机占比已超过32%。
本次竞赛使用的"原始"数据包含两类,均以逗号分隔值(CSV)格式存储,分别为发布商数据库与点击数据库。
发布商数据库用于记录发布商(亦称合作方)的档案信息,包含以下字段:
- publisherid:发布商的唯一标识符
- Bankaccount:绑定至发布商的银行账户信息(可为空)
- address:发布商的邮寄地址(已做混淆处理,可为空)
- status:发布商的状态标签,可分为以下三类:
1. "OK":BuzzCity认定流量健康的发布商,或成功绕过其检测机制的发布商
2. "Observation":刚开展流量业务或流量统计值偏离系统全局均值的发布商,BuzzCity暂未对其作出明确判定
3. "Fraud":有明确证据认定存在欺诈行为的发布商,BuzzCity将暂停其账户且不予结算收益
另一方面,点击数据库用于记录点击流量信息,包含以下字段:
- id:单次点击的唯一标识符
- numericip:点击者/访客设备的公网IP地址
- ua:点击者/访客使用的手机型号
- publisherid:对应发布商的唯一标识符
- adscampaignid:对应广告活动的唯一标识符
- usercountry:访客所在国家
- clicktime:单次点击的时间戳(格式为YYYY-MM-DD)
- publisherchannel:发布商的渠道类型,可分为以下类别:
- ad:成人内容站点
- co:社区类站点
- es:娱乐与生活方式类站点
- gd:时尚美妆与约会类站点
- in:资讯类站点
- mc:移动内容类站点
- pp:高端门户网站
- se:搜索、门户网站及服务类站点
- referredurl:广告横幅被点击时所在的来源URL(已做混淆处理,可为空)。有关HTTP Referer协议的更多细节可参阅此文。
相关出版物:R. J. Oentaryo、E.-P. Lim、M. Finegold、D. Lo、F.-D. Zhu、C. Phua、E.-Y. Cheu、G.-E. Yap、K. Sim、M. N. Nguyen、K. Perera、B. Neupane、M. Faisal、Z.-Y. Aung、W. L. Woon、W. Chen、D. Patel、D. Berrar. (2014). 《在线广告点击欺诈检测:一种数据挖掘方法》,《机器学习研究期刊》,15卷,99-140页。
创建时间:
2024-01-31
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含BuzzCity移动广告网络的发布者信息和点击流量数据,用于分析和检测点击欺诈。发布者数据库包含发布者ID、银行账户、地址和状态等信息,点击数据库包含点击ID、设备、国家、时间等详细信息。数据集适用于数据挖掘和模式识别研究。
以上内容由遇见数据集搜集并总结生成



