BuzzCity mobile advertisement dataset
收藏researchdata.smu.edu.sg2023-05-30 更新2025-01-15 收录
下载链接:
https://researchdata.smu.edu.sg/articles/dataset/BuzzCity_mobile_advertisement_dataset/12062703/1
下载链接
链接失效反馈官方服务:
资源简介:
This competition involves advertisement data provided by BuzzCity Pte. Ltd. BuzzCity is a global mobile advertising network that has millions of consumers around the world on mobile phones and devices. In Q1 2012, over 45 billion ad banners were delivered across the BuzzCity network consisting of more than 10,000 publisher sites which reach an average of over 300 million unique users per month. The number of smartphones active on the network has also grown significantly. Smartphones now account for more than 32% phones that are served advertisements across the BuzzCity network.
The "raw" data used in this competition has two types: publisher database and click database, both provided in CSV format. The publisher database records the publisher's (aka partner's) profile and comprises several fields:
publisherid - Unique identifier of a publisher.
Bankaccount - Bank account associated with a publisher (may be empty)
address - Mailing address of a publisher (obfuscated; may be empty)
status - Label of a publisher, which can be the following:
"OK" - Publishers whom BuzzCity deems as having healthy traffic (or those who slipped their detection mechanisms)
"Observation" - Publishers who may have just started their traffic or their traffic statistics deviates from system wide average. BuzzCity does not have any conclusive stand with these publishers yet
"Fraud" - Publishers who are deemed as fraudulent with clear proof. Buzzcity suspends their accounts and their earnings will not be paid
On the other hand, the click database records the click traffics and has several fields:
id - Unique identifier of a particular click
numericip - Public IP address of a clicker/visitor
deviceua - Phone model used by a clicker/visitor
publisherid - Unique identifier of a publisher
adscampaignid - Unique identifier of a given advertisement campaign
usercountry - Country from which the surfer is
clicktime - Timestamp of a given click (in YYYY-MM-DD format)
publisherchannel - Publisher's channel type, which can be the following:
ad - Adult sites
co - Community
es - Entertainment and lifestyle
gd - Glamour and dating
in - Information
mc - Mobile content
pp - Premium portal
se - Search, portal, services
referredurl - URL where the ad banners were clicked (obfuscated; may be empty). More details about the HTTP Referer protocol can be found in this article.
Related Publication: R. J. Oentaryo, E.-P. Lim, M. Finegold, D. Lo, F.-D. Zhu, C. Phua, E.-Y. Cheu, G.-E. Yap, K. Sim, M. N. Nguyen, K. Perera, B. Neupane, M. Faisal, Z.-Y. Aung, W. L. Woon, W. Chen, D. Patel, and D. Berrar. (2014). Detecting click fraud in online advertising: A data mining approach, Journal of Machine Learning Research, 15, 99-140.
本次竞赛涉及由BuzzCity Pte. Ltd.提供的广告数据。BuzzCity是一家全球性的移动广告网络,其覆盖范围遍布全球,拥有数百万的移动手机和设备用户。2012年第一季度,BuzzCity网络共投放了超过450亿个广告横幅,该网络由超过10,000个出版商网站组成,这些网站每月平均吸引了超过3亿独立用户。网络中活跃的智能手机数量也显著增长,现在智能手机占据了在BuzzCity网络中投放广告的手机的超过32%。
竞赛中所使用的“原始”数据分为两种类型:出版商数据库和点击数据库,均以CSV格式提供。出版商数据库记录了出版商(亦称合作伙伴)的资料,并包含以下字段:
publisherid - 出版商的唯一标识符。
Bankaccount - 与出版商关联的银行账户(可能为空)。
address - 出版商的邮寄地址(已匿名化;可能为空)。
status - 出版商的标签,可以是以下之一:
"OK" - BuzzCity认为拥有健康流量(或那些逃过了其检测机制的出版商)。
"Observation" - 可能刚刚开始流量或其流量统计数据偏离系统平均值的出版商。BuzzCity对这些出版商尚未形成任何结论。
"Fraud" - 被认为存在欺诈行为且有明确证据的出版商。BuzzCity将暂停其账户,其收入将不予支付。
另一方面,点击数据库记录了点击流量,并包含以下字段:
id - 特定点击的唯一标识符。
numericip - 点击者/访客的公共IP地址。
deviceua - 点击者/访客使用的手机型号。
publisherid - 出版商的唯一标识符。
adscampaignid - 给定广告活动的唯一标识符。
usercountry - 浏览者所在的国家。
clicktime - 给定点击的时间戳(YYYY-MM-DD格式)。
publisherchannel - 出版商的频道类型,可以是以下之一:
ad - 成人网站。
co - 社区。
es - 娱乐与生活方式。
gd - 魅力与约会。
in - 信息。
mc - 移动内容。
pp - 付费门户。
se - 搜索、门户网站、服务。
referredurl - 广告横幅被点击的URL(已匿名化;可能为空)。有关HTTP Referer协议的更多详细信息,请参阅此篇文章。
相关出版物:R. J. Oentaryo, E.-P. Lim, M. Finegold, D. Lo, F.-D. Zhu, C. Phua, E.-Y. Cheu, G.-E. Yap, K. Sim, M. N. Nguyen, K. Perera, B. Neupane, M. Faisal, Z.-Y. Aung, W. L. Woon, W. Chen, D. Patel, and D. Berrar. (2014). 在在线广告中检测点击欺诈:一种数据挖掘方法,Journal of Machine Learning Research, 15, 99-140。
提供机构:
SMU Research Data Repository (RDR)



