five

Adform click prediction dataset

收藏
DataONE2017-02-22 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/sha256:cdf287988eec0683812cdfe50296b91235a094c73b2ba4415a5bd72a82da4015
下载链接
链接失效反馈
官方服务:
资源简介:
This data is a sample of Adform's ad traffic. Each record corresponds to an ad impression served by Adform, and consists of a single binary label (clicked/not-clicked) and a selected subset of features (c0-c9). The positives and negatives are downsampled at different rates. The data is chronologically ordered. The file is gzipped and each line corresponds to a single record, serialized as JSON. The JSON has the following fields: \"l\": The binary label indicating whether the ad was clicked (1) or not (0). \"c0\" - \"c9\": Categorical features which were hashed into a 32-bit integer. The semantics of the features are not disclosed. The values are stored in an array, because some of the features have multiple values per record. When a key is missing, the field is empty. The files are named \"adform.click.2017.xx.json.gz\", where \"xx\" is the index (01-05). The files are indexed chronologically, and the records (lines) in the file within are ordered chronologically.

本数据集为Adform广告流量样本集。每条记录对应一次Adform投放的广告展示(ad impression),包含一个二元标签(点击/未点击)以及经筛选的特征子集(c0至c9)。正负样本以不同比例进行了下采样。数据集按时间顺序排列。该数据集文件采用gzip压缩(gzip)格式,每行对应一条以JSON格式序列化的记录。该JSON包含以下字段:"l":用于标识广告是否被点击的二元标签(1代表已点击,0代表未点击);"c0"至"c9":已被哈希为32位整数的分类特征,特征的具体语义未公开。由于部分单条记录可能包含多个特征值,因此特征值以数组形式存储。若某键缺失,则对应字段为空。数据集文件命名格式为"adform.click.2017.xx.json.gz",其中"xx"为文件索引(01至05)。所有文件按时间顺序编号,文件内的记录(行)同样按时间顺序排列。
创建时间:
2023-11-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作