淘宝展示广告点击聚类数据集
收藏国家基础学科公共科学数据中心2026-01-30 收录
下载链接:
https://nbsdc.cn/general/dataDetail?id=697b8411195d2616afaeb9ae&type=1
下载链接
链接失效反馈官方服务:
资源简介:
随着智能设备和用户行为数据的爆炸式增长,如何对海量异构数据进行高效聚类分析已成为大数据智能分析的重要课题。面向大数据层次聚类分析的电商展示广告点击数据集,基于类别聚合、特征清洗、数值归一化与向量化处理,形成适用于超大规模且非平衡分布聚类分析的结构化数据资源。该数据集涵盖了 1,023,154 名用户的广告展示与点击日志,并整合商品价格、广告类别、用户行为等多维属性,数据集大小为23.1GB。在数据预处理中,我们对原始数据进行了脱敏和合规性检查,实施了类别规范化与聚合、字段清洗、缺失值填补,并对数值型特征进行截尾处理和归一化,以缓解长尾分布影响。该数据集可为大数据聚类算法、用户画像分析及商业数据挖掘研究提供可靠的数据基础与实验基准,具有重要的研究价值和应用潜力。
Against the backdrop of the explosive proliferation of smart devices and user behavior data, efficient clustering analysis of massive heterogeneous data has become a pivotal research topic in big data intelligent analytics. This e-commerce display advertisement click dataset, tailored for big data hierarchical clustering analysis, is constructed through category aggregation, feature cleaning, numerical normalization and vectorization, resulting in a structured data resource applicable to clustering analysis of ultra-large-scale datasets with imbalanced distributions. The dataset encompasses ad display and click logs from 1,023,154 users, integrating multi-dimensional attributes such as product prices, ad categories and user behaviors, with a total size of 23.1 GB. During the preprocessing phase, we conducted data anonymization and compliance verification on the raw dataset, implemented category standardization and aggregation, field cleaning, missing value imputation, as well as winsorization and normalization for numerical features to mitigate the impact of long-tail distributions. This dataset offers a reliable data foundation and experimental benchmark for research on big data clustering algorithms, user profiling analysis and commercial data mining, demonstrating considerable research value and application potential.
提供机构:
西安交通大学
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



