five

Synthetic Data Set for Uplift Modeling

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3653140
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is designed and simulated for evaluating uplift modeling and feature selection methods. The main feature of this dataset is that it generates features with various patterns associated with the outcome variable and the causal effect (or treatment effect). Thus it is suitable for evaluating feature importance and model interpretation for uplift modeling. This dataset consists of 100 trials (replicates with different random seeds), each trial with 10,000 samples and 36 features. The outcome variable is binary, that makes this dataset for classification problem. The samples are equally split for control and treatment group (5,000 samples in each group in each trial). The generated data has three types of features: (1) uplift features influencing the treatment effect on the conversion probability; (2) classification features affecting the conversion probability but independent of the treatment effect; and (3) irrelevant features that are independent of both conversion probability and the treatment effect. To model the relationship between uplift features and the treatment effect and classification features and outcome probability, we implement six types of association patterns in the data generation process: linear, quadratic, cubic, ReLU (Rectified Linear Unit), trigonometric function sine, and cosine. In this data set, there are 36 features in total, including 10 classification features, 6 uplift features, and 20 irrelevant features. Column names: Trial ID: 'trial_id' Experiment group label: 'treatment_group_key' Outcome variable (classification label):  'conversion' Feature names: ['x1_informative', 'x2_informative', 'x3_informative', 'x4_informative', 'x5_informative', 'x6_informative', 'x7_informative', 'x8_informative', 'x9_informative', 'x10_informative', 'x11_irrelevant', 'x12_irrelevant', 'x13_irrelevant', 'x14_irrelevant', 'x15_irrelevant', 'x16_irrelevant', 'x17_irrelevant', 'x18_irrelevant', 'x19_irrelevant', 'x20_irrelevant', 'x21_irrelevant', 'x22_irrelevant', 'x23_irrelevant', 'x24_irrelevant', 'x25_irrelevant', 'x26_irrelevant', 'x27_irrelevant', 'x28_irrelevant', 'x29_irrelevant', 'x30_irrelevant', 'x31_uplift_increase', 'x32_uplift_increase', 'x33_uplift_increase', 'x34_uplift_increase', 'x35_uplift_increase', 'x36_uplift_increase'] True underlying control conversion probability: 'control_conversion_prob' True underlying treatment conversion probability: 'treatment1_conversion_prob' True treatment effect:  'treatment1_true_effect'
创建时间:
2020-02-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作