Synthetic Data Set for Uplift Modeling (One Trial)
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3653052
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is designed and simulated for evaluating uplift modeling and feature selection methods.
This dataset contains 10,000 samples and 36 features (one trial).
The samples are equally split for control and treatment group.
The generated data has three types of features: (1) uplift features influencing the treatment effect on the conversion probability; (2) classification features affecting the conversion probability but independent of the treatment effect; and (3) irrelevant features that are independent of both conversion probability and the treatment effect. To model the relationship between uplift features and the treatment effect and classification features and outcome probability, we implement six types of association patterns in the data generation process: linear, quadratic, cubic, ReLU (Rectified Linear Unit), trigonometric function sine, and cosine.
In this data set, there are 36 features in total, including 10 classification features, 6 uplift features, and 20 irrelevant features.
Column names:
Experiment group label: 'treatment_group_key'
Feature names: ['x1_informative',
'x2_informative',
'x3_informative',
'x4_informative',
'x5_informative',
'x6_informative',
'x7_informative',
'x8_informative',
'x9_informative',
'x10_informative',
'x11_irrelevant',
'x12_irrelevant',
'x13_irrelevant',
'x14_irrelevant',
'x15_irrelevant',
'x16_irrelevant',
'x17_irrelevant',
'x18_irrelevant',
'x19_irrelevant',
'x20_irrelevant',
'x21_irrelevant',
'x22_irrelevant',
'x23_irrelevant',
'x24_irrelevant',
'x25_irrelevant',
'x26_irrelevant',
'x27_irrelevant',
'x28_irrelevant',
'x29_irrelevant',
'x30_irrelevant',
'x31_uplift_increase',
'x32_uplift_increase',
'x33_uplift_increase',
'x34_uplift_increase',
'x35_uplift_increase',
'x36_uplift_increase']
Outcome variable: 'conversion'
True underlying control conversion probability: 'control_conversion_prob'
True underlying treatment conversion probability: 'treatment1_conversion_prob'
True treatment effect: 'treatment1_true_effect'
Note columns names with '_transformed' suffix are feature variables used in the intermediate steps during the data generation, that should be excluded for model training.
创建时间:
2020-02-07



