UCI Adult (Version 2) - Artificial Prevalence Protocol (APP) Test Indices
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14283869
下载链接
链接失效反馈官方服务:
资源简介:
Dataset
Adult
# data points
45,222
# non-sensitive features (preprocessed)
95
sensitive attribute
sex
S = 1
Male
Pr(S=1)
0.675
target variable
income/class
Y = ⊕
>$50,000
Pr(Y=⊕)
0.248
The UCI Adult dataset is loaded using fetch_openml from scikit-learn (version 2). To ensure reproducibility, the dataset is split into three equal subsets using stratified sampling based on the joint distribution of (S, Y) and a random seed of 0.
The indices stored in adult_D1.indices, adult_D2.indices, and adult_D3.indices refer to the original dataset's instances after any instances containing missing values have been removed. The indices in adult_D3_protocol.indices correspond to the instances in the LabelledCollection constructed using the indices from adult_D3.indices.
Categorical features are converted to dummy variables, and numerical features are standardised using a standard scaler.
Feature Type
Features
Categorical
workclass, education, marital-status, occupation, race, native-country
Numerical
age, capital-gain, capital-loss, hours-per-week
The Artificial Prevalence Protocol (APP) is applied to vary the distribution of the sensitive attribute across 11 prevalence values: Pr(s|⊖) ∈ {0.0, 0.1, ..., 0.9, 1.0}. To minimise variance in the evaluation results, 500 samples are drawn repeatedly under the protocol. This entire process is repeated 10 times, consistently using a random seed of 0 for reproducibility.
This work has been funded by the QuaDaSh project “Finanziato dall’Unione europea- Next Generation EU, Missione 4 Componente 2 CUP B53D23026250001”
创建时间:
2024-12-18



