UCI Adult (Version 2) - Artificial Prevalence Protocol (APP) Test Indices

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14283869

下载链接

链接失效反馈

官方服务：

资源简介：

Dataset Adult # data points 45,222 # non-sensitive features (preprocessed) 95 sensitive attribute sex S = 1 Male Pr(S=1) 0.675 target variable income/class Y = ⊕ >$50,000 Pr(Y=⊕) 0.248 The UCI Adult dataset is loaded using fetch_openml from scikit-learn (version 2). To ensure reproducibility, the dataset is split into three equal subsets using stratified sampling based on the joint distribution of (S, Y) and a random seed of 0. The indices stored in adult_D1.indices, adult_D2.indices, and adult_D3.indices refer to the original dataset's instances after any instances containing missing values have been removed. The indices in adult_D3_protocol.indices correspond to the instances in the LabelledCollection constructed using the indices from adult_D3.indices. Categorical features are converted to dummy variables, and numerical features are standardised using a standard scaler. Feature Type Features Categorical workclass, education, marital-status, occupation, race, native-country Numerical age, capital-gain, capital-loss, hours-per-week The Artificial Prevalence Protocol (APP) is applied to vary the distribution of the sensitive attribute across 11 prevalence values: Pr(s|⊖) ∈ {0.0, 0.1, ..., 0.9, 1.0}. To minimise variance in the evaluation results, 500 samples are drawn repeatedly under the protocol. This entire process is repeated 10 times, consistently using a random seed of 0 for reproducibility. This work has been funded by the QuaDaSh project “Finanziato dall’Unione europea- Next Generation EU, Missione 4 Componente 2 CUP B53D23026250001”

创建时间：

2024-12-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集