five

UCI Adult (Version 2) - Artificial Prevalence Protocol (APP) Test Indices

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14283869
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset Adult # data points 45,222 # non-sensitive features (preprocessed) 95 sensitive attribute sex S = 1 Male Pr(S=1) 0.675 target variable income/class Y = ⊕ >$50,000 Pr(Y=⊕) 0.248 The UCI Adult dataset is loaded using fetch_openml from scikit-learn (version 2). To ensure reproducibility, the dataset is split into three equal subsets using stratified sampling based on the joint distribution of (S, Y) and a random seed of 0. The indices stored in adult_D1.indices, adult_D2.indices, and adult_D3.indices refer to the original dataset's instances after any instances containing missing values have been removed. The indices in adult_D3_protocol.indices correspond to the instances in the LabelledCollection constructed using the indices from adult_D3.indices. Categorical features are converted to dummy variables, and numerical features are standardised using a standard scaler. Feature Type Features Categorical workclass, education, marital-status, occupation, race, native-country Numerical age, capital-gain, capital-loss, hours-per-week   The Artificial Prevalence Protocol (APP) is applied to vary the distribution of the sensitive attribute across 11 prevalence values: Pr(s|⊖) ∈ {0.0, 0.1, ..., 0.9, 1.0}. To minimise variance in the evaluation results, 500 samples are drawn repeatedly under the protocol. This entire process is repeated 10 times, consistently using a random seed of 0 for reproducibility. This work has been funded by the QuaDaSh project “Finanziato dall’Unione europea- Next Generation EU, Missione 4 Componente 2 CUP B53D23026250001”
创建时间:
2024-12-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作