five

b-fatma/adult-income-census-federated

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/b-fatma/adult-income-census-federated
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc0-1.0 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* dataset_info: features: - name: age dtype: float64 - name: education.num dtype: float64 - name: sex dtype: int64 - name: capital.gain dtype: float64 - name: capital.loss dtype: float64 - name: hours.per.week dtype: float64 - name: workclass_Other dtype: int64 - name: workclass_Private dtype: int64 - name: workclass_Self-employed dtype: int64 - name: occupation_Armed-Forces dtype: int64 - name: occupation_Craft-repair dtype: int64 - name: occupation_Exec-managerial dtype: int64 - name: occupation_Farming-fishing dtype: int64 - name: occupation_Handlers-cleaners dtype: int64 - name: occupation_Machine-op-inspct dtype: int64 - name: occupation_Other-service dtype: int64 - name: occupation_Priv-house-serv dtype: int64 - name: occupation_Prof-specialty dtype: int64 - name: occupation_Protective-serv dtype: int64 - name: occupation_Sales dtype: int64 - name: occupation_Tech-support dtype: int64 - name: occupation_Transport-moving dtype: int64 - name: relationship_Not-in-family dtype: int64 - name: relationship_Other-relative dtype: int64 - name: relationship_Own-child dtype: int64 - name: relationship_Unmarried dtype: int64 - name: relationship_Wife dtype: int64 - name: race_Other dtype: int64 - name: race_White dtype: int64 - name: native.country_United-States dtype: int64 - name: income dtype: int64 splits: - name: train num_bytes: 6459904 num_examples: 26048 - name: test num_bytes: 1615224 num_examples: 6513 download_size: 219795 dataset_size: 8075128 --- ## Adult Census Income Dataset The following was retrieved from [UCI machine learning repository](https://archive.ics.uci.edu/ml/datasets/adult). This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1) && (HRSWK>0)). The prediction task is to determine whether a person makes over $50K a year. **Description of fnlwgt (final weight)** The weights on the Current Population Survey (CPS) files are controlled to independent estimates of the civilian noninstitutional population of the US. These are prepared monthly for us by Population Division here at the Census Bureau. We use 3 sets of controls. These are: - A single cell estimate of the population 16+ for each state. - Controls for Hispanic Origin by age and sex. - Controls by Race, age and sex. We use all three sets of controls in our weighting program and "rake" through them 6 times so that by the end we come back to all the controls we used. The term estimate refers to population totals derived from CPS by creating "weighted tallies" of any specified socio-economic characteristics of the population. People with similar demographic characteristics should have similar weights. There is one important caveat to remember about this statement. That is that since the CPS sample is actually a collection of 51 state samples, each with its own probability of selection, the statement only applies within state.

许可协议:cc0-1.0 配置项: - 配置名称:default(默认) 数据文件: - 划分集:train(训练集),路径:data/train-* - 划分集:test(测试集),路径:data/test-* 数据集信息: 特征项: - 特征名称:age(年龄),数据类型:float64 - 特征名称:education.num(受教育年限),数据类型:float64 - 特征名称:sex(性别),数据类型:int64 - 特征名称:capital.gain(资本收益),数据类型:float64 - 特征名称:capital.loss(资本损失),数据类型:float64 - 特征名称:hours.per.week(每周工作时长),数据类型:float64 - 特征名称:workclass_Other(工作类别_其他),数据类型:int64 - 特征名称:workclass_Private(工作类别_私营企业),数据类型:int64 - 特征名称:workclass_Self-employed(工作类别_自雇),数据类型:int64 - 特征名称:occupation_Armed-Forces(职业_武装部队),数据类型:int64 - 特征名称:occupation_Craft-repair(职业_手工艺与维修),数据类型:int64 - 特征名称:occupation_Exec-managerial(职业_行政与管理),数据类型:int64 - 特征名称:occupation_Farming-fishing(职业_农业与渔业),数据类型:int64 - 特征名称:occupation_Handlers-cleaners(职业_手工装卸与清洁),数据类型:int64 - 特征名称:occupation_Machine-op-inspct(职业_机器操作与装配),数据类型:int64 - 特征名称:occupation_Other-service(职业_其他服务类),数据类型:int64 - 特征名称:occupation_Priv-house-serv(职业_私人家庭服务),数据类型:int64 - 特征名称:occupation_Prof-specialty(职业_专业技术类),数据类型:int64 - 特征名称:occupation_Protective-serv(职业_安保服务),数据类型:int64 - 特征名称:occupation_Sales(职业_销售类),数据类型:int64 - 特征名称:occupation_Tech-support(职业_技术支持),数据类型:int64 - 特征名称:occupation_Transport-moving(职业_运输与搬运),数据类型:int64 - 特征名称:relationship_Not-in-family(家庭关系_非家庭成员),数据类型:int64 - 特征名称:relationship_Other-relative(家庭关系_其他亲属),数据类型:int64 - 特征名称:relationship_Own-child(家庭关系_子女),数据类型:int64 - 特征名称:relationship_Unmarried(家庭关系_未婚),数据类型:int64 - 特征名称:relationship_Wife(家庭关系_配偶),数据类型:int64 - 特征名称:race_Other(种族_其他),数据类型:int64 - 特征名称:race_White(种族_白人),数据类型:int64 - 特征名称:native.country_United-States(原籍国家_美国),数据类型:int64 - 特征名称:income(收入),数据类型:int64 划分集信息: - 名称:train(训练集),字节数:6459904,样本量:26048 - 名称:test(测试集),字节数:1615224,样本量:6513 下载大小:219795字节 数据集总大小:8075128字节 ## 成人普查收入数据集(Adult Census Income Dataset) 本数据集取自[UCI机器学习仓库(UCI machine learning repository)](https://archive.ics.uci.edu/ml/datasets/adult)。 该数据集由Ronny Kohavi与Barry Becker从1994年美国人口普查局数据库中提取(出自Silicon Graphics公司的数据挖掘与可视化部门)。研究人员通过以下条件筛选出了一组质量较为整洁的记录:`((AAGE>16) && (AGI>100) && (AFNLWGT>1) && (HRSWK>0))`。本数据集的预测任务为判断个人年收入是否超过5万美元。 **fnlwgt(最终权重,final weight)说明** 当前人口调查(Current Population Survey, CPS)文件中的权重被校准为美国平民非机构化人口的独立估计值。这些校准参数由美国人口普查局人口司每月为我们制备。我们共使用三类校准参数,具体包括: - 全美各州16岁及以上人口的单单元格估计值; - 按年龄与性别划分的西班牙裔起源校准参数; - 按种族、年龄与性别划分的校准参数。 我们在权重计算程序中整合使用全部三类校准参数,并进行6轮耙式校准,最终使结果回归至所有预设校准参数的要求。此处的“估计值”指通过对人口的任意指定社会经济特征生成“加权计数”,从CPS数据中推导得到的人口总量。具有相似人口统计特征的个体应当拥有相近的权重。关于这一结论,有一个重要的注意事项需要牢记:这是因为CPS样本实际上是由51个州的样本集合构成,每个州样本都有其独立的抽样概率,因此该结论仅适用于州内范围。
提供机构:
b-fatma
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作