five

Adult Data Set

收藏
BigML2026-05-08 更新2025-01-04 收录
下载链接:
https://bigml.com/user/czuriaga/gallery/dataset/5a61dce12a83477e0d000112
下载链接
链接失效反馈
官方服务:
资源简介:
**Abstract**: Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset. **Donor**: Ronny Kohavi and Barry Becker Data Mining and Visualization Silicon Graphics e-mail: ronnyk@live.com for questions. **Data Set Information**: Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0)) Prediction task is to determine whether a person makes over 50K a year. **Listing of attributes**: **age**: continuous. **workclass**: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked. **fnlwgt**: continuous. **education**: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool. **education-num**: continuous. **marital-status**: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse. **occupation**: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces. **relationship**: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried. **race**: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black. **sex**: Female, Male. **capital-gain**: continuous. **capital-loss**: continuous. **hours-per-week**: continuous. **native-country**: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands. [Adult dataset from Delve datasets](http://www.cs.toronto.edu/~delve/data/datasets.html) and [UCI](https://archive.ics.uci.edu/ml/datasets/adult)
创建时间:
2018-01-19
原始信息汇总

Adult Data Set 数据集概述

基本信息

描述

  • 摘要: 基于人口普查数据预测收入是否超过每年5万美元。也称为“Census Income”数据集。
  • 捐赠者: Ronny Kohavi 和 Barry Becker(数据挖掘与可视化,Silicon Graphics)
  • 数据来源: 从1994年人口普查数据库中提取,使用条件: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0))
  • 预测任务: 确定一个人年收入是否超过5万美元。

属性列表

  1. age: 连续型。
  2. workclass: 类别型(Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked)。
  3. fnlwgt: 连续型。
  4. education: 类别型(Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool)。
  5. education-num: 连续型。
  6. marital-status: 类别型(Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse)。
  7. occupation: 类别型(Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces)。
  8. relationship: 类别型(Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried)。
  9. race: 类别型(White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black)。
  10. sex: 类别型(Female, Male)。
  11. capital-gain: 连续型。
  12. capital-loss: 连续型。
  13. hours-per-week: 连续型。
  14. native-country: 类别型(United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands)。

数据质量

  • 缺失值:
    • workclass: 2,799
    • occupation: 2,809
  • 错误值: 无

标签

  • Census
  • Demographic
  • Incomes
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集基于1994年美国人口普查数据,包含48,842条记录和15个特征字段,主要用于预测个人年收入是否超过5万美元。数据经过清洗筛选,涵盖人口统计、教育、职业等多维度信息。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作