five

AiresPucrs/adult-census-income

收藏
Hugging Face2024-10-13 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/AiresPucrs/adult-census-income
下载链接
链接失效反馈
官方服务:
资源简介:
adult-census-income数据集主要用于预测任务,即判断一个人的年收入是否超过50K美元。该数据集来源于1994年美国人口普查数据库,由Barry Becker和Ronny Kohavi提取并整理。数据集包含32,561条记录,特征包括年龄、工作类别、教育程度、婚姻状况、职业、种族、性别、资本收益、资本损失、每周工作小时数、原籍国等。数据集的语言为英语,总大小为5,316,802字节,下载大小为553,790字节。数据集的使用可以通过HuggingFace的`load_dataset`函数加载。

The adult-census-income dataset is primarily used for predictive tasks, i.e., to determine whether an individual's annual income exceeds $50,000. It is derived from the 1994 US Census Database, and was extracted and curated by Barry Becker and Ronny Kohavi. The dataset contains 32,561 records, with features including age, work class, education level, marital status, occupation, race, gender, capital gain, capital loss, weekly working hours, country of origin, and more. It is in English, with a total size of 5,316,802 bytes and a download size of 553,790 bytes. The dataset can be loaded using the `load_dataset` function from HuggingFace.
提供机构:
AiresPucrs
原始信息汇总

数据集概述

数据集详情

  • 数据集名称: adult-census-income
  • 语言: 英语
  • 总大小: 32,561 条记录
  • 下载大小: 553790 字节
  • 数据集大小: 5316802 字节
  • 许可证: Creative Commons(CC) License CC0 1.0

数据集特征

特征列表

  • age: 整数类型
  • workclass: 字符串类型
  • fnlwgt: 整数类型
  • education: 字符串类型
  • education.num: 整数类型
  • marital.status: 字符串类型
  • occupation: 字符串类型
  • relationship: 字符串类型
  • race: 字符串类型
  • sex: 字符串类型
  • capital.gain: 整数类型
  • capital.loss: 整数类型
  • hours.per.week: 整数类型
  • native.country: 字符串类型
  • income: 字符串类型

数据分割

  • train: 包含 32561 条记录,5316802 字节

数据集内容

特征详细说明

  • Income: >50K (24.1%), <=50K (75.9%)
  • Age: 连续值
  • Workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked
  • fnlwgt: 连续值
  • Education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool
  • Education.num: 连续值
  • Marital.status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse
  • Occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspect, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces
  • Relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried
  • Race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black
  • Sex: Female, Male
  • Capital.gain: 连续值
  • Capital.loss: 连续值
  • Hours.per.week: 连续值
  • Native.country: United States, Cambodia, England, Puerto Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作