five

职业人群体检数据集

收藏
阿里云天池2026-03-28 更新2024-05-21 收录
下载链接:
https://tianchi.aliyun.com/dataset/178198
下载链接
链接失效反馈
资源简介:
本数据集收集了部分职业人群体检数据,数据包括: 序号 性别 身份证号 是否吸烟 是否饮酒 开始从事某工作年份 体检年份 淋巴细胞计数 白细胞计数 细胞其它值 血小板计数 你可以使用本数据集合进行: (1)查看数据类型,表结构,并统计各字段空缺值的个数。 (2)删除全为空的列,删除“身份证号”为空的数据。 (3)将“开始从事某工作年份”“体检年份”规范为4位数字年份,如“2018”,并将“开始从事某工作年份”列名改为“参加工作时间”。 (4)“身份证号”“参加工作时间”“体检年份”的数据类型都是object,需要进行类型转换,统一转化为int64类型,删除有缺失值的行。 (5)增加“工龄”(体检年份-参加工作时间)和“年龄”(体检年份-出生年份)两列,删除不合理的数据(如负数,年龄为几百岁等)。 (6)统计不同性别的白细胞计数均值,并画出柱状图。 (7)统计不同年龄段的白细胞计数均值,并画出柱状图,年龄段划分为:小于或等于30岁,31~40岁,41~50岁,大于50岁。 (8)统计不同年龄段的人数,画出饼图,年龄段划分为:小于或等于30岁,31~40岁,41~50岁,大于50岁。 (9)分析年龄和白细胞计数的关系,绘制出散点图 (10)统计是否吸烟人群的白细胞计数均值,并画出柱状图。

This dataset collects physical examination data of some occupational populations, including the following fields: Serial Number, Gender, ID Card Number, Smoking Status, Drinking Status, Year of Starting a Certain Job, Physical Examination Year, Lymphocyte Count, White Blood Cell Count, Other Cell Values, Platelet Count. Users can perform the following operations with this dataset: (1) Examine the data types and table structure, and count the number of missing values for each field. (2) Delete columns that are entirely empty, and remove rows where the "ID Card Number" field is empty. (3) Standardize the "Year of Starting a Certain Job" and "Physical Examination Year" to 4-digit numeric years (e.g., "2018"), and rename the "Year of Starting a Certain Job" column to "Years of Work Experience". (4) The data types of "ID Card Number", "Years of Work Experience" and "Physical Examination Year" are all object; convert their types uniformly to int64, and remove rows with missing values. (5) Add two new columns: "Length of Service" (calculated as Physical Examination Year - Years of Work Experience) and "Age" (calculated as Physical Examination Year - Birth Year), then remove invalid data such as negative values and ages of hundreds of years old. (6) Calculate the mean value of White Blood Cell Count for each gender, and plot a bar chart. (7) Calculate the mean value of White Blood Cell Count for each age group, and plot a bar chart. The age groups are divided as: ≤30 years old, 31-40 years old, 41-50 years old, and >50 years old. (8) Count the number of people in each age group, and plot a pie chart. The age groups are divided as: ≤30 years old, 31-40 years old, 41-50 years old, and >50 years old. (9) Analyze the relationship between Age and White Blood Cell Count, and plot a scatter plot. (10) Calculate the mean value of White Blood Cell Count for smoking and non-smoking groups, and plot a bar chart.
提供机构:
阿里云天池
创建时间:
2024-05-18
AI搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
职业人群体检数据集是一个包含职业人群健康指标的小型数据集,涵盖性别、吸烟饮酒习惯、工作年份、体检年份以及血液细胞计数等字段。该数据集适用于数据清洗、类型转换和特征工程等基础数据处理练习,并提供了具体的统计分析任务,如计算工龄、年龄以及不同群体白细胞计数的均值,适合用于健康数据分析和可视化学习。数据集以Excel文件形式提供,大小为221.00KB,采用GPL 2.0许可证开放访问。
以上内容由AI搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作