职业人群体检数据集
收藏阿里云天池2026-06-03 更新2024-05-21 收录
下载链接:
https://tianchi.aliyun.com/dataset/178198
下载链接
链接失效反馈官方服务:
资源简介:
本数据集收集了部分职业人群体检数据,数据包括:
序号 性别 身份证号 是否吸烟 是否饮酒 开始从事某工作年份 体检年份 淋巴细胞计数 白细胞计数 细胞其它值 血小板计数
你可以使用本数据集合进行:
(1)查看数据类型,表结构,并统计各字段空缺值的个数。
(2)删除全为空的列,删除“身份证号”为空的数据。
(3)将“开始从事某工作年份”“体检年份”规范为4位数字年份,如“2018”,并将“开始从事某工作年份”列名改为“参加工作时间”。
(4)“身份证号”“参加工作时间”“体检年份”的数据类型都是object,需要进行类型转换,统一转化为int64类型,删除有缺失值的行。
(5)增加“工龄”(体检年份-参加工作时间)和“年龄”(体检年份-出生年份)两列,删除不合理的数据(如负数,年龄为几百岁等)。
(6)统计不同性别的白细胞计数均值,并画出柱状图。
(7)统计不同年龄段的白细胞计数均值,并画出柱状图,年龄段划分为:小于或等于30岁,31~40岁,41~50岁,大于50岁。
(8)统计不同年龄段的人数,画出饼图,年龄段划分为:小于或等于30岁,31~40岁,41~50岁,大于50岁。
(9)分析年龄和白细胞计数的关系,绘制出散点图
(10)统计是否吸烟人群的白细胞计数均值,并画出柱状图。
This dataset collects physical examination data of certain occupational populations. The dataset includes the following fields: Serial Number, Gender, ID Card Number, Smoking Status, Drinking Status, Year of Starting Current Job, Physical Examination Year, Lymphocyte Count, White Blood Cell Count, Other Cell Values, and Platelet Count.
This dataset can be used for the following operations:
1. Inspect the data types, table structure, and count the number of missing values for each field.
2. Delete columns with entirely null values, and remove rows where the "ID Card Number" field is null.
3. Standardize the "Year of Starting Current Job" and "Physical Examination Year" to 4-digit numeric years (e.g., "2018"), and rename the "Year of Starting Current Job" column to "Working Start Time".
4. Convert the data types of "ID Card Number", "Working Start Time", and "Physical Examination Year" from object to int64 uniformly, then delete rows with remaining missing values.
5. Add two new columns: "Length of Service" (computed as Physical Examination Year minus Working Start Time) and "Age" (computed as Physical Examination Year minus Year of Birth), then remove invalid data entries such as negative values or ages exceeding hundreds of years.
6. Calculate the mean white blood cell count across different genders, and generate a corresponding bar chart.
7. Calculate the mean white blood cell count for different age groups, and generate a bar chart. The age groups are categorized as: ≤30 years old, 31–40 years old, 41–50 years old, and >50 years old.
8. Count the number of individuals in each age group (using the same categorization as above) and generate a pie chart.
9. Analyze the correlation between age and white blood cell count, and plot a scatter diagram.
10. Calculate the mean white blood cell count for smoking and non-smoking groups, and generate a bar chart.
提供机构:
阿里云天池
创建时间:
2024-05-18
搜集汇总
数据集介绍

背景与挑战
背景概述
职业人群体检数据集是一个包含职业人群健康指标的小型数据集,涵盖性别、吸烟饮酒习惯、工作年份、体检年份以及血液细胞计数等字段。该数据集适用于数据清洗、类型转换和特征工程等基础数据处理练习,并提供了具体的统计分析任务,如计算工龄、年龄以及不同群体白细胞计数的均值,适合用于健康数据分析和可视化学习。数据集以Excel文件形式提供,大小为221.00KB,采用GPL 2.0许可证开放访问。
以上内容由遇见数据集搜集并总结生成



