中文简历筛选数据集(部分IT行业)
收藏阿里云天池2026-06-09 更新2025-04-19 收录
下载链接:
https://tianchi.aliyun.com/dataset/201566
下载链接
链接失效反馈官方服务:
资源简介:
本数据集是模拟生成的结构化简历数据,共包含 5000 条记录,旨在为简历评估筛选相关提供数据支持,且通过特定评估指标对每份简历进行了 “通过” 或 “不通过” 的标注。
数据包含 34 个字段,涵盖了简历的多方面信息。其中,基本信息类字段有简历编号(唯一标识每份简历)、姓名、性别、年龄、电话和邮箱。求职意向类包括意向岗位,涉及算法工程师、测试工程师等十种岗位。教育背景类有学历层次(如本科)、院校类别(如普通高校)、专业类别(如计算机类)和英语水平(如英语四级)。
技能相关字段众多,如编程语言(Java、Python 等)及其熟练度。此外,还细分了不同技术领域的技能,像前端技术、后端技术、数据库等,以及各自对应的熟练度。
工作经验方面,区分了小型企业工作经验、中型企业工作经验和大型企业工作经验,用不同时间段表示。项目经验则通过小规模项目、中规模项目和大规模项目的数量来体现。
最终的筛选结果字段,明确标注了每份简历是否通过评估筛选,这对于构建简历筛选模型、分析影响简历通过与否的关键因素等工作具有重要价值。同时,为确保隐私安全,这些数据均为模拟生成,不涉及真实个人信息。
This dataset consists of 5000 simulated structured resume records, aiming to provide data support for resume evaluation and screening tasks, with each resume labeled as "Passed" or "Failed" based on specific evaluation metrics.
The dataset contains 34 fields covering various aspects of resume information. The basic information fields include resume ID (unique identifier for each resume), full name, gender, age, phone number, and email address. The job intention fields cover the intended position, involving ten types of positions such as algorithm engineer and test engineer. The educational background fields include educational level (e.g., bachelor's degree), institution type (e.g., regular university), major category (e.g., computer science-related), and English proficiency (e.g., CET-4).
There are numerous skill-related fields, such as programming languages (e.g., Java, Python) and their proficiency levels. Additionally, skills in different technical domains are further categorized, including front-end technologies, back-end technologies, databases, etc., along with their corresponding proficiency levels.
In terms of work experience, the dataset differentiates work experience in small, medium, and large enterprises, represented by different time periods. Project experience is reflected by the number of small-scale, medium-scale, and large-scale projects.
The final screening result field clearly indicates whether each resume has passed the evaluation and screening, which is of great value for tasks such as building resume screening models and analyzing key factors affecting resume passing outcomes. Meanwhile, to ensure privacy and security, all data in this dataset is simulated and does not involve any real personal information.
提供机构:
阿里云天池
创建时间:
2025-04-15
搜集汇总
数据集介绍

背景与挑战
背景概述
中文简历筛选数据集(部分IT行业)是一个包含5000条模拟生成的结构化简历数据的数据集,涵盖34个字段,包括基本信息、求职意向、教育背景、技能和工作经验等,并标注了每份简历的筛选结果。该数据集适用于简历筛选模型的构建和分析。
以上内容由遇见数据集搜集并总结生成



