saptarshideveloper/resume-score-details

Name: saptarshideveloper/resume-score-details
Creator: saptarshideveloper
Published: 2026-04-02 15:26:14
License: 暂无描述

Hugging Face2026-04-02 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/saptarshideveloper/resume-score-details

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc task_categories: - text-classification - feature-extraction language: - en tags: - hr size_categories: - 1K<n<10K --- # Resume and Job Description Matching Dataset ### Overview This dataset contains **1,031 samples** of resumes and job descriptions (JDs) generated and assessed using **GPT-4o**. The primary goal of this dataset is to evaluate the alignment between resumes and job descriptions, aiding in the study of resume relevance, skill alignment, and job fit scoring based on predefined criteria. ### Dataset Composition The dataset includes resumes matched with job descriptions, with the assessment and scoring details based on various matching criteria: - **201 Mismatched JSONs**: Resumes that are not relevant to the provided JD. - **648 Matched JSONs**: Resumes that are relevant and aligned with the JD. - **142 Invalid JSONs**: Cases where either the resume or JD is incomplete or invalid. - **40 JSONs Missing Additional Info**: Instances where additional input information was omitted. ### Dataset Structure Each sample JSON file in the dataset includes the following keys: - **`input`**: - **`job_description`**: Contains the full job description text. - **`macro_dict`**: A dictionary with macro-level criteria and their respective weighting. - **`micro_dict`**: A dictionary with micro-level criteria and their respective weighting. - **`additional_info`**: Extra requirements or preferences related to the JD. - **`minimum_requirements`**: List of fundamental qualifications for the role. - **`resume`**: Text of the resume as provided. - **`output`**: - **`justification`**: Reasons for the scores assigned, based on specific criteria. - **`scores`**: - **`macro_scores`**: Scores for broader criteria (e.g., experience, strategic thinking). - **`micro_scores`**: Scores for detailed criteria (e.g., market research expertise). - **`requirements`**: Boolean indicators showing if key requirements are met. - **`aggregated_scores`**: Overall scores for macro and micro criteria. - **`personal_info`**: Extracted personal details (e.g., name, contact details, current position). - **`valid_resume_and_jd`**: Boolean indicating if both resume and JD are valid for evaluation. - **`details`**: - **Resume Analysis**: Detailed breakdown of education, certifications, skills, project history, and professional experience. ### Dataset Preparation Methodology 1. **JD Generation**: Resumes were randomly sampled, and GPT-4o generated job descriptions tailored to these resumes. 2. **JD Comparison**: Individual resumes were then compared to a randomly generated JD using GPT-4o to produce relevance scores and justifications. ### Example Entry A sample JSON object in this dataset resembles the following structure: ```json { "input": { "job_description": "Full job description text...", "macro_dict": {"experience": 89, "strategic thinking": 11}, "micro_dict": {"market research": 7, "it and manufacturing sector knowledge": 93}, "additional_info": "Preferred candidates are from top-tier institutes...", "minimum_requirements": ["5+ years of experience...", "Strong understanding of IT..."], "resume": "Resume text with skills, experience, etc." }, "output": { "justification": ["Candidate has only 1.5 years of experience, below the required 5+ years..."], "scores": { "macro_scores": [{"criteria": "experience", "score": 3}, {"criteria": "strategic thinking", "score": 2}], "micro_scores": [{"criteria": "market research", "score": 4}, {"criteria": "it and manufacturing sector knowledge", "score": 3}], "requirements": [{"criteria": "5+ years of experience...", "meets": false}, ...], "aggregated_scores": {"macro_scores": 2.89, "micro_scores": 3.07} }, "personal_info": {"name": "Muhammad Talha Riaz", "email": "talhariaz9969@gmail.com", ...}, "valid_resume_and_jd": true }, "details": { "name": "Talha Riaz", "skills": ["HTML", "CSS", "JavaScript", ...], "education": [{"university": "University of the Punjab", "degree_title": "BS Management", "end_date": "06-2021"}], ... } } ``` ### Use Cases This dataset is designed to support research in: - **AI-driven recruitment**: Assessing resume-JD alignment and scoring accuracy. - **Job Matching Algorithms**: Testing algorithms that rank or filter candidates based on job fit. - **Natural Language Processing (NLP)**: Analyzing how NLP can evaluate resume relevance based on custom criteria. ### Licensing and Citation Please cite this dataset as follows: ```plaintext Dataset generated using GPT-4o by [rohan/netsol].

许可协议：知识共享（Creative Commons, CC）任务类别：文本分类、特征提取语言：英语标签：人力资源（Human Resources, HR）样本规模：1000 < 样本量 < 10000 # 简历与岗位描述匹配数据集 ## 概述本数据集包含1031条由GPT-4o生成并经其评估的简历与岗位描述（Job Description, JD）样本，核心目标为评估简历与岗位描述的匹配程度，助力基于预设标准开展简历相关性、技能匹配度及岗位适配度评分相关研究。 ## 数据集构成本数据集收录与岗位描述匹配的简历样本，其评估与评分基于多项匹配标准： - 201份不匹配JSON文件：简历与给定岗位描述不相关； - 648份匹配JSON文件：简历与岗位描述相关且适配； - 142份无效JSON文件：简历或岗位描述存在不完整或无效情况； - 40份缺失额外信息JSON文件：缺失相关补充输入信息的案例。 ## 数据集结构数据集中的每份样本JSON文件包含以下键值： - **`input`**： - **`job_description`**：完整岗位描述文本； - **`macro_dict`**：包含宏观维度标准及其对应权重的字典； - **`micro_dict`**：包含微观维度标准及其对应权重的字典； - **`additional_info`**：与岗位描述相关的额外要求或偏好； - **`minimum_requirements`**：岗位的基本任职资格列表； - **`resume`**：所提供的简历文本。 - **`output`**： - **`justification`**：基于特定标准给出的评分依据； - **`scores`**： - **`macro_scores`**：宏观维度评分（如工作经验、战略思维）； - **`micro_scores`**：微观维度评分（如市场调研专业能力）； - **`requirements`**：标识关键要求是否满足的布尔值集合； - **`aggregated_scores`**：宏观与微观维度的综合评分； - **`personal_info`**：提取的个人详情（如姓名、联系方式、当前职位）； - **`valid_resume_and_jd`**：标识简历与岗位描述是否可用于评估的布尔值。 - **`details`**： - **Resume Analysis**：对教育背景、证书、技能、项目经历及职业履历的详细拆解。 ## 数据集制备方法 1. **岗位描述生成**：先随机采样简历，由GPT-4o生成适配该简历的岗位描述； 2. **岗位描述比对**：再将每份简历与随机生成的岗位描述通过GPT-4o进行比对，生成相关性评分与依据。 ## 示例条目本数据集中的示例JSON对象结构如下： json { "input": { "job_description": "Full job description text...", "macro_dict": {"experience": 89, "strategic thinking": 11}, "micro_dict": {"market research": 7, "it and manufacturing sector knowledge": 93}, "additional_info": "Preferred candidates are from top-tier institutes...", "minimum_requirements": ["5+ years of experience...", "Strong understanding of IT..."], "resume": "Resume text with skills, experience, etc." }, "output": { "justification": ["Candidate has only 1.5 years of experience, below the required 5+ years..."], "scores": { "macro_scores": [{"criteria": "experience", "score": 3}, {"criteria": "strategic thinking", "score": 2}], "micro_scores": [{"criteria": "market research", "score": 4}, {"criteria": "it and manufacturing sector knowledge", "score": 3}], "requirements": [{"criteria": "5+ years of experience...", "meets": false}, ...], "aggregated_scores": {"macro_scores": 2.89, "micro_scores": 3.07} }, "personal_info": {"name": "Muhammad Talha Riaz", "email": "talhariaz9969@gmail.com", ...}, "valid_resume_and_jd": true }, "details": { "name": "Talha Riaz", "skills": ["HTML", "CSS", "JavaScript", ...], "education": [{"university": "University of the Punjab", "degree_title": "BS Management", "end_date": "06-2021"}], ... } } ## 应用场景本数据集可支撑以下方向的研究： - **AI驱动的招聘**：评估简历-岗位描述匹配度与评分准确性； - **岗位匹配算法**：测试基于岗位适配度对候选人进行排序或筛选的算法； - **自然语言处理（Natural Language Processing, NLP）**：探究自然语言处理技术如何基于自定义标准评估简历相关性。 ## 许可与引用请按照以下格式引用本数据集： plaintext Dataset generated using GPT-4o by [rohan/netsol].

提供机构：

saptarshideveloper

5,000+

优质数据集

54 个

任务类型

进入经典数据集