CMap: a database for mapping job titles, sector specialization, and promotions across 24 sectorsTable 4Figure 4
收藏DataCite Commons2025-06-09 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/CMap_a_database_for_mapping_job_titles_sector_specialization_and_promotions_across_24_sectorsTable_4Figure_4/28229633/1
下载链接
链接失效反馈官方服务:
资源简介:
Understanding job titles, career trajectories, and promotions provides valuable insight into labor market dynamics and professional mobility. We present Career Map (CMap), a novel dataset spanning 24 industry sectors, systematically structured to study job specialization, sector concentration, and career advancements. Using advanced natural language processing techniques and large language models, we standardize 6.2 million job titles into 109 thousand unique titles and introduce a Specialization Index to quantify how specialized a title is within its sector. The dataset includes both a structured job titles dataset and a set of identified promotions—30 thousand validated promotions from the United States and the United Kingdom, and 72 thousand inferred promotions from a global context. It enables research on job hierarchies, workforce mobility and systemic inequalities in professional advancement. By providing insights into career progression patterns, labor market structures, and the impact of education and experience, this dataset serves as a valuable resource for economists, sociologists, and computational researchers studying employment trends across industries and regions.This repository contains the code necessary to recreate Figure 4 and Table 4 from the original manuscript.
解析职位名称、职业轨迹与晋升路径,可为深入洞察劳动力市场动态与职业流动情况提供宝贵视角。本研究提出职业图谱(Career Map,简称CMap)——一款覆盖24个行业领域的新型数据集,其采用系统化架构,专为研究职业专业化、行业集中度与职业晋升而设计。本研究借助先进自然语言处理技术与大语言模型(Large Language Model),将620万个原始职位名称标准化为10.9万个唯一标准化职位,并提出专业化指数(Specialization Index),用于量化某一职位在所属行业中的专业化程度。该数据集包含结构化职位名称数据集与已识别的晋升记录两部分:其中3万条晋升记录来自美国与英国且经过人工验证,另有7.2万条基于全球场景推断得到的晋升记录。该数据集可支撑关于职业层级、劳动力流动以及职业晋升中系统性不平等现象的相关研究。通过揭示职业发展模式、劳动力市场结构以及教育与工作经验的影响效应,本数据集可为研究跨行业、跨地区就业趋势的经济学家、社会学家与计算科学研究者提供极具价值的研究资源。本代码仓库包含复现原论文中图4与表4所需的全部代码。
提供机构:
figshare
创建时间:
2025-06-09



