Replication Package for \"From Online Job Postings to Economic Insights: A Machine Learning Approach to Structuring Naturally Occurring Data\"
收藏DataONE2025-05-01 更新2025-11-01 收录
下载链接:
https://search.dataone.org/view/sha256:c6074b5a9e0d7f6acd841fb337c5307e5b5345325bb7b596075cb975fbf5f4a0
下载链接
链接失效反馈官方服务:
资源简介:
This replication package provides the code used to generate the figures and results in the paper, which links Canadian online job postings from Indeed to firm-level data from Advan Research using natural language processing (NLP) techniques. The code is organized in two parts: 1. **Data construction Scripts** (require access to confidential data and cannot be executed without the necessary data agreements, though they are included for transparency and documentation) - **Company name matching** using tf-idf and cosine similarity to match inconsistently-declared company names in the online job postings names in the Advan Research Points-of-Interest (POI) dataset. - **Occupational classification** of job titles into the Canadian National Occupation Classification (NOC) using a pre-trained classifier. - **Aggregation** for data to construct the figures in the paper. 2. **Public Replication Scripts** (fully runnable with included grouped data) - **Nowcasting of official vacancies** using pseudo real-time information from online job postings and the Job Vacancies and Wage Survey (JVWS). - **Analysis of digital vs. non-digital jobs dynamics** in tech vs. non-tech firms during and after the COVID-19 pandemic. Due to licensing restrictions, raw data from Indeed and Advan are not included in this archive. However, we provide code to replicate the data processing pipeline (when access is granted) and make available aggregated outputs sufficient to reproduce all figures and tables in the paper.
创建时间:
2025-10-29



