Replication Package for "From Online Job Postings to Economic Insights: A Machine Learning Approach to Structuring Naturally Occurring Data"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/W5AI5A
下载链接
链接失效反馈官方服务:
资源简介:
This replication package provides the code used to generate the figures and results in the paper, which links Canadian online job postings from Indeed to firm-level data from Advan Research using natural language processing (NLP) techniques. The code is organized in two parts: 1. **Data construction Scripts** (require access to confidential data and cannot be executed without the necessary data agreements, though they are included for transparency and documentation) - **Company name matching** using tf-idf and cosine similarity to match inconsistently-declared company names in the online job postings names in the Advan Research Points-of-Interest (POI) dataset. - **Occupational classification** of job titles into the Canadian National Occupation Classification (NOC) using a pre-trained classifier. - **Aggregation** for data to construct the figures in the paper. 2. **Public Replication Scripts** (fully runnable with included grouped data) - **Nowcasting of official vacancies** using pseudo real-time information from online job postings and the Job Vacancies and Wage Survey (JVWS). - **Analysis of digital vs. non-digital jobs dynamics** in tech vs. non-tech firms during and after the COVID-19 pandemic. Due to licensing restrictions, raw data from Indeed and Advan are not included in this archive. However, we provide code to replicate the data processing pipeline (when access is granted) and make available aggregated outputs sufficient to reproduce all figures and tables in the paper.
创建时间:
2025-05-02



