five

Replication Package for "From Online Job Postings to Economic Insights: A Machine Learning Approach to Structuring Naturally Occurring Data"

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/W5AI5A
下载链接
链接失效反馈
官方服务:
资源简介:
This replication package provides the code used to generate the figures and results in the paper, which links Canadian online job postings from Indeed to firm-level data from Advan Research using natural language processing (NLP) techniques. The code is organized in two parts: 1. **Data construction Scripts** (require access to confidential data and cannot be executed without the necessary data agreements, though they are included for transparency and documentation) - **Company name matching** using tf-idf and cosine similarity to match inconsistently-declared company names in the online job postings names in the Advan Research Points-of-Interest (POI) dataset. - **Occupational classification** of job titles into the Canadian National Occupation Classification (NOC) using a pre-trained classifier. - **Aggregation** for data to construct the figures in the paper. 2. **Public Replication Scripts** (fully runnable with included grouped data) - **Nowcasting of official vacancies** using pseudo real-time information from online job postings and the Job Vacancies and Wage Survey (JVWS). - **Analysis of digital vs. non-digital jobs dynamics** in tech vs. non-tech firms during and after the COVID-19 pandemic. Due to licensing restrictions, raw data from Indeed and Advan are not included in this archive. However, we provide code to replicate the data processing pipeline (when access is granted) and make available aggregated outputs sufficient to reproduce all figures and tables in the paper.
创建时间:
2025-05-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作