five

vacancy_skills_data

收藏
Figshare2021-11-24 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/vacancy_skills_data/17075717/1
下载链接
链接失效反馈
官方服务:
资源简介:
3 datasets representing processed skill-sets for job advertisements obtained from HeadHunter online hiring platform (collected with open API https://dev.hh.ru/) for specialists in Information Technologies (in accordance with classifier https://github.com/hhru/api/blob/master/docs_eng/specializations.md). Description of main fields for vacancies is available via link https://github.com/hhru/api/blob/master/docs_eng/vacancies.md.<br><b>Datasets</b>:<br>1. "<i>vacancy_skill.csv</i>" - two column dataset, representing vacancy ID ("vacid") and processed skill name [in-demand skill] ("lv");<br>2. "<i>sh_soft_clusters.csv</i>" - three column dataset, representing initial formulations (translated) of "soft" skills ("V1"), the frequency of occurrence in the sample ("V2"), the etalon name [generalized categories of "soft" skills] ("ETALON");<br>3. "<i>jaccard_matrix.csv</i>" - dissimilarity square matrix between processed skill names (1,730 X 1,730) with Jaccard distances (computed by comparison of vacancy ID sets for each skill pair) [the first row and the first column contain skill names].

本数据集包含3份经标准化处理的岗位技能集数据,均采自HeadHunter在线招聘平台(通过公开API https://dev.hh.ru/ 采集),覆盖信息技术领域岗位(分类标准遵循 https://github.com/hhru/api/blob/master/docs_eng/specializations.md)。招聘启事核心字段的详细说明可通过链接 https://github.com/hhru/api/blob/master/docs_eng/vacancies.md 获取。<br><b>数据集详情</b>:<br>1. **`vacancy_skill.csv`**:该数据集包含两列,分别为岗位ID("vacid")与经处理的岗位刚需技能名称("lv")。<br>2. **`sh_soft_clusters.csv`**:该数据集包含三列,分别为软技能的初始译写表述("V1")、其在样本中的出现频次("V2"),以及软技能的标准化类别名称("ETALON")。<br>3. **`jaccard_matrix.csv`**:该文件为处理后技能名称间的相异性平方矩阵(1730 × 1730),采用杰卡德距离(Jaccard distance)计算(通过比对每对技能对应的岗位ID集合得到),矩阵的首行与首列均为技能名称。
提供机构:
Ternikov, Andrei
创建时间:
2021-11-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作