five

Jobs12K

收藏
arXiv2025-09-30 收录
下载链接:
https://kb.lightcast.io/en/articles/7153977-global-data-101
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集由基于美国的匿名简历数据构建而成,包含职位与公司配对的测试集,专为多类别职业分类任务设计。记录经过筛选,排除了非字母数字的职位名称以及超过七个单词长度的职位名称。在测试集中,职位的平均单词数为5.9个(标准差为2.3),而在验证集中为5.8个(标准差同样为2.3)。数据规模方面,测试集包含11,920条职位记录,而验证集则有100条。所面临的任务是进行多类别职业分类。

This dataset is constructed from US-based anonymous resume data, includes paired test sets of job titles and companies, and is specially designed for multi-class occupational classification tasks. All records have been filtered to exclude job titles containing non-alphanumeric characters and those with more than seven words. The average number of words per job title is 5.9 with a standard deviation of 2.3 in the test set, while this value is 5.8 with the identical standard deviation of 2.3 in the validation set. Regarding dataset scale, the test set contains 11,920 job records, whereas the validation set has 100 job records. The target task of this dataset is multi-class occupational classification.
提供机构:
Lightcast
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作