five

Dictionary of Titles

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://doi.org/10.7910/DVN/DQW8IP
下载链接
链接失效反馈
官方服务:
资源简介:
Hand transcribed content from the United States Bureau of Labour Statistics Dictionary of Titles (DoT). The DoT is a record of occupations and a description of the tasks performed. Five editions exist from 1939, 1949, 1965, 1977 and 1991. The DoT was replaced by O*NET structured data on jobs, workers and their characteristics. However, apart from the 1991 data, the data in the DoT is not easily ingestible, existing only in scalar PDF documents. Attempts at Optical Character Recognition led to low accuracy. For that reason we present here hand transcribed textual data from these documents. Various data are available for each occupation e.g. numerical codes, references to other occupations as well as the free text description. For that reason the data for each edition is presented in 'long' format with a variable number of lines, with a blank line between occupations. Consult the transcription instructions for more details. Structured meta-data (see here) on occupations is also available for the 1965, 1977 and 1991 editions. For the 1965, 1977 and 1991 editions, this data can be extracted from the numerical codes with the occupational entries, the key for these codes is found in the 1965 edition in separate tables exist which were transcribed. The instructions provided to transcribers for this edition are also added to the repository. The original documents are freely available in PDF format (e.g. here) This data accompanies the paper 'Longitudinal Complex Dynamics of Labour Markets Reveal Increasing Polarisation' by Althobaiti et al

本数据集源自美国劳工统计局(United States Bureau of Labour Statistics)职业名称词典(Dictionary of Titles, DoT)的手工转录内容。DoT是一份职业记录与任务描述文档,先后推出1939、1949、1965、1977及1991年共五个版本。后续该词典被针对职业、从业者及其特征的结构化数据平台O*NET所取代。除1991年版数据外,DoT其余版本的数据均难以直接导入使用,仅以扫描型PDF文档形式存在,且光学字符识别(Optical Character Recognition, OCR)的转写准确率极低。为此,本数据集提供了来自这些文档的手工转录文本数据。每个职业对应多项数据字段,例如数值编码、其他职业的关联引用,以及自由文本描述。各版本数据均采用「长格式」存储,单条职业数据对应可变行数的内容,职业之间以空行分隔。更多细节可查阅转录说明文档。1965、1977及1991年版还附带职业结构化元数据(详见此处)。对于上述三个版本,可从职业条目中的数值编码提取对应元数据,该编码的密钥可在1965年版的独立转录表格中获取。本数据集同时收录了该版本转录员使用的操作指南。原始PDF文档可免费获取(例如此处)。本数据集配套Althobaiti等人发表的论文《劳动力市场的纵向复杂动态揭示极化加剧》(Longitudinal Complex Dynamics of Labour Markets Reveal Increasing Polarisation)。
创建时间:
2022-04-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作