Dictionary of Titles
收藏DataONE2022-04-06 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:12adfbf84a741941f75c9f1549cc3aa8e7997e4265d013982022493f82576f70
下载链接
链接失效反馈官方服务:
资源简介:
Hand transcribed content from the United States Bureau of Labour Statistics Dictionary of Titles (DoT). The DoT is a record of occupations and a description of the tasks performed. Five editions exist from 1939, 1949, 1965, 1977 and 1991. The DoT was replaced by O*NET structured data on jobs, workers and their characteristics. However, apart from the 1991 data, the data in the DoT is not easily ingestible, existing only in scalar PDF documents. Attempts at Optical Character Recognition led to low accuracy. For that reason we present here hand transcribed textual data from these documents. Various data are available for each occupation e.g. numerical codes, references to other occupations as well as the free text description. For that reason the data for each edition is presented in 'long' format with a variable number of lines, with a blank line between occupations. Consult the transcription instructions for more details. Structured meta-data (see here) on occupations is also available for the 1965, 1977 and 1991 editions. For the 1965, 1977 and 1991 editions, this data can be extracted from the numerical codes with the occupational entries, the key for these codes is found in the 1965 edition in separate tables exist which were transcribed. The instructions provided to transcribers for this edition are also added to the repository. The original documents are freely available in PDF format (e.g. here) This data accompanies the paper 'Longitudinal Complex Dynamics of Labour Markets Reveal Increasing Polarisation' by Althobaiti et al
本数据集源自美国劳工统计局(United States Bureau of Labour Statistics)职业名称词典(Dictionary of Titles, 简称DoT)的人工转录内容。DoT是一份职业名录与任务描述文档,先后于1939、1949、1965、1977及1991年推出共五个版本。该词典已被涵盖职业、从业者及其特征的结构化数据平台O*NET所取代。但除1991年版数据外,DoT其余版本的数据均难以直接导入使用,仅以图像型PDF文档形式留存。此前尝试光学字符识别(Optical Character Recognition, OCR)均未能获得理想识别准确率。为此,本数据集公开上述文档的人工转录文本。
每个职业条目包含多类数据,例如数字编码、职业关联参考资料以及自由文本描述。各版本数据均采用“长格式”存储,单条职业记录对应可变行数的内容,职业间以空行分隔。详细转录规则可查阅配套的转录说明文档。
1965、1977及1991版还附带职业结构化元数据(详见此处)。针对这三个版本,可从职业条目的数字编码中提取该元数据,上述编码的密钥收录于1965版的独立转录表格中。本版面向转录员的操作说明也已一并上传至代码仓库。原始PDF文档可免费获取(例如此处)。本数据集配套Althobaiti等人发表的论文《劳动力市场的纵向复杂动态揭示日益加剧的极化现象》(Longitudinal Complex Dynamics of Labour Markets Reveal Increasing Polarisation)。
创建时间:
2023-11-08



