five

3-digit occupation code images from the Norwegian census of 1950 - Manual review dataset

收藏
DataONE2023-09-28 更新2024-10-12 收录
下载链接:
https://search.dataone.org/view/sha256:05a3d4f54e7a963e92a1fd86f0e88f3539845b9b7fb805ea66dc89d6c402e29e
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is made up of images containing handwritten 3-digit occupation codes from the Norwegian population census of 1950. The occupation codes were added to the census sheets by Statistics Norway after the census was concluded for the purpose of creating aggregated occupational statistics for the entire population. The coding standard used in the 1950 census is, according to Statistics Norway’s official publications (https://www.ssb.no/historisk-statistikk/folketellinger/folketellingen-1950, booklet 4, page 81), very similar to the standards used in the census for 1920. Cf. the 13th booklet published for the 1920 census (https://www.ssb.no/historisk-statistikk/folketellinger/folketellingen-1920, note that this booklet is only available in Norwegian). In short, an occupation code is a 3-digit number that corresponds to a given occupation or type of occupation. According to the official list of occupation codes provided by Statistics Norway there are 339 unique codes. These are not all necessarily sequential or hierarchical in general, but some subgroupings are. This list can be found under Files. It is also worth noting that these images were extracted from the original census sheet images algorithmically. This process was not flawless and lead to additional images being extracted, these can contain written occupation titles or be left entirely blank. The dataset consists of 90,000 unique images, and 9,000 images that were randomly selected and copied from the unique images. These were all used for a research project (link to preprint article: https://doi.org/10.48550/arXiv.2306.16126) where we (author list can be found in preprint) tried to find a more efficient way of reviewing and correcting classification results from a Machine Learning model, where the results did not pass a pre-set confidence threshold. This was a follow-up to our previous article where we describe the initial project and creating of our model in more detail, if it is of interest (“Lessons Learned Developing and Using a Machine Learning Model to Automatically Transcribe 2.3 Million Handwritten Occupation Codes”, https://doi.org/10.51964/hlcs11331).
创建时间:
2024-09-25
二维码
社区交流群
二维码
科研交流群
商业服务