Data and Code for: From Unstructured Injury Narratives to Structured Risk Pathways
收藏DataCite Commons2026-05-01 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/t2sks7fw5j
下载链接
链接失效反馈官方服务:
资源简介:
This dataset provides supporting data and code for the manuscript “From Unstructured Injury Narratives to Structured Risk Pathways: Revealing Recurrent Work–Hazard–Accident Mechanisms.”
The study uses publicly available OSHA Severe Injury Reports from the U.S. Department of Labor as the original data source. The original OSHA data are not redistributed in full in this repository. Instead, this dataset provides materials that support the understanding and partial reproduction of the data-processing, annotation, structuring, and evaluation workflow reported in the manuscript.
The repository includes Jupyter notebooks for converting Doccano annotations into NER training data, training the NER model, profiling the graph-based accident representation, comparing text-based and graph-based scenario similarity, and generating representative subgraph visualizations. It also includes the label schema, example normalization rules, graph schema description, sample processed data, and summary evaluation results.
The full derived dataset generated in this study, including complete NER outputs, normalized graph-ready tables, and scenario-similarity results, is not publicly deposited because it contains author-generated annotations, intermediate modeling outputs, and research-specific normalization rules that are part of an ongoing research program. The full processed dataset may be made available from the corresponding author upon reasonable request, subject to institutional approval and research-use conditions.
Original data source: OSHA Severe Injury Reports, U.S. Department of Labor.
提供机构:
Mendeley Data
创建时间:
2026-05-01



