Annotated dataset of clinical notes for predicting social determinants of mental health in opioid use disorder using a Human-in-the-Loop Large Language Model Interaction for Annotation (HLLIA) framework
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.d51c5b0h7
下载链接
链接失效反馈官方服务:
资源简介:
This dataset comprises 2,636 deidentified discharge summaries from the MIMIC-IV-Note database, annotated for 13 Social Determinants of Mental Health (SDOMH) relevant to Opioid Use Disorder (OUD). The dataset was created to support natural language processing (NLP) and machine learning research aimed at identifying social factors influencing OUD outcomes. Using a Human-in-the-Loop Large Language Model Interaction for Annotation (HLLIA) framework, initial SDOMH labels were generated by GPT-3.5/4 and subsequently refined through expert review, partial-correlation–based validation, and iterative consensus refinement to ensure label consistency and reliability. Each record includes: (1) a subject ID, (2) binary indicators for OUD presence (Hierarchy 1), SDOMH presence (Hierarchy 2), and (3) thirteen binary columns representing specific determinants such as Social Detachment, Financial Uncertainty, Housing Instability, Substance Misuse, Violence, and Suicide Mortality (Hierarchy 3).
The dataset enables hierarchical, multi-label classification of SDOMHs and serves as training data for transformer-based models such as the Multilevel Hierarchical Clinical-Longformer Embeddings (MHCLE) algorithm. Potential reuse includes applications in social and behavioral health informatics, causal inference, clinical decision support, and bias-aware LLM annotation studies.
创建时间:
2026-03-06



