Annotated dataset of clinical notes for predicting social determinants of mental health in opioid use disorder using a Human-in-the-Loop Large Language Model Interaction for Annotation (HLLIA) framework
收藏DataCite Commons2026-03-06 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.d51c5b0h7
下载链接
链接失效反馈官方服务:
资源简介:
This dataset comprises 2,636 deidentified discharge summaries from the
MIMIC-IV-Note database, annotated for 13 Social Determinants of Mental
Health (SDOMH) relevant to Opioid Use Disorder (OUD). The dataset was
created to support natural language processing (NLP) and machine learning
research aimed at identifying social factors influencing OUD outcomes.
Using a Human-in-the-Loop Large Language Model Interaction for Annotation
(HLLIA) framework, initial SDOMH labels were generated by GPT-3.5/4 and
subsequently refined through expert review, partial-correlation–based
validation, and iterative consensus refinement to ensure label consistency
and reliability. Each record includes: (1) a subject ID, (2) binary
indicators for OUD presence (Hierarchy 1), SDOMH presence (Hierarchy 2),
and (3) thirteen binary columns representing specific determinants such as
Social Detachment, Financial Uncertainty, Housing Instability, Substance
Misuse, Violence, and Suicide Mortality (Hierarchy 3). The dataset enables
hierarchical, multi-label classification of SDOMHs and serves as training
data for transformer-based models such as the Multilevel Hierarchical
Clinical-Longformer Embeddings (MHCLE) algorithm. Potential reuse includes
applications in social and behavioral health informatics, causal
inference, clinical decision support, and bias-aware LLM annotation
studies.
提供机构:
Dryad
创建时间:
2026-03-06



