LLM-Annotated Emotional Tone Labels and Structured Variables for Cancer Peer-Support Posts
收藏DataCite Commons2026-03-23 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/zftkdw3z7g/1
下载链接
链接失效反馈官方服务:
资源简介:
This dataset extends the Mental Health Insights: Vulnerable Cancer Survivors & Caregivers dataset (Orchi et al., 2023; DOI: 10.17632/69dcnv2gzd.1) with LLM-derived annotations produced using gpt-4o-mini. The original corpus contains 10,392 de-identified posts from online cancer peer-support communities (Reddit, DailyStrength, HealthBoard) covering five cancer types (brain, colon, liver, leukemia, lung), with author-provided emotional tone labels on a four-point scale (-2 to 1).
For each post, the LLM annotator produced: (1) a discrete emotional tone label (0 = very negative, 1 = negative, 2 = neutral, 3 = positive), (2) reporter role (PATIENT, CAREGIVER, or UNCLEAR), and (3) cancer type with confidence score. A collapsed three-class label (negative, neutral, positive) and an augmented text field prepending ROLE and CANCER tokens to the post text are also provided. Train/validation/test split assignments (60/20/20, stratified by three-class AI label, random seed 42) are included for reproducibility.
This dataset accompanies the manuscript: "LLM-Based Annotation and Token-Augmented Modeling for Emotional Tone Classification in Online Cancer Peer-Support Posts" (Xu, Wang, Wang, Ding, Zou, & Cao, submitted to PLOS Digital Health, 2026).
Columns:
row_id: integer index (0-based, matches original dataset row order)
posts: original post text
intensity: original human label (-2, -1, 0, 1)
human_label_4class: human label as string (very_negative, negative, neutral, positive)
human_label_3class: collapsed human label (negative, neutral, positive)
ai_speaker_role: LLM-extracted reporter role (PATIENT, CAREGIVER, UNCLEAR)
ai_cancer_type: LLM-extracted cancer type (BRAIN, COLON, LIVER, LEUKEMIA, LUNG, OTHER, UNKNOWN)
ai_cancer_type_confidence: LLM confidence for cancer type (0 to 1)
ai_label_4class: LLM-produced emotional tone label as string (very_negative, negative, neutral, positive)
ai_label_3class: collapsed LLM label (negative, neutral, positive)
split: data partition (train, val, test)
posts_aug: augmented text (ROLE_X CANCER_Y prepended to post text)
提供机构:
Mendeley Data
创建时间:
2026-03-23



