five

LLM-Annotated Emotional Tone Labels and Structured Variables for Cancer Peer-Support Posts

收藏
DataCite Commons2026-03-23 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/zftkdw3z7g/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset extends the Mental Health Insights: Vulnerable Cancer Survivors & Caregivers dataset (Orchi et al., 2023; DOI: 10.17632/69dcnv2gzd.1) with LLM-derived annotations produced using gpt-4o-mini. The original corpus contains 10,392 de-identified posts from online cancer peer-support communities (Reddit, DailyStrength, HealthBoard) covering five cancer types (brain, colon, liver, leukemia, lung), with author-provided emotional tone labels on a four-point scale (-2 to 1). For each post, the LLM annotator produced: (1) a discrete emotional tone label (0 = very negative, 1 = negative, 2 = neutral, 3 = positive), (2) reporter role (PATIENT, CAREGIVER, or UNCLEAR), and (3) cancer type with confidence score. A collapsed three-class label (negative, neutral, positive) and an augmented text field prepending ROLE and CANCER tokens to the post text are also provided. Train/validation/test split assignments (60/20/20, stratified by three-class AI label, random seed 42) are included for reproducibility. This dataset accompanies the manuscript: "LLM-Based Annotation and Token-Augmented Modeling for Emotional Tone Classification in Online Cancer Peer-Support Posts" (Xu, Wang, Wang, Ding, Zou, & Cao, submitted to PLOS Digital Health, 2026). Columns: row_id: integer index (0-based, matches original dataset row order) posts: original post text intensity: original human label (-2, -1, 0, 1) human_label_4class: human label as string (very_negative, negative, neutral, positive) human_label_3class: collapsed human label (negative, neutral, positive) ai_speaker_role: LLM-extracted reporter role (PATIENT, CAREGIVER, UNCLEAR) ai_cancer_type: LLM-extracted cancer type (BRAIN, COLON, LIVER, LEUKEMIA, LUNG, OTHER, UNKNOWN) ai_cancer_type_confidence: LLM confidence for cancer type (0 to 1) ai_label_4class: LLM-produced emotional tone label as string (very_negative, negative, neutral, positive) ai_label_3class: collapsed LLM label (negative, neutral, positive) split: data partition (train, val, test) posts_aug: augmented text (ROLE_X CANCER_Y prepended to post text)
提供机构:
Mendeley Data
创建时间:
2026-03-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作