LLM-Annotated Emotional Tone Labels and Structured Variables for Cancer Peer-Support Posts

Name: LLM-Annotated Emotional Tone Labels and Structured Variables for Cancer Peer-Support Posts
Creator: Mendeley Data
Published: 2026-03-23 15:44:35
License: 暂无描述

DataCite Commons2026-03-23 更新2026-05-04 收录

下载链接：

https://data.mendeley.com/datasets/zftkdw3z7g/1

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset extends the Mental Health Insights: Vulnerable Cancer Survivors & Caregivers dataset (Orchi et al., 2023; DOI: 10.17632/69dcnv2gzd.1) with LLM-derived annotations produced using gpt-4o-mini. The original corpus contains 10,392 de-identified posts from online cancer peer-support communities (Reddit, DailyStrength, HealthBoard) covering five cancer types (brain, colon, liver, leukemia, lung), with author-provided emotional tone labels on a four-point scale (-2 to 1). For each post, the LLM annotator produced: (1) a discrete emotional tone label (0 = very negative, 1 = negative, 2 = neutral, 3 = positive), (2) reporter role (PATIENT, CAREGIVER, or UNCLEAR), and (3) cancer type with confidence score. A collapsed three-class label (negative, neutral, positive) and an augmented text field prepending ROLE and CANCER tokens to the post text are also provided. Train/validation/test split assignments (60/20/20, stratified by three-class AI label, random seed 42) are included for reproducibility. This dataset accompanies the manuscript: "LLM-Based Annotation and Token-Augmented Modeling for Emotional Tone Classification in Online Cancer Peer-Support Posts" (Xu, Wang, Wang, Ding, Zou, & Cao, submitted to PLOS Digital Health, 2026). Columns: row_id: integer index (0-based, matches original dataset row order) posts: original post text intensity: original human label (-2, -1, 0, 1) human_label_4class: human label as string (very_negative, negative, neutral, positive) human_label_3class: collapsed human label (negative, neutral, positive) ai_speaker_role: LLM-extracted reporter role (PATIENT, CAREGIVER, UNCLEAR) ai_cancer_type: LLM-extracted cancer type (BRAIN, COLON, LIVER, LEUKEMIA, LUNG, OTHER, UNKNOWN) ai_cancer_type_confidence: LLM confidence for cancer type (0 to 1) ai_label_4class: LLM-produced emotional tone label as string (very_negative, negative, neutral, positive) ai_label_3class: collapsed LLM label (negative, neutral, positive) split: data partition (train, val, test) posts_aug: augmented text (ROLE_X CANCER_Y prepended to post text)

提供机构：

Mendeley Data

创建时间：

2026-03-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集