lung_data.xlsx
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/coloncancer_top_1000_updated_04052024/30509012
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains a curated collection of publicly available Reddit posts and comment threads related to lung cancer, collected from multiple cancer-focused subreddits. It includes discussions authored by patients, caregivers, medical professionals, and community members across platforms such as r/lungcancer, r/nsclc, r/cancer, r/cancercaregivers, r/cancerfamilysupport, and related communities. The dataset spans posts from 2019 to 2024.
Each entry captures both the original post and its associated top-level comments, along with structured metadata describing user role, cancer stage, treatment type, treatment intent, symptom discussions, emotional support themes, and other communication characteristics. Variables include subreddit, post date, engagement metrics, full post text, up to three comments, and an aggregated text field combining post and comments.
A set of manually coded variables provides detailed annotation of clinical and psychosocial attributes. These include:
Author type (patient, caregiver, medical professional, other)Cancer stage (I–IV, recurrence, NED, or unknown)Treatment perceptions (curative vs. palliative intent)Treatment modality (chemotherapy, radiation, surgery, immunotherapy, targeted therapy, or unknown)Treatment completion statusQuestion-asking behavior (e.g., side effects, recurrence, clinical trial eligibility, emotional support)Experience sharingPresence of external linksImage inclusionSymptom validation seekingPost-treatment concernsThematic categorization (clinical, non-clinical, or other)
This dataset enables research on online health communication, patient information needs, treatment decision-making, caregiver burden, emotional coping, and digital support networks in the context of lung cancer. It is suitable for qualitative, quantitative, computational linguistic, and machine-learning analyses. All content is sourced from publicly accessible Reddit pages and has been fully anonymized to remove usernames and personally identifiable information.
创建时间:
2025-11-02



