theaadityapaul/Mental-Health_Text-Classification_Dataset
收藏Hugging Face2026-04-26 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/theaadityapaul/Mental-Health_Text-Classification_Dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含用户生成的短文本,用于4类心理健康分类:自杀倾向、抑郁、焦虑和正常。这是一个衍生数据集,通过合并和清理三个公开的心理健康语料库(包括Kaggle上的“自杀与抑郁检测”和“心理健康情感分析”数据集,以及Murarka等人的“Reddit心理健康分类”数据集),重新标记为统一的4类方案,并导出CSV文件。数据集包括不平衡的主要训练语料库(反映真实的类别分布)、严格平衡的测试分割(每类248个样本,共992个样本)以及一个特征工程文件(包含文本长度、词数、URL数量、表情符号数量等基本统计特征)。该数据集仅用于研究和教育目的,不能作为临床工具或用于真实世界的诊断、分诊或危机干预。
This dataset contains short, user-generated texts labeled for 4-class mental health classification: Suicidal, Depression, Anxiety, and Normal. It is a derived dataset created by combining and cleaning three public mental-health corpora (including Suicide and Depression Detection and Sentiment Analysis for Mental Health from Kaggle, and Reddit Mental Health Classification by Murarka et al.), then re-labeling them into a unified 4-class scheme and exporting CSV files suitable for both classical ML and modern NLP models. The dataset includes an unbalanced main training corpus (reflecting realistic class distribution), a strictly balanced test split (248 samples per class, 992 total), and a feature-engineered file (with basic text statistics such as text length, word count, number of URLs, emojis, etc.). It is intended for research and education only, not for clinical use, diagnosis, triage, or crisis intervention.
提供机构:
theaadityapaul



