mrjunos/depression-reddit-cleaned

Name: mrjunos/depression-reddit-cleaned
Creator: mrjunos
Published: 2023-06-17 02:03:22
License: 暂无描述

Hugging Face2023-06-17 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/mrjunos/depression-reddit-cleaned

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - text-classification language: - en tags: - reddit - 'Sentiment ' - depression pretty_name: Depression Reddit Cleaned size_categories: - 1K<n<10K --- # Depression: Reddit Dataset (Cleaned) **~7000 Cleaned Reddit Labelled Dataset on Depression** ### Summary - The dataset provided is a Depression: Reddit Dataset (Cleaned) containing approximately 7,000 labeled instances. It consists of two main features: 'text' and 'label'. The 'text' feature contains the text data from Reddit posts related to depression, while the 'label' feature indicates whether a post is classified as depression or not. - The raw data for this dataset was collected by web scraping Subreddits. To ensure the data's quality and usefulness, multiple natural language processing (NLP) techniques were applied to clean the data. The dataset exclusively consists of English-language posts, and its primary purpose is to facilitate mental health classification tasks. - This dataset can be employed in various natural language processing tasks related to depression, such as sentiment analysis, topic modeling, text classification, or any other NLP task that requires labeled data pertaining to depression from Reddit. - Extracted from Kaggle: https://www.kaggle.com/datasets/infamouscoder/depression-reddit-cleaned

license: 知识共享署名4.0（CC BY 4.0） task_categories: - 文本分类 language: - 英语 tags: - Reddit - 情感 - 抑郁症 pretty_name: 抑郁症Reddit清洗数据集（Depression Reddit Cleaned） size_categories: - 1000 < 样本量 < 10000 # 抑郁症Reddit清洗数据集（Depression Reddit Cleaned） **约7000条经清洗的带标注抑郁症相关Reddit数据集** ### 摘要 - 本数据集为「抑郁症Reddit清洗数据集」，包含约7000条带标注样本，核心特征为`text`（文本）与`label`（标签）。其中`text`字段存储与抑郁症相关的Reddit帖子文本，`label`字段用于标注该帖子是否涉及抑郁症。 - 本数据集的原始数据通过网页抓取（web scraping）Reddit子版块（Subreddits）获取。为保障数据质量与可用性，研究人员应用了多种自然语言处理（Natural Language Processing, NLP）技术完成数据清洗流程。本数据集仅包含英文帖子，其核心用途为支撑心理健康分类相关任务。 - 本数据集可应用于各类与抑郁症相关的自然语言处理任务，例如情感分析、主题建模、文本分类，以及其他需要源自Reddit的抑郁症相关带标注文本数据的NLP任务。 - 本数据集源自Kaggle平台：https://www.kaggle.com/datasets/infamouscoder/depression-reddit-cleaned

提供机构：

mrjunos

原始信息汇总