jp-prakash05/dreaddit

Name: jp-prakash05/dreaddit
Creator: jp-prakash05
Published: 2026-04-15 18:31:21
License: 暂无描述

Hugging Face2026-04-15 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/jp-prakash05/dreaddit

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: subreddit dtype: string - name: post_id dtype: string - name: sentence_range dtype: string - name: text dtype: string - name: id dtype: int64 - name: label dtype: int64 - name: confidence dtype: float64 - name: social_timestamp dtype: int64 - name: social_karma dtype: int64 - name: syntax_ari dtype: float64 - name: lex_liwc_WC dtype: int64 - name: lex_liwc_Analytic dtype: float64 - name: lex_liwc_Clout dtype: float64 - name: lex_liwc_Authentic dtype: float64 - name: lex_liwc_Tone dtype: float64 - name: lex_liwc_WPS dtype: float64 - name: lex_liwc_Sixltr dtype: float64 - name: lex_liwc_Dic dtype: float64 - name: lex_liwc_function dtype: float64 - name: lex_liwc_pronoun dtype: float64 - name: lex_liwc_ppron dtype: float64 - name: lex_liwc_i dtype: float64 - name: lex_liwc_we dtype: float64 - name: lex_liwc_you dtype: float64 - name: lex_liwc_shehe dtype: float64 - name: lex_liwc_they dtype: float64 - name: lex_liwc_ipron dtype: float64 - name: lex_liwc_article dtype: float64 - name: lex_liwc_prep dtype: float64 - name: lex_liwc_auxverb dtype: float64 - name: lex_liwc_adverb dtype: float64 - name: lex_liwc_conj dtype: float64 - name: lex_liwc_negate dtype: float64 - name: lex_liwc_verb dtype: float64 - name: lex_liwc_adj dtype: float64 - name: lex_liwc_compare dtype: float64 - name: lex_liwc_interrog dtype: float64 - name: lex_liwc_number dtype: float64 - name: lex_liwc_quant dtype: float64 - name: lex_liwc_affect dtype: float64 - name: lex_liwc_posemo dtype: float64 - name: lex_liwc_negemo dtype: float64 - name: lex_liwc_anx dtype: float64 - name: lex_liwc_anger dtype: float64 - name: lex_liwc_sad dtype: float64 - name: lex_liwc_social dtype: float64 - name: lex_liwc_family dtype: float64 - name: lex_liwc_friend dtype: float64 - name: lex_liwc_female dtype: float64 - name: lex_liwc_male dtype: float64 - name: lex_liwc_cogproc dtype: float64 - name: lex_liwc_insight dtype: float64 - name: lex_liwc_cause dtype: float64 - name: lex_liwc_discrep dtype: float64 - name: lex_liwc_tentat dtype: float64 - name: lex_liwc_certain dtype: float64 - name: lex_liwc_differ dtype: float64 - name: lex_liwc_percept dtype: float64 - name: lex_liwc_see dtype: float64 - name: lex_liwc_hear dtype: float64 - name: lex_liwc_feel dtype: float64 - name: lex_liwc_bio dtype: float64 - name: lex_liwc_body dtype: float64 - name: lex_liwc_health dtype: float64 - name: lex_liwc_sexual dtype: float64 - name: lex_liwc_ingest dtype: float64 - name: lex_liwc_drives dtype: float64 - name: lex_liwc_affiliation dtype: float64 - name: lex_liwc_achieve dtype: float64 - name: lex_liwc_power dtype: float64 - name: lex_liwc_reward dtype: float64 - name: lex_liwc_risk dtype: float64 - name: lex_liwc_focuspast dtype: float64 - name: lex_liwc_focuspresent dtype: float64 - name: lex_liwc_focusfuture dtype: float64 - name: lex_liwc_relativ dtype: float64 - name: lex_liwc_motion dtype: float64 - name: lex_liwc_space dtype: float64 - name: lex_liwc_time dtype: float64 - name: lex_liwc_work dtype: float64 - name: lex_liwc_leisure dtype: float64 - name: lex_liwc_home dtype: float64 - name: lex_liwc_money dtype: float64 - name: lex_liwc_relig dtype: float64 - name: lex_liwc_death dtype: float64 - name: lex_liwc_informal dtype: float64 - name: lex_liwc_swear dtype: float64 - name: lex_liwc_netspeak dtype: float64 - name: lex_liwc_assent dtype: float64 - name: lex_liwc_nonflu dtype: float64 - name: lex_liwc_filler dtype: float64 - name: lex_liwc_AllPunc dtype: float64 - name: lex_liwc_Period dtype: float64 - name: lex_liwc_Comma dtype: float64 - name: lex_liwc_Colon dtype: float64 - name: lex_liwc_SemiC dtype: float64 - name: lex_liwc_QMark dtype: float64 - name: lex_liwc_Exclam dtype: float64 - name: lex_liwc_Dash dtype: float64 - name: lex_liwc_Quote dtype: float64 - name: lex_liwc_Apostro dtype: float64 - name: lex_liwc_Parenth dtype: float64 - name: lex_liwc_OtherP dtype: float64 - name: lex_dal_max_pleasantness dtype: float64 - name: lex_dal_max_activation dtype: float64 - name: lex_dal_max_imagery dtype: float64 - name: lex_dal_min_pleasantness dtype: float64 - name: lex_dal_min_activation dtype: float64 - name: lex_dal_min_imagery dtype: float64 - name: lex_dal_avg_activation dtype: float64 - name: lex_dal_avg_imagery dtype: float64 - name: lex_dal_avg_pleasantness dtype: float64 - name: social_upvote_ratio dtype: float64 - name: social_num_comments dtype: int64 - name: syntax_fk_grade dtype: float64 - name: sentiment dtype: float64 splits: - name: train num_bytes: 3929762 num_examples: 2838 - name: test num_bytes: 988933 num_examples: 715 download_size: 2297873 dataset_size: 4918695 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* tags: - stress - social-media - reddit pretty_name: 'Dreaddit: A Reddit Dataset for Stress Analysis in Social Media' size_categories: - 1K<n<10K language: - en --- # Dreaddit: A Reddit Dataset for Stress Analysis in Social Media Consists of 3.5k labeled texts from five different categories of Reddit communities. ## Citation ``` @inproceedings{turcan-mckeown-2019-dreaddit, title = "{D}readdit: A {R}eddit Dataset for Stress Analysis in Social Media", author = "Turcan, Elsbeth and McKeown, Kathy", editor = "Holderness, Eben and Jimeno Yepes, Antonio and Lavelli, Alberto and Minard, Anne-Lyse and Pustejovsky, James and Rinaldi, Fabio", booktitle = "Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019)", month = nov, year = "2019", address = "Hong Kong", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/D19-6213/", doi = "10.18653/v1/D19-6213", pages = "97--107", abstract = "Stress is a nigh-universal human experience, particularly in the online world. While stress can be a motivator, too much stress is associated with many negative health outcomes, making its identification useful across a range of domains. However, existing computational research typically only studies stress in domains such as speech, or in short genres such as Twitter. We present Dreaddit, a new text corpus of lengthy multi-domain social media data for the identification of stress. Our dataset consists of 190K posts from five different categories of Reddit communities; we additionally label 3.5K total segments taken from 3K posts using Amazon Mechanical Turk. We present preliminary supervised learning methods for identifying stress, both neural and traditional, and analyze the complexity and diversity of the data and characteristics of each category." } ```

数据集信息：特征字段： - 名称：subreddit（红迪子论坛），数据类型：字符串 - 名称：post_id（帖子ID），数据类型：字符串 - 名称：sentence_range（句子范围），数据类型：字符串 - 名称：text（文本内容），数据类型：字符串 - 名称：id（样本ID），数据类型：64位整数 - 名称：label（标签），数据类型：64位整数 - 名称：confidence（置信度），数据类型：64位浮点数 - 名称：social_timestamp（社交时间戳），数据类型：64位整数 - 名称：social_karma（社交声望值），数据类型：64位整数 - 名称：syntax_ari（句法自动化可读性指数），数据类型：64位浮点数 - 名称：lex_liwc_WC（Linguistic Inquiry and Word Count，语言查询与词频统计，简称LIWC，词数统计），数据类型：64位整数 - 名称：lex_liwc_Analytic（LIWC分析性得分），数据类型：64位浮点数 - 名称：lex_liwc_Clout（LIWC影响力得分），数据类型：64位浮点数 - 名称：lex_liwc_Authentic（LIWC真实性得分），数据类型：64位浮点数 - 名称：lex_liwc_Tone（LIWC语气得分），数据类型：64位浮点数 - 名称：lex_liwc_WPS（LIWC平均句长），数据类型：64位浮点数 - 名称：lex_liwc_Sixltr（LIWC六字母以上词占比），数据类型：64位浮点数 - 名称：lex_liwc_Dic（LIWC词典词占比），数据类型：64位浮点数 - 名称：lex_liwc_function（LIWC功能词占比），数据类型：64位浮点数 - 名称：lex_liwc_pronoun（LIWC代词占比），数据类型：64位浮点数 - 名称：lex_liwc_ppron（LIWC人称代词占比），数据类型：64位浮点数 - 名称：lex_liwc_i（LIWC第一人称单数代词占比），数据类型：64位浮点数 - 名称：lex_liwc_we（LIWC第一人称复数代词占比），数据类型：64位浮点数 - 名称：lex_liwc_you（LIWC第二人称代词占比），数据类型：64位浮点数 - 名称：lex_liwc_shehe（LIWC第三人称单数代词占比），数据类型：64位浮点数 - 名称：lex_liwc_they（LIWC第三人称复数代词占比），数据类型：64位浮点数 - 名称：lex_liwc_ipron（LIWC不定代词占比），数据类型：64位浮点数 - 名称：lex_liwc_article（LIWC冠词占比），数据类型：64位浮点数 - 名称：lex_liwc_prep（LIWC介词占比），数据类型：64位浮点数 - 名称：lex_liwc_auxverb（LIWC助动词占比），数据类型：64位浮点数 - 名称：lex_liwc_adverb（LIWC副词占比），数据类型：64位浮点数 - 名称：lex_liwc_conj（LIWC连词占比），数据类型：64位浮点数 - 名称：lex_liwc_negate（LIWC否定词占比），数据类型：64位浮点数 - 名称：lex_liwc_verb（LIWC动词占比），数据类型：64位浮点数 - 名称：lex_liwc_adj（LIWC形容词占比），数据类型：64位浮点数 - 名称：lex_liwc_compare（LIWC比较级词占比），数据类型：64位浮点数 - 名称：lex_liwc_interrog（LIWC疑问词占比），数据类型：64位浮点数 - 名称：lex_liwc_number（LIWC数词占比），数据类型：64位浮点数 - 名称：lex_liwc_quant（LIWC限定词占比），数据类型：64位浮点数 - 名称：lex_liwc_affect（LIWC情感类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_posemo（LIWC积极情感词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_negemo（LIWC消极情感词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_anx（LIWC焦虑相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_anger（LIWC愤怒相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_sad（LIWC悲伤相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_social（LIWC社交类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_family（LIWC家庭相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_friend（LIWC朋友相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_female（LIWC女性相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_male（LIWC男性相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_cogproc（LIWC认知加工类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_insight（LIWC洞察类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_cause（LIWC因果类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_discrep（LIWC矛盾表述类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_tentat（LIWC试探性表述类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_certain（LIWC确定性表述类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_differ（LIWC差异表述类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_percept（LIWC感知类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_see（LIWC视觉感知类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_hear（LIWC听觉感知类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_feel（LIWC情感感知类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_bio（LIWC生理类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_body（LIWC身体相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_health（LIWC健康相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_sexual（LIWC性相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_ingest（LIWC摄入类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_drives（LIWC驱力类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_affiliation（LIWC社交归属类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_achieve（LIWC成就类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_power（LIWC权力/影响力类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_reward（LIWC奖励类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_risk（LIWC风险类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_focuspast（LIWC过去时态词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_focuspresent（LIWC现在时态词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_focusfuture（LIWC将来时态词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_relativ（LIWC关系代词/副词占比），数据类型：64位浮点数 - 名称：lex_liwc_motion（LIWC运动类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_space（LIWC空间类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_time（LIWC时间类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_work（LIWC工作相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_leisure（LIWC休闲类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_home（LIWC家庭相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_money（LIWC金钱相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_relig（LIWC宗教相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_death（LIWC死亡相关词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_informal（LIWC非正式用语占比），数据类型：64位浮点数 - 名称：lex_liwc_swear（LIWC咒骂类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_netspeak（LIWC网络用语占比），数据类型：64位浮点数 - 名称：lex_liwc_assent（LIWC赞同类词汇占比），数据类型：64位浮点数 - 名称：lex_liwc_nonflu（LIWC不流畅表达占比），数据类型：64位浮点数 - 名称：lex_liwc_filler（LIWC填充词占比），数据类型：64位浮点数 - 名称：lex_liwc_AllPunc（LIWC所有标点符号占比），数据类型：64位浮点数 - 名称：lex_liwc_Period（LIWC句号占比），数据类型：64位浮点数 - 名称：lex_liwc_Comma（LIWC逗号占比），数据类型：64位浮点数 - 名称：lex_liwc_Colon（LIWC冒号占比），数据类型：64位浮点数 - 名称：lex_liwc_SemiC（LIWC分号占比），数据类型：64位浮点数 - 名称：lex_liwc_QMark（LIWC问号占比），数据类型：64位浮点数 - 名称：lex_liwc_Exclam（LIWC感叹号占比），数据类型：64位浮点数 - 名称：lex_liwc_Dash（LIWC破折号占比），数据类型：64位浮点数 - 名称：lex_liwc_Quote（LIWC引号占比），数据类型：64位浮点数 - 名称：lex_liwc_Apostro（LIWC撇号占比），数据类型：64位浮点数 - 名称：lex_liwc_Parenth（LIWC括号占比），数据类型：64位浮点数 - 名称：lex_liwc_OtherP（LIWC其他标点符号占比），数据类型：64位浮点数 - 名称：lex_dal_max_pleasantness（DAL最大愉悦度得分），数据类型：64位浮点数 - 名称：lex_dal_max_activation（DAL最大唤醒度得分），数据类型：64位浮点数 - 名称：lex_dal_max_imagery（DAL最大意象性得分），数据类型：64位浮点数 - 名称：lex_dal_min_pleasantness（DAL最小愉悦度得分），数据类型：64位浮点数 - 名称：lex_dal_min_activation（DAL最小唤醒度得分），数据类型：64位浮点数 - 名称：lex_dal_min_imagery（DAL最小意象性得分），数据类型：64位浮点数 - 名称：lex_dal_avg_activation（DAL平均唤醒度得分），数据类型：64位浮点数 - 名称：lex_dal_avg_imagery（DAL平均意象性得分），数据类型：64位浮点数 - 名称：lex_dal_avg_pleasantness（DAL平均愉悦度得分），数据类型：64位浮点数 - 名称：social_upvote_ratio（社交点赞率），数据类型：64位浮点数 - 名称：social_num_comments（社交评论数），数据类型：64位整数 - 名称：syntax_fk_grade（句法弗莱希-金凯德可读性指数），数据类型：64位浮点数 - 名称：sentiment（情感得分），数据类型：64位浮点数数据集拆分： - 拆分名称：训练集（train），字节数：3929762，样本数：2838 - 拆分名称：测试集（test），字节数：988933，样本数：715 下载大小：2297873字节，数据集总大小：4918695字节配置项： - 配置名称：default（默认配置），数据文件路径： - 训练集：data/train-* - 测试集：data/test-* 数据集标注标签： - stress（压力识别） - social-media（社交媒体） - Reddit（红迪论坛）数据集友好名称：Dreaddit：面向社交媒体压力分析的红迪数据集样本规模类别：1K<n<10K 语言：英语 # Dreaddit：面向社交媒体压力分析的红迪数据集该数据集包含来自5个不同类别的Reddit（红迪论坛）社区的3500条标注文本。 ## 引用文献 @inproceedings{turcan-mckeown-2019-dreaddit, title = "Dreaddit：面向社交媒体压力分析的红迪数据集", author = "Turcan, Elsbeth 与 McKeown, Kathy", editor = "Holderness, Eben 与 Jimeno Yepes, Antonio 与 Lavelli, Alberto 与 Minard, Anne-Lyse 与 Pustejovsky, James 与 Rinaldi, Fabio", booktitle = "第十届健康文本挖掘与信息分析国际研讨会（LOUHI 2019）论文集", month = "11月", year = "2019", address = "中国香港", publisher = "计算语言学协会", url = "https://aclanthology.org/D19-6213/", doi = "10.18653/v1/D19-6213", pages = "97--107", abstract = "压力是几乎普遍存在的人类体验，在网络空间中尤为显著。尽管适度压力可起到激励作用，但过量压力与诸多负面健康结局密切相关，因此在多个领域中识别压力均具有重要实用价值。然而现有计算相关研究通常仅关注语音领域或Twitter等短文本体裁中的压力识别任务。本文提出Dreaddit，一个涵盖多领域长文本社交媒体数据的新型语料库，用于压力识别任务。我们的数据集包含来自5个不同类别的Reddit（红迪论坛）社区的19万条帖子，并通过Amazon Mechanical Turk（亚马逊众包平台）对从3000条帖子中提取的共计3500个文本片段进行了人工标注。我们针对压力识别任务提出了包括神经网络与传统机器学习方法在内的初步监督学习方案，并分析了数据集的复杂性与多样性，以及不同类别的特征分布。" }

提供机构：

jp-prakash05

5,000+

优质数据集

54 个

任务类型

进入经典数据集