BurnoutText - Frequent Words in Texts about Burnout, Depression and a Control Group

Name: BurnoutText - Frequent Words in Texts about Burnout, Depression and a Control Group
Creator: OLOS, OLOS
Published: 2026-05-05 03:30:06
License: 暂无描述

DataCite Commons2026-05-05 更新2024-07-13 收录

下载链接：

https://olos.swiss/portal//archives/4fe0f340-757b-450f-8373-7f2a57a3b7ad

下载链接

链接失效反馈

官方服务：

资源简介：

[Abstract]=This dataset was generated in the context of a research project funded by the Swiss National Science Foundation (grant nr. 196483, see https://data.snf.ch/grants/grant/196483). In this project, new methods from natural language processing are applied to develop new methods for burnout detection in clinical psychology/psychiatry. For details refer to: https://www.bfh.ch/en/research/research-projects/2021-288-996-826/ The source data for this derived dataset was collected from Reddit and consists of a "Burnout" dataset with 352 samples, a "No burnout" dataset with 13,216 samples and a "Depression" dataset with 979 samples. More details about the original dataset can be found in the following publication: https://doi.org/10.3389/fdata.2022.863100 All contractions were expanded (ex. "I'm" to "I am") using the contractions python library. We used the spacy en-core-web-sm pre-trained English language pipeline to tokenize each text sample, remove stopwords and punctuation, and lemmatize the remaining tokens. For example, the text "I feel like I have been working too much. Everything is exhausting." would be converted to "feel like work exhausting". The dataset presented here was then compiled by counting the top 20 lemmatized tokens in each of the classes (Burnout, No burnout and Depression). The words are ordered from more frequent to less frequent.

提供机构：

OLOS, OLOS

创建时间：

2022-07-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集