five

BurnoutText - Frequent Words in Texts about Burnout, Depression and a Control Group

收藏
DataCite Commons2026-05-05 更新2024-07-13 收录
下载链接:
https://olos.swiss/portal//archives/4fe0f340-757b-450f-8373-7f2a57a3b7ad
下载链接
链接失效反馈
官方服务:
资源简介:
[Abstract]=This dataset was generated in the context of a research project funded by the Swiss National Science Foundation (grant nr. 196483, see https://data.snf.ch/grants/grant/196483). In this project, new methods from natural language processing are applied to develop new methods for burnout detection in clinical psychology/psychiatry. For details refer to: https://www.bfh.ch/en/research/research-projects/2021-288-996-826/ The source data for this derived dataset was collected from Reddit and consists of a "Burnout" dataset with 352 samples, a "No burnout" dataset with 13,216 samples and a "Depression" dataset with 979 samples. More details about the original dataset can be found in the following publication: https://doi.org/10.3389/fdata.2022.863100 All contractions were expanded (ex. "I'm" to "I am") using the contractions python library. We used the spacy en-core-web-sm pre-trained English language pipeline to tokenize each text sample, remove stopwords and punctuation, and lemmatize the remaining tokens. For example, the text "I feel like I have been working too much. Everything is exhausting." would be converted to "feel like work exhausting". The dataset presented here was then compiled by counting the top 20 lemmatized tokens in each of the classes (Burnout, No burnout and Depression). The words are ordered from more frequent to less frequent.
提供机构:
OLOS, OLOS
创建时间:
2022-07-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作