Depression Indicators in Twitter
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/s25h5tzgyf
下载链接
链接失效反馈官方服务:
资源简介:
The dataset was created to identify relevant features for detecting individuals with depression based on their Twitter posts. It consists of 3,758 tweets and 5,902 unique words, structured in a binary matrix format where each row represents a tweet and each column represents a word. The values indicate the presence (1) or absence (0) of a word in a given tweet.
In addition to textual data, the dataset incorporates nontextual features, stored in a separate table. Each row represents a tweet, and each column corresponds to a specific attribute: the number of likes, retweets, mentions, and the time of publication, as well as the device used for posting.
The posting time was transformed into a numerical format ranging from 0 to 47, where each value represents a 30-minute interval throughout the day. In contrast, the device type is stored as raw text containing the name of the device used to post each tweet. The numerical values (likes, retweets, and mentions) were also kept as raw counts, preserving their original scale for further analysis.
This dataset was used in the study "Characteristics for depression detection using Twitter data" (DOI: 10.59681/2175-4411.v16.iEspecial.2024.1319).
创建时间:
2025-03-05



