ADAM-SDMH: A DAtaset from Manipal for Severity Detection in Tweets related to Mental Health
收藏DataCite Commons2025-06-01 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/ADAM-SDMH_A_DAtaset_from_Manipal_for_Severity_Detection_in_Tweets_related_to_Mental_Health/19029656/2
下载链接
链接失效反馈官方服务:
资源简介:
Readme file for ADAM-SDMH: A DAtaset from Manipal for Severity Detection in Tweets related to Mental Health <br>Generated on 2021-02-15<br><br>Recommended citation for the dataset:<br>P. Surana, M. Yusuf and S. Singh, "Severity Classification of Mental Health-Related Tweets," 2021 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), 2021, pp. 336-341, DOI: 10.1109/DISCOVER52564.2021.9663651.<br>******************************<br>PROJECT INFORMATION<br>******************************<br>1. Title of dataset: Mental Health Dataset<br>2. Author information:<br>Praatibh Surana, Manipal Institute of Technology,<br>Mirza Yusuf, Manipal Institute of Technology,<br>Sanjay Singh, Manipal Institute of Technology<br><br>Principal Investigators <br>Name: Praatibh Surana<br>Address: Manipal Institute of Technology<br>Email: praatibhsurana@gmail.com<br><br>Name: Mirza Yusuf<br>Address: Manipal Institute of Technology<br>Email: baig.yusuf.cr7@gmail.com<br><br>Co-Investigator<br>Name: Sanjay Singh<br>Address: Manipal Institute of Technology<br>Email: sanjay.singh@manipal.edu<br><br>3. Date of data collection: Jan 2021 - Feb 2021<br>************************************<br>DATA ACCESS INFORMATION<br>************************************<br><br>1. Licences/restrictions placed on access to the dataset: CC BY 4.0<br><br>2. Links to publications that use the data:<br>URL: https://ieeexplore.ieee.org/document/9663651,<br>DOI: 10.1109/DISCOVER52564.2021.9663651<br><br>3. Links to a third party or secondary data used in the project (for example, existing datasets, third-party datasets)<br>Pennington, Jeffrey et al. “GloVe: Global Vectors for Word Representation.” EMNLP (2014).<br>DOI: https://doi.org/10.3115/v1/d14-1162<br><br>*****************************************<br>METHODS OF DATA COLLECTION<br>*****************************************<br><br>1. Describe the methods for data collection and/or provide links to papers describing data collection methods<br>Paper DOI :<br>Our research revolved around correctly classifying tweets based on their severity in a mental health context. An effort was also made to make the models detect sarcasm better, as this was something that many models in the past failed to do. <br>Our dataset consists of tweets labeled as ‘0’, ‘1’, and '2' depending on their classes. The labeling rules followed are given in Table 1<br><br><br>TABLE 1 - SEVERITY CLASSIFICATION CLASSES AND EXAMPLES<br>-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br>Class | Rules | Example<br> | |<br>0 | Helping / suggestion for mental health awareness | Are you suffering from anxiety? Check out this page for therapy through Skype!<br> | / positivity / informative |<br> | / motivational |<br> | / questions about mental health |<br> | |<br>1 | Sarcasm/rant/expression of annoyance | Today was so annoying. If my teacher would have called my name, I swear to God I would have killed myself<br> | |<br>2 | Case of slight disturbance | All I am is a burden. I don’t want to live anymore. <br> | / strong indication of disturbance |<br> | / user outright mentions depression |<br> | / anxiety / suicide / self-harm |<br>-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br><br><br>The following steps were performed for data collection:<br>1) Tweets were extracted with the help of Twitter’s official API using hashtags such as #depression, #mentalhealth, #anxiety, #selfharm, #killmyself, and #kms from users.<br>2) Around 40,000 tweets were extracted from Twitter between January and February 2021, out of which the final dataset comprised of 2460 tweets; 820 tweets were distributed equally amongst the three classes.<br>3) Two authors manually annotated the dataset and cross-verified it to ensure accurate labeling.<br>2. Data processing methods:<br>A. Preprocessing<br>1) Removal of numbers, URLs, usernames, and special characters: The first step after extraction of the tweets was ensuring that they were suitable for further use. The “preprocessor” uses the Python library to eliminate numbers, retweets, URLs, emojis, emoticons, and usernames, followed by duplicate tweets removal from the dataset.<br>2) Stopword removal and expansion of standard abbreviations: We made use of Python’s “nltk” library for the removal of common stopwords such as “for,” “the,” “a,” etc. As our data is sourced from Twitter, lots of common internet abbreviations like “lol,” “kms,” “gn,”etc., were used. It was taken care of by converting these short forms to their corresponding complete forms. Lots of short forms like “wanna” for “want to” and “gonna” for “going to” were used. We ensured that these, too, were taken care of to the best of our abilities. <br>3) Removal of names, so that anonymity is maintained. Names of people, places, twitter handles anything that could compromise the anonymity has been removed, a token named as ‘[redacted]’ has been used in their place instead.<br>*******************************<br>SUMMARY OF DATA FILE<br>*******************************<br><br>Filename: MentalHealthTweets.csv<br>Short description: This CSV File contains 2460 tweets annotated ‘0’, ‘1’ or ‘2’ based on the class they belong to.<br><br>*******************************************************************<br>DATA-SPECIFIC INFORMATION FOR NOTE: This section should be copied and pasted for each file<br>*******************************************************************<br>1. Number of variables: 2<br>2. Number of cases rows: 2461<br>3. Missing data codes: NA<br>4. Variable list<br>The variables and their properties have been provided in Table 2<br><br><br>TABLE 2 - VARIABLE DESCRIPTION TABLE<br>----------------------------------------------------------------------<br>Variable Name | Variable Description | Variable Type<br> | |<br>tweets | Cleaned up tweet | String<br> | |<br>label | Annotation for tweet | Integer<br>----------------------------------------------------------------------
提供机构:
figshare
创建时间:
2022-01-25



