five

Datasets for Explainable Depression Detection on Twitter Aided by Metaphor Concept Mappings

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7095099
下载链接
链接失效反馈
官方服务:
资源简介:
Introduction The datasets for training and evaluating our model for explainable depression detection on Twitter aided by metaphor concept mappings proposed in the following paper: Sooji, Han, Rui Mao, and Erik Cambria. "Hierarchical Attention Network for Explainable Depression Detection on Twitter Aided by Metaphor Concept Mappings." In Proceedings of the 29th International Conference on Computational Linguistics (COLING), 2022. in press Source code for our model is available at github.com/soojihan/HAN. These datasets were generated using the dataset proposed in Shen et al., 2017. The original dataset is available at github.com/sunlightsgy/MDDL.   Description There are three datasets.  1. mdl_HAN: This dataset contains tweets and metaphor concept mappings (MCMs) for 5,899 positive (i.e. depressed) and 4,469 negative users. Tweets in this dataset are extracted from the original MDDL dataset (Shen et al., 2017). MCMs were extracted using MetaPro (Mao et al., 2022). Please refer to our paper for more details.The name of each subfolder under the 'positive' and 'negative' subfolders is tweet userid. Each user's folder contains one or two json files: [userid].json: This json file contains tweet text objects, each of which is represented by [timestamp, tweet text]. [userid]_cm.json: This json file contains MCMs, each of which is represented by [timestamp, MCM] Note that some users do not have MCMs. There's no [userid]_cm.json file in such users' folders. 2. imdl_HAN: This dataset has the same contents and structure as mdl_HAN except that explicit linguistic cues for depression (i.e., “I’m/I was/I am/I’ve been diagnosed depression” and words containing “depress”, “diagnos”, “anxiety”, “bipolar” and “disorder”) were removed from all tweets. 3. sampled_training_eval_data: This dataset contains 5 randomly sampled cross-validation sets. Each of the five set contains train.csv, test,csv and dev.csv. Each csv file contains user ids.   Remarks If you use the datasets, please cite our paper: Sooji, Han, Rui Mao, and Erik Cambria. "Hierarchical Attention Network for Explainable Depression Detection on Twitter Aided by Metaphor Concept Mappings." In Proceedings of the 29th International Conference on Computational Linguistics (COLING), 2022. in press   Contact If you have any questions about the datasets or source code for our model, please contact Sooji Han.   References Shen, Guangyao, Jia Jia, Liqiang Nie, Fuli Feng, Cunjun Zhang, Tianrui Hu, Tat-Seng Chua, and Wenwu Zhu. "Depression detection via harvesting social media: A multimodal dictionary learning solution." In IJCAI, pp. 3838-3844. 2017. Rui Mao, Xiao Li, Mengshi Ge, and Erik Cambria. 2022. MetaPro: A computational metaphor pro- cessing model for text pre-processing. Information Fusion, 86-87:30–43.
创建时间:
2022-09-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作