Robin
收藏arXiv2022-09-13 更新2024-07-24 收录
下载链接:
https://drive.google.com/file/d/1MRGis3s4RQ2MMSQu2pxqBqOzStRCeN86/view
下载链接
链接失效反馈官方服务:
资源简介:
Robin数据集是由达特茅斯学院计算机科学系创建的一个大规模自杀相关文本语料库,包含超过110万条来自Reddit的帖子。该数据集旨在通过机器学习模型早期检测自杀意图,以挽救生命。数据集内容丰富,涵盖了多种自杀相关文本类别,如自杀哀悼和轻率提及,有助于模型学习表达自杀意念的微妙差异。创建过程中,数据集通过PushShift API从多个Reddit子论坛中精心筛选和收集,确保了数据的高质量和多样性。Robin数据集的应用领域主要集中在自杀情感研究,旨在通过先进的自然语言处理技术解决自杀预防问题。
The Robin Dataset is a large-scale suicide-related text corpus created by the Department of Computer Science at Dartmouth College, containing over 1.1 million posts sourced from Reddit. It is designed to enable early detection of suicidal intent via machine learning models to save lives. The dataset features rich content covering multiple suicide-related text categories, such as suicide mourning and casual mentions, which helps models learn the subtle differences in expressions of suicidal ideation. During its development, the dataset was carefully screened and collected from various Reddit subreddits via the PushShift API, ensuring high data quality and diversity. The primary application fields of the Robin Dataset focus on suicide emotion research, aiming to address suicide prevention issues through advanced natural language processing technologies.
提供机构:
达特茅斯学院计算机科学系
创建时间:
2022-09-13



