five

Expert-Annotated Reddit Posts on Six Classes of Psychological Abuse

收藏
DataCite Commons2026-03-10 更新2026-05-07 收录
下载链接:
https://rdr.ucl.ac.uk/articles/dataset/Expert-Annotated_Reddit_Posts_on_Six_Classes_of_Psychological_Abuse/31587925/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset accompanies the papers "Decoding Psychological Abuse: A Comparative Study of Natural Language Processing (NLP) Classifiers Using Reddit Data" and "The Use of Computational Text Mining Methods to Detect and Understand Domestic Abuse", conducted by the Gender and Tech Research Lab at UCL Computer Science. The studies apply NLP modeling techniques to classify different forms of psychological abuse in Reddit posts.The text data consists of posts from 2021, collected from three publicly available subreddits (r/abusiverelationships, r/domesticviolence, and r/emotionalabuse), using the Pushshift Reddit API. Each post has been labeled by four expert annotators across six non-mutually exclusive categories of psychological abuse. The dataset includes four separate files containing the individual annotations from each expert. In addition, an aggregate dataset is provided that combines the annotations using two labeling strategies: (1) OR labels, where a label is assigned if at least one annotator selected the category, and (2) AND labels, where a label is assigned if at least two annotators selected the category. The data is anonymised and excludes duplicate, empty, and deleted posts.

本数据集配套伦敦大学学院计算机科学学院性别与技术研究实验室发表的两篇论文,分别为《解码心理虐待:基于Reddit数据的自然语言处理(Natural Language Processing, NLP)分类器对比研究》与《利用计算文本挖掘方法检测与理解家庭虐待》。两项研究采用自然语言处理建模技术,对Reddit帖子中的各类心理虐待形式开展分类任务。本数据集的文本数据采集自2021年的公开内容,通过Pushshift Reddit API从r/abusiverelationships、r/domesticviolence及r/emotionalabuse三个公开Reddit子版块获取。每篇帖子均由四名专家标注员基于6种非互斥的心理虐待类别完成标注。数据集包含四个独立文件,分别存储每位专家标注员的单独标注结果。此外,还提供了聚合数据集,该数据集通过两种标注策略合并标注结果:其一为‘或标签’规则,即只要至少一名标注员选择某类别,即赋予该标签;其二为‘与标签’规则,即仅当至少两名标注员选择某类别时,才赋予该标签。本数据集已完成匿名化处理,剔除了重复、空值及已删除的帖子。
提供机构:
University College London
创建时间:
2026-03-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作