A Comprehensive Dataset for Automated Cyberbullying Detection
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/wmx9jj2htd
下载链接
链接失效反馈官方服务:
资源简介:
Cyberbullying is characterized by aggressive, repetitive, and intentional communication among peers. However, most existing datasets for cyberbullying detection only focus on aggressive texts classified as aggressive or non-aggressive, disregarding the other three aspects of cyberbullying. This dataset is a comprehensive dataset that incorporates the four aspects of Cyberbullying. This dataset is an updated version of the dataset presented in our paper[1] and has been developed using the same methodology. In this updated version, we present complete and enhanced data and the code to generate data. The aggressive and non-aggressive messages compiled from different sources [2,3] have also been shared. If you use this dataset, please cite our paper [1]
[1] Ejaz, Naveed, Fakhra Razi, and Salimur Choudhury. "Towards comprehensive cyberbullying detection: A dataset incorporating aggressive texts, repetition, peerness, and intent to harm." Computers in Human Behavior (2023): 108123.
Text Messages sourced from:
[2] Elsafoury, "Cyberbullying datasets," Mendeley. com, 2020. [Online]. Available: https://data. mendeley. com/datasets/jf4pzyvnpj/1.
[3] R. Kumar, A. N. Reganti, A. Bhatia, and T. Maheshwari, "Aggression-annotated Corpus of Hindi-English Code-mixed Data," in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, May 7-12, 2018.
网络欺凌(Cyberbullying)的核心特征为同龄人之间实施的攻击性、重复性且带有主观伤害意图的沟通行为。然而,当前绝大多数用于网络欺凌检测的公开数据集仅将文本划分为攻击性与非攻击性两类,忽略了网络欺凌其余三大核心特征。本数据集为覆盖网络欺凌全部四大特征的综合性数据集,是我们在文献[1]中公开数据集的更新版本,沿用了原有的数据集构建方法。在本次更新版本中,我们提供了完整且经过优化的数据集文本,以及用于生成该数据集的代码。此外,我们还共享了从文献[2,3]中整理得到的攻击性与非攻击性文本语料。若您使用本数据集,请引用我们的文献[1]。
[1] Ejaz, Naveed、Fakhra Razi 与 Salimur Choudhury,《面向全面网络欺凌检测:整合攻击性文本、重复性、同龄性与伤害意图的数据集》,《人类行为计算机研究》(2023): 108123。
文本语料来源:
[2] Elsafoury,《网络欺凌数据集》,Mendeley,2020年 [在线]. 可获取:https://data.mendeley.com/datasets/jf4pzyvnpj/1.
[3] R. Kumar、A. N. Reganti、A. Bhatia 与 T. Maheshwari,《印英语码混合攻击性标注语料库》,收录于第十一届国际语言资源与评价会议(LREC 2018)论文集,日本宫崎,2018年5月7日至12日。
创建时间:
2024-01-22



