Fine-Grained Balanced Cyberbullying Dataset

Name: Fine-Grained Balanced Cyberbullying Dataset
Creator: ieee-dataport.org
License: 暂无描述

ieee-dataport.org2025-03-24 收录

下载链接：

https://ieee-dataport.org/open-access/fine-grained-balanced-cyberbullying-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

Amidst the COVID-19 pandemic, cyberbullying has become an even more serious threat. Our work aims to investigate the viability of an automatic multiclass cyberbullying detection model that is able to classify whether a cyberbully is targeting a victim’s age, ethnicity, gender, religion, or other quality. Previous literature has not yet explored making fine-grained cyberbullying classifications of such magnitude, and existing cyberbullying datasets suffer from quite severe class imbalances. To combat these challenges, we establish a framework for the automatic generation of balanced data by using a semi-supervised online Dynamic Query Expansion (DQE) process to extract more natural data points of a specific class from Twitter. We also propose a Graph Convolutional Network (GCN) classifier, using a graph constructed from the thresholded cosine similarities between tweet embeddings. With our DQE-augmented dataset, which we have made publicly available, we compare our GCN model using eight different tweet embedding methods and six other classification models over two sizes of datasets. Our results show that our proposed GCN model matches or exceeds the performance of the baseline models, as indicated by McNemar statistical tests.

在新冠疫情期间，网络欺凌愈发成为一项严重的威胁。本研究旨在探讨一种自动多类网络欺凌检测模型的可行性，该模型能够对欺凌者针对受害者年龄、种族、性别、宗教或其他特质进行分类。现有文献尚未探讨此类细粒度网络欺凌分类的可行性，且现有的网络欺凌数据集普遍存在严重的类别不平衡问题。为应对这些挑战，我们构建了一个框架，通过使用半监督在线动态查询扩展（DQE）过程来自动生成平衡数据，以从Twitter中提取更多特定类别的自然数据点。此外，我们提出了一个基于图卷积网络（GCN）的分类器，该分类器利用由推文嵌入之间的阈值余弦相似度构建的图。利用我们公开的DQE增强数据集，我们比较了我们的GCN模型在八种不同的推文嵌入方法和六种其他分类模型上的表现，数据集规模分为两种。我们的结果表明，所提出的GCN模型在McNemar统计测试的指标下，其性能与基线模型相当或更优。

提供机构：

ieee-dataport.org

5,000+

优质数据集

54 个

任务类型

进入经典数据集