Graph Representation Learning Techniques for the Combat against Online Abusive Activity

Figshare2024-05-13 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Graph_Representation_Learning_Techniques_for_the_Combat_against_Online_Abusive_Activity/25611231

下载链接

链接失效反馈

官方服务：

资源简介：

With the growing prevalence of the Internet, abusive behaviors on online platforms have surged in recent decades. These online abusers take advantage of the popularity and conveniences of online platforms to engage in abusive activities toward online users, drawing attention from researchers on effective methods to combat various abusive activities on online platforms. However, traditional machine learning methods, focusing on modeling abusive behavior patterns, struggle to effectively detect and mitigate such behaviors due to their inability to account for the intricate relationships among abusers on online platforms. In response, this dissertation proposes to leverage graphs to depict the complex relationships among abusive users and employ Graph Representation Learning (GRL) methods to detect these online abusive activities (i.e., drug trafficking detection and malicious repository detection). Despite the notable success that existing GRL methods have gained on benchmark graph datasets like social, academic, and molecule graphs, they still face challenges for abuse detection in real-world scenarios: (i) Many real-world abuse detection tasks lack sufficient labeled data for model training, as obtaining a large amount of labeled data is always time-consuming and resource-intensive; (ii) Online abuse graph datasets frequently exhibit class imbalance and topology imbalance issues. To handle the first challenge, this dissertation presents advanced techniques, including graph meta-learning and graph self-supervised learning, to combat online abusive activities (i.e., drug trafficker identification and malicious repository detection). Specifically, to make optimal use of the limited labeled data, two novel GRL models (called Meta-AHIN and Meta-HG) that incorporate GRL and meta-learning are first designed to detect malicious repositories on social coding platforms and drug sellers on social media, respectively. In the second part, this dissertation fully exploits the benefits of handy unlabeled data. It proposes two graph self-supervised learning methods (i.e., Rep2Vec and HyGCL-DC) to detect malicious repositories and drug trafficking communities, respectively. Furthermore, to handle the second challenge about class and topology imbalance issues in graphs, this dissertation designs two novel models called CM-GCL and AD-GSMOTE to alleviate the imbalance issues in the self-supervised learning setting (CM-GCL) and the supervised learning setting (AD-GSMOTE) for various detection tasks. All designed models in this dissertation are evaluated on benchmark or real-world datasets, showcasing the effectiveness of GRL-based methods in real-world online abuse detection. Furthermore, this dissertation contributes to research fields by publicly releasing two newly collected datasets (i.e., Twitter-Drug and Twitter-HyDrug), providing valuable resources and assets for researchers in the areas of online abuse detection and GRL.

随着互联网的日益普及，近几十年来在线平台上的恶意行为呈爆发式增长。此类平台上的恶意行为者借助在线平台的普及性与便捷性，对平台用户实施恶意行为，这促使研究者们致力于探索有效遏制在线平台各类恶意行为的方法。然而，传统机器学习方法多聚焦于恶意行为模式的建模，由于无法刻画在线平台上恶意行为者之间的复杂关联，难以有效检测并缓解此类恶意行为。对此，本论文提出利用图结构刻画恶意用户之间的复杂关联，并采用图表示学习（Graph Representation Learning, GRL）方法来检测此类在线恶意行为，具体包括毒品交易检测与恶意代码仓库检测。尽管现有GRL方法在社交图、学术图、分子图等基准图数据集上已取得显著成效，但在真实场景下的恶意行为检测任务中仍面临诸多挑战：（一）许多真实恶意行为检测任务缺乏足够的标注数据用于模型训练，因为获取大量标注数据往往耗时耗力；（二）在线恶意行为图数据集常存在类别失衡与拓扑失衡问题。为解决第一个挑战，本论文提出了包括图元学习（Graph Meta-Learning）与图自监督学习（Graph Self-Supervised Learning）在内的先进技术，用于对抗在线恶意行为，具体场景包括毒贩识别与恶意代码仓库检测。具体而言，为充分利用有限的标注数据，本论文首先设计了两款融合GRL与元学习的新型GRL模型——Meta-AHIN与Meta-HG，分别用于检测社交编码平台上的恶意代码仓库与社交媒体上的贩毒者。第二部分中，本论文充分利用易于获取的未标注数据，提出了两款图自监督学习方法——Rep2Vec与HyGCL-DC，分别用于检测恶意代码仓库与贩毒社区。此外，为解决图数据中的类别失衡与拓扑失衡这一第二大挑战，本论文设计了两款新型模型CM-GCL与AD-GSMOTE，分别用于缓解自监督学习场景（CM-GCL）与监督学习场景（AD-GSMOTE）下各类检测任务中的数据失衡问题。本论文所有设计的模型均在基准数据集或真实数据集上进行了评估，验证了基于GRL的方法在真实在线恶意行为检测任务中的有效性。此外，本论文还公开发布了两款全新采集的数据集（即Twitter-Drug与Twitter-HyDrug），为在线恶意行为检测与GRL领域的研究者提供了宝贵的研究资源与数据支撑。

创建时间：

2024-05-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集