Perverted Justice Dataset

Name: Perverted Justice Dataset
Creator: Faraz, Anum
License: 暂无描述

IEEE2026-04-17 收录

下载链接：

https://ieee-dataport.org/documents/perverted-justice-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

The risks to children of online predators in real time gaming environments have been an area of growing concern. Research towards the development of near real time capabilities has been the focus of most queries published in this area of study. In this paper, we present Protectbot, a comprehensive safety framework used to interact with users in online gaming chat rooms. Protectbot employs a variant of the GPT-2 model known as DialoGPT, a generative pre-trained transformer designed specifically for conversation. By generating content that closely resembles human dialogue, DialoGPT allows Protectbot to engage users in interactive chat sessions. At the end of each chat, Protectbot analyzes the user's messages to identify any indications of potentially predatory behavior, enhancing the platform's capacity to safeguard its users. Protectbot architecture implements a text classifier that was trained and tested on the PAN12 dataset for identifying sexual predators. fastText word embeddings are generated from the chat text and aggregated into sentence vectors, which are then used as input features to train an SVM classifier. The proposed model achieved notable performance metrics, with a recall, accuracy, F1-score, and F_0.5-score of 0.99, marking a significant improvement over previous methodologies. A new dataset is prepared based on 71 predatory chats obtained from Perverted Justice (PJ), to evaluate the classifier's performance. The proposed approach demonstrates a high true positive rate of classifying predatory behavior by replacing the SVM with the KNN classifier 

针对儿童的在线性捕食者在实时游戏环境中带来的风险已成为日益受到关注的研究议题。本领域已发表的多数相关研究均聚焦于开发近实时相关能力。本文提出Protectbot——一款用于在线游戏聊天室用户交互的综合安全框架。Protectbot采用名为DialoGPT的GPT-2变体模型，这是一款专为对话设计的生成式预训练Transformer模型。通过生成高度贴近人类对话的内容，DialoGPT可使Protectbot与用户开展交互式聊天会话。在每场聊天结束时，Protectbot会分析用户的聊天信息，以识别潜在的捕食者行为迹象，从而提升平台保护用户的能力。Protectbot的架构搭载了一款文本分类器，该分类器基于PAN12数据集进行训练与测试，用于识别性捕食者。研究人员从聊天文本中生成fastText词嵌入，并将其聚合为句子向量，随后将这些向量作为输入特征训练支持向量机（SVM）分类器。所提模型取得了优异的性能指标，其召回率、准确率、F1值以及F_0.5值均为0.99，较此前的研究方法实现了显著提升。研究团队基于从Perverted Justice（PJ）获取的71条捕食者聊天记录构建了全新数据集，用于评估该分类器的性能。通过将支持向量机分类器替换为K近邻（KNN）分类器，所提方法实现了更高的捕食者行为真阳性分类率。

提供机构：

Faraz, Anum

5,000+

优质数据集

54 个

任务类型

进入经典数据集