ScamGen
收藏DataCite Commons2025-05-01 更新2025-05-17 收录
下载链接:
https://data.mendeley.com/datasets/dkypjhkmgb
下载链接
链接失效反馈官方服务:
资源简介:
ScamGen: A Comprehensive Dataset of Chinese Telephone Scams
This dataset, created using the ScamGen technique, focuses on capturing the psychological dynamics between scammers and victims in Chinese telephone scams. It is derived from a multi-source data collection framework and is expanded through a template-based data augmentation method, generating diverse and realistic scam scenarios. The dataset emphasizes the interactions between scammers and victims, using sentence- and word-level perturbations to ensure a wide variety of scam types and techniques.
This rich dataset covers various scam strategies, such as urgency, impersonation, and emotional manipulation, designed to simulate the real-life psychological tactics employed by scammers. It has been rigorously evaluated and proven to outperform large language models in generating diverse and high-quality scam-related data.
Alongside this dataset, five deep learning models for intent detection were developed, with BERT achieving a precision of 86.68%. This dataset is a valuable resource for researchers and practitioners in the fields of cybersecurity and fraud detection, enabling a deeper understanding of telephone scammer tactics and aiding in the development of more effective detection systems.
ScamGen:中文电信诈骗综合数据集
本数据集采用ScamGen技术构建,聚焦于刻画中文电信诈骗场景中诈骗者与受害者之间的心理互动博弈。该数据集基于多源数据采集框架构建,并通过基于模板的数据增强方法进行扩充,生成了多样化且贴近真实情况的诈骗场景。本数据集着重刻画诈骗者与受害者的交互过程,通过句级与词级扰动操作,确保覆盖丰富多样的诈骗类型与作案手法。
该丰富数据集涵盖了多种诈骗策略,如制造紧迫感、身份冒充与情感操控等,旨在还原诈骗者在现实中使用的心理操控手段。经严格评估验证,本数据集在生成多样化高质量诈骗相关数据方面,性能优于大语言模型(Large Language Model)。
伴随本数据集同步推出了5款用于意图识别的深度学习模型,其中BERT模型的精确率达到86.68%。本数据集为网络安全与欺诈检测领域的研究人员与从业者提供了宝贵的资源,有助于深入理解电信诈骗者的作案手法,并助力开发更高效的诈骗检测系统。
提供机构:
Mendeley Data
创建时间:
2025-01-16
搜集汇总
数据集介绍

背景与挑战
背景概述
ScamGen是一个专注于中国电话诈骗的综合数据集,通过多源数据收集和模板增强技术生成多样化的诈骗场景,涵盖多种诈骗策略和心理动态。该数据集已用于开发高效的意图检测模型(如BERT准确率86.68%),为网络安全和欺诈检测研究提供重要资源。
以上内容由遇见数据集搜集并总结生成



