Nachammai41/underserved-persona_conditioned-fraud-v3

Name: Nachammai41/underserved-persona_conditioned-fraud-v3
Creator: Nachammai41
Published: 2026-04-23 18:31:56
License: 暂无描述

Hugging Face2026-04-23 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/Nachammai41/underserved-persona_conditioned-fraud-v3

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为基于人物条件的欺诈检测数据集（v3，引用基础），旨在通过基于真实世界来源的合成数据检测服务不足社区的欺诈行为。数据集包含多种语言，并分为不同的原型，如汇款、零工经济工人、无银行账户者和ITIN持有者。v3版本新增了三个通用列，包括人物来源ID、欺诈向量类型参考和行为证据等级。数据集包含20,000条交易记录，分为四个原型各5,000条，欺诈率约为10%。数据集还提供了人物档案、引用注册表和类型注册表等参考数据。数据生成过程包括从13个精选来源提取结构化信息，生成46个基于真实来源的人物档案，然后通过TabDDPM v3生成器生成交易数据，最后通过Adaption Labs进行叙事填充。数据集适用于研究和教育用途，采用CC-BY-4.0许可。

The dataset is named Persona-Conditioned Fraud Detection Dataset (v3, Citation-Grounded) and is designed to detect fraud in underserved communities using synthetic data grounded in real-world sources. It includes multiple languages and is categorized into different archetypes such as remittance, gig worker, unbanked, and ITIN. The v3 version introduces three new universal columns: persona_source_ids, fraud_vector_typology_ref, and behavioral_evidence_grade. The dataset contains 20,000 transaction records, with 5,000 per archetype and a fraud rate of approximately 10%. It also provides reference data such as persona profiles, citation registry, and typology registry. The data generation process involves structured extraction from 13 curated sources, creation of 46 grounded persona profiles, generation of transaction data via TabDDPM v3, and narrative fill by Adaption Labs. The dataset is intended for research and educational purposes and is released under the CC-BY-4.0 license.

提供机构：

Nachammai41

5,000+

优质数据集

54 个

任务类型

进入经典数据集