techaudit/Dante-Synthetic-AML-Finance-V1
收藏Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/techaudit/Dante-Synthetic-AML-Finance-V1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- tabular-classification
tags:
- finance
- aml
- synthetic
- fraud-detection
pretty_name: Dante Synthetic AML Finance
size_categories:
- 1K<n<10K
---
# 🛡️ Dante Synthetic Finance: AML Discovery Edition (V1)
### 🚀 The Mission
Training AI to detect money laundering is nearly impossible due to privacy laws (GDPR/CCPA). This dataset provides a **high-fidelity, 100% synthetic** alternative that mimics real-world banking ecosystems without compromising any real user data.
### 🧠 The Dante Engine Advantage
Unlike standard random generators, this dataset was created using the **Dante Synthetic Engine**, which focuses on:
* **Behavioral Patterns:** Uses Gamma & Exponential distributions to simulate realistic human spending habits.
* **Embedded Anomalies:** Includes "Smurfing" and "Layering" patterns specifically designed to test the limits of Fraud Detection algorithms.
* **Global Scope:** Includes ISO-standard country codes and Merchant Category Codes (MCC).
### 📊 Dataset Structure
* `transaction_id`: Unique identifier for each event.
* `timestamp`: High-precision temporal data.
* `user_id`: Synthetic user mapping.
* `amount`: Financial value with realistic outliers.
* `location_iso`: Global transaction origin.
* `is_fraud`: Ground truth label for model training (0 = Normal, 1 = Anomaly).
---
**Looking for larger volumes?** The Dante Engine can generate 10M+ rows of custom-tailored financial data. Contact via profile for specialized integration.
---
license: MIT
task_categories:
- 表格分类(tabular-classification)
tags:
- 金融
- 反洗钱(AML)
- 合成数据
- 欺诈检测(fraud-detection)
pretty_name: Dante合成反洗钱金融数据集
size_categories:
- 1K<n<10K
---
# 🛡️ Dante合成金融:反洗钱发现版(V1)
### 🚀 项目使命
由于《通用数据保护条例》(General Data Protection Regulation, GDPR)、《加州消费者隐私法案》(California Consumer Privacy Act, CCPA)等隐私法规的约束,训练AI开展反洗钱检测工作几乎难以实现。本数据集提供了一种**高保真、100%合成**的替代方案,可在不泄露任何真实用户数据的前提下,还原真实的银行生态系统。
### 🧠 Dante合成引擎的优势
与标准随机生成器不同,本数据集由**Dante合成引擎**生成,其核心设计聚焦于以下方向:
* **行为模式**:采用伽马分布(Gamma distribution)与指数分布(Exponential distribution),模拟符合真实场景的人类消费习惯。
* **内置异常模式**:包含“拆分交易(Smurfing)”与“分层洗钱(Layering)”等专为测试欺诈检测算法极限而设计的异常模式。
* **全球覆盖**:包含ISO标准国家代码与商户类别码(MCC)。
### 📊 数据集结构
* `transaction_id`:每个交易事件的唯一标识符。
* `timestamp`:高精度时间戳数据。
* `user_id`:合成用户映射标识。
* `amount`:包含符合真实分布异常值的金融交易金额。
* `location_iso`:交易发生地的ISO标准国家代码。
* `is_fraud`:用于模型训练的真实标签(0表示正常交易,1表示异常交易)。
---
**需要更大规模的数据?** Dante合成引擎可生成1000万条以上的定制化金融数据。如需专业集成服务,请通过个人主页联系。
提供机构:
techaudit



