nithi060488/Cifer-Fraud-Detection-Dataset-AF
收藏Hugging Face2026-02-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/nithi060488/Cifer-Fraud-Detection-Dataset-AF
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- tabular-classification
- feature-extraction
language:
- en
tags:
- fraud-detection
- finance
- federated-learning
- cifer
pretty_name: Cifer-Fraud-Detection-Dataset-AF
size_categories:
- 10M<n<100M
---
# 📊 Cifer Fraud Detection Dataset
## 🧠 Overview
The **Cifer-Fraud-Detection-Dataset-AF** is a high-fidelity, fully synthetic dataset created to support the development and benchmarking of privacy-preserving, federated, and decentralized machine learning systems in financial fraud detection.
This dataset draws structural inspiration from the **PaySim simulator,** which was built using aggregated mobile money transaction data from a real financial provider operating in 14+ countries. Cifer extends this format by scaling it to **21 million samples,** optimizing for **federated learning environments,** and validating performance against real-world datasets.
> ### Accuracy Benchmark:
> Cifer-trained models on this dataset reach **99.93% accuracy,** benchmarked against real-world fraud datasets with **99.98% baseline accuracy**—providing high-fidelity behavior for secure, distributed ML research.
---
## ⚙️ Generation Method
This dataset is **entirely synthetic** and was generated using **Cifer’s internal simulation engine,** trained to mimic patterns of financial behavior, agent dynamics, and fraud strategies typically observed in mobile money ecosystems.
- Based on the structure and simulation dynamics of PaySim
- Enhanced for multi-agent testing, federated partitioning, and async model training
- Includes realistic fraud flagging mechanisms and unbalanced label distributions
---
# 🧩 Data Structure
| Column Name | Description |
|------------------|-----------------------------------------------------------------------------|
| `step` | Unit of time (1 step = 1 hour); simulation spans 30 days (744 steps total) |
| `type` | Transaction type: CASH-IN, CASH-OUT, DEBIT, PAYMENT, TRANSFER |
| `amount` | Transaction value in simulated currency |
| `nameOrig` | Anonymized ID of sender |
| `oldbalanceOrg` | Sender’s balance before transaction |
| `newbalanceOrig` | Sender’s balance after transaction |
| `nameDest` | Anonymized ID of recipient |
| `oldbalanceDest` | Recipient’s balance before transaction (if applicable) |
| `newbalanceDest` | Recipient’s balance after transaction (if applicable) |
| `isFraud` | Binary flag: 1 if transaction is fraudulent |
| `isFlaggedFraud` | 1 if transaction exceeds a flagged threshold (e.g. >200,000) |
---
# 📁 File Organization
Total Rows: **21,000,000**
Split into 14 files for large-scale and federated learning scenarios:
- `Cifer-Fraud-Detection-Dataset-AF-part-1-14.csv` → 1.5M rows
- `Cifer-Fraud-Detection-Dataset-AF-part-2-14.csv` → 1.5M rows
- `Cifer-Fraud-Detection-Dataset-AF-part-3-14.csv` → 1.5M rows
- `Cifer-Fraud-Detection-Dataset-AF-part-4-14.csv` → 1.5M rows
- `Cifer-Fraud-Detection-Dataset-AF-part-5-14.csv` → 1.5M rows
- `Cifer-Fraud-Detection-Dataset-AF-part-6-14.csv` → 1.5M rows
- `Cifer-Fraud-Detection-Dataset-AF-part-7-14.csv` → 1.5M rows
- `Cifer-Fraud-Detection-Dataset-AF-part-8-14.csv` → 1.5M rows
- `Cifer-Fraud-Detection-Dataset-AF-part-9-14.csv` → 1.5M rows
- `Cifer-Fraud-Detection-Dataset-AF-part-10-14.csv` → 1.5M rows
- `Cifer-Fraud-Detection-Dataset-AF-part-11-14.csv` → 1.5M rows
- `Cifer-Fraud-Detection-Dataset-AF-part-12-14.csv` → 1.5M rows
- `Cifer-Fraud-Detection-Dataset-AF-part-13-14.csv` → 1.5M rows
- `Cifer-Fraud-Detection-Dataset-AF-part-14-14.csv` → 1.5M rows
Format: `.csv` (optionally `.parquet` or `.json` upon request)
---
# ✅ Key Features
- Fully synthetic and safe for public release
- Compatible with federated learning (cross-silo, async, or multi-agent)
- Ideal for privacy-preserving machine learning and robustness testing
- Benchmarkable against real-world fraud datasets
- Supports fairness evaluation via distribution-aware modeling
---
# 🔬 Use Cases
- Fraud detection benchmarking in decentralized AI systems
- Federated learning simulation (training, evaluation, aggregation)
- Model bias mitigation and fairness testing
- Multi-agent coordination and adversarial fraud modeling
---
# 📜 License
**Apache 2.0** — freely usable with attribution
---
# 🧾 Attribution & Citation
This dataset was generated and extended by Cifer AI, building on structural principles introduced by:
**E. A. Lopez-Rojas, A. Elmir, and S. Axelsson** <br>
*PaySim: A financial mobile money simulator for fraud detection.* <br>
28th European Modeling and Simulation Symposium – EMSS 2016
---
---
许可证:apache-2.0
任务类别:
- 表格分类(tabular-classification)
- 特征提取(feature-extraction)
语言:
- 英语(en)
标签:
- 欺诈检测(fraud-detection)
- 金融(finance)
- 联邦学习(federated-learning)
- Cifer
展示名称:Cifer-Fraud-Detection-Dataset-AF
规模类别:
- 1000万<样本量<1亿
---
# 📊 Cifer欺诈检测数据集
## 🧠 数据集概览
**Cifer-Fraud-Detection-Dataset-AF** 是一套高保真全合成数据集,旨在支持金融欺诈检测领域中隐私保护型、联邦式及去中心化机器学习系统的开发与基准测试。
本数据集的结构灵感源自**PaySim模拟器**,该模拟器基于一家在14个以上国家开展业务的真实金融机构的聚合移动货币交易数据构建。Cifer在此基础上将样本量拓展至**2100万条**,并针对**联邦学习(federated-learning)环境**进行了优化,同时通过与真实世界数据集的对比验证了模型性能。
> ### 精度基准测试结果:
> 在该数据集上训练的Cifer模型精度可达**99.93%**,对比真实欺诈数据集的**99.98%基准精度**,可为安全分布式机器学习研究提供高保真的行为模拟数据。
---
## ⚙️ 生成方式
本数据集**完全由合成生成**,通过**Cifer内部仿真引擎**生成,该引擎经过训练可复现移动货币生态系统中常见的金融行为、智能体动态及欺诈策略模式。
- 基于PaySim的结构与仿真动力学
- 针对多智能体测试、联邦分区及异步模型训练进行了增强
- 包含逼真的欺诈标记机制与非均衡标签分布
---
# 🧩 数据结构
| 列名 | 描述 |
|------------------|----------------------------------------------------------------------|
| `step` | 时间单位(1步=1小时);仿真时长覆盖30天(总计744步) |
| `type` | 交易类型:现金存入(CASH-IN)、现金取出(CASH-OUT)、借记(DEBIT)、支付(PAYMENT)、转账(TRANSFER) |
| `amount` | 交易金额,单位为仿真货币 |
| `nameOrig` | 发送方的匿名ID |
| `oldbalanceOrg` | 交易前发送方账户余额 |
| `newbalanceOrig` | 交易后发送方账户余额 |
| `nameDest` | 接收方的匿名ID |
| `oldbalanceDest` | 交易前接收方账户余额(如适用) |
| `newbalanceDest` | 交易后接收方账户余额(如适用) |
| `isFraud` | 二元标记:若交易为欺诈则为1 |
| `isFlaggedFraud` | 若交易超出标记阈值(例如超过200,000)则为1 |
---
# 📁 文件组织
总记录数:**21,000,000**
为适配大规模学习与联邦学习场景,数据集被拆分为14个文件:
- `Cifer-Fraud-Detection-Dataset-AF-part-1-14.csv` → 150万条记录
- `Cifer-Fraud-Detection-Dataset-AF-part-2-14.csv` → 150万条记录
- `Cifer-Fraud-Detection-Dataset-AF-part-3-14.csv` → 150万条记录
- `Cifer-Fraud-Detection-Dataset-AF-part-4-14.csv` → 150万条记录
- `Cifer-Fraud-Detection-Dataset-AF-part-5-14.csv` → 150万条记录
- `Cifer-Fraud-Detection-Dataset-AF-part-6-14.csv` → 150万条记录
- `Cifer-Fraud-Detection-Dataset-AF-part-7-14.csv` → 150万条记录
- `Cifer-Fraud-Detection-Dataset-AF-part-8-14.csv` → 150万条记录
- `Cifer-Fraud-Detection-Dataset-AF-part-9-14.csv` → 150万条记录
- `Cifer-Fraud-Detection-Dataset-AF-part-10-14.csv` → 150万条记录
- `Cifer-Fraud-Detection-Dataset-AF-part-11-14.csv` → 150万条记录
- `Cifer-Fraud-Detection-Dataset-AF-part-12-14.csv` → 150万条记录
- `Cifer-Fraud-Detection-Dataset-AF-part-13-14.csv` → 150万条记录
- `Cifer-Fraud-Detection-Dataset-AF-part-14-14.csv` → 150万条记录
格式:`.csv`(可根据需求提供`.parquet`或`.json`格式)
---
# ✅ 核心特性
- 全合成生成,可安全公开发布
- 兼容联邦学习(跨域、异步或多智能体场景)
- 适用于隐私保护型机器学习与鲁棒性测试
- 可与真实世界欺诈数据集进行基准对比
- 支持通过分布感知建模开展公平性评估
---
# 🔬 应用场景
- 去中心化AI系统中的欺诈检测基准测试
- 联邦学习仿真(训练、评估、聚合)
- 模型偏差缓解与公平性测试
- 多智能体协同与对抗性欺诈建模
---
# 📜 许可证
**Apache 2.0** — 可自由使用,需注明出处
---
# 🧾 署名与引用
本数据集由Cifer AI生成并扩展,其结构原理源自:
**E. A. Lopez-Rojas、A. Elmir与S. Axelsson** <br>
*PaySim:用于欺诈检测的金融移动货币模拟器* <br>
第28届欧洲建模与仿真研讨会——EMSS 2016
提供机构:
nithi060488



