nithi060488/Cifer-Fraud-Detection-Dataset-AF

Name: nithi060488/Cifer-Fraud-Detection-Dataset-AF
Creator: nithi060488
Published: 2026-02-24 02:11:54
License: 暂无描述

Hugging Face2026-02-24 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/nithi060488/Cifer-Fraud-Detection-Dataset-AF

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - tabular-classification - feature-extraction language: - en tags: - fraud-detection - finance - federated-learning - cifer pretty_name: Cifer-Fraud-Detection-Dataset-AF size_categories: - 10M<n<100M --- # 📊 Cifer Fraud Detection Dataset ## 🧠 Overview The **Cifer-Fraud-Detection-Dataset-AF** is a high-fidelity, fully synthetic dataset created to support the development and benchmarking of privacy-preserving, federated, and decentralized machine learning systems in financial fraud detection. This dataset draws structural inspiration from the **PaySim simulator,** which was built using aggregated mobile money transaction data from a real financial provider operating in 14+ countries. Cifer extends this format by scaling it to **21 million samples,** optimizing for **federated learning environments,** and validating performance against real-world datasets. > ### Accuracy Benchmark: > Cifer-trained models on this dataset reach **99.93% accuracy,** benchmarked against real-world fraud datasets with **99.98% baseline accuracy**—providing high-fidelity behavior for secure, distributed ML research. --- ## ⚙️ Generation Method This dataset is **entirely synthetic** and was generated using **Cifer’s internal simulation engine,** trained to mimic patterns of financial behavior, agent dynamics, and fraud strategies typically observed in mobile money ecosystems. - Based on the structure and simulation dynamics of PaySim - Enhanced for multi-agent testing, federated partitioning, and async model training - Includes realistic fraud flagging mechanisms and unbalanced label distributions --- # 🧩 Data Structure | Column Name | Description | |------------------|-----------------------------------------------------------------------------| | `step` | Unit of time (1 step = 1 hour); simulation spans 30 days (744 steps total) | | `type` | Transaction type: CASH-IN, CASH-OUT, DEBIT, PAYMENT, TRANSFER | | `amount` | Transaction value in simulated currency | | `nameOrig` | Anonymized ID of sender | | `oldbalanceOrg` | Sender’s balance before transaction | | `newbalanceOrig` | Sender’s balance after transaction | | `nameDest` | Anonymized ID of recipient | | `oldbalanceDest` | Recipient’s balance before transaction (if applicable) | | `newbalanceDest` | Recipient’s balance after transaction (if applicable) | | `isFraud` | Binary flag: 1 if transaction is fraudulent | | `isFlaggedFraud` | 1 if transaction exceeds a flagged threshold (e.g. >200,000) | --- # 📁 File Organization Total Rows: **21,000,000** Split into 14 files for large-scale and federated learning scenarios: - `Cifer-Fraud-Detection-Dataset-AF-part-1-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-2-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-3-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-4-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-5-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-6-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-7-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-8-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-9-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-10-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-11-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-12-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-13-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-14-14.csv` → 1.5M rows Format: `.csv` (optionally `.parquet` or `.json` upon request) --- # ✅ Key Features - Fully synthetic and safe for public release - Compatible with federated learning (cross-silo, async, or multi-agent) - Ideal for privacy-preserving machine learning and robustness testing - Benchmarkable against real-world fraud datasets - Supports fairness evaluation via distribution-aware modeling --- # 🔬 Use Cases - Fraud detection benchmarking in decentralized AI systems - Federated learning simulation (training, evaluation, aggregation) - Model bias mitigation and fairness testing - Multi-agent coordination and adversarial fraud modeling --- # 📜 License **Apache 2.0** — freely usable with attribution --- # 🧾 Attribution & Citation This dataset was generated and extended by Cifer AI, building on structural principles introduced by: **E. A. Lopez-Rojas, A. Elmir, and S. Axelsson** <br> *PaySim: A financial mobile money simulator for fraud detection.* <br> 28th European Modeling and Simulation Symposium – EMSS 2016 ---

--- 许可证：apache-2.0 任务类别： - 表格分类（tabular-classification） - 特征提取（feature-extraction）语言： - 英语（en）标签： - 欺诈检测（fraud-detection） - 金融（finance） - 联邦学习（federated-learning） - Cifer 展示名称：Cifer-Fraud-Detection-Dataset-AF 规模类别： - 1000万<样本量<1亿 --- # 📊 Cifer欺诈检测数据集 ## 🧠 数据集概览 **Cifer-Fraud-Detection-Dataset-AF** 是一套高保真全合成数据集，旨在支持金融欺诈检测领域中隐私保护型、联邦式及去中心化机器学习系统的开发与基准测试。本数据集的结构灵感源自**PaySim模拟器**，该模拟器基于一家在14个以上国家开展业务的真实金融机构的聚合移动货币交易数据构建。Cifer在此基础上将样本量拓展至**2100万条**，并针对**联邦学习（federated-learning）环境**进行了优化，同时通过与真实世界数据集的对比验证了模型性能。 > ### 精度基准测试结果： > 在该数据集上训练的Cifer模型精度可达**99.93%**，对比真实欺诈数据集的**99.98%基准精度**，可为安全分布式机器学习研究提供高保真的行为模拟数据。 --- ## ⚙️ 生成方式本数据集**完全由合成生成**，通过**Cifer内部仿真引擎**生成，该引擎经过训练可复现移动货币生态系统中常见的金融行为、智能体动态及欺诈策略模式。 - 基于PaySim的结构与仿真动力学 - 针对多智能体测试、联邦分区及异步模型训练进行了增强 - 包含逼真的欺诈标记机制与非均衡标签分布 --- # 🧩 数据结构 | 列名 | 描述 | |------------------|----------------------------------------------------------------------| | `step` | 时间单位（1步=1小时）；仿真时长覆盖30天（总计744步） | | `type` | 交易类型：现金存入（CASH-IN）、现金取出（CASH-OUT）、借记（DEBIT）、支付（PAYMENT）、转账（TRANSFER） | | `amount` | 交易金额，单位为仿真货币 | | `nameOrig` | 发送方的匿名ID | | `oldbalanceOrg` | 交易前发送方账户余额 | | `newbalanceOrig` | 交易后发送方账户余额 | | `nameDest` | 接收方的匿名ID | | `oldbalanceDest` | 交易前接收方账户余额（如适用） | | `newbalanceDest` | 交易后接收方账户余额（如适用） | | `isFraud` | 二元标记：若交易为欺诈则为1 | | `isFlaggedFraud` | 若交易超出标记阈值（例如超过200,000）则为1 | --- # 📁 文件组织总记录数：**21,000,000** 为适配大规模学习与联邦学习场景，数据集被拆分为14个文件： - `Cifer-Fraud-Detection-Dataset-AF-part-1-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-2-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-3-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-4-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-5-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-6-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-7-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-8-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-9-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-10-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-11-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-12-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-13-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-14-14.csv` → 150万条记录格式：`.csv`（可根据需求提供`.parquet`或`.json`格式） --- # ✅ 核心特性 - 全合成生成，可安全公开发布 - 兼容联邦学习（跨域、异步或多智能体场景） - 适用于隐私保护型机器学习与鲁棒性测试 - 可与真实世界欺诈数据集进行基准对比 - 支持通过分布感知建模开展公平性评估 --- # 🔬 应用场景 - 去中心化AI系统中的欺诈检测基准测试 - 联邦学习仿真（训练、评估、聚合） - 模型偏差缓解与公平性测试 - 多智能体协同与对抗性欺诈建模 --- # 📜 许可证 **Apache 2.0** — 可自由使用，需注明出处 --- # 🧾 署名与引用本数据集由Cifer AI生成并扩展，其结构原理源自： **E. A. Lopez-Rojas、A. Elmir与S. Axelsson** <br> *PaySim：用于欺诈检测的金融移动货币模拟器* <br> 第28届欧洲建模与仿真研讨会——EMSS 2016

提供机构：

nithi060488

5,000+

优质数据集

54 个

任务类型

进入经典数据集