Durgesh111/Cifer-Fraud-Detection-Dataset-AF

Name: Durgesh111/Cifer-Fraud-Detection-Dataset-AF
Creator: Durgesh111
Published: 2025-12-05 07:48:33
License: 暂无描述

Hugging Face2025-12-05 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/Durgesh111/Cifer-Fraud-Detection-Dataset-AF

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - tabular-classification - feature-extraction language: - en tags: - fraud-detection - finance - federated-learning - cifer pretty_name: Cifer-Fraud-Detection-Dataset-AF size_categories: - 10M<n<100M --- # 📊 Cifer Fraud Detection Dataset ## 🧠 Overview The **Cifer-Fraud-Detection-Dataset-AF** is a high-fidelity, fully synthetic dataset created to support the development and benchmarking of privacy-preserving, federated, and decentralized machine learning systems in financial fraud detection. This dataset draws structural inspiration from the **PaySim simulator,** which was built using aggregated mobile money transaction data from a real financial provider operating in 14+ countries. Cifer extends this format by scaling it to **21 million samples,** optimizing for **federated learning environments,** and validating performance against real-world datasets. > ### Accuracy Benchmark: > Cifer-trained models on this dataset reach **99.93% accuracy,** benchmarked against real-world fraud datasets with **99.98% baseline accuracy**—providing high-fidelity behavior for secure, distributed ML research. --- ## ⚙️ Generation Method This dataset is **entirely synthetic** and was generated using **Cifer’s internal simulation engine,** trained to mimic patterns of financial behavior, agent dynamics, and fraud strategies typically observed in mobile money ecosystems. - Based on the structure and simulation dynamics of PaySim - Enhanced for multi-agent testing, federated partitioning, and async model training - Includes realistic fraud flagging mechanisms and unbalanced label distributions --- # 🧩 Data Structure | Column Name | Description | |------------------|-----------------------------------------------------------------------------| | `step` | Unit of time (1 step = 1 hour); simulation spans 30 days (744 steps total) | | `type` | Transaction type: CASH-IN, CASH-OUT, DEBIT, PAYMENT, TRANSFER | | `amount` | Transaction value in simulated currency | | `nameOrig` | Anonymized ID of sender | | `oldbalanceOrg` | Sender’s balance before transaction | | `newbalanceOrig` | Sender’s balance after transaction | | `nameDest` | Anonymized ID of recipient | | `oldbalanceDest` | Recipient’s balance before transaction (if applicable) | | `newbalanceDest` | Recipient’s balance after transaction (if applicable) | | `isFraud` | Binary flag: 1 if transaction is fraudulent | | `isFlaggedFraud` | 1 if transaction exceeds a flagged threshold (e.g. >200,000) | --- # 📁 File Organization Total Rows: **21,000,000** Split into 14 files for large-scale and federated learning scenarios: - `Cifer-Fraud-Detection-Dataset-AF-part-1-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-2-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-3-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-4-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-5-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-6-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-7-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-8-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-9-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-10-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-11-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-12-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-13-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-14-14.csv` → 1.5M rows Format: `.csv` (optionally `.parquet` or `.json` upon request) --- # ✅ Key Features - Fully synthetic and safe for public release - Compatible with federated learning (cross-silo, async, or multi-agent) - Ideal for privacy-preserving machine learning and robustness testing - Benchmarkable against real-world fraud datasets - Supports fairness evaluation via distribution-aware modeling --- # 🔬 Use Cases - Fraud detection benchmarking in decentralized AI systems - Federated learning simulation (training, evaluation, aggregation) - Model bias mitigation and fairness testing - Multi-agent coordination and adversarial fraud modeling --- # 📜 License **Apache 2.0** — freely usable with attribution --- # 🧾 Attribution & Citation This dataset was generated and extended by Cifer AI, building on structural principles introduced by: **E. A. Lopez-Rojas, A. Elmir, and S. Axelsson** <br> *PaySim: A financial mobile money simulator for fraud detection.* <br> 28th European Modeling and Simulation Symposium – EMSS 2016 ---

提供机构：

Durgesh111

5,000+

优质数据集

54 个

任务类型

进入经典数据集