five

nithi060488/Cifer-Fraud-Detection-Dataset-AF

收藏
Hugging Face2026-02-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/nithi060488/Cifer-Fraud-Detection-Dataset-AF
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - tabular-classification - feature-extraction language: - en tags: - fraud-detection - finance - federated-learning - cifer pretty_name: Cifer-Fraud-Detection-Dataset-AF size_categories: - 10M<n<100M --- # 📊 Cifer Fraud Detection Dataset ## 🧠 Overview The **Cifer-Fraud-Detection-Dataset-AF** is a high-fidelity, fully synthetic dataset created to support the development and benchmarking of privacy-preserving, federated, and decentralized machine learning systems in financial fraud detection. This dataset draws structural inspiration from the **PaySim simulator,** which was built using aggregated mobile money transaction data from a real financial provider operating in 14+ countries. Cifer extends this format by scaling it to **21 million samples,** optimizing for **federated learning environments,** and validating performance against real-world datasets. > ### Accuracy Benchmark: > Cifer-trained models on this dataset reach **99.93% accuracy,** benchmarked against real-world fraud datasets with **99.98% baseline accuracy**—providing high-fidelity behavior for secure, distributed ML research. --- ## ⚙️ Generation Method This dataset is **entirely synthetic** and was generated using **Cifer’s internal simulation engine,** trained to mimic patterns of financial behavior, agent dynamics, and fraud strategies typically observed in mobile money ecosystems. - Based on the structure and simulation dynamics of PaySim - Enhanced for multi-agent testing, federated partitioning, and async model training - Includes realistic fraud flagging mechanisms and unbalanced label distributions --- # 🧩 Data Structure | Column Name | Description | |------------------|-----------------------------------------------------------------------------| | `step` | Unit of time (1 step = 1 hour); simulation spans 30 days (744 steps total) | | `type` | Transaction type: CASH-IN, CASH-OUT, DEBIT, PAYMENT, TRANSFER | | `amount` | Transaction value in simulated currency | | `nameOrig` | Anonymized ID of sender | | `oldbalanceOrg` | Sender’s balance before transaction | | `newbalanceOrig` | Sender’s balance after transaction | | `nameDest` | Anonymized ID of recipient | | `oldbalanceDest` | Recipient’s balance before transaction (if applicable) | | `newbalanceDest` | Recipient’s balance after transaction (if applicable) | | `isFraud` | Binary flag: 1 if transaction is fraudulent | | `isFlaggedFraud` | 1 if transaction exceeds a flagged threshold (e.g. >200,000) | --- # 📁 File Organization Total Rows: **21,000,000** Split into 14 files for large-scale and federated learning scenarios: - `Cifer-Fraud-Detection-Dataset-AF-part-1-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-2-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-3-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-4-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-5-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-6-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-7-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-8-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-9-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-10-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-11-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-12-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-13-14.csv` → 1.5M rows - `Cifer-Fraud-Detection-Dataset-AF-part-14-14.csv` → 1.5M rows Format: `.csv` (optionally `.parquet` or `.json` upon request) --- # ✅ Key Features - Fully synthetic and safe for public release - Compatible with federated learning (cross-silo, async, or multi-agent) - Ideal for privacy-preserving machine learning and robustness testing - Benchmarkable against real-world fraud datasets - Supports fairness evaluation via distribution-aware modeling --- # 🔬 Use Cases - Fraud detection benchmarking in decentralized AI systems - Federated learning simulation (training, evaluation, aggregation) - Model bias mitigation and fairness testing - Multi-agent coordination and adversarial fraud modeling --- # 📜 License **Apache 2.0** — freely usable with attribution --- # 🧾 Attribution & Citation This dataset was generated and extended by Cifer AI, building on structural principles introduced by: **E. A. Lopez-Rojas, A. Elmir, and S. Axelsson** <br> *PaySim: A financial mobile money simulator for fraud detection.* <br> 28th European Modeling and Simulation Symposium – EMSS 2016 ---

--- 许可证:apache-2.0 任务类别: - 表格分类(tabular-classification) - 特征提取(feature-extraction) 语言: - 英语(en) 标签: - 欺诈检测(fraud-detection) - 金融(finance) - 联邦学习(federated-learning) - Cifer 展示名称:Cifer-Fraud-Detection-Dataset-AF 规模类别: - 1000万<样本量<1亿 --- # 📊 Cifer欺诈检测数据集 ## 🧠 数据集概览 **Cifer-Fraud-Detection-Dataset-AF** 是一套高保真全合成数据集,旨在支持金融欺诈检测领域中隐私保护型、联邦式及去中心化机器学习系统的开发与基准测试。 本数据集的结构灵感源自**PaySim模拟器**,该模拟器基于一家在14个以上国家开展业务的真实金融机构的聚合移动货币交易数据构建。Cifer在此基础上将样本量拓展至**2100万条**,并针对**联邦学习(federated-learning)环境**进行了优化,同时通过与真实世界数据集的对比验证了模型性能。 > ### 精度基准测试结果: > 在该数据集上训练的Cifer模型精度可达**99.93%**,对比真实欺诈数据集的**99.98%基准精度**,可为安全分布式机器学习研究提供高保真的行为模拟数据。 --- ## ⚙️ 生成方式 本数据集**完全由合成生成**,通过**Cifer内部仿真引擎**生成,该引擎经过训练可复现移动货币生态系统中常见的金融行为、智能体动态及欺诈策略模式。 - 基于PaySim的结构与仿真动力学 - 针对多智能体测试、联邦分区及异步模型训练进行了增强 - 包含逼真的欺诈标记机制与非均衡标签分布 --- # 🧩 数据结构 | 列名 | 描述 | |------------------|----------------------------------------------------------------------| | `step` | 时间单位(1步=1小时);仿真时长覆盖30天(总计744步) | | `type` | 交易类型:现金存入(CASH-IN)、现金取出(CASH-OUT)、借记(DEBIT)、支付(PAYMENT)、转账(TRANSFER) | | `amount` | 交易金额,单位为仿真货币 | | `nameOrig` | 发送方的匿名ID | | `oldbalanceOrg` | 交易前发送方账户余额 | | `newbalanceOrig` | 交易后发送方账户余额 | | `nameDest` | 接收方的匿名ID | | `oldbalanceDest` | 交易前接收方账户余额(如适用) | | `newbalanceDest` | 交易后接收方账户余额(如适用) | | `isFraud` | 二元标记:若交易为欺诈则为1 | | `isFlaggedFraud` | 若交易超出标记阈值(例如超过200,000)则为1 | --- # 📁 文件组织 总记录数:**21,000,000** 为适配大规模学习与联邦学习场景,数据集被拆分为14个文件: - `Cifer-Fraud-Detection-Dataset-AF-part-1-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-2-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-3-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-4-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-5-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-6-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-7-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-8-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-9-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-10-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-11-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-12-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-13-14.csv` → 150万条记录 - `Cifer-Fraud-Detection-Dataset-AF-part-14-14.csv` → 150万条记录 格式:`.csv`(可根据需求提供`.parquet`或`.json`格式) --- # ✅ 核心特性 - 全合成生成,可安全公开发布 - 兼容联邦学习(跨域、异步或多智能体场景) - 适用于隐私保护型机器学习与鲁棒性测试 - 可与真实世界欺诈数据集进行基准对比 - 支持通过分布感知建模开展公平性评估 --- # 🔬 应用场景 - 去中心化AI系统中的欺诈检测基准测试 - 联邦学习仿真(训练、评估、聚合) - 模型偏差缓解与公平性测试 - 多智能体协同与对抗性欺诈建模 --- # 📜 许可证 **Apache 2.0** — 可自由使用,需注明出处 --- # 🧾 署名与引用 本数据集由Cifer AI生成并扩展,其结构原理源自: **E. A. Lopez-Rojas、A. Elmir与S. Axelsson** <br> *PaySim:用于欺诈检测的金融移动货币模拟器* <br> 第28届欧洲建模与仿真研讨会——EMSS 2016
提供机构:
nithi060488
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作