five

Synthetic Banking Transaction Dataset with Multi-Pattern Fraud Labels for Machine Learning Research

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/ktbthg777x
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains synthetically generated banking transaction records . It is designed specifically for fraud detection research, algorithm development, and educational purposes. The dataset contains realistic banking operations while embedding multiple types of fraud patterns that reflect real-world financial fraud scenarios. Key Features: 1. 1,000,000 transaction records with ground-truth fraud labels. 2. 100,000 unique synthetic customer profiles and 20,000 merchant entities. 3. Four embedded fraud pattern types: card testing, account takeover, money laundering rings, and geographic anomalies. 4. Fraud rate: ~0.5% (configurable). 5. Rich feature set: temporal, geographical, behavioral, and network attributes. 6. Data format: CSV and Parquet, UTF-8 encoded. 7. Timestamps: ISO 8601 format with UTC timezone. 8. Fully synthetic data; no privacy concerns. Dataset Characteristics: 1. Temporal range: 1 year (configurable). 2. Scalable generation methodology: allows creation of larger or smaller datasets. 3. Development and benchmarking of fraud detection algorithms. 4. Research on imbalanced classification and graph-based fraud detection. 5. Temporal and sequential pattern analysis. 6. Geographic anomaly detection. 7. Educational purposes in ML, data science, and fintech courses. 8. Testing ML pipelines and feature engineering strategies. Advantages: 1. Privacy-compliant: no real customer data, eliminating GDPR/PCI-DSS concerns. 2. Ground-truth labels: enables accurate model evaluation. 3. Reproducible: consistent dataset for comparative studies. 4. Multi-pattern fraud: supports comprehensive algorithm testing. Limitations: 1. Synthetic data approximates real-world patterns; actual fraud scenarios may be more complex. 2. Geographic coordinates are randomized within realistic ranges; they do not represent actual merchant locations. 3. All transactions are in USD. I4. ntended for research and educational purposes; production systems require validation on real-world data. Potential Applications: 1. Fraud detection software testing for banks and fintech companies. 2. Algorithm benchmarking for machine learning competitions or research. 3. Simulation of financial networks to study systemic risk and money laundering. 4. Behavioral analytics to understand customer transaction patterns. 5. Training data for anomaly detection models in payment systems. Acknowledgments Generated using Python scientific stack (NumPy, Pandas) and Faker library for synthetic data generation.
创建时间:
2025-11-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作