five

RajgauriBhilare/nigerian-banking-retail-transactions

收藏
Hugging Face2026-01-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/RajgauriBhilare/nigerian-banking-retail-transactions
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - tabular-classification - tabular-regression tags: - nigeria - banking - fraud-detection - finance - transactions - fintech language: - en size_categories: - 1M<n<10M --- # Nigerian Retail Banking Transactions (CASA) **Dataset Type**: Banking & Finance **Version**: 1.0 **License**: Apache 2.0 **Language**: English **Geography**: Nigeria --- ## Dataset Description This dataset contains synthetic Nigerian retail banking transactions (Current and Savings Accounts) with fraud detection labels. It includes realistic patterns from Nigerian banking behavior, including mobile banking dominance, POS usage, and state-specific transaction patterns. **Use Cases**: - Fraud detection and prevention - Transaction anomaly detection - Customer segmentation - Channel optimization - Risk modeling --- ## Dataset Statistics - **Rows**: 5,000,000 (pilot: 10,000) - **Columns**: 16 - **Time Range**: 2023-01-01 to 2024-12-31 - **Geography**: 37 Nigerian states - **Fraud Prevalence**: 0.8% --- ## Schema | Column | Type | Description | |--------|------|-------------| | `transaction_id` | string | Unique transaction identifier (UUID) | | `account_id` | string | Account number (NUBAN format: ACC-########) | | `customer_id` | string | Customer identifier (CUS-########) | | `timestamp` | datetime | Transaction timestamp (Africa/Lagos timezone) | | `amount_ngn` | float | Transaction amount in Nigerian Naira | | `balance_before_ngn` | float | Account balance before transaction | | `balance_after_ngn` | float | Account balance after transaction | | `transaction_type` | category | `debit` or `credit` | | `channel` | category | Transaction channel: `mobile`, `pos`, `atm`, `web`, `branch`, `ussd`, `agent` | | `merchant_category_code` | string | ISO 18245 MCC code (4 digits) | | `merchant_name` | string | Nigerian merchant name | | `location_lga` | string | Local Government Area | | `location_state` | category | Nigerian state (37 states + FCT) | | `device_id` | string | Device fingerprint (SHA256 hash, for digital channels) | | `status` | category | `success`, `failed`, `pending`, `reversed` | | `fraud_flag` | bool | **LABEL**: True if fraudulent transaction | --- ## Label Distribution ### Fraud Flag - **Positive (fraud)**: 0.8% - **Negative (legitimate)**: 99.2% **Fraud Drivers**: - Night-time transactions (22:00-06:00) - High transaction velocity (>10 txns/24h) - New devices (<7 days old) - Amount anomalies (>3 std devs from customer average) - International transactions --- ## Data Distributions ### Channel Distribution | Channel | Percentage | Description | |---------|------------|-------------| | Mobile | 35% | Mobile banking apps | | POS | 30% | Point of Sale terminals | | ATM | 20% | Automated Teller Machines | | Web | 10% | Internet banking | | Branch | 3% | Physical branch | | USSD | 1.5% | *99# codes | | Agent | 0.5% | Banking agents | ### Transaction Amount - **Distribution**: Lognormal (μ=9.5, σ=1.8) - **Range**: ₦100 - ₦5,000,000 - **Median**: ~₦13,000 - **Mean**: ~₦64,000 ### Status Distribution | Status | Percentage | |--------|------------| | Success | 92% | | Failed | 7% | | Pending | 0.8% | | Reversed | 0.2% | ### Geographic Distribution **Top 5 States by Transaction Volume**: 1. Lagos - 25% 2. Abuja (FCT) - 12% 3. Rivers - 8% 4. Kano - 7% 5. Oyo - 5% --- ## Nigerian Context ### Banks Represented - **Tier 1**: Access, First Bank, GTBank, UBA, Zenith - **Tier 2**: Fidelity, Union, Sterling, Stanbic IBTC, Ecobank - **Digital**: Kuda, Carbon, FairMoney, ALAT ### Merchants - **Groceries**: Shoprite, Spar, Ebeano, Grand Square - **Fuel Stations**: Total, Mobil, Oando, Conoil, NNPC - **Telecom**: MTN, Airtel, Glo, 9mobile - **Utilities**: EKEDC, IKEDC, AEDC, PHEDC, EEDC - **Entertainment**: DSTV, GOtv, Startimes - **E-commerce**: Jumia, Konga, Jiji ### Temporal Patterns - **Salary Day Effect**: 2x transaction volume on last Friday of month - **Peak Hours**: 18:00-20:00 (evening after work) - **Friday Spike**: +17% compared to other weekdays - **January Lull**: -40% due to post-December spending --- ## Realism Features ### Behavioral Patterns ✅ **Family Transfers**: 65% of customers send money to 3-5 recipients regularly ✅ **Religious Giving**: Christian tithe (10%) and Muslim zakat (2.5%) ✅ **Airtime Purchases**: Small frequent transactions (₦200-₦1,500) ✅ **Bill Payments**: Regular utilities, cable TV, internet ✅ **Transaction Splitting**: Breaking large amounts into smaller ones ### Economic Context ✅ **Naira Devaluation**: Exchange rate variations reflected ✅ **Fuel Subsidy Removal**: Increased fuel station transactions ✅ **Cashless Policy**: High digital channel adoption ✅ **Salary Delays**: Reduced spending in affected states ✅ **Harvest Seasons**: Agricultural areas show seasonal patterns ### Cultural Events ✅ **Detty December**: +120% spending in December (Dec 20-31 peak) ✅ **Owambe Season**: Wedding/party spending (Mar-May, Oct-Nov) ✅ **Ramadan/Eid**: Increased spending patterns during Islamic holidays ✅ **Back to School**: September spending surge --- ## Data Quality ### Integrity Checks ✅ Zero null values in required fields ✅ Balance equations verified (balance_after = balance_before ± amount ± fees) ✅ Temporal ordering maintained ✅ No orphan records ✅ Referential integrity (customer → account → transaction) ### Validation Results - **Schema Compliance**: 100% - **Distribution Accuracy**: 99.5% - **Label Balance**: 100% (exact 0.8%) - **Nigerian Context**: 100% authentic - **Overall Quality Score**: 99.9% --- ## Files in This Dataset ``` retail_transactions/ ├── README.md (this file) ├── nigerian_retail_transactions_pilot.parquet (10k rows, 0.81 MB) ├── nigerian_retail_transactions.parquet (5M rows, ~400 MB) - Coming Soon ├── nigerian_retail_transactions.csv (5M rows, ~800 MB) - Coming Soon └── retail_transactions_sample.csv (100 rows, viewer sample) ``` --- ## Usage Example ### Load Dataset ```python import pandas as pd # Load full dataset (Parquet - recommended) df = pd.read_parquet('nigerian_retail_transactions.parquet') # Or load CSV df = pd.read_csv('nigerian_retail_transactions.csv') # Load pilot (for quick testing) df_pilot = pd.read_parquet('nigerian_retail_transactions_pilot.parquet') ``` ### Basic Fraud Detection ```python from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier # Prepare features features = ['amount_ngn', 'hour', 'day_of_week', 'channel_encoded', 'state_encoded', 'merchant_category_encoded'] X = df[features] y = df['fraud_flag'] # Split data X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, stratify=y, random_state=42 ) # Train model clf = RandomForestClassifier(n_estimators=100, random_state=42) clf.fit(X_train, y_train) # Evaluate from sklearn.metrics import classification_report, roc_auc_score y_pred = clf.predict(X_test) print(classification_report(y_test, y_pred)) print(f"ROC-AUC: {roc_auc_score(y_test, clf.predict_proba(X_test)[:, 1]):.4f}") ``` --- ## Limitations 1. **Synthetic Data**: Generated data, not real banking transactions 2. **Simplified Relationships**: Some customer behaviors simplified for generation 3. **Time Period**: Limited to 2023-2024 4. **Balance Tracking**: Sequential balance calculation simplified in pilot 5. **Fraud Patterns**: Based on known patterns, may not capture all fraud types --- ## Citation ```bibtex @dataset{nigerian_retail_transactions_2025, author = {Electric Sheep Africa}, title = {Nigerian Retail Banking Transactions Dataset}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/electricsheepafrica/nigerian-retail-transactions} } ``` --- ## License Apache 2.0 - Free to use for commercial and non-commercial purposes. --- ## Contact - **Organization**: Electric Sheep Africa - **Dataset Maintainer**: Banking Data Team - **Issues**: Report on GitHub or Hugging Face discussions --- ## Related Datasets - [Nigerian Card Transactions](../card_transactions/) - [Nigerian Personal Loans](../personal_loans/) - [Nigerian Mobile Money](../mobile_money/) - [Nigerian Customer 360](../customer_360/) --- **Last Updated**: 2025-10-19 **Status**: ✅ Pilot Validated, Full Dataset In Progress
提供机构:
RajgauriBhilare
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作