five

electricsheepafrica/electric-sheep-credit

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/electric-sheep-credit
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - machine-generated language: - en license: - mit multilinguality: - monolingual pretty_name: Electric Sheep Alternative Credit Data for Thin-File Users size_categories: - 100<n<1000 source_datasets: - original tags: - credit-scoring - financial-inclusion - thin-file - synthetic-data - behavioral-finance - alternative-data - tabular - fintech - machine-learning task_categories: - tabular-classification - tabular-regression configs: - config_name: US data_files: "US/*.parquet" - config_name: NG data_files: "NG/*.parquet" - config_name: IN data_files: "IN/*.parquet" - config_name: BR data_files: "BR/*.parquet" --- # Electric Sheep — Alternative Credit Data for Thin-File Users ## Dataset Summary The **Electric Sheep Alternative Credit Dataset** is a synthetic dataset for modeling creditworthiness using behavioral financial data rather than traditional credit history. It targets **thin-file users** — individuals excluded from conventional credit scoring due to insufficient loan or credit card history. **Four culturally authentic regions:** | Region | Currency | Dominant Channel | Median Income | |--------|----------|------------------|---------------| | 🇺🇸 **US** (United States) | USD | POS / Bank | ~$17,400 | | 🇳🇬 **NG** (Nigeria) | NGN | Mobile Money | ~₦2,400 | | 🇮🇳 **IN** (India) | INR | UPI | ~₹1,900 | | 🇧🇷 **BR** (Brazil) | BRL | PIX / Bank | ~R$3,500 | ## Files Per Region Each region contains **2 files** in `{region}/`: | File | Description | |------|-------------| | `credit_profiles.parquet` | User-level profiles with demographics, 19 behavioral features, and credit labels (1 row per user, 30 columns) | | `metadata.json` | Generation statistics | **Raw transactions** are in `transactions/{region}.parquet` (separate download, ~900 transactions per user). ## Loading the Dataset ```python from datasets import load_dataset # Load US region ds = load_dataset("electricsheepafrica/electric-sheep-credit", name="US") profiles = ds["train"] # credit_profiles.parquet print(profiles[0]) ``` Or with pandas: ```python import pandas as pd profiles = pd.read_parquet("US/credit_profiles.parquet") transactions = pd.read_parquet("US/transactions.parquet") ``` ## Credit Profiles Schema (30 columns) **Demographics:** - `user_id` — Unique identifier (`US_000000`) - `age_range` — `18-25`, `26-35`, `36-45`, `46-55`, `55+` - `income_type` — `salary`, `gig`, `business`, `unemployed` - `region` — `US`, `NG`, `IN`, `BR` - `currency` — `USD`, `NGN`, `INR`, `BRL` - `account_tenure_days` — Length of financial activity - `primary_archetype` — Behavioral pattern (stable_salaried, gig_worker, etc.) - `secondary_archetype` — Secondary pattern **Behavioral Features (19):** - `avg_monthly_income` — Mean monthly inflow - `income_volatility` — Coefficient of variation of income - `income_frequency` — Income events per month - `income_trend` — Linear trend (positive = growing) - `income_gap_months` — Months with zero income - `avg_monthly_spend` — Mean monthly outflow - `spending_volatility` — Variability of spending - `essential_spend_ratio` — Fraction on food, transport, bills, healthcare - `discretionary_spend_ratio` — Fraction on non-essentials - `betting_ratio` — Fraction on betting/gambling - `avg_balance` — Mean balance - `min_balance` — Lowest balance reached - `max_balance` — Highest balance reached - `overdraft_frequency` — Fraction of transactions with negative balance - `bill_payment_consistency` — Regularity of bills (0–1) - `recurring_expense_ratio` — Fraction of spending that recurs - `transaction_frequency` — Transactions per day - `cashflow_stability_score` — Composite stability score (0–1) - `risk_behavior_score` — Composite risk indicator (0–1) **Credit Labels:** - `credit_outcome` — `good`, `bad`, `indeterminate` - `default_probability` — Estimated default risk (0–1) - `risk_bucket` — `low`, `medium`, `high` ## Transactions Schema | Column | Type | Description | |--------|------|-------------| | transaction_id | string | UUID | | user_id | string | Foreign key | | timestamp | datetime | Transaction time | | amount | float | Signed: `+income`, `-expense` | | transaction_type | string | `credit` or `debit` | | category | string | `food`, `transport`, `bills`, `entertainment`, `betting`, `transfer`, `savings`, `healthcare`, `other` | | channel | string | `bank`, `cash`, `POS`, `mobile_money` | | merchant_type | string | Region-specific merchant | | balance_estimate | float | Running balance | | is_recurring | bool | Recurring payment flag | | counterparty | string | Transfer partner (if applicable) | | corridor | string | Remittance path (e.g., `US→NG`) | ## Feature Correlations with Creditworthiness | Feature | US | NG | IN | BR | Interpretation | |---------|----|----|----|----|----------------| | `overdraft_frequency` | −0.73*** | −0.84*** | −0.80*** | −0.74*** | Strongest negative | | `cashflow_stability_score` | +0.58*** | +0.70*** | +0.71*** | +0.66*** | Strongest positive | | `betting_ratio` | −0.36* | −0.60*** | −0.65*** | −0.60*** | Significant negative | | `avg_balance` | +0.49** | +0.56** | +0.40* | +0.48** | Liquidity buffer | | `bill_payment_consistency` | +0.50** | +0.50** | +0.38* | +0.37* | Reliability signal | ## Label Distribution (V2 — Simulated Loans) | Region | Good | Bad | Indeterminate | Default Rate | |--------|------|-----|---------------|--------------| | US | 63% | 20% | 17% | 0.22 | | NG | 47% | 37% | 16% | 0.32 | | IN | 60% | 30% | 10% | 0.25 | | BR | 47% | 33% | 20% | 0.30 | ## Use Cases - Credit scoring model development for thin-file populations - Benchmarking alternative data approaches - Financial inclusion research - Fairness and bias testing - Cross-regional behavioral finance studies ## Limitations - Synthetic data (not from real financial institutions) - Simulated credit outcomes - Simplified temporal patterns - Approximate regional income levels ## Citation ```bibtex @dataset{electric_sheep_2026, title={Electric Sheep: Alternative Credit Data for Thin-File Users}, author={ElectricSheepAfrica}, year={2026}, license={MIT}, url={https://huggingface.co/datasets/electricsheepafrica/electric-sheep-credit} } ``` ## License MIT — free for research and commercial use.
提供机构:
electricsheepafrica
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作