electricsheepafrica/electric-sheep-credit
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/electricsheepafrica/electric-sheep-credit
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- machine-generated
language:
- en
license:
- mit
multilinguality:
- monolingual
pretty_name: Electric Sheep Alternative Credit Data for Thin-File Users
size_categories:
- 100<n<1000
source_datasets:
- original
tags:
- credit-scoring
- financial-inclusion
- thin-file
- synthetic-data
- behavioral-finance
- alternative-data
- tabular
- fintech
- machine-learning
task_categories:
- tabular-classification
- tabular-regression
configs:
- config_name: US
data_files: "US/*.parquet"
- config_name: NG
data_files: "NG/*.parquet"
- config_name: IN
data_files: "IN/*.parquet"
- config_name: BR
data_files: "BR/*.parquet"
---
# Electric Sheep — Alternative Credit Data for Thin-File Users
## Dataset Summary
The **Electric Sheep Alternative Credit Dataset** is a synthetic dataset for modeling creditworthiness using behavioral financial data rather than traditional credit history. It targets **thin-file users** — individuals excluded from conventional credit scoring due to insufficient loan or credit card history.
**Four culturally authentic regions:**
| Region | Currency | Dominant Channel | Median Income |
|--------|----------|------------------|---------------|
| 🇺🇸 **US** (United States) | USD | POS / Bank | ~$17,400 |
| 🇳🇬 **NG** (Nigeria) | NGN | Mobile Money | ~₦2,400 |
| 🇮🇳 **IN** (India) | INR | UPI | ~₹1,900 |
| 🇧🇷 **BR** (Brazil) | BRL | PIX / Bank | ~R$3,500 |
## Files Per Region
Each region contains **2 files** in `{region}/`:
| File | Description |
|------|-------------|
| `credit_profiles.parquet` | User-level profiles with demographics, 19 behavioral features, and credit labels (1 row per user, 30 columns) |
| `metadata.json` | Generation statistics |
**Raw transactions** are in `transactions/{region}.parquet` (separate download, ~900 transactions per user).
## Loading the Dataset
```python
from datasets import load_dataset
# Load US region
ds = load_dataset("electricsheepafrica/electric-sheep-credit", name="US")
profiles = ds["train"] # credit_profiles.parquet
print(profiles[0])
```
Or with pandas:
```python
import pandas as pd
profiles = pd.read_parquet("US/credit_profiles.parquet")
transactions = pd.read_parquet("US/transactions.parquet")
```
## Credit Profiles Schema (30 columns)
**Demographics:**
- `user_id` — Unique identifier (`US_000000`)
- `age_range` — `18-25`, `26-35`, `36-45`, `46-55`, `55+`
- `income_type` — `salary`, `gig`, `business`, `unemployed`
- `region` — `US`, `NG`, `IN`, `BR`
- `currency` — `USD`, `NGN`, `INR`, `BRL`
- `account_tenure_days` — Length of financial activity
- `primary_archetype` — Behavioral pattern (stable_salaried, gig_worker, etc.)
- `secondary_archetype` — Secondary pattern
**Behavioral Features (19):**
- `avg_monthly_income` — Mean monthly inflow
- `income_volatility` — Coefficient of variation of income
- `income_frequency` — Income events per month
- `income_trend` — Linear trend (positive = growing)
- `income_gap_months` — Months with zero income
- `avg_monthly_spend` — Mean monthly outflow
- `spending_volatility` — Variability of spending
- `essential_spend_ratio` — Fraction on food, transport, bills, healthcare
- `discretionary_spend_ratio` — Fraction on non-essentials
- `betting_ratio` — Fraction on betting/gambling
- `avg_balance` — Mean balance
- `min_balance` — Lowest balance reached
- `max_balance` — Highest balance reached
- `overdraft_frequency` — Fraction of transactions with negative balance
- `bill_payment_consistency` — Regularity of bills (0–1)
- `recurring_expense_ratio` — Fraction of spending that recurs
- `transaction_frequency` — Transactions per day
- `cashflow_stability_score` — Composite stability score (0–1)
- `risk_behavior_score` — Composite risk indicator (0–1)
**Credit Labels:**
- `credit_outcome` — `good`, `bad`, `indeterminate`
- `default_probability` — Estimated default risk (0–1)
- `risk_bucket` — `low`, `medium`, `high`
## Transactions Schema
| Column | Type | Description |
|--------|------|-------------|
| transaction_id | string | UUID |
| user_id | string | Foreign key |
| timestamp | datetime | Transaction time |
| amount | float | Signed: `+income`, `-expense` |
| transaction_type | string | `credit` or `debit` |
| category | string | `food`, `transport`, `bills`, `entertainment`, `betting`, `transfer`, `savings`, `healthcare`, `other` |
| channel | string | `bank`, `cash`, `POS`, `mobile_money` |
| merchant_type | string | Region-specific merchant |
| balance_estimate | float | Running balance |
| is_recurring | bool | Recurring payment flag |
| counterparty | string | Transfer partner (if applicable) |
| corridor | string | Remittance path (e.g., `US→NG`) |
## Feature Correlations with Creditworthiness
| Feature | US | NG | IN | BR | Interpretation |
|---------|----|----|----|----|----------------|
| `overdraft_frequency` | −0.73*** | −0.84*** | −0.80*** | −0.74*** | Strongest negative |
| `cashflow_stability_score` | +0.58*** | +0.70*** | +0.71*** | +0.66*** | Strongest positive |
| `betting_ratio` | −0.36* | −0.60*** | −0.65*** | −0.60*** | Significant negative |
| `avg_balance` | +0.49** | +0.56** | +0.40* | +0.48** | Liquidity buffer |
| `bill_payment_consistency` | +0.50** | +0.50** | +0.38* | +0.37* | Reliability signal |
## Label Distribution (V2 — Simulated Loans)
| Region | Good | Bad | Indeterminate | Default Rate |
|--------|------|-----|---------------|--------------|
| US | 63% | 20% | 17% | 0.22 |
| NG | 47% | 37% | 16% | 0.32 |
| IN | 60% | 30% | 10% | 0.25 |
| BR | 47% | 33% | 20% | 0.30 |
## Use Cases
- Credit scoring model development for thin-file populations
- Benchmarking alternative data approaches
- Financial inclusion research
- Fairness and bias testing
- Cross-regional behavioral finance studies
## Limitations
- Synthetic data (not from real financial institutions)
- Simulated credit outcomes
- Simplified temporal patterns
- Approximate regional income levels
## Citation
```bibtex
@dataset{electric_sheep_2026,
title={Electric Sheep: Alternative Credit Data for Thin-File Users},
author={ElectricSheepAfrica},
year={2026},
license={MIT},
url={https://huggingface.co/datasets/electricsheepafrica/electric-sheep-credit}
}
```
## License
MIT — free for research and commercial use.
提供机构:
electricsheepafrica



