credit_scoring_datatset
收藏魔搭社区2025-12-05 更新2025-09-20 收录
下载链接:
https://modelscope.cn/datasets/syncora/credit_scoring_datatset
下载链接
链接失效反馈官方服务:
资源简介:
# 🏦 Synthetic Credit Scoring Dataset — Powered by Syncora
🌐 Official Website: [Syncora.ai](https://syncora.ai)
High-fidelity synthetic financial behavior dataset for **AI, ML modeling & LLM training**.
---
## Dataset Summary
This dataset contains **synthetic financial records** simulating customer behavior in a credit scoring context.
Generated with **Syncora.ai**, it provides **privacy-safe, realistic data** while preserving statistical fidelity.
Key applications:
- Credit risk modeling
- Machine learning classification
- Feature engineering for financial AI
- **Dataset for LLM training** (tabular-to-text, reasoning with structured finance data)
- Educational use in data science courses
---
## 📊 Dataset Info
| Field | Details |
|-------------------|-------------------------------------------------------------------------|
| **Features** | - `CUST_ID` (string) <br> - `INCOME` (int32) <br> - `SAVINGS` (int32) <br> - `DEBT` (int32) <br> - `CREDIT_SCORE` (int32) <br> - `DEFAULT` (int32) |
| **Task Categories** | - Tabular Classification <br> - Financial Risk Modeling |
| **License** | Apache-2.0 |
| **Size Category** | 10K < n < 100K |
Format: CSV, ~20K synthetic records.
---
## 📦 What This Repo Contains
- **Synthetic Credit Scoring Dataset** – CSV format, ready for ML modeling.
[⬇️ Download Dataset](https://huggingface.co/datasets/syncora/credit_scoring_datatset/blob/main/synthetic_e2dabba50a1a4fbcabd601f7883eef1e.csv)
- **Jupyter Notebook** – Exploration and usage guide for the dataset.
[📓 Open Notebook](https://huggingface.co/datasets/syncora/credit_scoring_datatset/blob/main/credit-scoring%20(1).ipynb)
- **Syncora Platform** – Generate your own high-fidelity synthetic datasets.
[⚡ Generate Your Own Synthetic Data](https://huggingface.co/spaces/syncora/synthetic-generation)
## 🤖 Machine Learning & AI Use Cases
- **💳 Credit Risk Modeling**: Train classification models to predict default risk.
- **⚙️ Feature Engineering**: Extract behavioral features like debt-to-income and repayment consistency.
- **🧠 LLM Alignment**: Use as a structured dataset for LLM training (e.g., converting tabular inputs into human-readable risk assessments).
- **📊 Benchmarking**: Compare model accuracy, precision, and recall across logistic regression, random forest, XGBoost, and deep learning.
- **🔍 Explainability**: Apply SHAP, LIME, or ELI5 to interpret model predictions.
- **⚖️ Bias & Fairness Studies**: Explore whether synthetic datasets can reduce bias compared to real-world financial data.
- **✅ Synthetic Data Validation**: Test how well synthetic datasets maintain model performance relative to real datasets.
## Usage
Load directly with Hugging Face `datasets` library:
```python
from datasets import load_dataset
dataset = load_dataset("syncora-ai/synthetic-credit-scoring")
print(dataset["train"][0])
# 🏦 合成信贷评分数据集 — 由Syncora打造
🌐 官方网站:[Syncora.ai](https://syncora.ai)
适用于**人工智能(AI)、机器学习(ML)建模与大语言模型(LLM)训练**的高保真合成金融行为数据集。
---
## 数据集概述
本数据集包含**合成金融记录**,模拟信贷评分场景下的客户行为。由**Syncora.ai**生成,可提供**隐私安全、贴近真实的数据**,同时保留统计保真度。
关键应用场景:
- 信贷风险建模
- 机器学习分类任务
- 金融人工智能特征工程
- **大语言模型(LLM)训练数据集**(表格转文本、结构化金融数据推理)
- 数据科学课程教学用途
---
## 📊 数据集信息
| 字段名 | 详情 |
|-------------------|-------------------------------------------------------------------------|
| **特征字段** | - `CUST_ID`(字符串类型)<br> - `INCOME`(int32 整型)<br> - `SAVINGS`(int32 整型)<br> - `DEBT`(int32 整型)<br> - `CREDIT_SCORE`(int32 整型)<br> - `DEFAULT`(int32 整型) |
| **任务类别** | - 表格分类任务<br> - 金融风险建模 |
| **授权协议** | Apache-2.0 |
| **数据规模** | 10K < n < 100K |
格式:CSV格式,约20,000条合成记录。
---
## 📦 本仓库包含内容
- **合成信贷评分数据集**:CSV格式,可直接用于机器学习建模。
[⬇️ 下载数据集](https://huggingface.co/datasets/syncora/credit_scoring_datatset/blob/main/synthetic_e2dabba50a1a4fbcabd601f7883eef1e.csv)
- **Jupyter Notebook**:数据集探索与使用指南。
[📓 打开Notebook](https://huggingface.co/datasets/syncora/credit_scoring_datatset/blob/main/credit-scoring%20(1).ipynb)
- **Syncora平台**:生成您专属的高保真合成数据集。
[⚡ 生成您的合成数据](https://huggingface.co/spaces/syncora/synthetic-generation)
## 🤖 机器学习与人工智能应用场景
- **💳 信贷风险建模**:训练分类模型以预测违约风险。
- **⚙️ 特征工程**:提取债务收入比、还款一致性等行为特征。
- **🧠 大语言模型(LLM)对齐**:将该结构化数据集用于大语言模型训练(例如将表格输入转换为可读的风险评估报告)。
- **📊 基准测试**:对比逻辑回归、随机森林、XGBoost与深度学习等模型的准确率、精确率与召回率。
- **🔍 可解释性研究**:应用SHAP、LIME或ELI5等工具解释模型预测结果。
- **⚖️ 偏差与公平性研究**:探索合成数据集相较于真实金融数据能否降低模型偏差。
- **✅ 合成数据验证**:测试合成数据集相对于真实数据集的模型性能保留程度。
## 使用方法
可直接通过Hugging Face `datasets`库加载:
python
from datasets import load_dataset
dataset = load_dataset("syncora-ai/synthetic-credit-scoring")
print(dataset["train"][0])
提供机构:
maas
创建时间:
2025-09-13



