credit_scoring_datatset

Name: credit_scoring_datatset
Creator: maas
Published: 2025-12-05 16:50:27
License: 暂无描述

魔搭社区2025-12-05 更新2025-09-20 收录

下载链接：

https://modelscope.cn/datasets/syncora/credit_scoring_datatset

下载链接

链接失效反馈

官方服务：

资源简介：

# 🏦 Synthetic Credit Scoring Dataset — Powered by Syncora 🌐 Official Website: [Syncora.ai](https://syncora.ai) High-fidelity synthetic financial behavior dataset for **AI, ML modeling & LLM training**. --- ## Dataset Summary This dataset contains **synthetic financial records** simulating customer behavior in a credit scoring context. Generated with **Syncora.ai**, it provides **privacy-safe, realistic data** while preserving statistical fidelity. Key applications: - Credit risk modeling - Machine learning classification - Feature engineering for financial AI - **Dataset for LLM training** (tabular-to-text, reasoning with structured finance data) - Educational use in data science courses --- ## 📊 Dataset Info | Field | Details | |-------------------|-------------------------------------------------------------------------| | **Features** | - `CUST_ID` (string) - `INCOME` (int32) - `SAVINGS` (int32) - `DEBT` (int32) - `CREDIT_SCORE` (int32) - `DEFAULT` (int32) | | **Task Categories** | - Tabular Classification - Financial Risk Modeling | | **License** | Apache-2.0 | | **Size Category** | 10K < n < 100K | Format: CSV, ~20K synthetic records. --- ## 📦 What This Repo Contains - **Synthetic Credit Scoring Dataset** – CSV format, ready for ML modeling. [⬇️ Download Dataset](https://huggingface.co/datasets/syncora/credit_scoring_datatset/blob/main/synthetic_e2dabba50a1a4fbcabd601f7883eef1e.csv) - **Jupyter Notebook** – Exploration and usage guide for the dataset. [📓 Open Notebook](https://huggingface.co/datasets/syncora/credit_scoring_datatset/blob/main/credit-scoring%20(1).ipynb) - **Syncora Platform** – Generate your own high-fidelity synthetic datasets. [⚡ Generate Your Own Synthetic Data](https://huggingface.co/spaces/syncora/synthetic-generation) ## 🤖 Machine Learning & AI Use Cases - **💳 Credit Risk Modeling**: Train classification models to predict default risk. - **⚙️ Feature Engineering**: Extract behavioral features like debt-to-income and repayment consistency. - **🧠 LLM Alignment**: Use as a structured dataset for LLM training (e.g., converting tabular inputs into human-readable risk assessments). - **📊 Benchmarking**: Compare model accuracy, precision, and recall across logistic regression, random forest, XGBoost, and deep learning. - **🔍 Explainability**: Apply SHAP, LIME, or ELI5 to interpret model predictions. - **⚖️ Bias & Fairness Studies**: Explore whether synthetic datasets can reduce bias compared to real-world financial data. - **✅ Synthetic Data Validation**: Test how well synthetic datasets maintain model performance relative to real datasets. ## Usage Load directly with Hugging Face `datasets` library: ```python from datasets import load_dataset dataset = load_dataset("syncora-ai/synthetic-credit-scoring") print(dataset["train"][0])

# 🏦 合成信贷评分数据集 — 由Syncora打造 🌐 官方网站：[Syncora.ai](https://syncora.ai) 适用于**人工智能（AI）、机器学习（ML）建模与大语言模型（LLM）训练**的高保真合成金融行为数据集。 --- ## 数据集概述本数据集包含**合成金融记录**，模拟信贷评分场景下的客户行为。由**Syncora.ai**生成，可提供**隐私安全、贴近真实的数据**，同时保留统计保真度。关键应用场景： - 信贷风险建模 - 机器学习分类任务 - 金融人工智能特征工程 - **大语言模型（LLM）训练数据集**（表格转文本、结构化金融数据推理） - 数据科学课程教学用途 --- ## 📊 数据集信息 | 字段名 | 详情 | |-------------------|-------------------------------------------------------------------------| | **特征字段** | - `CUST_ID`（字符串类型） - `INCOME`（int32 整型） - `SAVINGS`（int32 整型） - `DEBT`（int32 整型） - `CREDIT_SCORE`（int32 整型） - `DEFAULT`（int32 整型） | | **任务类别** | - 表格分类任务 - 金融风险建模 | | **授权协议** | Apache-2.0 | | **数据规模** | 10K < n < 100K | 格式：CSV格式，约20,000条合成记录。 --- ## 📦 本仓库包含内容 - **合成信贷评分数据集**：CSV格式，可直接用于机器学习建模。 [⬇️ 下载数据集](https://huggingface.co/datasets/syncora/credit_scoring_datatset/blob/main/synthetic_e2dabba50a1a4fbcabd601f7883eef1e.csv) - **Jupyter Notebook**：数据集探索与使用指南。 [📓 打开Notebook](https://huggingface.co/datasets/syncora/credit_scoring_datatset/blob/main/credit-scoring%20(1).ipynb) - **Syncora平台**：生成您专属的高保真合成数据集。 [⚡ 生成您的合成数据](https://huggingface.co/spaces/syncora/synthetic-generation) ## 🤖 机器学习与人工智能应用场景 - **💳 信贷风险建模**：训练分类模型以预测违约风险。 - **⚙️ 特征工程**：提取债务收入比、还款一致性等行为特征。 - **🧠 大语言模型（LLM）对齐**：将该结构化数据集用于大语言模型训练（例如将表格输入转换为可读的风险评估报告）。 - **📊 基准测试**：对比逻辑回归、随机森林、XGBoost与深度学习等模型的准确率、精确率与召回率。 - **🔍 可解释性研究**：应用SHAP、LIME或ELI5等工具解释模型预测结果。 - **⚖️ 偏差与公平性研究**：探索合成数据集相较于真实金融数据能否降低模型偏差。 - **✅ 合成数据验证**：测试合成数据集相对于真实数据集的模型性能保留程度。 ## 使用方法可直接通过Hugging Face `datasets`库加载： python from datasets import load_dataset dataset = load_dataset("syncora-ai/synthetic-credit-scoring") print(dataset["train"][0])

提供机构：

maas

创建时间：

2025-09-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集