five

credit_scoring_datatset

收藏
魔搭社区2025-12-05 更新2025-09-20 收录
下载链接:
https://modelscope.cn/datasets/syncora/credit_scoring_datatset
下载链接
链接失效反馈
官方服务:
资源简介:
# 🏦 Synthetic Credit Scoring Dataset — Powered by Syncora 🌐 Official Website: [Syncora.ai](https://syncora.ai) High-fidelity synthetic financial behavior dataset for **AI, ML modeling & LLM training**. --- ## Dataset Summary This dataset contains **synthetic financial records** simulating customer behavior in a credit scoring context. Generated with **Syncora.ai**, it provides **privacy-safe, realistic data** while preserving statistical fidelity. Key applications: - Credit risk modeling - Machine learning classification - Feature engineering for financial AI - **Dataset for LLM training** (tabular-to-text, reasoning with structured finance data) - Educational use in data science courses --- ## 📊 Dataset Info | Field | Details | |-------------------|-------------------------------------------------------------------------| | **Features** | - `CUST_ID` (string) <br> - `INCOME` (int32) <br> - `SAVINGS` (int32) <br> - `DEBT` (int32) <br> - `CREDIT_SCORE` (int32) <br> - `DEFAULT` (int32) | | **Task Categories** | - Tabular Classification <br> - Financial Risk Modeling | | **License** | Apache-2.0 | | **Size Category** | 10K < n < 100K | Format: CSV, ~20K synthetic records. --- ## 📦 What This Repo Contains - **Synthetic Credit Scoring Dataset** – CSV format, ready for ML modeling. [⬇️ Download Dataset](https://huggingface.co/datasets/syncora/credit_scoring_datatset/blob/main/synthetic_e2dabba50a1a4fbcabd601f7883eef1e.csv) - **Jupyter Notebook** – Exploration and usage guide for the dataset. [📓 Open Notebook](https://huggingface.co/datasets/syncora/credit_scoring_datatset/blob/main/credit-scoring%20(1).ipynb) - **Syncora Platform** – Generate your own high-fidelity synthetic datasets. [⚡ Generate Your Own Synthetic Data](https://huggingface.co/spaces/syncora/synthetic-generation) ## 🤖 Machine Learning & AI Use Cases - **💳 Credit Risk Modeling**: Train classification models to predict default risk. - **⚙️ Feature Engineering**: Extract behavioral features like debt-to-income and repayment consistency. - **🧠 LLM Alignment**: Use as a structured dataset for LLM training (e.g., converting tabular inputs into human-readable risk assessments). - **📊 Benchmarking**: Compare model accuracy, precision, and recall across logistic regression, random forest, XGBoost, and deep learning. - **🔍 Explainability**: Apply SHAP, LIME, or ELI5 to interpret model predictions. - **⚖️ Bias & Fairness Studies**: Explore whether synthetic datasets can reduce bias compared to real-world financial data. - **✅ Synthetic Data Validation**: Test how well synthetic datasets maintain model performance relative to real datasets. ## Usage Load directly with Hugging Face `datasets` library: ```python from datasets import load_dataset dataset = load_dataset("syncora-ai/synthetic-credit-scoring") print(dataset["train"][0])

# 🏦 合成信贷评分数据集 — 由Syncora打造 🌐 官方网站:[Syncora.ai](https://syncora.ai) 适用于**人工智能(AI)、机器学习(ML)建模与大语言模型(LLM)训练**的高保真合成金融行为数据集。 --- ## 数据集概述 本数据集包含**合成金融记录**,模拟信贷评分场景下的客户行为。由**Syncora.ai**生成,可提供**隐私安全、贴近真实的数据**,同时保留统计保真度。 关键应用场景: - 信贷风险建模 - 机器学习分类任务 - 金融人工智能特征工程 - **大语言模型(LLM)训练数据集**(表格转文本、结构化金融数据推理) - 数据科学课程教学用途 --- ## 📊 数据集信息 | 字段名 | 详情 | |-------------------|-------------------------------------------------------------------------| | **特征字段** | - `CUST_ID`(字符串类型)<br> - `INCOME`(int32 整型)<br> - `SAVINGS`(int32 整型)<br> - `DEBT`(int32 整型)<br> - `CREDIT_SCORE`(int32 整型)<br> - `DEFAULT`(int32 整型) | | **任务类别** | - 表格分类任务<br> - 金融风险建模 | | **授权协议** | Apache-2.0 | | **数据规模** | 10K < n < 100K | 格式:CSV格式,约20,000条合成记录。 --- ## 📦 本仓库包含内容 - **合成信贷评分数据集**:CSV格式,可直接用于机器学习建模。 [⬇️ 下载数据集](https://huggingface.co/datasets/syncora/credit_scoring_datatset/blob/main/synthetic_e2dabba50a1a4fbcabd601f7883eef1e.csv) - **Jupyter Notebook**:数据集探索与使用指南。 [📓 打开Notebook](https://huggingface.co/datasets/syncora/credit_scoring_datatset/blob/main/credit-scoring%20(1).ipynb) - **Syncora平台**:生成您专属的高保真合成数据集。 [⚡ 生成您的合成数据](https://huggingface.co/spaces/syncora/synthetic-generation) ## 🤖 机器学习与人工智能应用场景 - **💳 信贷风险建模**:训练分类模型以预测违约风险。 - **⚙️ 特征工程**:提取债务收入比、还款一致性等行为特征。 - **🧠 大语言模型(LLM)对齐**:将该结构化数据集用于大语言模型训练(例如将表格输入转换为可读的风险评估报告)。 - **📊 基准测试**:对比逻辑回归、随机森林、XGBoost与深度学习等模型的准确率、精确率与召回率。 - **🔍 可解释性研究**:应用SHAP、LIME或ELI5等工具解释模型预测结果。 - **⚖️ 偏差与公平性研究**:探索合成数据集相较于真实金融数据能否降低模型偏差。 - **✅ 合成数据验证**:测试合成数据集相对于真实数据集的模型性能保留程度。 ## 使用方法 可直接通过Hugging Face `datasets`库加载: python from datasets import load_dataset dataset = load_dataset("syncora-ai/synthetic-credit-scoring") print(dataset["train"][0])
提供机构:
maas
创建时间:
2025-09-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作