AUC and p-values of the models–Home Credit data.

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://figshare.com/articles/dataset/AUC_and_p-values_of_the_models_Home_Credit_data_/26542045

下载链接

链接失效反馈

官方服务：

资源简介：

Credit scorecards are essential tools for banks to assess the creditworthiness of loan applicants. While advanced machine learning models like XGBoost and random forest often outperform traditional logistic regression in predictive accuracy, their lack of interpretability hinders their adoption in practice. This study bridges the gap between research and practice by developing a novel framework for constructing interpretable credit scorecards using Shapley values. We apply this framework to two credit datasets, discretizing numerical variables and utilizing one-hot encoding to facilitate model development. Shapley values are then employed to derive credit scores for each predictor variable group in XGBoost, random forest, LightGBM, and CatBoost models. Our results demonstrate that this approach yields credit scorecards with interpretability comparable to logistic regression while maintaining superior predictive accuracy. This framework offers a practical and effective solution for credit practitioners seeking to leverage the power of advanced models without sacrificing transparency and regulatory compliance.

信用评分卡（Credit scorecards）是银行评估贷款申请人信用资质的核心工具。尽管XGBoost、随机森林（random forest）等先进机器学习模型在预测精度上往往优于传统逻辑回归（logistic regression），但它们可解释性的缺失阻碍了其在实际场景中的落地应用。本研究通过构建一种基于夏普利值（Shapley values）的可解释信用评分卡构建新框架，填补了学术研究与实际应用之间的鸿沟。我们将该框架应用于两个信用数据集，通过离散化数值变量并采用独热编码（one-hot encoding）以助力模型开发。随后利用夏普利值为XGBoost、随机森林、LightGBM及CatBoost模型中的各预测变量组推导信用评分。研究结果表明，该方法所生成的信用评分卡在可解释性上可与逻辑回归媲美，同时仍保持更优的预测精度。该框架为信用领域从业者提供了一种切实可行的解决方案，使其能够在不牺牲透明度与监管合规性的前提下，充分发挥先进模型的效能。

创建时间：

2024-08-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集