Alternative predictor variables.

Figshare2024-05-21 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Alternative_predictor_variables_/25870735

下载链接

链接失效反馈

官方服务：

资源简介：

This study explores the potential of utilizing alternative data sources to enhance the accuracy of credit scoring models, compared to relying solely on traditional data sources, such as credit bureau data. A comprehensive dataset from the Home Credit Group’s home loan portfolio is analysed. The research examines the impact of incorporating alternative predictors that are typically overlooked, such as an applicant’s social network default status, regional economic ratings, and local population characteristics. The modelling approach applies the model-X knockoffs framework for systematic variable selection. By including these alternative data sources, the credit scoring models demonstrate improved predictive performance, achieving an area under the curve metric of 0.79360 on the Kaggle Home Credit default risk competition dataset, outperforming models that relied solely on traditional data sources, such as credit bureau data. The findings highlight the significance of leveraging diverse, non-traditional data sources to augment credit risk assessment capabilities and overall model accuracy.

本研究探讨了相较于仅依赖征信局数据等传统数据源，利用替代数据源提升信用评分模型准确率的潜力。本研究分析了捷信集团（Home Credit Group）住房贷款组合的综合数据集。本研究考察了纳入通常被忽视的替代预测变量的影响，例如申请人的社交网络违约状态、区域经济评级以及当地人口特征。建模方法采用模型-X knockoffs（model-X knockoffs）框架开展系统性变量选择。通过纳入这些替代数据源，信用评分模型的预测性能得到提升，在Kaggle平台的Home Credit违约风险竞赛数据集上取得了0.79360的曲线下面积（area under the curve, AUC）指标，优于仅依赖征信局数据等传统数据源的模型。研究结果凸显了利用多样化非传统数据源以增强信用风险评估能力与整体模型准确率的重要意义。

创建时间：

2024-05-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集