five

Performance Evaluation of Hybrid Machine Learning Algorithms for Online Lending Credit Risk Prediction

收藏
DataCite Commons2024-12-16 更新2024-08-19 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Performance_Evaluation_of_Hybrid_Machine_Learning_Algorithms_for_Online_Lending_Credit_Risk_Prediction/25902891
下载链接
链接失效反馈
官方服务:
资源简介:
Peer-to-Peer systems are still in the early stages of development when it comes to the processing of credit and the appraisal of the risk associated with it. In this study, we used a hybrid convolutional neural network with logistic regression, a gradient-boosting decision tree, and a k-nearest neighbor to predict the credit risk in a P2P lending club. The lending clubs publicly available P2P loan data was used to train the model. In order to address the issue of data imbalance within the dataset, specifically between the non-defaulter and defaulter classes, the synthetic minority oversampling technique sampling approach is utilized. We developed the architecture of our hybrid model by removing the fully connected layer with the soft-max, which is the final layer of the fully connected CNN model and replaced by LR, GBDT, and k-NN algorithms. The experimental results show that the hybrid CNN-kNN model outperforms the CNN-GBDT and CNN-LR models based on the performance metrics accuracy, recall, F1-score, and area under the curve for both all input and important features. This shows that hybrid machine learning models effectively identify and categorize credit risk in peer-to-peer lending clubs, hence assisting in financial loss prevention.

在信贷处理及关联风险评估领域,点对点(Peer-to-Peer,P2P)系统仍处于发展初期阶段。本研究采用融合逻辑回归(Logistic Regression, LR)、梯度提升决策树(Gradient Boosting Decision Tree, GBDT)与k近邻(k-Nearest Neighbor, k-NN)的混合卷积神经网络(Convolutional Neural Network, CNN),用于预测P2P借贷俱乐部中的信贷风险。研究采用该借贷俱乐部公开可用的P2P贷款数据集对模型进行训练。为解决数据集内非违约类别与违约类别间的数据不平衡问题,本研究采用合成少数类过采样技术(Synthetic Minority Oversampling Technique, SMOTE)进行采样处理。我们对混合模型的架构进行了优化:移除全连接卷积神经网络模型的最终全连接层与Softmax层,改用LR、GBDT及k-NN算法替代。实验结果表明,在全部输入特征与重要特征两种场景下,基于准确率、召回率、F1值与曲线下面积(Area Under Curve, AUC)这几项性能指标,混合CNN-kNN模型的表现均优于CNN-GBDT与CNN-LR模型。这证实混合机器学习模型可有效识别并分类P2P借贷俱乐部中的信贷风险,进而助力金融风险防控与损失规避。
提供机构:
Taylor & Francis
创建时间:
2024-05-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作