Han, Seongil (2021) Explainable credit scoring through generative adversarial networks. PhD thesis, Birkbeck, University of London.
|
Text
PhD_thesis_Seongil Han_15Dec2021.pdf - Full Version Download (5MB) | Preview |
Abstract
Credit scoring has been playing a vital role in mitigating financial risk that could affect the sustainability of financial institutions. An accurate and automated credit scoring allows to control the financial risk by using the state-of-the-art and data-driven analytics. The primary rationale of this thesis is to understand and improve financial credit scoring models. The key issues that occur in the process of developing credit scoring model using the state-of-the-art machine learning(ML) techniques, are identified and investigated. Through the proposed models using ML approaches in this thesis, the challenges in credit scoring can be resolved. Therefore, the existing credit scoring models can be improved by novel computer science techniques in realistic problem of the areas as follows. First, an interpretability aspect of credit scoring as eXplainable Artificial Intelligence (XAI) is examined by non-parametric tree-based ML models combining with SHapley Additive exPlanations (SHAP). In this experiment, the suitability of tree-based ensemble models is also assessed in imbalanced credit scoring dataset, comparing the performance of different class imbalance. In order to achieve explainability as well as high predictive performance in credit scoring, we propose a model named as NATE which is Non-pArameTric approach for Explainable credit scoring. This explainable and comprehensible NATE allows us to analyse the key factors of credit scoring by SHAP values both locally and globally in addition to robust predictive power for creditworthiness. Second, the issue of class imbalance is investigated. Class imbalance in datasets occurs when there are a huge number of differences of observations between the classes in the dataset. The imbalanced class in real-world credit scoring datasets results in the biased classification performance for credit worthiness. As an approach to overcome the limitation of traditional resampling methods for class imbalance, we propose a model named as NOTE which is Non-parametric Oversampling Techniques for Explainable credit scoring. By using conditional Wasserstein Generative Adversarial Networks (cWGAN)-based oversampling technique paired with Non-parametric Stacked Autoen-coder (NSA), NOTE as a generative model allows to oversample minority class with reflecting the complex and non-linear patterns in the dataset. Therefore, NOTE predicts the classification and explains the credit scoring model with unbiased performance on a balanced credit scoring dataset. Third, incomplete data is also a common issue in credit scoring datasets. This missingness normally distorts the analysis and prediction for credit scoring, and results in the misclassification for creditworthiness. To address the issue of missing values in the dataset and overcome the limitation of conventional imputation methods, we propose a model named as DITE which is Denoising Imputation TEchniques for missingness in credit scoring. By using the extended Generative Adversarial Imputation Networks (GAIN) paired with randomised Singular Value Decomposition (rSVD), DITE is capable of replacing missing values with plausible estimation through reducing the noise and capturing complex missing patterns in dataset. To evaluate the robustness and effectiveness of the proposed models for key issues, namely, model explainability, class imbalance, and missing-ness in the dataset, the performances of models using ML are compared against the benchmarks of literature on publicly available real-world financial credit scoring datasets, respectively. Our experimental results successfully demonstrated the robustness and effectiveness of the novel concepts used in the models by outperforming the benchmarks. Furthermore, the pro-posed NATE, NOTE and DITE also lead to a better model explainability, suitability, stability, and superiority on complex and non-linear credit scoring datasets. Finally, this thesis demonstrated that the existing credit scoring models can be improved by novel computer science techniques in real-world problem of credit scoring domain.
Metadata
Item Type: | Thesis |
---|---|
Copyright Holders: | The copyright of this thesis rests with the author, who asserts his/her right to be known as such according to the Copyright Designs and Patents Act 1988. No dealing with the thesis contrary to the copyright or moral rights of the author is permitted. |
Depositing User: | Acquisitions And Metadata |
Date Deposited: | 27 Jan 2022 18:00 |
Last Modified: | 01 Nov 2023 15:17 |
URI: | https://eprints.bbk.ac.uk/id/eprint/47370 |
DOI: | https://doi.org/10.18743/PUB.00047370 |
Statistics
Additional statistics are available via IRStats2.