Research on Interpretable Loan Approval Identification Using Multi-Dimensional Features
Abstract
Loan approval is a pivotal component of financial risk control. Current research often relies on black-box prediction models and lacks in-depth exploration of multidimensional features and interpretability, resulting in shortcomings in model interpretability and robustness. To address this, this paper proposes an Interpretable Loan Approval Identification Model based on Multidimensional Features (ILA-MDF). The ILA-MDF model is constructed using the CatBoost algorithm and is compared with benchmark models such as Random Forest in experimental evaluations. The results indicate that the proposed model performs optimally across six metrics—Accuracy, Precision, Recall, F1-score, AUC, and MCC—achieving values of 93.33%, 87.19%, 82.03%, 84.53%, 97.77%, and 80.34%, respectively. Furthermore, the SHAP framework is introduced to analyze the key factors influencing model decisions. Interpretability analysis reveals that the probability of loan approval increases significantly when the loan interest rate exceeds 14% or when the previous loan default indicator is 0. The ILA-MDF model proposed in this paper demonstrates significant advantages in both predictive performance and decision interpretability, providing a reference for formulating loan approval and risk prevention strategies.
Keywords
Download Options
Introduction
In recent years, as demand for social financing has continued to grow, loans have become an indispensable financial tool for both individuals and businesses. For financial institutions, the quality of loan approval decisions directly affects asset security and operational stability. Against this backdrop, credit scoring models serve as the foundation of automated approval systems by quantifying borrowers' credit risk and providing an objective basis for approval decisions [1]. A robust and accurate model not only improves approval efficiency and reduces default losses but also plays a key role in advancing the intelligent transformation of financial services.
However, traditional loan approval processes have long relied on manual experience, which comes with limitations such as low efficiency and inconsistent standards. While machine learning techniques have been widely adopted to build automated approval models, existing research still has gaps: on one hand, most models focus narrowly on default prediction and fail to adequately support comprehensive approval decisions involving multi-dimensional rules; on the other, many high-performance models are "black boxes" with opaque decision-making logic [2], making it difficult to meet the strict regulatory requirements for fairness and interpretability, and leaving applicants unable to understand the rationale behind approval outcomes. Therefore, exploring loan approval models that combine high predictive performance with strong interpretability has become an important focus for both academia and industry.
In response to these challenges, extensive research has been conducted in academia on loan approval modeling, forming three main directions.
In terms of constructing ensemble learning and complex models, Kokate and Chetty employed a combination of machine learning methods such as gradient boosting, random forests, and decision trees to build a credit scoring model for automated approval, using feature selection techniques to improve model efficiency. Their model demonstrated superior risk discrimination ability and stability on real-world banking data, validating the effectiveness of combining ensemble learning with feature optimization [3]. Further, Uddin et al. proposed a hybrid approach integrating deep learning with ExtraTrees and adopted an ensemble voting mechanism to combine the three best-performing base models, achieving an accuracy of 87.26% in bank loan default prediction, highlighting the significant improvement of ensemble strategies on classification performance [4]. In line with this, Perera and Premaratne developed a stacking ensemble model based on a voting mechanism, further confirming the practicality and stability of ensemble learning in credit risk assessment [5]. Lakshmi and Rao constructed a loan default prediction model by integrating algorithms such as naive Bayes, decision trees, and multi-layer perceptrons, training on Kaggle historical data and ultimately achieving 90% accuracy, further demonstrating that ensemble learning methods can effectively support banks in making loan approval decisions [6].
In terms of benchmarking and optimization of traditional machine learning models, a substantial body of research has focused on providing benchmarks for model selection through systematic comparisons. Singh employed logistic regression, random forests, and support vector machines to model loan approval, achieving an accuracy of 78.785% on historical data [7]. Nureni and Adekola systematically compared eight machine learning algorithms and found that logistic regression performed particularly well in terms of accuracy (up to 83.24%) and sensitivity (up to 97.76%) [8]. Tumuluru et al. pointed out that among models for default prediction based on historical customer data, the random forest algorithm achieved the best accuracy [9]. Viswanatha conducted a comprehensive comparison of algorithms including random forests, naive Bayes, decision trees, and KNN, with naive Bayes achieving the highest accuracy at 83.73% [10]. A systematic comparison by Sarkar et al. also indicated that random forest and logistic regression models slightly outperformed other compared algorithms in terms of overall performance. These extensive comparative studies provide empirical support for model selection across different scenarios [11].
Conclusion
This paper focuses on the problem of loan approval identification and proposes a model named ILA-MDF (Interpretable Loan Approval Identification Model based on Multidimensional Features).
First, outlier detection is performed on the acquired dataset, followed by standardization and One-Sided Selection undersampling, effectively improving data quality and class balance. One-Sided Selection was chosen for its ability to remove noisy majority class samples while preserving informative decision boundary samples.
Second, through a comprehensive comparison of multiple mainstream machine learning algorithms, CatBoost is selected as the base architecture, and grid search combined with five-fold cross-validation is used for hyperparameter tuning, significantly enhancing the model's generalization ability and robustness.
Third, multi-level analysis based on SHAP shows that features such as previous loan defaults, the ratio of loan amount to annual income, and loan interest rate have a significant impact on identification results. In particular, when the loan interest rate exceeds 14% or the indicator for previous loan defaults is 0, the probability of loan approval increases significantly. These findings provide a theoretical basis for risk control strategies in loan approval.
References
- Sarathamani, S., & Kumar, S. (2024). Application of machine learning in credit scoring for loan approval systems. In 2024 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES) (pp. 1–7). IEEE.
- Zou, Y., Xia, M., & Lan, X. (2025). Interpretable credit scoring based on an additive extreme gradient boosting. Chaos, Solitons & Fractals, 194, Article 116216.
- Kokate, S., & Chetty, M. S. R. (2021). Credit risk assessment of loan defaulters in commercial banks using voting classifier ensemble learner machine learning model. International Journal of Safety and Security Engineering, 11(5), 565–572.
- Uddin, N., Ahamed, M. K. U., Uddin, M. A., Memon, M. H., & Uddin, M. A. (2023). An ensemble machine learning based bank loan approval predictions system with a smart application. International Journal of Cognitive Computing in Engineering, 4, 327–339.
- Perera, C. L., & Premaratne, S. C. (2024). An ensemble machine learning approach for forecasting credit risk of loan applications. WSEAS Transactions on Systems, 23, 31–46.
- Lakshmi, M. V. N., & Rao, P. S. (2024). A prediction of modernized loan approval system based on machine learning approach. International Journal for Modern Trends in Science and Technology, 10(6), 17–21.
- Singh, V. (2021). Prediction of modernized loan approval system based on machine learning approach. In 2021 International Conference on Intelligent Technologies (CONIT) (pp. 1–6). IEEE.
- Nureni, A. A., & Adekola, O. E. (2022). Loan approval prediction based on machine learning approach. FUDMA Journal of Sciences, 6(3), 41–50.
- Tumuluru, P., Burra, L. R., Loukya, M., Sai, N. S., & Bhavani, K. (2022). Comparative analysis of customer loan approval prediction using machine learning algorithms. In 2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS) (pp. 349–353). IEEE.
- Viswanatha, V., Ramachandra, A. C., Vishwas, K. N., Kavyashree, B. S., & Nayana, G. (2023). Prediction of loan approval in banks using machine learning approach. International Journal of Engineering and Management Research, 13(4), 7–19.
- Sarkar, T., Rakhra, M., Sharma, V., Singh, S., & Sharma, P. (2024). An empirical comparison of machine learning techniques for bank loan approval prediction. In 2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE) (pp. 137–143). IEEE.
- Gomathy, C. K., Charulatha, M., Aakash, M., & Mani, P. (2021). The loan prediction using machine learning. International Research Journal of Engineering and Technology, 8(10), 2395–0056.
- Karthikeyan, S. M., & Ravikumar, P. (2021). A comparative analysis of feature selection for loan prediction model. International Journal of Computer Applications, 975, 8887.
- Bhattad, S., Bawane, S., Agrawal, S., & Tiwari, A. (2021). Loan prediction using machine learning algorithms. International Journal of Computer Science Trends and Technology, 9(3), 143–146.
- Nalawade, S., Andhe, S., Parab, S., & Gurav, S. (2022). Loan approval prediction. International Research Journal of Engineering and Technology, 9(4), 669–673.
- Sharma, H., Tyagi, I., Agarwal, G., & Sharma, A. (2023). An exhaustive investigation on loan prediction in banks using LRD. [Journal name missing].
- Krishnaraj, P., Rita, S., & Jaiswal, J. (2023). Comparing machine learning techniques for loan approval prediction. In Proceedings of the 1st International Conference on Artificial Intelligence, Communication, IoT, Data Engineering and Security (IACIDS) (pp. 23–25).
- Badhan, A., Rana, A., Malhi, S. S., & Singh, J. (2024). A comparative analysis for loan approval prediction using machine learning. In 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT) (Vol. 1, pp. 1–5). IEEE.
- Raheem, M. (2024). Loan default prediction using machine learning: A review on the techniques. Journal of Applied Technology and Innovation, 8(2), 1.
- Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 31, 6638–6648.
- Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 4765–4774.
- Natasha, A., Prastyo, D. D., & Suhartono. (2019). Credit scoring to classify consumer loan using machine learning. In AIP Conference Proceedings (Vol. 2194, No. 1, Article 020070). AIP Publishing.
- Shinde, A., Patil, Y., Kotian, I., & Kharat, S. (2022). Loan prediction system using machine learning. In ITM Web of Conferences (Vol. 44, Article 03019). EDP Sciences.
- Ndayisenga, T. (2021). Bank loan approval prediction using machine learning techniques [Master's thesis, [University name]].
- Yang, C. (2024). Research on loan approval and credit risk based on the comparison of machine learning models. In SHS Web of Conferences (Vol. 181, Article 02003). EDP Sciences.