Technological industry agglomeration, green innovation efficiency, and development quality of city cluster

Pengzhen Liu; Yanmin Zhao; Jianing Zhu; Cunyi Yang; Pengzhen Liu; Yanmin Zhao; Jianing Zhu; Cunyi Yang

doi:10.3934/GF.2022020

Green Finance

2022, Volume 4, Issue 4: 411-435. doi: 10.3934/GF.2022020

Previous Article Next Article

Research article

Technological industry agglomeration, green innovation efficiency, and development quality of city cluster

1.
School of Economics, Jinan University, 510632 Guangzhou, China
2.
School of Economics and Management, China University of Mining and Technology, 221116 Xuzhou, China
3.
Paul Merage School of Business, University of California, Irvine, CA 92697, USA
4.
Lingnan College, Sun Yat-Sen University, 510275 Guangzhou, China

Received: 18 August 2022 Revised: 11 October 2022 Accepted: 27 October 2022 Published: 31 October 2022
JEL Codes: O18, Q43, R11

Technological progress, especially green innovation, is a key factor in achieving sustainable development and promoting economic growth. In this study, based on innovation value chain theory, we employ the location entropy, super-efficiency SBM-DEA model, and the improved entropy TOPSIS method to measure the technological industry agglomeration, two-stage green innovation efficiency, and development quality index in Yangtze River Delta city cluster, respectively. We then build a spatial panel simultaneous cubic equation model, focusing on the interaction effects among the three factors. The findings indicate: (1) There are significant spatial links between the technological industry agglomeration, green innovation efficiency, and development quality in city cluster. (2) The development quality and technological industry agglomeration are mutually beneficial. In the R&D stage, green innovation efficiency, development quality, and technological industry agglomeration compete with each other, while there is a mutual promotion in the transformation stage. (3) The spatial interaction among the three factors reveals the heterogeneity of two innovation stages. The positive geographical spillover effects of technological industry agglomeration, green innovation efficiency, and development quality are all related to each other. This paper can provide a reference for the direction and path of improving the development quality of city clusters worldwide.

Keywords:

Citation: Pengzhen Liu, Yanmin Zhao, Jianing Zhu, Cunyi Yang. Technological industry agglomeration, green innovation efficiency, and development quality of city cluster[J]. Green Finance, 2022, 4(4): 411-435. doi: 10.3934/GF.2022020

Related Papers:

[1]	Sina Gholami, Erfan Zarafshan, Reza Sheikh, Shib Sankar Sana . Using deep learning to enhance business intelligence in organizational management. Data Science in Finance and Economics, 2023, 3(4): 337-353. doi: 10.3934/DSFE.2023020
[2]	Ying Li, Keyue Yan . Prediction of bank credit customers churn based on machine learning and interpretability analysis. Data Science in Finance and Economics, 2025, 5(1): 19-34. doi: 10.3934/DSFE.2025002
[3]	Habib Zouaoui, Meryem-Nadjat Naas . Option pricing using deep learning approach based on LSTM-GRU neural networks: Case of London stock exchange. Data Science in Finance and Economics, 2023, 3(3): 267-284. doi: 10.3934/DSFE.2023016
[4]	Man-Fai Leung, Abdullah Jawaid, Sai-Wang Ip, Chun-Hei Kwok, Shing Yan . A portfolio recommendation system based on machine learning and big data analytics. Data Science in Finance and Economics, 2023, 3(2): 152-165. doi: 10.3934/DSFE.2023009
[5]	Ezekiel NN Nortey, Edmund F. Agyemang, Enoch Sakyi-Yeboah, Obu-Amoah Ampomah, Louis Agyekum . AI meets economics: Can deep learning surpass machine learning and traditional statistical models in inflation time series forecasting?. Data Science in Finance and Economics, 2025, 5(2): 136-155. doi: 10.3934/DSFE.2025007
[6]	Aditya Narvekar, Debashis Guha . Bankruptcy prediction using machine learning and an application to the case of the COVID-19 recession. Data Science in Finance and Economics, 2021, 1(2): 180-195. doi: 10.3934/DSFE.2021010
[7]	Esau Moyoweshumba, Modisane Seitshiro . Leveraging Markowitz, random forest, and XGBoost for optimal diversification of South African stock portfolios. Data Science in Finance and Economics, 2025, 5(2): 205-233. doi: 10.3934/DSFE.2025010
[8]	Yongfeng Wang, Guofeng Yan . Survey on the application of deep learning in algorithmic trading. Data Science in Finance and Economics, 2021, 1(4): 345-361. doi: 10.3934/DSFE.2021019
[9]	Michael Jacobs, Jr . Benchmarking alternative interpretable machine learning models for corporate probability of default. Data Science in Finance and Economics, 2024, 4(1): 1-52. doi: 10.3934/DSFE.2024001
[10]	Lindani Dube, Tanja Verster . Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models. Data Science in Finance and Economics, 2023, 3(4): 354-379. doi: 10.3934/DSFE.2023021

Abstract

1. Introduction

For banks and other lending institutions, predicting loan defaults has always been crucial and extremely difficult. Financial analysts and specialists are therefore searching for and identifying the most effective methods to assist them in making decisions. Conventional methods have been extensively applied to credit risk assessment for a considerable amount of time. There are two types of credit risk assessment. In the first category, applicants are classified as having "good" and bad credit risk. Application scoring is the process of grouping data according to financial information. In the second category, the applicant's payment history, payment patterns, and other information are taken into account. This is called behavioral scoring (e.g. Woo and Sohn (2022)). Application scoring is the main topic of this paper. However, there are issues with these models' ability to precisely predict loan default. (Lyn et al. (2002) and Mestiri and Farhat (2021)).

In recent years, artificial intelligence and machine learning models have emerged as powerful tools for forecasting, as they can handle large datasets and capture nonlinear relationships between input variables and the output. We discuss the idea of machine learning, including deep neural network, which are non-linear algorithms, to compare their performances and demonstrate possibilities of the sophisticated modelling in finance. One useful tool for many financial tasks is deep learning, a subfield of machine learning that specializes in handling intricate, non-linear patterns in data (see Tran et al., (2016) and Wang et al., (2018)). Deep learning has been used more and more in the finance industry in recent years. For more details about the deep learning approach, we can refer to the studies of Deng and Yu (2014) and Le Cun et al., (2015).

The research paper is organized as follows: Section 2 provides a pertinent literature review related to forecasting loan defaults. Section 3 presents the different statistical and artificial intelligence techniques used in this study. In Section 4, the data used are described. Section 5 is devoted to the empirical investigation to forecast the loan defaults of Tunisian customers. Finally, Section 6 concludes the paper.

2. Related literature

Statistical methods have been used since the 1950s and are still popularly used today because they enable lenders to use concepts of sample estimators, confidence intervals, and statistical inference in credit scoring. This allows scorecard developers to evaluate the discriminatory power of models and determine which borrower characteristics are more important in explaining borrower behaviour. Linear discriminant analysis was one of the earliest approaches used in credit scoring. Even though the scorecards it produced were very robust, the assumptions needed to ensure satisfactory discriminatory power were restrictive.

Lyn et al. (2002) used logistic regression, a statistical technique for credit scoring that has proven successful and has replaced linear discriminant analysis. According to Mellisa (2020), the methods used for credit scoring have increased in sophistication in recent years. They have evolved from traditional statistical techniques to innovative methods such as artificial intelligence, including machine-learning algorithms such as random forests, gradient boosting, and deep neural networks. With the ongoing discussions in the banking industry, the future of machine learning (ML) models will soon be more prevalent. Several advanced techniques have been used to predict loan default, such as decision trees and support vector machines. As a matter of fact, with the invasion of artificial intelligence modeling algorithms since the 1990s in diverse domains, the artificial neural network (ANN) was the most popular machine learning technique used in finance (Fuster et al., (2022)).

Liu (2018) conducted research on credit card clients and compared Support Vector Machine, k-Nearest Neighbours, Decision Tree, and Random Forest with Feedforward Neural Network and Long Short-Term Memory. Their research was done in order to improve on earlier findings by Lien and Yeh (2009). Liu (2018) proposed to add two important factors, drop-out and long short-term memory, to neural networks in order to find their effect on improving accuracy and also solving the problem of overfitting. They used the same dataset as in Lien and Yeh (2009) with the 30,000 samples randomly shuffled, after which the top 10,000 samples were chosen. The top 8500 samples were used as a training set, with the other 1500 used as a testing set. The data was normalized to a mean of zero and a variance of one.

Stefan et al. (2015) looked at some of the improvements that could be introduced to credit scoring. They used 41 classification methods across 8 credit scoring data sets. Their work was focus on data, classification methods, and indicators used to assess the performance of the classification methods. Support Vector Machine (SVM) emerged as a competing approach to Artificial Neural Network (ANN). Giudici et al. (2020) categorized SVM and ANN as part of the non-parametric approach. SVM is a method used to classify objects without considering multicollinearity among the predictors. This method implements the concept that input data is transformed into a higher feature dimension using the kernel function. Woo and Sohn (2022) created a weighted machine learning model using text-mining techniques and psychometric characteristics. According to the justification given above, the goal of this work is to determine how to use machine learning for credit scoring. Then, the aims of this work is to compare the effectiveness of various parametric statistical and non-parametric machine learning approaches for customer loan classification.

3. Statistical, machine learning and deep learning techniques

3.1. Linear discriminant analysis (LDA)

Fisher's 1933 study laid the foundation for the use of multiple quantitative variables combined in a linear fashion to discriminate between various groups or categories. This linear combination of descriptors is called the discriminant function. The output of LDA is a score that consists of classifying a data observation between the good and bad classes.

$\begin{equation} Score = \sum \limits_{i = 0}^{p} a_{i }X_{i} \end{equation}$

(1)

Where $a_{i}$ are the weights associated with the quantitative input variables $X_{i}$ .

3.2. Logistic regression (LR)

Logistic regression is a statistical method used for binary classification tasks (e.g., 0 or 1, bad or good, health or default, etc.). The outcome of the LR model can be written as:

$\begin{equation} P(y = 1|X) = sigmoid(z) = \frac{1}{1+exp(-z)} \end{equation}$

(2)

where $P(y = 1|X)$ is the probability of y being 1, given the input variables X, z is a linear combination of X: $z = a_0+a_{1}X_{1}+a_{2}X_{2}+..+a_pX_p$

where $a_0$ is the intercept term, $a_{1}, a_{2}, ..., a_{p}$ are the weights, and $X_{1}, X_{2}, ..., X_p$ are the input variables.

3.3. Decision trees (DT)

Decision trees proceed with a recursive partitioning of the data into subsets based on the values of the input variables, with each partition represented by a branch in the tree Quinlan (1986). The functioning of decision trees aims to train a sequence of binary decisions that can be used to predict the value of the output for a new observation. Each decision node in the tree corresponds to a test of the value of one of the input variables, and the branches correspond to the possible outcomes of the test. The leaves of the tree correspond to the predicted values of the output variable for each combination of input values. The decision tree algorithm works by recursively partitioning the data into subsets based on the values of the input variables. At each step, the algorithm selects the input variable that provides the best split of the data into two subsets that are as homogeneous as possible with respect to the output variable. An information gain or Gini impurity criterion, which measures the amount of uncertainty reduced about the output variable that the split reduces, is commonly used to assess the quality of a split.

Decision trees are typically not formulated in terms of mathematical equations, but as a sequence of logical rules that describe how the input variables are used to predict the output variable. However, the splitting criterion used to select the best split at each decision node can be expressed mathematically. Suppose i have a dataset with n observations and m input variables, denoted by $X_{1}, X_{2}, ..., X_p$ , and a binary output variable y that takes values in 0, 1. Let S be a subset of the data at a particular decision node, and let $p_i$ be the proportion of observations in S that belong to class i. The Gini impurity of S is defined as:

$\begin{equation} G(S) = 1-\sum\limits_{i}(p_i)^{2} \end{equation}$

(3)

The Gini impurity measures the probability of misclassifying an observation in S if i randomly assign it to a class based on the proportion of observations in each class. A small value of G(S) indicates that the observations in S are well-separated by the input variables.

To split the data at a decision node, we consider all possible splits of each input variable into two subsets and choose the split that minimizes the weighted sum of the Gini impurities of the resulting subsets. The weighted sum is given by:

$\begin{equation} \Delta G = G(S) - (\dfrac{|S_{1}|}{|S|}).G(S_{1}) - (\dfrac{|S_{2}|}{|S|}).G(S_{2}) \end{equation}$

(4)

where $S_{1}$ and $S_{2}$ are the subsets of S resulting from the split, and $|S_{1}|$ and $|S_{2}|$ are their respective sizes. The split with the smallest value of $\Delta G$ chosen as the best split. The decision tree algorithm proceeds recursively, splitting the data at each decision node based on the best split until a stopping criterion is met, such as reaching a maximum depth or minimum number of observations at a leaf node.

3.4. Support vector machine (SVM)

Support vector machine (SVM), developed by Vapnik (1998), is a supervised learning algorithm used for classification, regression, and outlier detection. The basic idea of this technique is to find the best separating hyperplane between the two classes in a given dataset. The mathematical formulation of SVM can be divided into two parts: The optimization problem and the decision function.

Given a training set $(x_i, y_i)$ where $x_i$ is the ith input vector and $y_i$ is the corresponding output label $y_i = (-1, 1)$ , SVM seeks to find the best separating hyperplane defined by:

$\begin{equation} w . x+b = 0 \end{equation}$

(5)

where $w$ is the weight vector, b is the bias term, and $x$ is the input vector.

SVM algorithm aims to find the optimal $w$ and b that maximize the margin between the two classes. The margin is defined as the distance between the hyperplane and the closest data point from either class. Then, SVM optimization problem can be formulated as:

minimize $\frac{1}{2}\|w\|^2 +C\sum_{i = 1}^n\xi_i$ subject to $y_i(w^tx_i+b)\geq 1 -\xi_i$ and $\xi_i\geq 0$

where $||w||^{2}$ is the L2-norm of the weight vector, C is a hyperparameter that controls the trade off between maximizing the margin and minimizing the classification error, $\xi _i$ is the slack variable that allows for some misclassification, and the two constraints enforce that all data points lie on the correct side of the hyperplane with a margin of at least $1- \xi_i$ .

The optimization problem can be solved using convex optimization techniques, such as quadratic programming. Once the optimization problem is solved, the decision function can be defined as:

$\begin{equation} f(x) = sign(w·x + b) \end{equation}$

(6)

where sign is the sign function that returns +1 or -1 depending on the sign of the argument. The decision function takes an input vector x and returns its predicted class label based on whether the output of the hyperplane is positive or negative. Thereafter, SVM finds the best way to separate hyperplanes by solving an optimization problem that maximizes the margin between the two classes, subject to constraints that ensure all data points are correctly classified with a margin of at least $1-\xi _i$ . The decision function then predicts the class label of new data points based on the output of the hyperplane.

3.5. Random forests (RF)

Random Forest is an ensemble of learning algorithms developed by Breiman in 2001. It is a type of ensemble learning method that combines multiple decision trees for making predictions. The algorithm is called "random" because it uses random subsets of the features and random samples of the data to build the individual decision trees. The data is split into training and testing sets. The training set is used to build the model, and the testing set is used to evaluate its performance. At each node of a decision tree, the algorithm selects a random subset of the features to consider when making a split. This helps to reduce overfitting and increase the diversity of the individual decision trees.

A decision tree is built using the selected features and a subset of the training data. The tree is grown until it reaches a pre-defined depth or until all the data in a node belongs to the same class. Suppose we have a dataset with n observations and p features. Let X be the matrix of predictor variables and Y be the vector of target variables.

To build a Random Forest model, i first create multiple decision trees using a bootstrap sample of the original data. This means that i randomly sample n observations from the dataset with replacement to create a new dataset, and this process is repeated k times to create k bootstrap samples. For each bootstrap sample, we then create a decision tree using a random subset of p features. At each node of the tree, i select the best feature and threshold value to split the data based on a criterion such as information gain or Gini impurity. I repeat the above steps k times to create k decision trees. To make a prediction for a new observation, i pass it through each of the k decision trees and therefore obtain k predictions.

3.6. Deep neural network (DNN)

Deep neural network (DNN) is an enhanced version of the conventional artificial neural network with at least three hidden layers Schmidhuber (2015). Figure 1 illustrates the standard architecture of a deep neural network.

Figure 1. The standard architecture of DNN.

DownLoad: Full-Size Img PowerPoint

A solid understanding of the fundamentals of artificial neural networks is required to completely comprehend how DNN functions. The following formula determines the DNN output:

$\begin{equation} y\left(t\right) = \sum\limits_{k = 1}^{L}{f(w_k}+x_k(t))+\epsilon(t) \end{equation}$

(7)

where $w_{k}$ are the weights of the layer trained by backpropagation. $x_{k} (k = 1, ..., L)$ is the total number of sequence of real values called events, during an epoch. $f$ is the activation function.

4. Data

The aim of this research is to predict the outcome of variable as good or bad using the data set, which is classified based on features or attributes, utilizing different machine learning classification algorithms that are applied to the same data set to compare the accuracy of each of them. This empirical study used a Tunisian commercial bank's personal loan data set (Available from the author). This data contains both continuous and categorical data. A total of 12 variables are used for this analysis; each instance is characterized by the first 12 variables, and the last attribute is used to classify if a transaction is good or bad. Table 1 presents the different attributes and their classes, which are either numerical or categorical in nature.

Table 1. List of variables used in credit score modeling.

ID variables	Description	Type
$x_{1}$	Age in years plus twelfths of a year	Numerical
$x_{2}$	Yearly income (in Dinars)	Numerical
$x_{3}$	Credit length (in months)	Numerical
$x_{4}$	Amount of loans (in Dinars)	Numerical
$x_{5}$	Length of stay (in years)	Numerical
$x_{6}$	Purpose	Categorical
$x_{7}$	Employment	Categorical
$x_{8}$	Type of house	Categorical
$x_{10}$	Marital Status	Categorical
$x_{11}$	Education	Categorical
$x_{12}$	Number of dependent	Categorical
$y$	Default: Good-Bad Indicator	Categorical

| Show Table

DownLoad: CSV

The data consists of 688 personal loans, with 577 good loans and 111 bad loans. The proportion of bad loans (default) compared to good loans (non-default) is 19.23. Based on Table 2, the average of the debtor's ages is 33.5 years old. The average of net income is 3.286 Dinars.

Table 2. Statistics descriptive for continuous variables.

Variables	Mean	St.Dev	Min	Max
$x_{1}$	33.55	8.9	25.66	64.16
$x_{2}$	3.286	1.498	2.237	12.000
$x_{3}$	87.14	77.89	10.00	240.00
$x_{4}$	106.6	145.812	5.0	600.0
$x_{5}$	3.55	1.87	1	8

| Show Table

DownLoad: CSV

According to Table 3, the majority of the debtors are men, and up to 533 of them are already married. Most people's educational backgrounds are acquired through others (high school). This dataset will be divided into 70:30 proportions before each method is examined. To validate, the best model is the one that is used the most.

Table 3. Statistics descriptive for categorical variables.

Variables	Category	Mode	Freq.mode
$x_{6}$	3	House	233
$x_{7}$	2	Private	648
$x_{8}$	2	Own	346
$x_{9}$	2	Male	357
$x_{10}$	3	Married	533
$x_{11}$	4	High school	405
$x_{12}$	4	0	350

| Show Table

DownLoad: CSV

5. Empirical investigation

5.1. Predictive performance measures

The predictive power of the methods used can be compared and assessed using a number of criteria, such as accuracy rate, F1 score, and AUC.

5.1.1. Accuracy rate

The accuracy rate is the most famous performance metric, deduced from the matrix confusion (see. Table 4) and calculated following this formula:

$\begin{equation} Accuracy\ rate = \frac{(T_0+T_1)}{\left(T_0+F_1\right)+(F_0+T_1)} \end{equation}$

(8)

Table 4. Confusion matrix.

	Predicted class "0"	Predicted class "1"
Actual class "0"	True positive ( $T_{0}$ )	False positive ( $F_{1}$ )
Actual class "1"	False negative ( $F_{0}$ )	True negative ( $T_{1}$ )

| Show Table

DownLoad: CSV

5.1.2. F1 score

The F1 score is also computed from the confusion matrix. The value of F1 score varies between 0 and 1, since 1 is the best possible score. A high F1-score indicates that the model shows both high precision and high recall, meaning it can correctly identify positive and negative cases.

$\begin{equation} \ F1\ score = 2*\frac{(Precision*Recall)}{(Precision+Recall)} \end{equation}$

(9)

Where $Recall = \frac{T_0}{T_0+F_0}$ and $Precision = \frac{T_0}{T_0+F_1}$

5.1.3. AUC

Area Under Curve (AUC) is a synthetic indicator derived from the ROC curve. This curve is a graphical indicator used to assess the forecasting accuracy of the model. The ROC curve is based on two relevant indicators that are specificity and sensitivity (see Pepe (2000) for further details). This curve is caracterized by the 1- specificity rate on the x axis and by sensitivity on the y axis.

where

$\begin{equation} \ Sensitivity = True\ positive\ rate = \frac{T_0}{Positives} = \frac{T_0}{T_0+F_1} \end{equation}$

(10)

and

$\begin{equation} Specificity = True\ negative\ rate = \frac{T_1}{Negatives} = \frac{T_1}{T_1+F_0} \end{equation}$

(11)

Moreover, the AUC measure reflects the quality of the model classification between heath and default firms. In the ideal case, AUC is equal to 1, i.e. the model makes it possible to completely separate all the positives from the negatives, without false positives or false negatives.

5.2. Results and discussion

The data set is divided using feature extraction into training and testing data in order to efficiently assess the credit risk using machine learning classification algorithms. To the training data, different classification algorithms are applied. These models are implemented in Rstudio (see Mestiri (2024)). This hyper parameter tuning was used to train a number of models in this study:

Decision Tree: max. depth = 6

Random Forest: max. depth = 10 and ntree = 1000

SVM model: Kernel function used is Gaussian Radial Basis Function (RBF), cost = 10 and gamma = 0.076.

Deep Neural Network: Recurrent neural networks (RNNs) used three hidden layer. Nodes per Layer are 200,100, 40, 1 (using Keras Sequential API). Activation function is ReLU and Loss function is binary cross entropyand. Output unit is Sigmoid. Optimizer use default settings with Epochs 100, Batchsize 100 and Validation size 0.3. Early-Stopping is applied to mitigate overfitting.

The predictive model is built using the test data. The predictive model's output is compared to the model built using trained data. Table 5 presents the empirical results of the accuracy rate, F1 score, and AUC criteria used to evaluate the classifier's performance in the applied models.

Table 5. Prediction results and models accuracy.

Models	Accuracy rate	F1- score	AUC	Rank
Linear Discriminant Analysis (LDA)	70.9%	0.790	0.474	5
Logistic Regression (LR)	75.8%	0.822	0.533	3
Decision Trees (DT)	64.3%	0.738	0.575	6
Random Forest (RF)	78.2%	0.833	0.715	2
Support Vector Machine (SVM)	74.8%	0.810	0.563	4
Deep Neural Network (DNN)	83.6%	0.864	0.788	1

| Show Table

DownLoad: CSV

According to Table 5, the deep neural network outperforms the other techniques in terms of all forecasting performance metrics. DNN shows the highest accuracy rate with 83.6% whereas 78.2% for RF and 75.8% for LR. The lowest rate of prediction accuracy was found using DT (64.3%). For the same objective to assess the predictive ability of the proposed algorithms, F1-score equal to 0.864 proves DNN's ability to identify with a great precision good from bad customers. Since 1 is the most desired F1 score, DNN reaches the highest score while F1 score value was equal to 0.833, 0.822, 0.810, 0.790 and 0.738 for RF, LR, SVM, LDA, and DT, respectively.

Other graphical indicators were also used to evaluate the quality of classification of the models under study. We talk about the ROC curve (see. Figure 2). From this curve, we deduce the AUC measure. The AUC value is nearing unity, and the model shows a high quality of classification between health and default firms. Based on Table 5, the AUC of DNN yields 0.788. In the second rank, we found the RF with an AUC equal to 0.715. The LR and LDA models present the worst classification results, as the AUC is 0.533 and 0.474, respectively, in the testing sample.

Figure 2. ROC curve for the five machine learning models and DNN.

DownLoad: Full-Size Img PowerPoint

The results show that when compared to statistical and traditional machine learning techniques, the deep neural network model more especially, recurrent neural networks (RNNs) has better prediction performance. In summary, DNN performs better at predicting loan default than statistical and traditional machine learning models. In the second rank, i found that RF has a significantly higher prediction accuracy compared to other techniques employed. Our empirical research suggests that the deep neural network is the most effective method for identifying loan defaults among customers, which can aid in managerial decision-making.

To note, in this empirical task, i used 20% of the sample to test the forecasting accuracy and classifier's quality of the models. For the training process, a deep feed-forward network with three hidden layers is adopted, with a sigmoid activation function for the hidden layers and a linear activation function for the output layer.

6. Conclusions

Financial credit institutions have always been very concerned about forecasting loan defaults in order to make the right lending decisions. Our objective of this study is to create a useful model for categorizing credit applicants in order to accurately predict their financial difficulties. In this work I compare several machine learning methods for assessing credit risk in the Tunisian credit dataset. Several classification algorithms, including LDA, LR, DT, SVM, RF, and DNN, have been used to implement and evaluate these. From the above analysis, using the DNN methodology provides higher accuracy in credit risk evaluation. The RF model performs better than other machine learning and statistical methods.

As suggested by Lyn et al. (2002), the robustness of classification methods in credit scoring can be examined better with more data, and use more datasets. We mainly focused on popular machine learning methods and explored novel techniques that can incorporate the foundational algorithms. I covered a lot of recurrent neural networks and selective ensemble methods. Evaluation of classification algorithms is another area that has received much attention recently, and best practices suggest the use of different types of evaluation metrics. There are three broad types of evaluation metrics: Threshold, ranking, and probabilistic metrics. Most studies, including our own, have focused on threshold metrics (e.g., accuracy and F-measure), ranking methods and metrics (e.g., ROC analysis and AUROC), and have left out other crucial measures of classifier performance.

The results of the empirical study demonstrated that DNN is an excellent tool for researching financial defaults in credit institutions. Compared to past work, this study incorporates machine learning models used in predicting loan defaults. Unlike traditional methods, these models can learn complex, non-linear relationships between various data points and loan defaults. Machine learning models can continuously learn and improve as they are exposed to more data. They can potentially discover complex relationships between various factors that might not be evident through traditional statistical methods. Further technical aspects could be explored in future research, and the current study's results still require discussion and improvement, particularly in order to create a new model that performs better than earlier models, such as identifying new or counter intuitive insights and finding a significant and meaningful new variable.

Use of AI tools declaration

The authors declare that they have not used Artificial Intelligence (AI) tools in the creation of this article.

Conflict of interest

The author states that there is no conflicts of interest with the publication of this manuscript.

References

[1]	Akinci M (2018) Inequality and economic growth: Trickle-down effect revisited. Dev Policy Rev 36: O1–O24. https://doi.org/10.1111/dpr.12214 doi: 10.1111/dpr.12214
[2]	Arbolino R, De Simone L, Carlucci F, et al. (2018) Towards a sustainable industrial ecology: Implementation of a novel approach in the performance evaluation of Italian regions. J Clean Prod 178: 220–236. https://doi.org/10.1016/j.jclepro.2017.12.183 doi: 10.1016/j.jclepro.2017.12.183
[3]	Arrow KJ (1971) The economic implications of learning by doing. Readings in the Theory of Growth, 131–149. https://doi.org/10.1007/978-1-349-15430-2_11
[4]	Busch J, Foxon TJ, Taylor PG (2018) Designing industrial strategy for a low carbon transformation. Environ Innov Soc Transitions 29: 114–125. https://doi.org/10.1016/j.eist.2018.07.005 doi: 10.1016/j.eist.2018.07.005
[5]	Demirtas YE, Kececi NF (2020) The efficiency of private pension companies using dynamic data envelopment analysis. Quant Financ Econ 4: 204–219. https://doi.org/10.3934/qfe.2020009 doi: 10.3934/qfe.2020009
[6]	Du KR, Li JL (2019) Towards a green world: How do green technology innovations affect total-factor carbon productivity. Energy Policy 131: 240–250. https://doi.org/10.1016/j.enpol.2019.04.033 doi: 10.1016/j.enpol.2019.04.033
[7]	Greco M, Cricelli L, Grimaldi M, et al. (2022) Unveiling the relationships among intellectual property strategies, protection mechanisms and outbound open innovation. Creat Innov Manage 31: 376–389. https://doi.org/10.1111/caim.12498 doi: 10.1111/caim.12498
[8]	Hansen MT, Birkinshaw J (2007) The innovation value chain. Harvard Bus Rev 85: 121.
[9]	Hong Y, Liu W, Song H (2022) Spatial econometric analysis of effect of New economic momentum on China's high-quality development. Res Int Bus Financ 61: 101621. https://doi.org/10.1016/j.ribaf.2022.101621 doi: 10.1016/j.ribaf.2022.101621
[10]	Hou YX, Zhang KR, Zhu YC, et al. (2021) Spatial and temporal differentiation and influencing factors of environmental governance performance in the Yangtze River Delta, China. Sci Total Environ 801: 149699. https://doi.org/10.1016/j.scitotenv.2021.149699 doi: 10.1016/j.scitotenv.2021.149699
[11]	Huang CX, Zhao X, Deng YK, et al. (2022) Evaluating influential nodes for the Chinese energy stocks based on jump volatility spillover network. Int Rev Econ Financ 78: 81–94. https://doi.org/10.1016/j.iref.2021.11.001 doi: 10.1016/j.iref.2021.11.001
[12]	Kelejian HH, Prucha IR (2004) Estimation of simultaneous systems of spatially interrelated cross sectional equations. J Econometrics 118: 27–50. https://doi.org/10.1016/s0304-4076(03)00133-7 doi: 10.1016/s0304-4076(03)00133-7
[13]	Kemeny T, Osman T (2018) The wider impacts of high-technology employment: Evidence from US cities. Res Policy 47: 1729–1740. https://doi.org/10.1016/j.respol.2018.06.005 doi: 10.1016/j.respol.2018.06.005
[14]	Kolia DL, Papadopoulos S (2020) The levels of bank capital, risk and efficiency in the Eurozone and the U.S. in the aftermath of the financial crisis. Quant Financ Econ 4: 66–90. https://doi.org/10.3934/Qfe.2020004 doi: 10.3934/Qfe.2020004
[15]	Lin BQ, Zhou YC (2022) Does energy efficiency make sense in China? Based on the perspective of economic growth quality. Sci Total Environ 804: 149895. https://doi.org/10.1016/j.scitotenv.2021.149895 doi: 10.1016/j.scitotenv.2021.149895
[16]	Liu CY, Gao XY, Ma WL, et al. (2020) Research on regional differences and influencing factors of green technology innovation efficiency of China's high-tech industry. J Comput Appl Math 369: 112597. https://doi.org/10.1016/j.cam.2019.112597 doi: 10.1016/j.cam.2019.112597
[17]	Liu H, Lei H, Zhou Y (2022) How does green trade affect the environment? Evidence from China. J Econ Anal 1: 1–27. https://doi.org/10.12410/jea.2811-0943.2022.01.001 doi: 10.12410/jea.2811-0943.2022.01.001
[18]	Liu Y, Liu M, Wang GG, et al. (2021) Effect of Environmental Regulation on High-quality Economic Development in China-An Empirical Analysis Based on Dynamic Spatial Durbin Model. Environ Sci Pollut Res 28: 54661–54678. https://doi.org/10.1007/s11356-021-13780-2 doi: 10.1007/s11356-021-13780-2
[19]	Long RY, Gan X, Chen H, et al. (2020) Spatial econometric analysis of foreign direct investment and carbon productivity in China: Two-tier moderating roles of industrialization development. Resour Conserv Recy 155: 104677. https://doi.org/10.1016/j.resconrec.2019.104677 doi: 10.1016/j.resconrec.2019.104677
[20]	Lu R, Ruan M, Reve T (2016) Cluster and co-located cluster effects: An empirical study of six Chinese city regions. Res Policy 45: 1984–1995. https://doi.org/10.1016/j.respol.2016.07.003 doi: 10.1016/j.respol.2016.07.003
[21]	Lv CC, Shao CH, Lee CC (2021) Green technology innovation and financial development: Do environmental regulation and innovation output matter? Energy Econ 98: 105237. https://doi.org/10.1016/j.eneco.2021.105237 doi: 10.1016/j.eneco.2021.105237
[22]	Ma XW, Xu JW (2022) Impact of Environmental Regulation on High-Quality Economic Development. Front Env Sci 10: 896892. https://doi.org/10.3389/fenvs.2022.896892 doi: 10.3389/fenvs.2022.896892
[23]	Miao CL, Fang DB, Sun LY, et al. (2017) Natural resources utilization efficiency under the influence of green technological innovation. Resour Conserv Recy 126: 153–161. https://doi.org/10.1016/j.resconrec.2017.07.019 doi: 10.1016/j.resconrec.2017.07.019
[24]	Nieto J, Carpintero O, Lobejon LF, et al. (2020) An ecological macroeconomics model: The energy transition in the EU. Energy Policy 145: 111726. https://doi.org/10.1016/j.enpol.2020.111726 doi: 10.1016/j.enpol.2020.111726
[25]	Peng BH, Zheng CY, Wei G, et al. (2020) The cultivation mechanism of green technology innovation in manufacturing industry: From the perspective of ecological niche. J Clean Prod 252: 119711. https://doi.org/10.1016/j.jclepro.2019.119711 doi: 10.1016/j.jclepro.2019.119711
[26]	Poon JP, Kedron P, Bagchi-Sen S (2013) Do foreign subsidiaries innovate and perform better in a cluster? A spatial analysis of Japanese subsidiaries in the US. Appl Geogr 44: 33–42. https://doi.org/10.1016/j.apgeog.2013.07.007 doi: 10.1016/j.apgeog.2013.07.007
[27]	Ren S, Liu Z, Zhanbayev R, et al. (2022). Does the internet development put pressure on energy-saving potential for environmental sustainability? Evidence from China. J Econ Anal 1: 81–101. https://doi.org/10.12410/jea.2811-0943.2022.01.004 doi: 10.12410/jea.2811-0943.2022.01.004
[28]	Ren ZL (2020) Evaluation Method of Port Enterprise Product Quality Based on Entropy Weight TOPSIS. J Coastal Res 766–769. https://doi.org/10.2112/si103-158.1 doi: 10.2112/si103-158.1
[29]	Song Y, Yang L, Sindakis S, et al. (2022) Analyzing the Role of High-Tech Industrial Agglomeration in Green Transformation and Upgrading of Manufacturing Industry: the Case of China. J Knowl Econ, 1–31. https://doi.org/10.1007/s13132-022-00899-x doi: 10.1007/s13132-022-00899-x
[30]	Strauss J, Yigit T (2001) Present value model, heteroscedasticity and parameter stability tests. Econ Lett 73: 375–378. https://doi.org/10.1016/S0165-1765(01)00506-7 doi: 10.1016/S0165-1765(01)00506-7
[31]	Su Y, Li Z, Yang C (2021) Spatial Interaction Spillover Effects between Digital Financial Technology and Urban Ecological Efficiency in China: An Empirical Study Based on Spatial Simultaneous Equations. Int J Environ Res Public Health 18: 8535. https://doi.org/10.3390/ijerph18168535 doi: 10.3390/ijerph18168535
[32]	Sun CZ, Yang YD, Zhao LS (2015) Economic spillover effects in the Bohai Rim Region of China: Is the economic growth of coastal counties beneficial for the whole area? China Econ Rev 33: 123–136. https://doi.org/10.1016/j.chieco.2015.01.008 doi: 10.1016/j.chieco.2015.01.008
[33]	Wang L, Xue YB, Chang M, et al. (2020a) Macroeconomic determinants of high-tech migration in China: The case of Yangtze River Delta Urban Agglomeration. Cities 107: 102888. https://doi.org/10.1016/j.cities.2020.102888 doi: 10.1016/j.cities.2020.102888
[34]	Wang MY, Li YM, Li JQ, et al. (2021) Green process innovation, green product innovation and its economic performance improvement paths: A survey and structural model. J Environ Manage 297: 113282. https://doi.org/10.1016/j.jenvman.2021.113282 doi: 10.1016/j.jenvman.2021.113282
[35]	Wang S, Yang C, Li Z (2022) Green Total Factor Productivity Growth: Policy-Guided or Market-Driven? Int J Environ Res Public Health 19: 10471. https://doi.org/10.3390/ijerph191710471 doi: 10.3390/ijerph191710471
[36]	Wang SJ, Hua GH, Yang LZ (2020b) Coordinated development of economic growth and ecological efficiency in Jiangsu, China. Environ Sci Pollut Res 27: 36664–36676. https://doi.org/10.1007/s11356-020-09297-9 doi: 10.1007/s11356-020-09297-9
[37]	Wang Y, Pan JF, Pei RM, et al. (2020c) Assessing the technological innovation efficiency of China's high-tech industries with a two-stage network DEA approach. Socio-Econ Plan Sci 71: 100810. https://doi.org/10.1016/j.seps.2020.100810 doi: 10.1016/j.seps.2020.100810
[38]	Wu HT, Hao Y, Ren SY (2020) How do environmental regulation and environmental decentralization affect green total factor energy efficiency: Evidence from China. Energy Econ 91: 104880. https://doi.org/10.1016/j.eneco.2020.104880 doi: 10.1016/j.eneco.2020.104880
[39]	Wu HT, Hao Y, Ren SY, et al. (2021) Does internet development improve green total factor energy efficiency? Evidence from China. Energy Policy 153: 112247. https://doi.org/10.1016/j.enpol.2021.112247 doi: 10.1016/j.enpol.2021.112247
[40]	Wu MR (2022) The impact of eco-environmental regulation on green energy efficiency in China-Based on spatial economic analysis. Energy Environ. https://doi.org/10.1177/0958305x211072435 doi: 10.1177/0958305x211072435
[41]	Wu XX, Huang Y, Gao J (2022) Impact of industrial agglomeration on new-type urbanization: Evidence from Pearl River Delta urban agglomeration of China. Int Rev Econ Financ 77: 312–325. https://doi.org/10.1016/j.iref.2021.10.002 doi: 10.1016/j.iref.2021.10.002
[42]	Xu J, Li JS (2019) The impact of intellectual capital on SMEs' performance in China Empirical evidence from non-high-tech vs. high-tech SMEs. J Intellect Capital 20: 488–509. https://doi.org/10.1108/jic-04-2018-0074 doi: 10.1108/jic-04-2018-0074
[43]	Xu JH, Li Y (2021) Research on the Impact of Producer Services Industry Agglomeration on the High Quality Development of Urban Agglomerations in the Yangtze River Economic Belt. World Congress on Services, Springer, Cham, 12996: 35–52. https://doi.org/10.1007/978-3-030-96585-3_3 doi: 10.1007/978-3-030-96585-3_3
[44]	Yao Y, Hu D, Yang C, et al. (2021) The impact and mechanism of fintech on green total factor productivity. Green Financ 3: 198–221. https://doi.org/10.3934/gf.2021011 doi: 10.3934/gf.2021011
[45]	Yin XB, Guo LY (2021) Industrial efficiency analysis based on the spatial panel model. Eurasip J Wireless Commun Netw 2021: 1–17. https://doi.org/10.1186/s13638-021-01907-5 doi: 10.1186/s13638-021-01907-5
[46]	Zhao BY, Sun LC, Qin L (2022) Optimization of China's provincial carbon emission transfer structure under the dual constraints of economic development and emission reduction goals. Environ Sci Pollut Res, 1–17. https://doi.org/10.1007/s11356-022-19288-7 doi: 10.1007/s11356-022-19288-7
[47]	Zheng Y, Chen S, Wang N (2020) Does financial agglomeration enhance regional green economy development? Evidence from China. Green Financ 2: 173–196. https://doi.org/10.3934/GF.2020010 doi: 10.3934/GF.2020010
[48]	Zhou B, Zeng XY, Jiang L, et al. (2020) High-quality Economic Growth under the Influence of Technological Innovation Preference in China: A Numerical Simulation from the Government Financial Perspective. Struct Change Econ Dyn 54: 163–172. https://doi.org/10.1016/j.strueco.2020.04.010 doi: 10.1016/j.strueco.2020.04.010
[49]	Zhou J, Wang G, Lan S, et al. (2017) Study on the Innovation Incubation Ability Evaluation of High Technology Industry in China from the Perspective of Value-Chain An Empirical Analysis Based on 31 Provinces. Procedia Manuf 10: 1066–1076. https://doi.org/10.1016/j.promfg.2017.07.097 doi: 10.1016/j.promfg.2017.07.097
[50]	Zhu L, Luo J, Dong QL, et al. (2021) Green technology innovation efficiency of energy-intensive industries in China from the perspective of shared resources: Dynamic change and improvement path. Technol Forecasting Soc Change 170: 120890. https://doi.org/10.1016/j.techfore.2021.120890 doi: 10.1016/j.techfore.2021.120890
[51]	Zhu M, Song X, Chen W (2022) The Impact of Social Capital on Land Arrangement Behavior of Migrant Workers in China. J Econ Anal 1: 52–80. https://doi.org/10.12410/jea.2811-0943.2022.01.003 doi: 10.12410/jea.2811-0943.2022.01.003

This article has been cited by:

1.	Suyan Tan, Yilin Guo, A study of the impact of scientific collaboration on the application of Large Language Model, 2024, 9, 2473-6988, 19737, 10.3934/math.2024963
2.	Mamatha A, S. Meena Kumari, Amala Rashmi Kumar, Tavishi S Shetty, 2024, Synergistic Approaches to Credit Scoring: Enhancing Predictive Performance with Attention Mechanisms and Ensemble Learning, 979-8-3315-0546-2, 1, 10.1109/CSITSS64042.2024.10816749
3.	Badreddine Slime, Jaspreet Singh Sahni, Modeling default risk charge (DRC) with intensity probability theory, 2025, 10, 2473-6988, 2958, 10.3934/math.2025137
4.	Nguyen Thi Hong Thuy, Nguyen Thi Vinh Ha, Nguyen Nam Trung, Vu Thi Thanh Binh, Nguyen Thu Hang, Vu The Binh, Comparing the Effectiveness of Machine Learning and Deep Learning Models in Student Credit Scoring: A Case Study in Vietnam, 2025, 13, 2227-9091, 99, 10.3390/risks13050099

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Green Finance

5.0 10.3

Metrics

Article views(3216) PDF downloads(205) Cited by(33)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(3) / Tables(10)

Green Finance

Technological industry agglomeration, green innovation efficiency, and development quality of city cluster

Related Papers:

Abstract

1. Introduction

2. Related literature

3. Statistical, machine learning and deep learning techniques

3.1. Linear discriminant analysis (LDA)

3.2. Logistic regression (LR)

3.3. Decision trees (DT)

3.4. Support vector machine (SVM)

3.5. Random forests (RF)

3.6. Deep neural network (DNN)

4. Data

5. Empirical investigation

5.1. Predictive performance measures

5.1.1. Accuracy rate

5.1.2. F1 score

5.1.3. AUC

5.2. Results and discussion

6. Conclusions

Use of AI tools declaration

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Green Finance

Technological industry agglomeration, green innovation efficiency, and development quality of city cluster

Related Papers:

Abstract

1. Introduction

2. Related literature

3. Statistical, machine learning and deep learning techniques

3.1. Linear discriminant analysis (LDA)

3.2. Logistic regression (LR)

3.3. Decision trees (DT)

3.4. Support vector machine (SVM)

3.5. Random forests (RF)

3.6. Deep neural network (DNN)

4. Data

5. Empirical investigation

5.1. Predictive performance measures

5.1.1. Accuracy rate

5.1.2. F1 score

5.1.3. AUC

5.2. Results and discussion

6. Conclusions

Use of AI tools declaration

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog