
Magneto-Acousto-Electrical Tomography (MAET) is a multi-physics coupling imaging modality that integrates the high resolution of ultrasound imaging with the high contrast of electrical impedance imaging. However, the quality of images obtained through this imaging technique can be easily compromised by environmental or experimental noise, thereby affecting the overall quality of the imaging results. Existing methods for magneto-acousto-electrical image denoising lack the capability to model local and global features of magneto-acousto-electrical images and are unable to extract the most relevant multi-scale contextual information to model the joint distribution of clean images and noise images. To address this issue, we propose a Dual Generative Adversarial Network based on Attention Residual U-Net (ARU-DGAN) for magneto-acousto-electrical image denoising. Specifically, our model approximates the joint distribution of magneto-acousto-electrical clean and noisy images from two perspectives: noise removal and noise generation. First, it transforms noisy images into clean ones through a denoiser; second, it converts clean images into noisy ones via a generator. Simultaneously, we design an Attention Residual U-Net (ARU) to serve as the backbone of the denoiser and generator in the Dual Generative Adversarial Network (DGAN). The ARU network adopts a residual mechanism and introduces a linear Self-Attention based on Cross-Normalization (CNorm-SA), which is proposed in this paper. This design allows the model to effectively extract the most relevant multi-scale contextual information while maintaining high resolution, thereby better modeling the local and global features of magneto-acousto-electrical images. Finally, extensive experiments on a real-world magneto-acousto-electrical image dataset constructed in this paper demonstrate significant improvements in preserving image details achieved by ARU-DGAN. Furthermore, compared to the state-of-the-art competitive methods, it exhibits a 0.3 dB increase in PSNR and an improvement of 0.47% in SSIM.
Citation: Shuaiyu Bu, Yuanyuan Li, Wenting Ren, Guoqiang Liu. ARU-DGAN: A dual generative adversarial network based on attention residual U-Net for magneto-acousto-electrical image denoising[J]. Mathematical Biosciences and Engineering, 2023, 20(11): 19661-19685. doi: 10.3934/mbe.2023871
[1] | Sami Mestiri . Credit scoring using machine learning and deep Learning-Based models. Data Science in Finance and Economics, 2024, 4(2): 236-248. doi: 10.3934/DSFE.2024009 |
[2] | Lindani Dube, Tanja Verster . Interpretability of the random forest model under class imbalance. Data Science in Finance and Economics, 2024, 4(3): 446-468. doi: 10.3934/DSFE.2024019 |
[3] | Dominic Joseph . Estimating credit default probabilities using stochastic optimisation. Data Science in Finance and Economics, 2021, 1(3): 253-271. doi: 10.3934/DSFE.2021014 |
[4] | Sina Gholami, Erfan Zarafshan, Reza Sheikh, Shib Sankar Sana . Using deep learning to enhance business intelligence in organizational management. Data Science in Finance and Economics, 2023, 3(4): 337-353. doi: 10.3934/DSFE.2023020 |
[5] | Hana Demma Wube, Sintayehu Zekarias Esubalew, Firesew Fayiso Weldesellasie, Taye Girma Debelee . Text-Based Chatbot in Financial Sector: A Systematic Literature Review. Data Science in Finance and Economics, 2022, 2(3): 232-259. doi: 10.3934/DSFE.2022011 |
[6] | Habib Zouaoui, Meryem-Nadjat Naas . Option pricing using deep learning approach based on LSTM-GRU neural networks: Case of London stock exchange. Data Science in Finance and Economics, 2023, 3(3): 267-284. doi: 10.3934/DSFE.2023016 |
[7] | Michael Jacobs, Jr . Benchmarking alternative interpretable machine learning models for corporate probability of default. Data Science in Finance and Economics, 2024, 4(1): 1-52. doi: 10.3934/DSFE.2024001 |
[8] | Angelica Mcwera, Jules Clement Mba . Predicting stock market direction in South African banking sector using ensemble machine learning techniques. Data Science in Finance and Economics, 2023, 3(4): 401-426. doi: 10.3934/DSFE.2023023 |
[9] | Yen H. Hoang, Duong T.T. Nguyen, Linh H.T. Tran, Nhung T.H. Nguyen, Ngoc B. Vu . Customers' adoption of financial services offered by banks and fintechs partnerships: evidence of a transitional economy. Data Science in Finance and Economics, 2021, 1(1): 77-95. doi: 10.3934/DSFE.2021005 |
[10] | Man-Fai Leung, Abdullah Jawaid, Sai-Wang Ip, Chun-Hei Kwok, Shing Yan . A portfolio recommendation system based on machine learning and big data analytics. Data Science in Finance and Economics, 2023, 3(2): 152-165. doi: 10.3934/DSFE.2023009 |
Magneto-Acousto-Electrical Tomography (MAET) is a multi-physics coupling imaging modality that integrates the high resolution of ultrasound imaging with the high contrast of electrical impedance imaging. However, the quality of images obtained through this imaging technique can be easily compromised by environmental or experimental noise, thereby affecting the overall quality of the imaging results. Existing methods for magneto-acousto-electrical image denoising lack the capability to model local and global features of magneto-acousto-electrical images and are unable to extract the most relevant multi-scale contextual information to model the joint distribution of clean images and noise images. To address this issue, we propose a Dual Generative Adversarial Network based on Attention Residual U-Net (ARU-DGAN) for magneto-acousto-electrical image denoising. Specifically, our model approximates the joint distribution of magneto-acousto-electrical clean and noisy images from two perspectives: noise removal and noise generation. First, it transforms noisy images into clean ones through a denoiser; second, it converts clean images into noisy ones via a generator. Simultaneously, we design an Attention Residual U-Net (ARU) to serve as the backbone of the denoiser and generator in the Dual Generative Adversarial Network (DGAN). The ARU network adopts a residual mechanism and introduces a linear Self-Attention based on Cross-Normalization (CNorm-SA), which is proposed in this paper. This design allows the model to effectively extract the most relevant multi-scale contextual information while maintaining high resolution, thereby better modeling the local and global features of magneto-acousto-electrical images. Finally, extensive experiments on a real-world magneto-acousto-electrical image dataset constructed in this paper demonstrate significant improvements in preserving image details achieved by ARU-DGAN. Furthermore, compared to the state-of-the-art competitive methods, it exhibits a 0.3 dB increase in PSNR and an improvement of 0.47% in SSIM.
In the financial market, risk management has become one of the key elements of banking. Credit card customer churn prediction plays an important role in the risk management of the banking industry, assisting banks in identifying potential risks and implementing suitable measures to mitigate them (Butaru et al., 2016). With the development of technology, the influence of customer churn prediction has extended into various fields, including the financial industry. In particular, the rapid progression of machine learning theories and algorithms has offered a fresh perspective and solution for customer churn prediction (Alfaiz and Fati, 2022). Firstly, machine learning technology is capable of automatically extracting useful information from large amounts of historical data, identifying distribution patterns and relationships to predict future trends and behaviors (Pudjihartono et al., 2022). In credit card customer churn prediction, machine learning methods can efficiently identify key variables and underlying relationships related to customer attrition. By analyzing the mass multi-dimensional data, including consumer behavior, credit history, and demographic information of historical users, machine learning can construct an effective prediction model. Secondly, machine learning technology has strong generalization ability, automatically adapting to new and unseen data after training without frequent adjustments and optimizations (Janiesch et al., 2021). Bank customer churn prediction using machine learning methods presents both challenges and opportunities. Through comprehensive studies and the application of machine learning methods, prediction models provide more precise and reliable results, thereby enhancing the accuracy and reliability of risk management standards in the credit card business and supporting the sustainable development of banking.
Despite the significant advantages of machine learning in customer churn prediction, its practical implementation still encounters several challenges. As datasets grow larger and models become more complex, ensuring model interpretability becomes a crucial concern (Chen and Meng, 2020). The SHAP values method is applied to interpret the outcomes of machine learning models. It quantifies the importance of each variable in the dataset, considering both the independent influence of each variable on prediction outcomes and the interactions among different variables. The SHAP values method provides a comprehensive explanation of machine learning model predictions (Gebreyesus et al., 2023). Moreover, the causal inference method has gained significant popularity in recent years for data analysis and machine learning-related research. By analyzing causal effects among variables, the causal inference method excludes variables that are related to prediction results but lack actual causal connections, and filters out variables with a causal impact on prediction results. This method provides a more reliable basis for variable selection in predictions based on machine learning (Li et al., 2023). The purpose of this study is to discuss the construction of credit card customer churn prediction models by applying machine learning techniques. Additionally, the SHAP values method is utilized to complete variable selection based on the importance of influencing the outcomes and to interpret machine learning-based prediction results. The causal inference method aims to explain the prediction results by analyzing the causal relationships between variables. These causal relationships are compared with the results obtained by the SHAP values method.
The remainder of this study is organized as follows. Section 2 reviews research related to machine learning, interpretability analysis, and causal inference. Section 3 describes the concepts of four sampling techniques for data preprocessing, six machine learning methods for credit card customer churn prediction, and the causal inference method in detail. Section 4 summarizes the performance of the machine learning models, interprets the optimal model, and identifies the causal relationships by analyzing the experimental results. Finally, Section 5 presents the conclusion and describes limitations and further research.
With the development of computer hardware in recent years, technologies such as machine learning and deep learning are now widely used across various industries. Machine learning can be combined with stock price prediction to construct new quantitative investment strategies in financial investment (Yan et al., 2023). In the field of stock market risk, traditional machine learning models and neural networks can also predict Barrier Option prices with relative accuracy (Li and Yan, 2023). Even Bitcoin, a cryptocurrency that has been widely discussed in recent years, can have its price predicted by machine learning methods, with the Support Vector Regression model outperforming other models (Erfanian et al., 2022). In the area of credit card customer churn prediction, there is a growing number of reliable research papers aiming to predict churn rates by analyzing the personal and behavioral data of credit card customers, enabling banks to take effective measures to retain customers in advance. For instance, Zhang et al. used simple preprocessing of customer data and applied only the Random Forest (RF) model, a traditional machine learning method, to classify customers, achieving better classification results (Zhang et al., 2024). In the study by de Lima Lemos et al., several traditional machine learning models were used for prediction, including RF, Decision Tree, k-Nearest Neighbor, Elastic Net, Logistic Regression, and Support Vector Machines, with the RF model achieving the best classification accuracy at 82.8% (de Lima Lemos et al., 2022). Lalwani et al. introduced more advanced ideas into the data preprocessing problem, using the Gravitational Search Algorithm to select variables and improve the efficiency of the machine learning models (Lalwani et al., 2022). Siddiqui et al. used three variable selection methods to observe the performance of different machine learning models by comparing all variables, separating continuous and discrete variables, and selecting variables based on their importance. They ultimately found that using all variables yielded the most prevalent classification results (Siddiqui et al., 2024).
In addition to focusing on constructing different models to improve classification efficacy, researchers have gradually paid more attention to the interpretability of the models and the relationships between variables. In the field of financial investment, some articles focus on the impact of each variable on volatility risk (Yan and Li, 2024). In the study by AL-Najjar et al., they demonstrated the variable importance of the same traditional machine learning model, C5 Tree, under different methods of variable selection. The results showed that the three main variables affecting the model's effectiveness were the total number of transactions, the total credit card revolving balance, and the change in the number of transactions, providing useful references for credit card managers (AL-Najjar et al., 2022). Other studies discuss variables using more advanced and understandable variable visualization tools, such as the SHAP values method. In Peng et al.'s study, the best model for classification, which combined a genetic algorithm with XGBoost, was used to analyze interpretability. The results of the SHAP values method not only showed the order of importance of different independent variables, but also indicated whether changes in the value of each independent variable had a positive or negative effect on the predicted variable (Peng et al., 2023). In addition to the SHAP values method, other interpretable tools for visualization with similar effects, such as Local Interpretable Model-Agnostic Explanations (LIME), can also be used for analysis (Chang et al., 2024).
So far, machine learning algorithms have primarily captured correlations between variables, and inferring their importance from a correlation perspective, often ignoring the causal relationships among them. In other words, the model's determination of variable importance is frequently based on the strength of the correlation between the dependent and independent variables, which can introduce bias errors in actual predictions (Feuerriegel et al., 2024). To address this issue, this study innovatively uses the R-learner in meta learners to analyze variable importance. This approach combines the fields of machine learning and causal inference (Künzel et al. , 2019). The R-learner employs the control variable method to observe how independent variables affect the probability of credit card customer churn and calculates a more accurate conditional average treatment effect (CATE) by removing bias from the data. This helps quantify the effect of different variables on the probability of credit card customer churn.
The aim of this study is to predict whether bank credit card customers will churn using machine learning methods and to utilize the prediction results to construct models that explain the influencing factors of customer churn. This will provide an effective basis for rationalizing customer management before they leave the bank and for effective risk management in the banking industry. Initially, the relevant dataset of bank customer churn needs to be downloaded and preprocessed. Subsequently, the approach is divided into two parts. The first part involves balancing the class distribution of the training set using four sampling techniques: Random Oversampling, Synthetic Minority Oversampling Technique (SMOTE), Borderline-Synthetic Minority Oversampling Technique (Borderline-SMOTE), and Adaptive Synthetic Sampling (ADASYN). Then, six machine learning methods, including RF, Gradient Boosting Decision Tree (GBDT), Extra Tree, AdaBoost, XGBoost, and CatBoost, are used to predict whether the bank will lose credit card customers. The optimal model is selected by comparing the performance of each model, and the important variables influencing customer churn and their effects are analyzed using the SHAP values method. The second part involves using the causal inference method to investigate the causal impact of variables on customer churn based on the optimal prediction model mentioned above. In this part, the R-learner is used for continuous variables in the meta learners of the causal inference method. The framework of this study is outlined in Figure 1.
Categorical Variables | Class | Number of classes | Conversion to numbers |
Attrition_Flag | Existing Customer | 8500 | 0 |
Attrited Customer | 1627 | 1 | |
Gender | M | 4769 | 0 |
F | 5358 | 1 | |
Education_Level | Uneducated | 1487 | 6 |
High School | 2013 | 15 | |
College | 1013 | 18 | |
Graduate | 3128 | 22 | |
Post-Graduate | 516 | 24 | |
Doctorate | 451 | 28 | |
Unknown | 1519 | 22 | |
Marital_Status | Single | 3943 | 0 |
Married | 4687 | 1 | |
Divorced | 748 | 2 | |
Unknown | 749 | 1 | |
Income_Category | Less than $40K | 3561 | 2 |
$40K - $60K | 1790 | 5 | |
$60K - $80K | 1402 | 7 | |
$80K - $120K | 1535 | 10 | |
$120K + | 727 | 14 | |
Unknown | 1112 | 2 | |
Card_Category | Blue | 9436 | 0 |
Silver | 555 | 1 | |
Gold | 116 | 2 | |
Platinum | 20 | 3 |
Historical data of bank customers downloaded from Kaggle is used as the dataset in this study (Kaggle, 2022). The dataset contains twenty three variables, including customers' fundamental personal details, the credit card level they hold, and their credit card usage, totaling 10,127 samples. The variable labeled "CLIENTNUM" represents the unique identification number assigned by the bank to each customer. Since this variable has no effect on the model construction in this study, it is removed from the analysis. Additionally, the last two variables in the dataset represent the outcomes provided by Kaggle for predicting bank customer churn based on Naive Bayes classifiers. These variables have been excluded during the data processing as this study does not compare this probabilistic classification approach with machine learning methods. The remaining twenty variables, which consist of fourteen quantitative variables and six categorical variables, are all utilized in the experiment in this study to discuss the importance and impact. These variables represent customers' personal information and credit card usage. To build the model, it is necessary to convert the classes of categorical variables into numerical values. These six categorical variables are: "Attrition_Flag", "Gender", "Education_Level", "Marital_Status", "Income_Category", and "Card_Category". For ordered categorical variables, the classes need to be sorted from low to high before conversion. One type of class encountered in the variables "Education_Level", "Marital_Status", and "Income_Category" is "Unknown". There are 3,046 samples that meet the criterion of having at least one of these three variables labeled as "Unknown", constituting 30% of the total sample size. The treatment for "Unknown" in any of these three variables is to change it to the majority class of the respective variable. The details of these six categorical variables are outlined in Table 2.
Sampling techniques | Machine learning | accuracy | precision | recall | F1 | AUC |
Random Oversampling | RF | 0.9650 | 0.9648 | 0.9650 | 0.9449 | 0.9901 |
GBDT | 0.9743 | 0.9746 | 0.9743 | 0.9745 | 0.9954 | |
Extra Tree | 0.9620 | 0.9613 | 0.9620 | 0.9613 | 0.9890 | |
AdaBoost | 0.9521 | 0.9568 | 0.9521 | 0.9535 | 0.9893 | |
XGBoost | 0.9748 | 0.9751 | 0.9748 | 0.9749 | 0.9939 | |
CatBoost | 0.9556 | 0.9606 | 0.9556 | 0.9570 | 0.9900 | |
SMOTE | RF | 0.9615 | 0.9627 | 0.9615 | 0.9620 | 0.9897 |
GBDT | 0.9699 | 0.9701 | 0.9699 | 0.9700 | 0.9945 | |
Extra Tree | 0.9600 | 0.9596 | 0.9600 | 0.9597 | 0.9881 | |
AdaBoost | 0.9556 | 0.9565 | 0.9556 | 0.9560 | 0.9856 | |
XGBoost | 0.9704 | 0.9704 | 0.9704 | 0.9704 | 0.9919 | |
CatBoost | 0.9590 | 0.9607 | 0.9590 | 0.9596 | 0.9884 | |
Borderline-SMOTE | RF | 0.9585 | 0.9583 | 0.9585 | 0.9588 | 0.9891 |
GBDT | 0.9709 | 0.9711 | 0.9709 | 0.9710 | 0.9936 | |
Extra Tree | 0.9556 | 0.9549 | 0.9556 | 0.9551 | 0.9886 | |
AdaBoost | 0.9516 | 0.9538 | 0.9516 | 0.9524 | 0.9855 | |
XGBoost | 0.9719 | 0.9719 | 0.9719 | 0.9719 | 0.9934 | |
CatBoost | 0.9497 | 0.9532 | 0.9497 | 0.9508 | 0.9857 | |
ADASYN | RF | 0.9605 | 0.9615 | 0.9605 | 0.9609 | 0.9901 |
GBDT | 0.9709 | 0.9710 | 0.9709 | 0.9709 | 0.9939 | |
Extra Tree | 0.9566 | 0.9561 | 0.9566 | 0.9563 | 0.9885 | |
AdaBoost | 0.9576 | 0.9589 | 0.9876 | 0.9581 | 0.9859 | |
XGBoost | 0.9724 | 0.9724 | 0.9724 | 0.9724 | 0.9931 | |
CatBoost | 0.9546 | 0.9575 | 0.9546 | 0.9556 | 0.9878 |
The variable "Marital_Status" is an unordered categorical variable with more than two distinct values, requiring conversion into dummy variables.
The variable "Attrition_Flag" indicates whether a customer has churned or not, serving as the target variable of this study. The dataset contains 1,627 samples in the "Attrited Customer" class. The ratio of these samples to those in the "Existing Customer" class is greater than 5:1, indicating a significant class imbalance, as shown in Figure 2.
If the training set is used by models without preprocessing, it will decrease the ability of the machine learning models to recognize churned customers, leading to overfitting in the results and reducing prediction accuracy (Alam et al., 2020). To avoid these problems and improve the performance of the model, improved oversampling has been adopted to balance the class distribution, which involves increasing the number of samples from the minority class. In this study, the following four sampling algorithms are used.
Random oversampling is a straightforward method that randomly selects and replicates samples from the minority class until the classes are balanced.
SMOTE is an improved algorithm for handling imbalanced data based on random oversampling. The first step involves calculating the distance between each sample from the minority class and identifying the k nearest neighbors of the same class. Then, a specified number of samples are randomly selected from these nearest neighbors. A new synthetic sample is generated between the minority class sample and the selected nearest neighbor, lying on the line connecting these two samples, thereby increasing the number of minority class samples (Akın, 2023).
Borderline-SMOTE is an improved version of the SMOTE algorithm. The key difference is an additional preliminary step: a minority class sample is selected to apply the SMOTE algorithm if most of its k nearest neighbors are from the majority class (Gu et al., 2023).
ADASYN is another common oversampling method similar to SMOTE. The key to this algorithm is calculating the ratio of samples between different classes for each minority class sample and using this ratio distribution to determine the number of synthetic samples generated for each minority class sample (Dube and Verster, 2023).
This study employs six machine learning methods to predict credit card customer churn. In addition to RF and AdaBoost, which were used in the previous research to predict barrier option prices (Li and Yan, 2023), this study utilizes the extra tree algorithm and three other prevalent boosting algorithms: GBDT, XGBoost, and CatBoost. To enhance the performance of these machine learning models, the grid search method is used to tune the hyperparameters of each model and find the optimal set. A complete customer churn prediction model can be constructed by combining one of the sampling techniques mentioned above with one of these six machine learning methods. The prediction will result in four possible cases:
● True positive (TP): The predicted result is positive, and the actual value is also positive.
● True negative (TN): The predicted result is negative, and the actual value is also negative.
● False positive (FP): The predicted result is positive, whereas the actual value is negative.
● False negative (FN): The predicted result is negative, whereas the actual value is positive.
TP and TN indicate the predicted results are consistent with the actual values, and FP and FN indicate the opposite. To evaluate the performance of models in binary classification problems, the following indicators are typically utilized: accuracy, precision, recall, F1 score, and AUC. The formulas for the first four indicators are defined as follows:
accuracy=TP+TNTP+TN+FP+FN, | (1) |
precision=TPTP+FP, | (2) |
recall=TPTP+FN, | (3) |
F1=2×precision×recallprecision+recall. | (4) |
The value of AUC is equal to the area under the receiver operating characteristic curve (ROC), which is the curve consisting of recall and false positive rate (FPR). The formula for FPR is as follows:
FPR=FPFP+TN. | (5) |
The closer the values of these five indicators are to 1, the better the model performs.
Causal inference is a method that effectively explores and analyzes whether a variable is the main factor affecting the target variable ((Jiang, 2022). In traditional causal inference, the randomized controlled trial is generally considered as a reliable methodology to determine the influence of variables. However, in practice, due to the cost and ethical concerns of experiment, causal inference is often based on collected observational datasets. In Facure's book, which explores the combination of machine learning and causal inference, the authors propose a modeling approach called R-learner for continuous variables in a dataset to assess the causal significance of an independent variable on a dependent variable (Facure M, 2023). In current industrial applications, various code tools are becoming available for researchers to explore the feasibility of machine learning in causal inference problems (Molak A, 2023). To determine whether a variable has a causal relationship with customer churn and to quantify its impact, this study adopts the causal inference method to understand the causal effects among them.
In causality, the cause variable is the treatment denoted as T, while the dependent variable is the outcome denoted as Y. Additionally, the other independent variables are referred to as characteristic variables, denoted as X. For an individual sample i, the prediction model predicts the corresponding value Yi as Yi(T=1|X) when the treatment is applied (i.e., Ti=1), and as Yi(T=0|X) without the treatment (i.e., Ti=0). Individual treatment effect (ITE) calculates the difference in outcomes for an individual between the scenario with treatment and the scenario without treatment. The formula of ITE is shown as follows:
ITEi=Yi(T=1|X)−Yi(T=0|X). | (6) |
Considering the overall casual effect between the treatment and the outcome, it needs to measure CATE, which is given by the following equation:
CATE=E[Y|T=1,X]−E[Y|T=0,X]. | (7) |
The value of CATE indicates whether there exists a causal relationship between the treatment variable T on the outcome Y, and the quantification of its effect.
For binary classification problems, this study uses R-learner to estimate the causality and effect of different continuous treatment variables on the outcome Y respectively, based on the optimal customer churn prediction model. The CATE value in R-learner is obtained by calculating the probability of the same outcome occurring with and without the treatment, which is given by the following equation:
CATE∗=P(Y=1|T=1,X)−P(Y=1|T=0,X). | (8) |
For each T, hypothesis testing is applied to CATE∗ to determine the casual impact of that treatment variable on Y.
The experimental environment for this study is Python, utilizing toolboxes such as imblearn, sklearn, xgboost, and catboost. Following the data preprocessing steps outlined in Section 3.1, the dataset is divided into a training set and a test set in an 8:2 ratio. The training set data is balanced using four sampling techniques, and the independent variables of all the data are normalized. Subsequently, six machine learning methods are trained on the training set data, resulting in twenty-four complete customer churn prediction models. These trained models are then applied to predict the "Attrition_Flag" values in the test set, and the performance of these models is evaluated using the indicators mentioned in Section 3.3.
The results of the accuracy for the training set, evaluated using six machine learning methods, range from 96.06% to 99.92% under random oversampling, 97.33% to 99.88% under SMOTE, 96.95% to 99.94% under Borderline-SMOTE, and 96.88% to 99.91% under ADASYN. The test set results are as follows.
As shown in Table 2, the values of all indicators for the complete customer churn prediction models are above 0.94 after hyperparameter tuning and training, indicating good performance for each model. The difference in the evaluation results of the prediction models between the training set and the test set is not significant, and there is no instance of complete overfitting. For the random oversampling-XGBoost, SMOTE-XGBoost, Borderline-SMOTE-XGBoost, and ADASYN-XGBoost models, all indicator values exceed 0.97, suggesting that these four models perform better than the others. These four models can be considered the optimal prediction models. Furthermore, the XGBoost model performs better and more stably compared to other machine learning models across different sampling techniques by optimizing both the loss function and regularization term (Guo and Fan, 2024).
The SHAP values method is used to understand the contribution of each variable in the optimal prediction models and its relationship with the target variable better (Wu et al., 2024). When the SHAP value of a variable is positive, it indicates a positive relationship with customer churn, whereas a negative SHAP value indicates a negative relationship. By using the SHAP toolbox in Python, an interpretable explanation of the optimal prediction models is obtained, resulting in corresponding SHAP summary plots. The variables in the SHAP summary plots are sorted from highest to lowest according to their importance in each model, with the top 10 variables selected. The points corresponding to each variable represent their SHAP values.
Based on the four figures above, in any of the optimal prediction models, the variables "Total_Trans_Ct", "Total_Trans_Amt", and "Total_Revolving_Bal" consistently rank in the top three in terms of importance and in the same order. Moreover, the positive or negative relationship between each of these variables and "Attrition_Flag" remains unchanged. These three variables represent the customer's total number of transactions with the bank in the past year, the total amount of transactions, and the total revolving balance of the credit card, respectively. As the total number of transactions increases, the SHAP value decreases, indicating a negative relationship with customer churn. This suggests that customers with a higher number of transactions are less likely to churn. For the second most important variable, the higher the total amount of transactions, the higher the SHAP value. When the total transaction amount is small, the SHAP value can be either positive or negative. There is a significant positive relationship with customer churn only when the total amount exceeds a certain threshold, indicating that customers are more likely to churn when their total transaction amount is above this threshold. For "Total_Revolving_Bal", the greater the total revolving balance of the credit card, the lower the SHAP value, indicating that customers with a larger total credit card revolving balance are less prone to churn compared to those with smaller total credit card revolving balance.
Furthermore, for each optimal prediction model, the variables "Total_Relationship_Count", "Total_Amt_Chng_Q4_Q1", and "Total_Ct_Chng_Q4_Q1" also rank among the top 10 of the SHAP values, consistently appearing in the middle positions. These three variables represent the total number of bank products held by customers, the change in the amount from the fourth quarter to the first quarter, and the change in the number of transactions, respectively. For these three variables, as their values decrease, their SHAP values increase, leading to the predicted result tending closer to 1, which indicates a negative relationship with customer churn. This means that customers who hold a larger number of products or have significant changes in amount and transaction frequency across different quarters are less likely to churn.
For the remaining variables in the SHAP summary plots, although some are important in most optimal prediction models, they pertain to customers' personal information. Therefore, it is not convenient to discuss and classify the attributes of variables such as "Customer_Age", "Marital_Status_0", "Marital_Status_1", and "Education_Level". Additionally, "Months_Inactive_12_mon" for the total number of months inactive in the past year, "Contacts_Count_12_mon" for the total number of contacts in the past year, and "Gender" are only among the top 10 in the random oversampling-XGBoost model. Among these, "Gender" is personal information, while the other two variables are positively related to customer churn. Furthermore, the variable "Avg_Utilization_Ratio" only appears in the ADASYN-XGBoost model, ranking 10th in importance. This variable represents the average utilization rate of bank credit cards, with a lower rate indicating a higher likelihood of customer churn.
The statsmodels toolbox in python is used to analyze the causal effects between variables without personal information of customers and customer churn in the SHAP summary plots. The XGBoost model is selected for estimating customer churn in R-learner due to its superior predictive performance. The experimental results of the causal inference method are listed as follows.
According to Table 3, the p-values for "Total_Revolving_Bal" and "Avg_Utilization_Ratio" are greater than 0.05, indicating no significant difference in customer churn caused by changes in these two treatment variables. Combined with the SHAP values analysis, this suggests that there is only a correlation between each of these two variables and customer churn, not a causal relationship. For the remaining treatment variables, the p-values are all less than 0.05, indicating that changes in each of these variables have causal relationships with customer churn.
Variables | coef | std err | t | P>|t| | 95 % interval |
Total_Trans_Ct | −0.0003 | 2.48e-05 | −13.109 | 0.000 | [−0.000, −0.000] |
Total_Trans_Amt | 7.163e-06 | 4.14e-07 | 17.308 | 0.000 | [6.35e-06, 7.97e-06] |
Total_Revolving_Bal | −4.396e-06 | 5.8e-06 | −0.758 | 0.449 | [-1.58e-05, 6.98e-06] |
Total_Relationship_Count | −0.0008 | 0.000 | −6.324 | 0.000 | [-0.001, -0.001] |
Total_Amt_Chng_Q4_Q1 | −0.0079 | 0.001 | −7.078 | 0.000 | [−0.010, −0.006] |
Total_Ct_Chng_Q4_Q1 | −0.0051 | 0.001 | −4.819 | 0.000 | [−0.007, −0.003] |
Months_Inactive_12_mon | 0.0007 | 0.000 | 3.968 | 0.000 | [0.000, 0.001] |
Contacts_Count_12_mon | 0.0006 | 0.000 | 3.872 | 0.000 | [0.000, 0.001] |
Avg_Utilization_Ratio | 0.0039 | 0.010 | 0.384 | 0.701 | [−0.016, 0.024] |
Considering the coefficient values in Table 3, the treatment variables "Total_Trans_Ct", "Total_Relationship_Count", "Total_Amt_Chng_Q4_Q1", and "Total_Ct_Chng_Q4_Q1" have negative coefficients, indicating negative causal relationships with customer churn. Among these variables, "Total_Amt_Chng_Q4_Q1" has the largest causal effect, while "Total_Trans_Ct" has the smallest. Conversely, the coefficients of "Total_Trans_Amt", "Months_Inactive_12_mon", and "Contacts_Count_12_mon" are positive, indicating positive causal relationships with customer churn. "Months_Inactive_12_mon" displays the largest causal effect, while "Total_Trans_Amt" displays the smallest. Compared with the results of the SHAP values method, the similarity is that these seven variables have the same direction of impact on customer churn, while the difference lies in the order of the quantified effect values.
In the current research, to enhance the accuracy of prediction results, a combination of sampling techniques and machine learning models was employed to forecast customer churn in banks. A comparative performance analysis indicates that the XGBoost model consistently outperforms other machine learning models, achieving an accuracy of at least 97%, regardless of the sampling techniques used. Furthermore, the SHAP values method was utilized to interpret the optimized prediction models, while R-learner was used to investigate the causal effects of these variables on customer churn. Based on these two methods, the main important variables affecting customer churn, which include the total number and amount of transactions with the bank in the past year, the total number of bank products held by the customer, and the changes in the amount and number of transactions from the fourth quarter to the first quarter, were identified. Additionally, the analysis found that the total credit card revolving balance does not have a significant causal relationship with customer churn, but there is a strong correlation. The research findings provide valuable recommendations for bank managers to improve customer management strategies.
Due to the limited number of minority class samples in the dataset, the experiment requires sampling techniques to generate synthetic samples. This approach enables the prediction model to more accurately identify the categories of samples. Furthermore, excluding the variables belonging to customers' personal information, the other variables consist of cross-sectional data. They are typically utilized in the analysis that emphasize the differences among individual samples, rather than focusing on changes within a sample over time. Therefore, if additional samples with more extensive variables, such as time series data, are available the feasibility of the model's predictions, so that a better comparison of the interaction between the SHAP values method and the casual inference method in the further research.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this study.
The authors have received no financial assistance from any source in the preparation of this study.
All authors declare no conflicts of interest in this study.
[1] |
P. Grasland-Mongrain, C. Lafon, Review on biomedical techniques for imaging electrical impedance, IRBM, 39 (2018), 243–250. https://doi.org/10.1016/j.irbm.2018.06.001 doi: 10.1016/j.irbm.2018.06.001
![]() |
[2] |
H. Wen, R. S. Balaban, The potential for hall effect breast imaging, Breast Dis., 10 (1998), 191–195. https://doi.org/10.3233/BD-1998-103-418 doi: 10.3233/BD-1998-103-418
![]() |
[3] |
Y. Zhou, Z. Yu, Q. Ma, G. Guo, J. Tu, D. Zhang, Noninvasive treatment-efficacy evaluation for HIFU therapy based on magneto-acousto-electrical tomography, IEEE Trans. Biomed. Eng., 66 (2018), 666–674. https://doi.org/10.1109/TBME.2018.2853594 doi: 10.1109/TBME.2018.2853594
![]() |
[4] |
Y. Li, G. Liu, H. Xia, Z. Xia, Numerical simulations and experimental study of magneto-acousto-electrical tomography with plane transducer, IEEE Trans. Magn., 54 (2017), 1–4. https://doi.org/10.1109/TMAG.2017.2771564 doi: 10.1109/TMAG.2017.2771564
![]() |
[5] |
G. Guo, J. Wang, Q. Ma, J. Tu, D. Zhang, Non-invasive treatment efficacy evaluation for high-intensity focused ultrasound therapy using magnetically induced magnetoacoustic measurement, J. Appl. Phys., 123 (2018), 154901. https://doi.org/10.1063/1.5024735 doi: 10.1063/1.5024735
![]() |
[6] |
M. S. Gözü, R. Zengin, N. G. Gençer, Numerical implementation of magneto-acousto-electrical tomography (MAET) using a linear phased array transducer, Phys. Med. Biol., 63 (2018), 35012. https://doi.org/10.1088/1361-6560/aa9f3b doi: 10.1088/1361-6560/aa9f3b
![]() |
[7] | P. Grasland-Mongrain, F. Destrempes, J. Mari, R. Souchon, S. Catheline, J. Chapelon, Acousto-electrical speckle pattern in electrical impedance tomography, in 2014 IEEE International Ultrasonics Symposium, (2014), 221–223. https://doi.org/10.1109/ULTSYM.2014.0056 |
[8] |
L. Guo, G. Liu, H. Xia, Magneto-acousto-electrical tomography with magnetic induction for conductivity reconstruction, IEEE Trans. Biomed. Eng., 62 (2014), 2114–2124. https://doi.org/10.1109/TBME.2014.2382562 doi: 10.1109/TBME.2014.2382562
![]() |
[9] |
L. Kunyansky, C. P. Ingram, R. S. Witte, Rotational magneto-acousto-electric tomography (MAET): theory and experimental validation, Phys. Med. Biol., 62 (2017), 3025. https://doi.org/10.1088/1361-6560/aa6222 doi: 10.1088/1361-6560/aa6222
![]() |
[10] |
L. Kunyansky, A mathematical model and inversion procedure for magneto-acousto-electric tomography, Inverse Probl., 28 (2012), 035002. https://doi.org/10.1088/0266-5611/28/3/035002 doi: 10.1088/0266-5611/28/3/035002
![]() |
[11] |
H. Ammari, P. Grasland-Mongrain, P. Millien, L. Seppecher, J. Seo, A mathematical and numerical framework for ultrasonically-induced Lorentz force electrical impedance tomography, J. Math. Pures Appl., 103 (2015), 1390–1409. https://doi.org/10.1016/j.matpur.2014.11.003 doi: 10.1016/j.matpur.2014.11.003
![]() |
[12] |
Y. Li, J. Song, H. Xia, G. Liu, The experimental study of mouse liver in magneto-acousto-electrical tomography by scan mode, Phys. Med. Biol., 65 (2020), 215024. https://doi.org/10.1088/1361-6560/abb4bb doi: 10.1088/1361-6560/abb4bb
![]() |
[13] |
Z. Sun, G. Liu, H. Xia, S. Catheline, Lorentz force electrical-impedance tomography using linearly frequency-modulated ultrasound pulse, IEEE Trans. Ultrason. Ferroelectr. Freq. Control, 65 (2018), 168–177. https://doi.org/10.1109/TUFFC.2017.2781189 doi: 10.1109/TUFFC.2017.2781189
![]() |
[14] |
M. Dai, X. Chen, T. Sun, L. Yu, M. Chen, H. Lin, A 2D magneto-acousto-electrical tomography method to detect conductivity variation using multifocus image method, Sensors, 18 (2018), 2373. https://doi.org/10.3390/s18072373 doi: 10.3390/s18072373
![]() |
[15] |
E. Renzhiglova, V. Ivantsiv, Y. Xu, Difference frequency magneto-acousto-electrical tomography (DF-MAET): application of ultrasound-induced radiation force to imaging electrical current density, IEEE Trans. Ultrason. Ferroelectr. Freq. Control, 57 (2010), 2391–2402. https://doi.org/10.1109/TUFFC.2010.1707 doi: 10.1109/TUFFC.2010.1707
![]() |
[16] |
A. Montalibet, J. Jossinet, A. Matias, Scanning electric conductivity gradients with ultrasonically-induced lorentz force, Ultrason. Imaging, 23 (2001), 117–132. https://doi.org/10.1177/016173460102300204 doi: 10.1177/016173460102300204
![]() |
[17] |
Y. Jin, H. Zhao, G. Liu, H. Xia, Y. Li, The application of wavelet filtering method in magneto-acousto-electrical tomography, Phys. Med. Biol., 68 (2023), 145014. https://doi.org/10.1088/1361-6560/ace09c doi: 10.1088/1361-6560/ace09c
![]() |
[18] | S. Anwar, N. Barnes, Real image denoising with feature attention, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 3155–3164. https://doi.org/10.1109/ICCV.2019.00325 |
[19] | Z. Yue, H. Yong, Q. Zhao, D. Meng, L. Zhang, Variational denoising network: toward blind noise modeling and removal, in Advances in Neural Information Processing Systems, 32 (2019). Available from: https://proceedings.neurips.cc/paper_files/paper/2019/file/6395ebd0f4b478145ecfbaf939454fa4-Paper.pdf. |
[20] |
K. Zhang, W. Zuo, Y. Chen, D. Meng, L. Zhang, Beyond a gaussian denoiser: residual learning of deep CNN for image denoising, IEEE Trans. Image Process., 26 (2017), 3142–3155. https://doi.org/10.1109/TIP.2017.2662206 doi: 10.1109/TIP.2017.2662206
![]() |
[21] | I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., Generative adversarial nets, in Advances in Neural Information Processing Systems, 27 (2014). Available from: https://proceedings.neurips.cc/paper_files/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf. |
[22] | J. Chen, J. Chen, H. Chao, M. Yang, Image blind denoising with generative adversarial network based noise modeling, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 3155–3164. https://doi.org/10.1109/CVPR.2018.00333 |
[23] | D. W. Kim, J. R. Chung, S. W. Jung, Grdn: Grouped residual dense network for real image denoising and gan-based real-world noise modeling, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2019), 2086–2094. https://doi.org/10.1109/CVPRW.2019.00261 |
[24] |
P. Grasland-Mongrain, J. M. Mari, J. Y. Chapelon, C. Lafon, Lorentz force electrical impedance tomography, IRBM, 34 (2013), 357–360. https://doi.org/10.1016/j.irbm.2013.08.002 doi: 10.1016/j.irbm.2013.08.002
![]() |
[25] |
Y. Li, S. Bu, X. Han, H. Xia, W. Ren, G. Liu, Magneto-acousto-electrical tomography with nonuniform static magnetic field, IEEE Trans. Instrum. Meas., 72 (2023), 1–12. https://doi.org/10.1109/TIM.2023.3244814 doi: 10.1109/TIM.2023.3244814
![]() |
[26] |
H. Lin, Y. Chen, S. Xie, M. Yu, D. Deng, T. Sun, et al., A dual-modal imaging method combining ultrasound and electromagnetism for simultaneous measurement of tissue elasticity and electrical conductivity, IEEE Trans. Biomed. Eng., 69 (2022), 2499–2511. https://doi.org/10.1109/TBME.2022.3148120 doi: 10.1109/TBME.2022.3148120
![]() |
[27] |
M. Aharon, M. Elad, A. Bruckstein, K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation, IEEE Trans. Signal Process., 54 (2006), 4311–4322. https://doi.org/10.1109/TSP.2006.881199 doi: 10.1109/TSP.2006.881199
![]() |
[28] |
M. Elad, M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries, IEEE Trans. Image Process., 15 (2006), 3736–3745. https://doi.org/10.1109/TIP.2006.881969 doi: 10.1109/TIP.2006.881969
![]() |
[29] | S. Gu, L. Zhang, W. Zuo, X. Feng, Weighted nuclear norm minimization with application to image denoising, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, (2014), 2862–2869. https://doi.org/10.1109/CVPR.2014.366 |
[30] | A. Buades, B. Coll, J. M. Morel, A non-local algorithm for image denoising, in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2 (2005), 60–65. https://doi.org/10.1109/CVPR.2005.38 |
[31] |
K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian, Image denoising by sparse 3-D transform-domain collaborative filtering, IEEE Trans. Image Process., 16 (2007), 2080–2095. https://doi.org/10.1109/TIP.2007.901238 doi: 10.1109/TIP.2007.901238
![]() |
[32] |
J. Portilla, V. Strela, M. J. Wainwright, E. P. Simoncelli, Image denoising using scale mixtures of Gaussians in the wavelet domain, IEEE Trans. Image Process., 12 (2003), 1338–1351. https://doi.org/10.1109/TIP.2003.818640 doi: 10.1109/TIP.2003.818640
![]() |
[33] |
L. I. Rudin, S. Osher, E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D, 60 (1992), 259–268. https://doi.org/10.1016/0167-2789(92)90242-F doi: 10.1016/0167-2789(92)90242-F
![]() |
[34] |
A. Barbu, Training an active random field for real-time image denoising, IEEE Trans. Image Process., 18 (2009), 2451–2462. https://doi.org/10.1109/TIP.2009.2028254 doi: 10.1109/TIP.2009.2028254
![]() |
[35] | K. G. G. Samuel, M. F. Tappen, Learning optimized MAP estimates in continuously-valued MRF models, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), 477–484. https://doi.org/10.1109/CVPR.2009.5206774 |
[36] |
J. Sun, M. F. Tappen, Learning non-local range Markov Random field for image restoration, CVPR 2011, (2011), 2745–2752. https://doi.org/10.1109/CVPR.2011.5995520 doi: 10.1109/CVPR.2011.5995520
![]() |
[37] | U. Schmidt, Half-quadratic Inference and Learning for Natural Images, Ph.D thesis, Technische University, 2017. Available from: https://tuprints.ulb.tu-darmstadt.de/id/eprint/6044. |
[38] | U. Schmidt, S. Roth, Shrinkage fields for effective image restoration, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, (2014), 2774–2781. https://doi.org/10.1109/CVPR.2014.349 |
[39] |
Y. Chen, T. Pock, Trainable nonlinear reaction diffusion: a flexible framework for fast and effective image restoration, IEEE Trans. Pattern Anal. Mach. Intell., 39 (2017), 1256–1272. https://doi.org/10.1109/TPAMI.2016.2596743 doi: 10.1109/TPAMI.2016.2596743
![]() |
[40] | V. Jain, S. Seung, Natural image denoising with convolutional networks, in Advances in Neural Information Processing Systems, 21 (2008). Available from: https://proceedings.neurips.cc/paper_files/paper/2008/file/c16a5320fa475530d9583c34fd356ef5-Paper.pdf. |
[41] | H. C. Burger, C. J. Schuler, S. Harmeling, Image denoising: can plain neural networks compete with BM3D? in 2012 IEEE Conference on Computer Vision and Pattern Recognition, (2012), 2392–2399. https://doi.org/10.1109/CVPR.2012.6247952 |
[42] | X. Mao, C. Shen, Y. B. Yang, Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections, in Advances in Neural Information Processing Systems, 29 (2016). Available from: https://proceedings.neurips.cc/paper_files/paper/2016/file/0ed9422357395a0d4879191c66f4faa2-Paper.pdf. |
[43] | D. Liu, B. Wen, Y. Fan, C. C. Loy, T. S. Huang, Non-local recurrent network for image restoration, in Advances in Neural Information Processing Systems, 31 (2018). Available from: https://proceedings.neurips.cc/paper_files/paper/2018/file/fc49306d97602c8ed1be1dfbf0835ead-Paper.pdf. |
[44] | T. Plötz, S. Roth, Neural nearest neighbors networks, in Advances in Neural Information Processing Systems, 31 (2018). Available from: https://proceedings.neurips.cc/paper_files/paper/2018/file/f0e52b27a7a5d6a1a87373dffa53dbe5-Paper.pdf. |
[45] | Z. Yue, Q. Zhao, L. Zhang, D. Meng, Dual adversarial network: toward real-world noise removal and noise generation, in Computer Vision – ECCV 2020, (2020), 41–58. https://doi.org/10.1007/978-3-030-58607-2_3 |
[46] |
N. Mu, Z. Lyu, M. Rezaeitaleshmahalleh, J. Tang, J. Jiang, An attention residual u-net with differential preprocessing and geometric postprocessing: learning how to segment vasculature including intracranial aneurysms, Med. Image Anal., 84 (2023), 102697. https://doi.org/10.1016/j.media.2022.102697 doi: 10.1016/j.media.2022.102697
![]() |
[47] |
X. Liu, D. Zhang, J. Yao, J. Tang, Transformer and convolutional based dual branch network for retinal vessel segmentation in OCTA images, Biomed. Signal Process. Control, 83 (2023), 104604. https://doi.org/10.1016/j.bspc.2023.104604 doi: 10.1016/j.bspc.2023.104604
![]() |
[48] |
M. Versaci, G. Angiulli, P. Crucitti, D. de Carlo, F. Laganà, D. Pellicanò, et al., A fuzzy similarity-based approach to classify numerically simulated and experimentally detected carbon fiber-reinforced polymer plate defects, Sensors, 22 (2022), 4232. https://doi.org/10.3390/s22114232 doi: 10.3390/s22114232
![]() |
[49] |
Z. Yue, H. Yong, D. Meng, Q. Zhao, Y. Leung, L. Zhang, Robust multiview subspace learning with nonindependently and nonidentically distributed complex noise, IEEE Trans. Neural Networks Learn. Syst., 31 (2020), 1070–1083. https://doi.org/10.1109/TNNLS.2019.2917328 doi: 10.1109/TNNLS.2019.2917328
![]() |
[50] | D. P. Kingma, M. Welling, Auto-encoding variational bayes, preprint, arXiv: 1312.6114. |
[51] | C. Li, T. Xu, J. Zhu, B. Zhang, Triple generative adversarial nets, in Advances in Neural Information Processing Systems, 30 (2017). Available from: https://proceedings.neurips.cc/paper_files/paper/2017/file/86e78499eeb33fb9cac16b7555b50767-Paper.pdf. |
[52] | M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial networks, in Proceedings of the 34th International Conference on Machine Learning, 70 (2017), 214–223. Available from: https://proceedings.mlr.press/v70/arjovsky17a.html. |
[53] | O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional networks for biomedical image segmentation, in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, (2015), 234–241. https://doi.org/10.1007/978-3-319-24574-4_28 |
[54] | K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90 |
[55] | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Advances in Neural Information Processing Systems, 30 (2017). Available from: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf. |
[56] | I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. C. Courville, Improved training of wasserstein GANs, in Advances in Neural Information Processing Systems, 30 (2017). Available from: https://proceedings.neurips.cc/paper_files/paper/2017/file/892c3b1c6dccd52936e27cbd0ff683d6-Paper.pdf. |
[57] | A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, preprint, arXiv: 1511.06434. |
[58] | P. Isola, J. Y. Zhu, T. Zhou, A. A. Efros, Image-to-image translation with conditional adversarial networks, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 1125–1134. https://doi.org/10.1109/CVPR.2017.632 |
[59] | J. Y. Zhu, T. Park, P. Isola, A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), (2017), 2223–2232. |
[60] | S. Guo, Z. Yan, K. Zhang, W. Zuo, L. Zhang, Toward convolutional blind denoising of real photographs, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 1712–1722. https://doi.org/10.1109/CVPR.2019.00181 |
[61] | J. Korhonen, J. You, Peak signal-to-noise ratio revisited: is simple beautiful? in 2012 Fourth International Workshop on Quality of Multimedia Experience, (2012), 37–38. https://doi.org/10.1109/QoMEX.2012.6263880 |
[62] |
Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE Trans. Image Process., 13 (2004), 600–612. https://doi.org/10.1109/TIP.2003.819861 doi: 10.1109/TIP.2003.819861
![]() |
[63] | X. Cao, Y. Chen, Q. Zhao, D. Meng, Y. Wang, D. Wang, et al., Low-rank matrix factorization under general mixture noise distributions, in 2015 IEEE International Conference on Computer Vision (ICCV), (2015), 1493–1501. https://doi.org/10.1109/ICCV.2015.175 |
[64] | D. P. Kingma, J. Ba, Adam: a method for stochastic optimization, preprint, arXiv: 1412.6980. |
[65] | A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, et al., PyTorch: an imperative style, high-performance deep learning library, in Advances in Neural Information Processing Systems, 32 (2019). Available from: https://proceedings.neurips.cc/paper_files/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf. |
Categorical Variables | Class | Number of classes | Conversion to numbers |
Attrition_Flag | Existing Customer | 8500 | 0 |
Attrited Customer | 1627 | 1 | |
Gender | M | 4769 | 0 |
F | 5358 | 1 | |
Education_Level | Uneducated | 1487 | 6 |
High School | 2013 | 15 | |
College | 1013 | 18 | |
Graduate | 3128 | 22 | |
Post-Graduate | 516 | 24 | |
Doctorate | 451 | 28 | |
Unknown | 1519 | 22 | |
Marital_Status | Single | 3943 | 0 |
Married | 4687 | 1 | |
Divorced | 748 | 2 | |
Unknown | 749 | 1 | |
Income_Category | Less than $40K | 3561 | 2 |
$40K - $60K | 1790 | 5 | |
$60K - $80K | 1402 | 7 | |
$80K - $120K | 1535 | 10 | |
$120K + | 727 | 14 | |
Unknown | 1112 | 2 | |
Card_Category | Blue | 9436 | 0 |
Silver | 555 | 1 | |
Gold | 116 | 2 | |
Platinum | 20 | 3 |
Sampling techniques | Machine learning | accuracy | precision | recall | F1 | AUC |
Random Oversampling | RF | 0.9650 | 0.9648 | 0.9650 | 0.9449 | 0.9901 |
GBDT | 0.9743 | 0.9746 | 0.9743 | 0.9745 | 0.9954 | |
Extra Tree | 0.9620 | 0.9613 | 0.9620 | 0.9613 | 0.9890 | |
AdaBoost | 0.9521 | 0.9568 | 0.9521 | 0.9535 | 0.9893 | |
XGBoost | 0.9748 | 0.9751 | 0.9748 | 0.9749 | 0.9939 | |
CatBoost | 0.9556 | 0.9606 | 0.9556 | 0.9570 | 0.9900 | |
SMOTE | RF | 0.9615 | 0.9627 | 0.9615 | 0.9620 | 0.9897 |
GBDT | 0.9699 | 0.9701 | 0.9699 | 0.9700 | 0.9945 | |
Extra Tree | 0.9600 | 0.9596 | 0.9600 | 0.9597 | 0.9881 | |
AdaBoost | 0.9556 | 0.9565 | 0.9556 | 0.9560 | 0.9856 | |
XGBoost | 0.9704 | 0.9704 | 0.9704 | 0.9704 | 0.9919 | |
CatBoost | 0.9590 | 0.9607 | 0.9590 | 0.9596 | 0.9884 | |
Borderline-SMOTE | RF | 0.9585 | 0.9583 | 0.9585 | 0.9588 | 0.9891 |
GBDT | 0.9709 | 0.9711 | 0.9709 | 0.9710 | 0.9936 | |
Extra Tree | 0.9556 | 0.9549 | 0.9556 | 0.9551 | 0.9886 | |
AdaBoost | 0.9516 | 0.9538 | 0.9516 | 0.9524 | 0.9855 | |
XGBoost | 0.9719 | 0.9719 | 0.9719 | 0.9719 | 0.9934 | |
CatBoost | 0.9497 | 0.9532 | 0.9497 | 0.9508 | 0.9857 | |
ADASYN | RF | 0.9605 | 0.9615 | 0.9605 | 0.9609 | 0.9901 |
GBDT | 0.9709 | 0.9710 | 0.9709 | 0.9709 | 0.9939 | |
Extra Tree | 0.9566 | 0.9561 | 0.9566 | 0.9563 | 0.9885 | |
AdaBoost | 0.9576 | 0.9589 | 0.9876 | 0.9581 | 0.9859 | |
XGBoost | 0.9724 | 0.9724 | 0.9724 | 0.9724 | 0.9931 | |
CatBoost | 0.9546 | 0.9575 | 0.9546 | 0.9556 | 0.9878 |
Variables | coef | std err | t | P>|t| | 95 % interval |
Total_Trans_Ct | −0.0003 | 2.48e-05 | −13.109 | 0.000 | [−0.000, −0.000] |
Total_Trans_Amt | 7.163e-06 | 4.14e-07 | 17.308 | 0.000 | [6.35e-06, 7.97e-06] |
Total_Revolving_Bal | −4.396e-06 | 5.8e-06 | −0.758 | 0.449 | [-1.58e-05, 6.98e-06] |
Total_Relationship_Count | −0.0008 | 0.000 | −6.324 | 0.000 | [-0.001, -0.001] |
Total_Amt_Chng_Q4_Q1 | −0.0079 | 0.001 | −7.078 | 0.000 | [−0.010, −0.006] |
Total_Ct_Chng_Q4_Q1 | −0.0051 | 0.001 | −4.819 | 0.000 | [−0.007, −0.003] |
Months_Inactive_12_mon | 0.0007 | 0.000 | 3.968 | 0.000 | [0.000, 0.001] |
Contacts_Count_12_mon | 0.0006 | 0.000 | 3.872 | 0.000 | [0.000, 0.001] |
Avg_Utilization_Ratio | 0.0039 | 0.010 | 0.384 | 0.701 | [−0.016, 0.024] |
Categorical Variables | Class | Number of classes | Conversion to numbers |
Attrition_Flag | Existing Customer | 8500 | 0 |
Attrited Customer | 1627 | 1 | |
Gender | M | 4769 | 0 |
F | 5358 | 1 | |
Education_Level | Uneducated | 1487 | 6 |
High School | 2013 | 15 | |
College | 1013 | 18 | |
Graduate | 3128 | 22 | |
Post-Graduate | 516 | 24 | |
Doctorate | 451 | 28 | |
Unknown | 1519 | 22 | |
Marital_Status | Single | 3943 | 0 |
Married | 4687 | 1 | |
Divorced | 748 | 2 | |
Unknown | 749 | 1 | |
Income_Category | Less than $40K | 3561 | 2 |
$40K - $60K | 1790 | 5 | |
$60K - $80K | 1402 | 7 | |
$80K - $120K | 1535 | 10 | |
$120K + | 727 | 14 | |
Unknown | 1112 | 2 | |
Card_Category | Blue | 9436 | 0 |
Silver | 555 | 1 | |
Gold | 116 | 2 | |
Platinum | 20 | 3 |
Sampling techniques | Machine learning | accuracy | precision | recall | F1 | AUC |
Random Oversampling | RF | 0.9650 | 0.9648 | 0.9650 | 0.9449 | 0.9901 |
GBDT | 0.9743 | 0.9746 | 0.9743 | 0.9745 | 0.9954 | |
Extra Tree | 0.9620 | 0.9613 | 0.9620 | 0.9613 | 0.9890 | |
AdaBoost | 0.9521 | 0.9568 | 0.9521 | 0.9535 | 0.9893 | |
XGBoost | 0.9748 | 0.9751 | 0.9748 | 0.9749 | 0.9939 | |
CatBoost | 0.9556 | 0.9606 | 0.9556 | 0.9570 | 0.9900 | |
SMOTE | RF | 0.9615 | 0.9627 | 0.9615 | 0.9620 | 0.9897 |
GBDT | 0.9699 | 0.9701 | 0.9699 | 0.9700 | 0.9945 | |
Extra Tree | 0.9600 | 0.9596 | 0.9600 | 0.9597 | 0.9881 | |
AdaBoost | 0.9556 | 0.9565 | 0.9556 | 0.9560 | 0.9856 | |
XGBoost | 0.9704 | 0.9704 | 0.9704 | 0.9704 | 0.9919 | |
CatBoost | 0.9590 | 0.9607 | 0.9590 | 0.9596 | 0.9884 | |
Borderline-SMOTE | RF | 0.9585 | 0.9583 | 0.9585 | 0.9588 | 0.9891 |
GBDT | 0.9709 | 0.9711 | 0.9709 | 0.9710 | 0.9936 | |
Extra Tree | 0.9556 | 0.9549 | 0.9556 | 0.9551 | 0.9886 | |
AdaBoost | 0.9516 | 0.9538 | 0.9516 | 0.9524 | 0.9855 | |
XGBoost | 0.9719 | 0.9719 | 0.9719 | 0.9719 | 0.9934 | |
CatBoost | 0.9497 | 0.9532 | 0.9497 | 0.9508 | 0.9857 | |
ADASYN | RF | 0.9605 | 0.9615 | 0.9605 | 0.9609 | 0.9901 |
GBDT | 0.9709 | 0.9710 | 0.9709 | 0.9709 | 0.9939 | |
Extra Tree | 0.9566 | 0.9561 | 0.9566 | 0.9563 | 0.9885 | |
AdaBoost | 0.9576 | 0.9589 | 0.9876 | 0.9581 | 0.9859 | |
XGBoost | 0.9724 | 0.9724 | 0.9724 | 0.9724 | 0.9931 | |
CatBoost | 0.9546 | 0.9575 | 0.9546 | 0.9556 | 0.9878 |
Variables | coef | std err | t | P>|t| | 95 % interval |
Total_Trans_Ct | −0.0003 | 2.48e-05 | −13.109 | 0.000 | [−0.000, −0.000] |
Total_Trans_Amt | 7.163e-06 | 4.14e-07 | 17.308 | 0.000 | [6.35e-06, 7.97e-06] |
Total_Revolving_Bal | −4.396e-06 | 5.8e-06 | −0.758 | 0.449 | [-1.58e-05, 6.98e-06] |
Total_Relationship_Count | −0.0008 | 0.000 | −6.324 | 0.000 | [-0.001, -0.001] |
Total_Amt_Chng_Q4_Q1 | −0.0079 | 0.001 | −7.078 | 0.000 | [−0.010, −0.006] |
Total_Ct_Chng_Q4_Q1 | −0.0051 | 0.001 | −4.819 | 0.000 | [−0.007, −0.003] |
Months_Inactive_12_mon | 0.0007 | 0.000 | 3.968 | 0.000 | [0.000, 0.001] |
Contacts_Count_12_mon | 0.0006 | 0.000 | 3.872 | 0.000 | [0.000, 0.001] |
Avg_Utilization_Ratio | 0.0039 | 0.010 | 0.384 | 0.701 | [−0.016, 0.024] |