
Growing energy demand has exacerbated the issue of energy security and caused us to necessitate the utilization of renewable resources. The best alternative for promoting generation in Bangladesh from renewable energy is solar photovoltaic technology. Grid-connected solar photovoltaic (PV) systems are becoming increasingly popular, considering solar potential and the recent cost of PV modules. This study proposes a grid-connected solar PV system with a net metering strategy using the Hybrid Optimization of Multiple Electric Renewables model. The HOMER model is used to evaluate raw data, to create a demand cycle using data from load surveys, and to find the best cost-effective configuration. A sensitivity analysis was also conducted to assess the impact of differences in radiation from the solar (4, 4.59, 4.65, 5 kWh/m2/day), PV capacity (0 kW, 100 kW, 200 kW, 300 kW, 350 kW, 400 kW, 420 kW), and grid prices ($0.107, $0.118, $0.14 per kWh) upon that optimum configuration. Outcomes reveal that combining 420 kW of PV with a 405-kW converter and connecting to the utility grid is the least expensive and ecologically healthy configuration of the system. The electricity generation cost is estimated to be 0.0725 dollars per kilowatt-hour, and the net present value is 1.83 million dollars with a payback period of 6.4 years based on the system's 20-year lifespan. Also, compared to the existing grid and diesel-generator system, the optimized system, with a renewable fraction of 31.10%, provides a reduction in carbon dioxide emissions of 191 tons and 1,028 tons, respectively, each year.
Citation: Md. Mehadi Hasan Shamim, Sidratul Montaha Silmee, Md. Mamun Sikder. Optimization and cost-benefit analysis of a grid-connected solar photovoltaic system[J]. AIMS Energy, 2022, 10(3): 434-457. doi: 10.3934/energy.2022022
[1] | Sami Mestiri . Credit scoring using machine learning and deep Learning-Based models. Data Science in Finance and Economics, 2024, 4(2): 236-248. doi: 10.3934/DSFE.2024009 |
[2] | Lindani Dube, Tanja Verster . Interpretability of the random forest model under class imbalance. Data Science in Finance and Economics, 2024, 4(3): 446-468. doi: 10.3934/DSFE.2024019 |
[3] | Dominic Joseph . Estimating credit default probabilities using stochastic optimisation. Data Science in Finance and Economics, 2021, 1(3): 253-271. doi: 10.3934/DSFE.2021014 |
[4] | Sina Gholami, Erfan Zarafshan, Reza Sheikh, Shib Sankar Sana . Using deep learning to enhance business intelligence in organizational management. Data Science in Finance and Economics, 2023, 3(4): 337-353. doi: 10.3934/DSFE.2023020 |
[5] | Hana Demma Wube, Sintayehu Zekarias Esubalew, Firesew Fayiso Weldesellasie, Taye Girma Debelee . Text-Based Chatbot in Financial Sector: A Systematic Literature Review. Data Science in Finance and Economics, 2022, 2(3): 232-259. doi: 10.3934/DSFE.2022011 |
[6] | Habib Zouaoui, Meryem-Nadjat Naas . Option pricing using deep learning approach based on LSTM-GRU neural networks: Case of London stock exchange. Data Science in Finance and Economics, 2023, 3(3): 267-284. doi: 10.3934/DSFE.2023016 |
[7] | Michael Jacobs, Jr . Benchmarking alternative interpretable machine learning models for corporate probability of default. Data Science in Finance and Economics, 2024, 4(1): 1-52. doi: 10.3934/DSFE.2024001 |
[8] | Angelica Mcwera, Jules Clement Mba . Predicting stock market direction in South African banking sector using ensemble machine learning techniques. Data Science in Finance and Economics, 2023, 3(4): 401-426. doi: 10.3934/DSFE.2023023 |
[9] | Yen H. Hoang, Duong T.T. Nguyen, Linh H.T. Tran, Nhung T.H. Nguyen, Ngoc B. Vu . Customers' adoption of financial services offered by banks and fintechs partnerships: evidence of a transitional economy. Data Science in Finance and Economics, 2021, 1(1): 77-95. doi: 10.3934/DSFE.2021005 |
[10] | Man-Fai Leung, Abdullah Jawaid, Sai-Wang Ip, Chun-Hei Kwok, Shing Yan . A portfolio recommendation system based on machine learning and big data analytics. Data Science in Finance and Economics, 2023, 3(2): 152-165. doi: 10.3934/DSFE.2023009 |
Growing energy demand has exacerbated the issue of energy security and caused us to necessitate the utilization of renewable resources. The best alternative for promoting generation in Bangladesh from renewable energy is solar photovoltaic technology. Grid-connected solar photovoltaic (PV) systems are becoming increasingly popular, considering solar potential and the recent cost of PV modules. This study proposes a grid-connected solar PV system with a net metering strategy using the Hybrid Optimization of Multiple Electric Renewables model. The HOMER model is used to evaluate raw data, to create a demand cycle using data from load surveys, and to find the best cost-effective configuration. A sensitivity analysis was also conducted to assess the impact of differences in radiation from the solar (4, 4.59, 4.65, 5 kWh/m2/day), PV capacity (0 kW, 100 kW, 200 kW, 300 kW, 350 kW, 400 kW, 420 kW), and grid prices ($0.107, $0.118, $0.14 per kWh) upon that optimum configuration. Outcomes reveal that combining 420 kW of PV with a 405-kW converter and connecting to the utility grid is the least expensive and ecologically healthy configuration of the system. The electricity generation cost is estimated to be 0.0725 dollars per kilowatt-hour, and the net present value is 1.83 million dollars with a payback period of 6.4 years based on the system's 20-year lifespan. Also, compared to the existing grid and diesel-generator system, the optimized system, with a renewable fraction of 31.10%, provides a reduction in carbon dioxide emissions of 191 tons and 1,028 tons, respectively, each year.
In the financial market, risk management has become one of the key elements of banking. Credit card customer churn prediction plays an important role in the risk management of the banking industry, assisting banks in identifying potential risks and implementing suitable measures to mitigate them (Butaru et al., 2016). With the development of technology, the influence of customer churn prediction has extended into various fields, including the financial industry. In particular, the rapid progression of machine learning theories and algorithms has offered a fresh perspective and solution for customer churn prediction (Alfaiz and Fati, 2022). Firstly, machine learning technology is capable of automatically extracting useful information from large amounts of historical data, identifying distribution patterns and relationships to predict future trends and behaviors (Pudjihartono et al., 2022). In credit card customer churn prediction, machine learning methods can efficiently identify key variables and underlying relationships related to customer attrition. By analyzing the mass multi-dimensional data, including consumer behavior, credit history, and demographic information of historical users, machine learning can construct an effective prediction model. Secondly, machine learning technology has strong generalization ability, automatically adapting to new and unseen data after training without frequent adjustments and optimizations (Janiesch et al., 2021). Bank customer churn prediction using machine learning methods presents both challenges and opportunities. Through comprehensive studies and the application of machine learning methods, prediction models provide more precise and reliable results, thereby enhancing the accuracy and reliability of risk management standards in the credit card business and supporting the sustainable development of banking.
Despite the significant advantages of machine learning in customer churn prediction, its practical implementation still encounters several challenges. As datasets grow larger and models become more complex, ensuring model interpretability becomes a crucial concern (Chen and Meng, 2020). The SHAP values method is applied to interpret the outcomes of machine learning models. It quantifies the importance of each variable in the dataset, considering both the independent influence of each variable on prediction outcomes and the interactions among different variables. The SHAP values method provides a comprehensive explanation of machine learning model predictions (Gebreyesus et al., 2023). Moreover, the causal inference method has gained significant popularity in recent years for data analysis and machine learning-related research. By analyzing causal effects among variables, the causal inference method excludes variables that are related to prediction results but lack actual causal connections, and filters out variables with a causal impact on prediction results. This method provides a more reliable basis for variable selection in predictions based on machine learning (Li et al., 2023). The purpose of this study is to discuss the construction of credit card customer churn prediction models by applying machine learning techniques. Additionally, the SHAP values method is utilized to complete variable selection based on the importance of influencing the outcomes and to interpret machine learning-based prediction results. The causal inference method aims to explain the prediction results by analyzing the causal relationships between variables. These causal relationships are compared with the results obtained by the SHAP values method.
The remainder of this study is organized as follows. Section 2 reviews research related to machine learning, interpretability analysis, and causal inference. Section 3 describes the concepts of four sampling techniques for data preprocessing, six machine learning methods for credit card customer churn prediction, and the causal inference method in detail. Section 4 summarizes the performance of the machine learning models, interprets the optimal model, and identifies the causal relationships by analyzing the experimental results. Finally, Section 5 presents the conclusion and describes limitations and further research.
With the development of computer hardware in recent years, technologies such as machine learning and deep learning are now widely used across various industries. Machine learning can be combined with stock price prediction to construct new quantitative investment strategies in financial investment (Yan et al., 2023). In the field of stock market risk, traditional machine learning models and neural networks can also predict Barrier Option prices with relative accuracy (Li and Yan, 2023). Even Bitcoin, a cryptocurrency that has been widely discussed in recent years, can have its price predicted by machine learning methods, with the Support Vector Regression model outperforming other models (Erfanian et al., 2022). In the area of credit card customer churn prediction, there is a growing number of reliable research papers aiming to predict churn rates by analyzing the personal and behavioral data of credit card customers, enabling banks to take effective measures to retain customers in advance. For instance, Zhang et al. used simple preprocessing of customer data and applied only the Random Forest (RF) model, a traditional machine learning method, to classify customers, achieving better classification results (Zhang et al., 2024). In the study by de Lima Lemos et al., several traditional machine learning models were used for prediction, including RF, Decision Tree, k-Nearest Neighbor, Elastic Net, Logistic Regression, and Support Vector Machines, with the RF model achieving the best classification accuracy at 82.8% (de Lima Lemos et al., 2022). Lalwani et al. introduced more advanced ideas into the data preprocessing problem, using the Gravitational Search Algorithm to select variables and improve the efficiency of the machine learning models (Lalwani et al., 2022). Siddiqui et al. used three variable selection methods to observe the performance of different machine learning models by comparing all variables, separating continuous and discrete variables, and selecting variables based on their importance. They ultimately found that using all variables yielded the most prevalent classification results (Siddiqui et al., 2024).
In addition to focusing on constructing different models to improve classification efficacy, researchers have gradually paid more attention to the interpretability of the models and the relationships between variables. In the field of financial investment, some articles focus on the impact of each variable on volatility risk (Yan and Li, 2024). In the study by AL-Najjar et al., they demonstrated the variable importance of the same traditional machine learning model, C5 Tree, under different methods of variable selection. The results showed that the three main variables affecting the model's effectiveness were the total number of transactions, the total credit card revolving balance, and the change in the number of transactions, providing useful references for credit card managers (AL-Najjar et al., 2022). Other studies discuss variables using more advanced and understandable variable visualization tools, such as the SHAP values method. In Peng et al.'s study, the best model for classification, which combined a genetic algorithm with XGBoost, was used to analyze interpretability. The results of the SHAP values method not only showed the order of importance of different independent variables, but also indicated whether changes in the value of each independent variable had a positive or negative effect on the predicted variable (Peng et al., 2023). In addition to the SHAP values method, other interpretable tools for visualization with similar effects, such as Local Interpretable Model-Agnostic Explanations (LIME), can also be used for analysis (Chang et al., 2024).
So far, machine learning algorithms have primarily captured correlations between variables, and inferring their importance from a correlation perspective, often ignoring the causal relationships among them. In other words, the model's determination of variable importance is frequently based on the strength of the correlation between the dependent and independent variables, which can introduce bias errors in actual predictions (Feuerriegel et al., 2024). To address this issue, this study innovatively uses the R-learner in meta learners to analyze variable importance. This approach combines the fields of machine learning and causal inference (Künzel et al. , 2019). The R-learner employs the control variable method to observe how independent variables affect the probability of credit card customer churn and calculates a more accurate conditional average treatment effect (CATE) by removing bias from the data. This helps quantify the effect of different variables on the probability of credit card customer churn.
The aim of this study is to predict whether bank credit card customers will churn using machine learning methods and to utilize the prediction results to construct models that explain the influencing factors of customer churn. This will provide an effective basis for rationalizing customer management before they leave the bank and for effective risk management in the banking industry. Initially, the relevant dataset of bank customer churn needs to be downloaded and preprocessed. Subsequently, the approach is divided into two parts. The first part involves balancing the class distribution of the training set using four sampling techniques: Random Oversampling, Synthetic Minority Oversampling Technique (SMOTE), Borderline-Synthetic Minority Oversampling Technique (Borderline-SMOTE), and Adaptive Synthetic Sampling (ADASYN). Then, six machine learning methods, including RF, Gradient Boosting Decision Tree (GBDT), Extra Tree, AdaBoost, XGBoost, and CatBoost, are used to predict whether the bank will lose credit card customers. The optimal model is selected by comparing the performance of each model, and the important variables influencing customer churn and their effects are analyzed using the SHAP values method. The second part involves using the causal inference method to investigate the causal impact of variables on customer churn based on the optimal prediction model mentioned above. In this part, the R-learner is used for continuous variables in the meta learners of the causal inference method. The framework of this study is outlined in Figure 1.
Categorical Variables | Class | Number of classes | Conversion to numbers |
Attrition_Flag | Existing Customer | 8500 | 0 |
Attrited Customer | 1627 | 1 | |
Gender | M | 4769 | 0 |
F | 5358 | 1 | |
Education_Level | Uneducated | 1487 | 6 |
High School | 2013 | 15 | |
College | 1013 | 18 | |
Graduate | 3128 | 22 | |
Post-Graduate | 516 | 24 | |
Doctorate | 451 | 28 | |
Unknown | 1519 | 22 | |
Marital_Status | Single | 3943 | 0 |
Married | 4687 | 1 | |
Divorced | 748 | 2 | |
Unknown | 749 | 1 | |
Income_Category | Less than $40K | 3561 | 2 |
$40K - $60K | 1790 | 5 | |
$60K - $80K | 1402 | 7 | |
$80K - $120K | 1535 | 10 | |
$120K + | 727 | 14 | |
Unknown | 1112 | 2 | |
Card_Category | Blue | 9436 | 0 |
Silver | 555 | 1 | |
Gold | 116 | 2 | |
Platinum | 20 | 3 |
Historical data of bank customers downloaded from Kaggle is used as the dataset in this study (Kaggle, 2022). The dataset contains twenty three variables, including customers' fundamental personal details, the credit card level they hold, and their credit card usage, totaling 10,127 samples. The variable labeled "CLIENTNUM" represents the unique identification number assigned by the bank to each customer. Since this variable has no effect on the model construction in this study, it is removed from the analysis. Additionally, the last two variables in the dataset represent the outcomes provided by Kaggle for predicting bank customer churn based on Naive Bayes classifiers. These variables have been excluded during the data processing as this study does not compare this probabilistic classification approach with machine learning methods. The remaining twenty variables, which consist of fourteen quantitative variables and six categorical variables, are all utilized in the experiment in this study to discuss the importance and impact. These variables represent customers' personal information and credit card usage. To build the model, it is necessary to convert the classes of categorical variables into numerical values. These six categorical variables are: "Attrition_Flag", "Gender", "Education_Level", "Marital_Status", "Income_Category", and "Card_Category". For ordered categorical variables, the classes need to be sorted from low to high before conversion. One type of class encountered in the variables "Education_Level", "Marital_Status", and "Income_Category" is "Unknown". There are 3,046 samples that meet the criterion of having at least one of these three variables labeled as "Unknown", constituting 30% of the total sample size. The treatment for "Unknown" in any of these three variables is to change it to the majority class of the respective variable. The details of these six categorical variables are outlined in Table 2.
Sampling techniques | Machine learning | accuracy | precision | recall | F1 | AUC |
Random Oversampling | RF | 0.9650 | 0.9648 | 0.9650 | 0.9449 | 0.9901 |
GBDT | 0.9743 | 0.9746 | 0.9743 | 0.9745 | 0.9954 | |
Extra Tree | 0.9620 | 0.9613 | 0.9620 | 0.9613 | 0.9890 | |
AdaBoost | 0.9521 | 0.9568 | 0.9521 | 0.9535 | 0.9893 | |
XGBoost | 0.9748 | 0.9751 | 0.9748 | 0.9749 | 0.9939 | |
CatBoost | 0.9556 | 0.9606 | 0.9556 | 0.9570 | 0.9900 | |
SMOTE | RF | 0.9615 | 0.9627 | 0.9615 | 0.9620 | 0.9897 |
GBDT | 0.9699 | 0.9701 | 0.9699 | 0.9700 | 0.9945 | |
Extra Tree | 0.9600 | 0.9596 | 0.9600 | 0.9597 | 0.9881 | |
AdaBoost | 0.9556 | 0.9565 | 0.9556 | 0.9560 | 0.9856 | |
XGBoost | 0.9704 | 0.9704 | 0.9704 | 0.9704 | 0.9919 | |
CatBoost | 0.9590 | 0.9607 | 0.9590 | 0.9596 | 0.9884 | |
Borderline-SMOTE | RF | 0.9585 | 0.9583 | 0.9585 | 0.9588 | 0.9891 |
GBDT | 0.9709 | 0.9711 | 0.9709 | 0.9710 | 0.9936 | |
Extra Tree | 0.9556 | 0.9549 | 0.9556 | 0.9551 | 0.9886 | |
AdaBoost | 0.9516 | 0.9538 | 0.9516 | 0.9524 | 0.9855 | |
XGBoost | 0.9719 | 0.9719 | 0.9719 | 0.9719 | 0.9934 | |
CatBoost | 0.9497 | 0.9532 | 0.9497 | 0.9508 | 0.9857 | |
ADASYN | RF | 0.9605 | 0.9615 | 0.9605 | 0.9609 | 0.9901 |
GBDT | 0.9709 | 0.9710 | 0.9709 | 0.9709 | 0.9939 | |
Extra Tree | 0.9566 | 0.9561 | 0.9566 | 0.9563 | 0.9885 | |
AdaBoost | 0.9576 | 0.9589 | 0.9876 | 0.9581 | 0.9859 | |
XGBoost | 0.9724 | 0.9724 | 0.9724 | 0.9724 | 0.9931 | |
CatBoost | 0.9546 | 0.9575 | 0.9546 | 0.9556 | 0.9878 |
The variable "Marital_Status" is an unordered categorical variable with more than two distinct values, requiring conversion into dummy variables.
The variable "Attrition_Flag" indicates whether a customer has churned or not, serving as the target variable of this study. The dataset contains 1,627 samples in the "Attrited Customer" class. The ratio of these samples to those in the "Existing Customer" class is greater than 5:1, indicating a significant class imbalance, as shown in Figure 2.
If the training set is used by models without preprocessing, it will decrease the ability of the machine learning models to recognize churned customers, leading to overfitting in the results and reducing prediction accuracy (Alam et al., 2020). To avoid these problems and improve the performance of the model, improved oversampling has been adopted to balance the class distribution, which involves increasing the number of samples from the minority class. In this study, the following four sampling algorithms are used.
Random oversampling is a straightforward method that randomly selects and replicates samples from the minority class until the classes are balanced.
SMOTE is an improved algorithm for handling imbalanced data based on random oversampling. The first step involves calculating the distance between each sample from the minority class and identifying the k nearest neighbors of the same class. Then, a specified number of samples are randomly selected from these nearest neighbors. A new synthetic sample is generated between the minority class sample and the selected nearest neighbor, lying on the line connecting these two samples, thereby increasing the number of minority class samples (Akın, 2023).
Borderline-SMOTE is an improved version of the SMOTE algorithm. The key difference is an additional preliminary step: a minority class sample is selected to apply the SMOTE algorithm if most of its k nearest neighbors are from the majority class (Gu et al., 2023).
ADASYN is another common oversampling method similar to SMOTE. The key to this algorithm is calculating the ratio of samples between different classes for each minority class sample and using this ratio distribution to determine the number of synthetic samples generated for each minority class sample (Dube and Verster, 2023).
This study employs six machine learning methods to predict credit card customer churn. In addition to RF and AdaBoost, which were used in the previous research to predict barrier option prices (Li and Yan, 2023), this study utilizes the extra tree algorithm and three other prevalent boosting algorithms: GBDT, XGBoost, and CatBoost. To enhance the performance of these machine learning models, the grid search method is used to tune the hyperparameters of each model and find the optimal set. A complete customer churn prediction model can be constructed by combining one of the sampling techniques mentioned above with one of these six machine learning methods. The prediction will result in four possible cases:
● True positive (TP): The predicted result is positive, and the actual value is also positive.
● True negative (TN): The predicted result is negative, and the actual value is also negative.
● False positive (FP): The predicted result is positive, whereas the actual value is negative.
● False negative (FN): The predicted result is negative, whereas the actual value is positive.
TP and TN indicate the predicted results are consistent with the actual values, and FP and FN indicate the opposite. To evaluate the performance of models in binary classification problems, the following indicators are typically utilized: accuracy, precision, recall, F1 score, and AUC. The formulas for the first four indicators are defined as follows:
accuracy=TP+TNTP+TN+FP+FN, | (1) |
precision=TPTP+FP, | (2) |
recall=TPTP+FN, | (3) |
F1=2×precision×recallprecision+recall. | (4) |
The value of AUC is equal to the area under the receiver operating characteristic curve (ROC), which is the curve consisting of recall and false positive rate (FPR). The formula for FPR is as follows:
FPR=FPFP+TN. | (5) |
The closer the values of these five indicators are to 1, the better the model performs.
Causal inference is a method that effectively explores and analyzes whether a variable is the main factor affecting the target variable ((Jiang, 2022). In traditional causal inference, the randomized controlled trial is generally considered as a reliable methodology to determine the influence of variables. However, in practice, due to the cost and ethical concerns of experiment, causal inference is often based on collected observational datasets. In Facure's book, which explores the combination of machine learning and causal inference, the authors propose a modeling approach called R-learner for continuous variables in a dataset to assess the causal significance of an independent variable on a dependent variable (Facure M, 2023). In current industrial applications, various code tools are becoming available for researchers to explore the feasibility of machine learning in causal inference problems (Molak A, 2023). To determine whether a variable has a causal relationship with customer churn and to quantify its impact, this study adopts the causal inference method to understand the causal effects among them.
In causality, the cause variable is the treatment denoted as T, while the dependent variable is the outcome denoted as Y. Additionally, the other independent variables are referred to as characteristic variables, denoted as X. For an individual sample i, the prediction model predicts the corresponding value Yi as Yi(T=1|X) when the treatment is applied (i.e., Ti=1), and as Yi(T=0|X) without the treatment (i.e., Ti=0). Individual treatment effect (ITE) calculates the difference in outcomes for an individual between the scenario with treatment and the scenario without treatment. The formula of ITE is shown as follows:
ITEi=Yi(T=1|X)−Yi(T=0|X). | (6) |
Considering the overall casual effect between the treatment and the outcome, it needs to measure CATE, which is given by the following equation:
CATE=E[Y|T=1,X]−E[Y|T=0,X]. | (7) |
The value of CATE indicates whether there exists a causal relationship between the treatment variable T on the outcome Y, and the quantification of its effect.
For binary classification problems, this study uses R-learner to estimate the causality and effect of different continuous treatment variables on the outcome Y respectively, based on the optimal customer churn prediction model. The CATE value in R-learner is obtained by calculating the probability of the same outcome occurring with and without the treatment, which is given by the following equation:
CATE∗=P(Y=1|T=1,X)−P(Y=1|T=0,X). | (8) |
For each T, hypothesis testing is applied to CATE∗ to determine the casual impact of that treatment variable on Y.
The experimental environment for this study is Python, utilizing toolboxes such as imblearn, sklearn, xgboost, and catboost. Following the data preprocessing steps outlined in Section 3.1, the dataset is divided into a training set and a test set in an 8:2 ratio. The training set data is balanced using four sampling techniques, and the independent variables of all the data are normalized. Subsequently, six machine learning methods are trained on the training set data, resulting in twenty-four complete customer churn prediction models. These trained models are then applied to predict the "Attrition_Flag" values in the test set, and the performance of these models is evaluated using the indicators mentioned in Section 3.3.
The results of the accuracy for the training set, evaluated using six machine learning methods, range from 96.06% to 99.92% under random oversampling, 97.33% to 99.88% under SMOTE, 96.95% to 99.94% under Borderline-SMOTE, and 96.88% to 99.91% under ADASYN. The test set results are as follows.
As shown in Table 2, the values of all indicators for the complete customer churn prediction models are above 0.94 after hyperparameter tuning and training, indicating good performance for each model. The difference in the evaluation results of the prediction models between the training set and the test set is not significant, and there is no instance of complete overfitting. For the random oversampling-XGBoost, SMOTE-XGBoost, Borderline-SMOTE-XGBoost, and ADASYN-XGBoost models, all indicator values exceed 0.97, suggesting that these four models perform better than the others. These four models can be considered the optimal prediction models. Furthermore, the XGBoost model performs better and more stably compared to other machine learning models across different sampling techniques by optimizing both the loss function and regularization term (Guo and Fan, 2024).
The SHAP values method is used to understand the contribution of each variable in the optimal prediction models and its relationship with the target variable better (Wu et al., 2024). When the SHAP value of a variable is positive, it indicates a positive relationship with customer churn, whereas a negative SHAP value indicates a negative relationship. By using the SHAP toolbox in Python, an interpretable explanation of the optimal prediction models is obtained, resulting in corresponding SHAP summary plots. The variables in the SHAP summary plots are sorted from highest to lowest according to their importance in each model, with the top 10 variables selected. The points corresponding to each variable represent their SHAP values.
Based on the four figures above, in any of the optimal prediction models, the variables "Total_Trans_Ct", "Total_Trans_Amt", and "Total_Revolving_Bal" consistently rank in the top three in terms of importance and in the same order. Moreover, the positive or negative relationship between each of these variables and "Attrition_Flag" remains unchanged. These three variables represent the customer's total number of transactions with the bank in the past year, the total amount of transactions, and the total revolving balance of the credit card, respectively. As the total number of transactions increases, the SHAP value decreases, indicating a negative relationship with customer churn. This suggests that customers with a higher number of transactions are less likely to churn. For the second most important variable, the higher the total amount of transactions, the higher the SHAP value. When the total transaction amount is small, the SHAP value can be either positive or negative. There is a significant positive relationship with customer churn only when the total amount exceeds a certain threshold, indicating that customers are more likely to churn when their total transaction amount is above this threshold. For "Total_Revolving_Bal", the greater the total revolving balance of the credit card, the lower the SHAP value, indicating that customers with a larger total credit card revolving balance are less prone to churn compared to those with smaller total credit card revolving balance.
Furthermore, for each optimal prediction model, the variables "Total_Relationship_Count", "Total_Amt_Chng_Q4_Q1", and "Total_Ct_Chng_Q4_Q1" also rank among the top 10 of the SHAP values, consistently appearing in the middle positions. These three variables represent the total number of bank products held by customers, the change in the amount from the fourth quarter to the first quarter, and the change in the number of transactions, respectively. For these three variables, as their values decrease, their SHAP values increase, leading to the predicted result tending closer to 1, which indicates a negative relationship with customer churn. This means that customers who hold a larger number of products or have significant changes in amount and transaction frequency across different quarters are less likely to churn.
For the remaining variables in the SHAP summary plots, although some are important in most optimal prediction models, they pertain to customers' personal information. Therefore, it is not convenient to discuss and classify the attributes of variables such as "Customer_Age", "Marital_Status_0", "Marital_Status_1", and "Education_Level". Additionally, "Months_Inactive_12_mon" for the total number of months inactive in the past year, "Contacts_Count_12_mon" for the total number of contacts in the past year, and "Gender" are only among the top 10 in the random oversampling-XGBoost model. Among these, "Gender" is personal information, while the other two variables are positively related to customer churn. Furthermore, the variable "Avg_Utilization_Ratio" only appears in the ADASYN-XGBoost model, ranking 10th in importance. This variable represents the average utilization rate of bank credit cards, with a lower rate indicating a higher likelihood of customer churn.
The statsmodels toolbox in python is used to analyze the causal effects between variables without personal information of customers and customer churn in the SHAP summary plots. The XGBoost model is selected for estimating customer churn in R-learner due to its superior predictive performance. The experimental results of the causal inference method are listed as follows.
According to Table 3, the p-values for "Total_Revolving_Bal" and "Avg_Utilization_Ratio" are greater than 0.05, indicating no significant difference in customer churn caused by changes in these two treatment variables. Combined with the SHAP values analysis, this suggests that there is only a correlation between each of these two variables and customer churn, not a causal relationship. For the remaining treatment variables, the p-values are all less than 0.05, indicating that changes in each of these variables have causal relationships with customer churn.
Variables | coef | std err | t | P>|t| | 95 % interval |
Total_Trans_Ct | −0.0003 | 2.48e-05 | −13.109 | 0.000 | [−0.000, −0.000] |
Total_Trans_Amt | 7.163e-06 | 4.14e-07 | 17.308 | 0.000 | [6.35e-06, 7.97e-06] |
Total_Revolving_Bal | −4.396e-06 | 5.8e-06 | −0.758 | 0.449 | [-1.58e-05, 6.98e-06] |
Total_Relationship_Count | −0.0008 | 0.000 | −6.324 | 0.000 | [-0.001, -0.001] |
Total_Amt_Chng_Q4_Q1 | −0.0079 | 0.001 | −7.078 | 0.000 | [−0.010, −0.006] |
Total_Ct_Chng_Q4_Q1 | −0.0051 | 0.001 | −4.819 | 0.000 | [−0.007, −0.003] |
Months_Inactive_12_mon | 0.0007 | 0.000 | 3.968 | 0.000 | [0.000, 0.001] |
Contacts_Count_12_mon | 0.0006 | 0.000 | 3.872 | 0.000 | [0.000, 0.001] |
Avg_Utilization_Ratio | 0.0039 | 0.010 | 0.384 | 0.701 | [−0.016, 0.024] |
Considering the coefficient values in Table 3, the treatment variables "Total_Trans_Ct", "Total_Relationship_Count", "Total_Amt_Chng_Q4_Q1", and "Total_Ct_Chng_Q4_Q1" have negative coefficients, indicating negative causal relationships with customer churn. Among these variables, "Total_Amt_Chng_Q4_Q1" has the largest causal effect, while "Total_Trans_Ct" has the smallest. Conversely, the coefficients of "Total_Trans_Amt", "Months_Inactive_12_mon", and "Contacts_Count_12_mon" are positive, indicating positive causal relationships with customer churn. "Months_Inactive_12_mon" displays the largest causal effect, while "Total_Trans_Amt" displays the smallest. Compared with the results of the SHAP values method, the similarity is that these seven variables have the same direction of impact on customer churn, while the difference lies in the order of the quantified effect values.
In the current research, to enhance the accuracy of prediction results, a combination of sampling techniques and machine learning models was employed to forecast customer churn in banks. A comparative performance analysis indicates that the XGBoost model consistently outperforms other machine learning models, achieving an accuracy of at least 97%, regardless of the sampling techniques used. Furthermore, the SHAP values method was utilized to interpret the optimized prediction models, while R-learner was used to investigate the causal effects of these variables on customer churn. Based on these two methods, the main important variables affecting customer churn, which include the total number and amount of transactions with the bank in the past year, the total number of bank products held by the customer, and the changes in the amount and number of transactions from the fourth quarter to the first quarter, were identified. Additionally, the analysis found that the total credit card revolving balance does not have a significant causal relationship with customer churn, but there is a strong correlation. The research findings provide valuable recommendations for bank managers to improve customer management strategies.
Due to the limited number of minority class samples in the dataset, the experiment requires sampling techniques to generate synthetic samples. This approach enables the prediction model to more accurately identify the categories of samples. Furthermore, excluding the variables belonging to customers' personal information, the other variables consist of cross-sectional data. They are typically utilized in the analysis that emphasize the differences among individual samples, rather than focusing on changes within a sample over time. Therefore, if additional samples with more extensive variables, such as time series data, are available the feasibility of the model's predictions, so that a better comparison of the interaction between the SHAP values method and the casual inference method in the further research.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this study.
The authors have received no financial assistance from any source in the preparation of this study.
All authors declare no conflicts of interest in this study.
[1] | Power system master plan-2016 (2021) Available from: https://powerdivision.gov.bd/site/page/f68eb32d-cc0b-483e-b047-13eb81da6820/Power-System-Master-Plan-2016. |
[2] |
Nallapaneni MK, Chopra S, Chand A, et al. (2020) Hybrid renewable energy microgrid for a residential community: A techno-economic and environmental perspective in the context of the SDG7. Sustainability 12: 3944. https://doi.org/10.3390/su12103944 doi: 10.3390/su12103944
![]() |
[3] | Rofiqul Islam, Rabiul Islam, Beg RA (2008) Renewable energy resources and technologies practice in Bangladesh. Renewable Sustainable Energy Rev 12: 299-343. Available from: https://ideas.repec.org/a/eee/rensus/v12y2008i2p299-343.html. |
[4] |
Agyekum E, Mehmood U, Kamel S, et al. (2022) Technical performance prediction and employment potential of solar PV systems in cold countries. Sustainability 14: 3546. https://doi.org/10.3390/su14063546 doi: 10.3390/su14063546
![]() |
[5] |
Falih H, Hamed AJ, Khalifa AHN (2022) Techno-economic assessment of a hybrid connected PV solar system. Int J Air-Cond Ref 3: 30. https://doi.org/10.1007/s44189-022-00003-7 doi: 10.1007/s44189-022-00003-7
![]() |
[6] |
Rafique MM, Rehman S (2017) National energy scenario of Pakistan: current status, future alternatives, and institutional infrastructure: An overview. Renewable Sustainable Energy Rev 69: 156-167. https://doi.org/10.1016/j.rser.2016.11.057 doi: 10.1016/j.rser.2016.11.057
![]() |
[7] |
Kumar NM, Elavarasan RM, Shafiullah GM, et al. (2020) Hybrid renewable energy microgrid for a residential community: A techno-economic and environmental perspective in the context of the SDG7. Sustainability 12: 3944. https://doi.org/10.3390/su12103944 doi: 10.3390/su12103944
![]() |
[8] |
Poudyal R, Loskot P, Parajuli R (2021) Techno-economic feasibility analysis of a 3-kW PV system installation in Nepal. Renewables 5: 8. https://doi.org/10.1186/s40807-021-00068-9 doi: 10.1186/s40807-021-00068-9
![]() |
[9] |
Adetokun BB, Ojo JO Muriithi CM (2021) Application of large‑scale grid‑connected solar photovoltaic system for voltage stability improvement of weak national grids. Sci Rep 11: 24526. https://doi.org/10.1038/s41598-021-04300-w doi: 10.1038/s41598-021-04300-w
![]() |
[10] | Uddin MM, Faysal A, Raihan, et al. (2018) Present energy scenario, necessity and future prospect of renewable energy in Bangladesh. American J Eng Res 7: 45-51. Available from: https://www.researchgate.net/publication/327103112_Present_Energy_Scenario_Necessity_and_Future_Prospect_of_Renewable_Energy_in_Bangladesh. |
[11] | Ministry of Power, Energy and Mineral Resources (2021) Available from: https://mpemr.gov.bd/. |
[12] |
Agyekum EB (2021) Techno-economic comparative analysis of solar photovoltaic power systems with and without storage systems in three different climatic regions, Ghana. Sustainable Energy Technol Assess 43: 100906. https://doi.org/10.1016/j.seta.2020.100906 doi: 10.1016/j.seta.2020.100906
![]() |
[13] | PV magazine-Photovoltaics Markets and Technology (2021) Available from: https://www.pv-magazine.com/2020/04/06/world-now-has-583-5-gw-of-operational-pv/. |
[14] |
Uddin MN, Rahman MA, Mofijur M, et al. (2019) Renewable energy in Bangladesh: Status and prospects. Energy Procedia 160: 655-661. https://doi.org/10.1016/j.egypro.2019.02.218 doi: 10.1016/j.egypro.2019.02.218
![]() |
[15] | Sustainable and renewable energy development authority (2021) Available from: http://www.sreda.gov.bd/. |
[16] | Country profile data by world bank (2021) Available from: https://data.worldbank.org/country/Bangladesh?view=chart. |
[17] | Central Intelligence Agency (CIA) (2022) South Asia: Bangladesh-The World Factbook. Available from: https://www.cia.gov/the-world-factbook/countries/bangladesh/. |
[18] | Md. Belayet Hossain (2019-2020 Annual Report) Bangladesh Power Development Board. Available from: https://bdcom.bpdb.gov.bd/bpdb_new/resourcefile/annualreports/annualreport_1605772936_AnnualReport2019-20.pdf. |
[19] |
An J, Mikhaylov A, Jung SU (2020) The strategy of South Korea in the global oil market. Energies 24: 2491. https://doi.org/10.3390/en13102491 doi: 10.3390/en13102491
![]() |
[20] |
Mutalimov, Verdi, Kovaleva, et al. (2021) Assessing regional growth of small business in Russia. Entre Business Econos Re 9: 119-133. https://doi.org/10.15678/EBER.2021.090308 doi: 10.15678/EBER.2021.090308
![]() |
[21] | Noor Abir MA, Rahman MT (2018) Energy scenario of Bangladesh & future challenges. IJSER 9: 1001-1005. Available from: https://www.ijser.org/researchpaper/Energy-Scenario-of-Bangladesh-Future-Challenges.pdf. |
[22] |
GM Shafiullah, Tjedza Masola, Remember Samu, et al. (2021) Prospects of hybrid renewable energy-based power system: A case study, post analysis of Chipendeke Micro-Hydro, Zimbabwe. IEEE Access 9: 73433-73452. https://doi.org/10.1109/ACCESS.2021.3078713 doi: 10.1109/ACCESS.2021.3078713
![]() |
[23] | Hossain CA, Chowdhury N, Michela Lo, et al. (2019) System and cost analysis of stand-alone solar home systems applied to a developing country. Sustainability 11: 1403. Available from: https://ideas.repec.org/a/gam/jsusta/v11y2019i5p1403-d211588.html. |
[24] | Global Solar Atlas (2021) Available from: https://globalsolaratlas.info/download/Bangladesh. |
[25] | Photovoltaic power potential of Bangladesh (2021) Available from: https://datacatalog.worldbank.org/dataset/bangladesh-solar-irradiation-and-pv-power-potential-map. |
[26] |
Hossain S, Rahman M (2021) Solar energy prospects in Bangladesh: Target and current status. Energy Power Eng. 13: 322-332. https://doi.org/10.4236/epe.2021.138022 doi: 10.4236/epe.2021.138022
![]() |
[27] | PV magazine-Photovoltaics markets and technology (2021) Available from: https://www.pv-magazine.com/2021/04/30/Bangladesh's-largest-pv-plant-comes-online/. |
[28] | HOMER Pro-Microgrid Software for Designing Optimized Hybrid Microgrids (2021) Available from: https://www.homerenergy.com/products/pro/index.html. |
[29] |
Das, Nipu, Chakrabartty, et al. (2020) Present energy scenario and future energy mix of Bangladesh. Energy St Rev 32: 1-11. https://doi.org/10.1016/j.esr.2020.100576 doi: 10.1016/j.esr.2020.100576
![]() |
[30] |
Shuvho B, Chowdhury M, Ahmed S, et al. (2019) Prediction of solar irradiation and performance evaluation of grid-connected solar 80KWp PV plant in Bangladesh. Energy Reports 5: 714-722. https://doi.org/10.1016/j.egyr.2019.06.011 doi: 10.1016/j.egyr.2019.06.011
![]() |
[31] |
Kumar Das B, Hasan M, Fazlur Rashid (2021) Optimal sizing of a grid-independent PV/diesel/pump-hydro hybrid system: A case study in Bangladesh. Sustainable Energy Technol Asses 44: 100997. http://dx.doi.org/10.1016/j.seta.2021.100997 doi: 10.1016/j.seta.2021.100997
![]() |
[32] |
Islam A, Shima FA, Khanam A (2013) Analysis of grid-connected solar PV systems in the southeastern part of Bangladesh. Appl Sol Energy 49: 116-123. https://doi.org/10.3103/S0003701X13020035 doi: 10.3103/S0003701X13020035
![]() |
[33] | Mohammad Shuhrawardy, Kazi Tanvir Ahmmed (2014) The feasibility study of a grid-connected PV system to meet the power demand in Bangladesh-A case study. AJEE 2: 59-64. Available from: https://www.sciencepublishinggroup.com/journal/paperinfo.aspx?journalid=168&doi=10.11648/j.ajee.20140202.12. |
[34] | Mahmud, Nasif (2013) Modeling and economic analysis of grid-connected solar photovoltaic systems in Bangladesh. IJAET 6: 1452-1463. Available from: https://www.researchgate.net/profile/nasif-mahmud/publication/312218967_modeling_and_economic_analysis_of_grid_connected_solar_photo_voltaic_system_in_bangladesh/links/594b18cc458515225a831d7c/modeling-and-economic-analysis-of-grid-connected-solar-photo-voltaic-system-in-bangladesh.pdf. |
[35] | Nurunnabi Md, Roy N (2015) Grid connected hybrid power system design using HOMER in 3rd ICAEE. https://doi.org/10.1109/ICAEE.2015.7506786 |
[36] | Iqra P, Abdul Razaque S, Sukru D (2017) Designing off-grid and on-grid renewable energy systems using HOMER Pro software. J Int Envirn Appl Sci 12: 270-276. Available from: https://www.acarindex.com/pdfs/433383. |
[37] |
Soumya Mandal, Das Barun K, Najmul Hoque (2018) Optimum sizing of a stand-alone hybrid energy system for rural electrification in Bangladesh. J Cleaner Pro 200: 12-27. https://doi.org/10.1016/j.jclepro.2018.07.257 doi: 10.1016/j.jclepro.2018.07.257
![]() |
[38] |
Nandi SK, Ghosh HR (2010) Prospect of wind-PV-battery hybrid power system as an alternative to grid extension in Bangladesh. Energy 35: 3040-3047. https://doi.org/10.1016/j.energy.2010.03.044 doi: 10.1016/j.energy.2010.03.044
![]() |
[39] |
Lipu MSH, Uddin MS, Miah MAR (2013) A feasibility study of solar-wind-diesel hybrid system in rural and remote areas of Bangladesh. IJRER 3: 4. https://doi.org/10.20508/ijrer.v3i4.898.g6220 doi: 10.20508/ijrer.v3i4.898.g6220
![]() |
[40] | Pradhan SR, Amit Kumar, Ashutosh Sahoo, et al. (2017) Optimization of grid-connected hybrid energy (solar and biomass) system using HOMER Pro software. Int J Inno Sci Res Tech 2: 50-57. Available from: https://ijisrt.com/wp-content/uploads/2017/04/Optimization-of-Grid-connected-Hybrid-Energy-solar.pdf. |
[41] |
Ali F, Muahnmmad A, Jiang, et al. (2021) A techno-economic assessment of hybrid energy systems in rural Pakistan. Energy 215: 119103. https://doi.org/10.1016/j.energy.2020.119103 doi: 10.1016/j.energy.2020.119103
![]() |
[42] | El-Tous Y (2012) A study of a grid-connected PV household system in Amman and the effect of the incentive tariff on the economic feasibility. Int J Appl Sci Tech 2: 100-105. Available from: http://www.ijastnet.com/update/journals/Vol_2_No_2_February_2012/14.pdf. |
[43] |
Li D, Cheung KL, Lam T, et al. (2012) A study of grid-connected photovoltaic (PV) system in Hong Kong. Appl Energy 90: 122-127. https://doi.org/10.1016/j.apenergy.2011.01.054 doi: 10.1016/j.apenergy.2011.01.054
![]() |
[44] |
Dawoud S, Lin XN, Sun JW, et al. (2015) Feasibility study of isolated PV-Wind hybrid system in Egypt. Ad Materials Res 1092-1093: 145-151. https://doi.org/10.4028/www.scientific.net/AMR.1092-1093.145 doi: 10.4028/www.scientific.net/AMR.1092-1093.145
![]() |
[45] |
Chouki Ghenai, Maamar Bettayeb (2019) Grid-tied solar PV/Fuel cell hybrid power system for university building. Energy Procedia 159: 96-103. https://doi.org/10.1016/j.egypro.2018.12.025 doi: 10.1016/j.egypro.2018.12.025
![]() |
[46] |
Shafiullah GM (2016) Hybrid renewable energy integration (HREI) system for subtropical climate in central Queensland, Australia. Renewable Energy 96: 1034-1053. https://doi.org/10.1016/j.renene.2016.04.101 doi: 10.1016/j.renene.2016.04.101
![]() |
[47] |
Gm Shafiullah, Sanjeev Kumar P, Kumar NM, et al. (2020) A comprehensive review on renewable energy development, challenges, and policies of leading Indian states with an international perspective. IEEE Access 8: 74432-74457. http://dx.doi.org/10.1109/ACCESS.2020.2988011 doi: 10.1109/ACCESS.2020.2988011
![]() |
[48] | Ali I, Shafiullah GM, Urmee T, et al. (2018) A preliminary feasibility of roof-mounted solar PV systems in the Maldives. Renewable Sustainable Energy Rev 83: 18-32. Available from: https://EconPapers.repec.org/RePEc:eee:rensus:v:83:y:2018:i:c:p:18-32. |
[49] |
Shaahid SM, El-Amin I (2009) Techno-economic evaluation of off-grid hybrid photovoltaic-diesel-battery power systems for rural electrification in Saudi Arabia-A way forward for sustainable development. Renewable Sustainable Energy Rev 13: 625-633. https://doi.org/10.1016/j.rser.2007.11.017 doi: 10.1016/j.rser.2007.11.017
![]() |
[50] |
Al-Ghussain L, Samu R, Taylan O, et al. (2020) Techno-economic comparative analysis of renewable energy systems: Case study in Zimbabwe. Inventions 5: 27. https://doi.org/10.3390/inventions5030027 doi: 10.3390/inventions5030027
![]() |
[51] |
Tareq Salameh, Chaouki Ghenai, Adel Merabet, et al. (2020) Techno-economical optimization of an integrated stand-alone hybrid solar PV tracking and diesel generator power system in Khorfakkan, United Arab Emirates. Energy 190: 116475. https://doi.org/10.1016/j.energy.2019.116475 doi: 10.1016/j.energy.2019.116475
![]() |
[52] |
Sina Makhdoomi, Alireza Askarzadeh (2020) Optimizing operation of a photovoltaic/diesel generator hybrid energy system with pumped hydro storage by a modified crow search algorithm. J Energy Storage 27: 101040. https://doi.org/10.1016/j.est.2019.101040 doi: 10.1016/j.est.2019.101040
![]() |
[53] | Javed MS, Ma T, Jurasz J, et al. (2020) Solar and wind power generation systems with pumped hydro storage: Review and future perspectives. Renewable Energy 148: 176-192. Available from: https://ideas.repec.org/a/eee/renene/v148y2020icp176-192.html. |
[54] |
Swaminathan Ganesan, Umashankar Subramaniam, Ghodke AA, et al. (2020) Investigation on sizing of voltage source for a battery energy storage system in microgrid with renewable energy sources. IEEE Access 8: 188861-188874. https://doi.org/10.1109/ACCESS.2020.3030729 doi: 10.1109/ACCESS.2020.3030729
![]() |
[55] |
Yashwant Sawle, Siddharth Jain, Sanjana Babu, et al. (2021) Prefeasibility economic and sensitivity assessment of hybrid renewable energy system. IEEE Access 9: 28260-28271. https://doi.org/10.1109/ACCESS.2021.3058517 doi: 10.1109/ACCESS.2021.3058517
![]() |
[56] |
Oladigbolu JO, Al-Turki YA, Olatomiwa L (2021) Comparative study and sensitivity analysis of a standalone hybrid energy system for electrification of rural healthcare facility in Nigeria. Alexandria Eng. J 60: 5547-5565. https://doi.org/10.1016/j.aej.2021.04.042 doi: 10.1016/j.aej.2021.04.042
![]() |
[57] |
Niyonteze JDD, Zou FM, Osarumwense Asemota GN, et al. (2020) Key technology development needs and applicability analysis of renewable energy hybrid technologies in off-grid areas for the rwanda power sector. Heliyon 6: e03300. https://doi.org/10.1016/j.heliyon.2020.e03300 doi: 10.1016/j.heliyon.2020.e03300
![]() |
[58] | Faiz FUH, Shakoor R, Raheem A, et al. (2021) Modeling and analysis of 3 MW solar photovoltaic plant using PVSyst at Islamia University of Bahawalpur, Pakistan. Int J Photoenergy 2021: 1-14. http://dx.doi.org/10.1155/2021/6673448 |
[59] |
Ajith Gopi, Sudhakar K, Ngui WK, et al. (2021) Performance modeling of the weather impact on a utility-scale PV power plant in a tropical region. Int J Photoenergy 2021: 1-10. https://doi.org/10.1155/2021/5551014 doi: 10.1155/2021/5551014
![]() |
[60] |
Imasiku K (2021) A solar photovoltaic performance and financial modeling solution for grid-connected homes in Zambia. Int J Photoenergy 2021: 1-13. https://doi.org/10.1155/2021/8870109 doi: 10.1155/2021/8870109
![]() |
[61] |
Mahmoud FE, Elkadeem MR, Kotb KM, et al. (2021) Optimal design and energy management of an isolated fully renewable energy system integrating batteries and supercapacitors. Energy Con Manage 245: 114584. https://doi.org/10.1016/j.enconman.2021.114584 doi: 10.1016/j.enconman.2021.114584
![]() |
[62] |
Ahmed Al-Sarraj, Kareem KM (2020) Simulation design of hybrid system (grid/PV/wind turbine/battery/diesel) with applying HOMER: A case study in Baghdad, Iraq. Int J Electron Commun Eng. 7: 10-18. https://doi.org/10.14445/23488549/IJECE-V7I5P103 doi: 10.14445/23488549/IJECE-V7I5P103
![]() |
[63] |
Natei Ermias Benti, Yedilfana Setarge Mekonnen, Ashenafi Abebe Asfaw, et al. (2022) Techno-economic analysis of solar energy system for electrification of rural school in Southern Ethiopia. Cogent Engineering 9: 1-21. https://doi.org/10.1080/23311916.2021.2021838 doi: 10.1080/23311916.2021.2021838
![]() |
[64] | Md. Shafiuzzaman KK (2007) Solar and wind energy resource assessment (SWERA)-Bangladesh, UNEP/GEF. Available from: https://www.researchgate.net/publication/282665355_Solar_and_Wind_Energy_Resource_Assessment_SWERA_-_Bangladesh. |
[65] |
Rezzouk H, Mellit A (2015) Feasibility study and sensitivity analysis of a stand-alone photovoltaic-diesel-battery hybrid energy system in the north of Algeria. Renewable Sustainable Energy Rev 43: 1134-1150. https://doi.org/10.1016/j.rser.2014.11.103 doi: 10.1016/j.rser.2014.11.103
![]() |
[66] | Abdul Momin BM (2019) Summary of rainfall in Bangladesh for the year 2017 & 2018. Surface Water Processing Branch. Available from: http://www.hydrology.bwdb.gov.bd/img_upload/ongoing_project/756.pdf. |
[67] |
Gebrehiwot K, Hossain Mondal MA, Claudia Ringler, et al. (2019) Optimization and cost-benefit assessment of hybrid power systems for off-grid rural electrification in Ethiopia. Energy 177: 234-246. https://doi.org/10.1016/j.energy.2019.04.095 doi: 10.1016/j.energy.2019.04.095
![]() |
[68] | Division P (2018) "Net Metering Guidelines-2018", Sustainable and Renewable Energy Development Authority (SREDA), December 2018. Available from: https://www.bd.undp.org/content/dam/bangladesh/docs/Projects/srepgen/2018.11.28%20-%20Net%20Metering%20Guidelien%202018%20(English).pdf. |
[69] |
Mondal AH, Denich M (2010) Hybrid systems for decentralized power generation in Bangladesh. Energy Sustainable Develp 14: 48-55. https://doi.org/10.1016/j.esd.2010.01.001 doi: 10.1016/j.esd.2010.01.001
![]() |
[70] |
Jäger-Waldau A (2021) Snapshot of photovoltaics. EPJ Photovoltaics 12: 1-7. https://doi.org/10.1051/epjpv/2021002 doi: 10.1051/epjpv/2021002
![]() |
[71] | Retail electricity tariff rate (2020) Bangladesh Energy Regulatory Commission. Available from: http://www.berc.org.bd/. |
[72] | Inflation rate in Bangladesh (2021) Available from: https://www.statista.com/statistics/438363/inflation-rate-in-bangladesh/. |
Categorical Variables | Class | Number of classes | Conversion to numbers |
Attrition_Flag | Existing Customer | 8500 | 0 |
Attrited Customer | 1627 | 1 | |
Gender | M | 4769 | 0 |
F | 5358 | 1 | |
Education_Level | Uneducated | 1487 | 6 |
High School | 2013 | 15 | |
College | 1013 | 18 | |
Graduate | 3128 | 22 | |
Post-Graduate | 516 | 24 | |
Doctorate | 451 | 28 | |
Unknown | 1519 | 22 | |
Marital_Status | Single | 3943 | 0 |
Married | 4687 | 1 | |
Divorced | 748 | 2 | |
Unknown | 749 | 1 | |
Income_Category | Less than $40K | 3561 | 2 |
$40K - $60K | 1790 | 5 | |
$60K - $80K | 1402 | 7 | |
$80K - $120K | 1535 | 10 | |
$120K + | 727 | 14 | |
Unknown | 1112 | 2 | |
Card_Category | Blue | 9436 | 0 |
Silver | 555 | 1 | |
Gold | 116 | 2 | |
Platinum | 20 | 3 |
Sampling techniques | Machine learning | accuracy | precision | recall | F1 | AUC |
Random Oversampling | RF | 0.9650 | 0.9648 | 0.9650 | 0.9449 | 0.9901 |
GBDT | 0.9743 | 0.9746 | 0.9743 | 0.9745 | 0.9954 | |
Extra Tree | 0.9620 | 0.9613 | 0.9620 | 0.9613 | 0.9890 | |
AdaBoost | 0.9521 | 0.9568 | 0.9521 | 0.9535 | 0.9893 | |
XGBoost | 0.9748 | 0.9751 | 0.9748 | 0.9749 | 0.9939 | |
CatBoost | 0.9556 | 0.9606 | 0.9556 | 0.9570 | 0.9900 | |
SMOTE | RF | 0.9615 | 0.9627 | 0.9615 | 0.9620 | 0.9897 |
GBDT | 0.9699 | 0.9701 | 0.9699 | 0.9700 | 0.9945 | |
Extra Tree | 0.9600 | 0.9596 | 0.9600 | 0.9597 | 0.9881 | |
AdaBoost | 0.9556 | 0.9565 | 0.9556 | 0.9560 | 0.9856 | |
XGBoost | 0.9704 | 0.9704 | 0.9704 | 0.9704 | 0.9919 | |
CatBoost | 0.9590 | 0.9607 | 0.9590 | 0.9596 | 0.9884 | |
Borderline-SMOTE | RF | 0.9585 | 0.9583 | 0.9585 | 0.9588 | 0.9891 |
GBDT | 0.9709 | 0.9711 | 0.9709 | 0.9710 | 0.9936 | |
Extra Tree | 0.9556 | 0.9549 | 0.9556 | 0.9551 | 0.9886 | |
AdaBoost | 0.9516 | 0.9538 | 0.9516 | 0.9524 | 0.9855 | |
XGBoost | 0.9719 | 0.9719 | 0.9719 | 0.9719 | 0.9934 | |
CatBoost | 0.9497 | 0.9532 | 0.9497 | 0.9508 | 0.9857 | |
ADASYN | RF | 0.9605 | 0.9615 | 0.9605 | 0.9609 | 0.9901 |
GBDT | 0.9709 | 0.9710 | 0.9709 | 0.9709 | 0.9939 | |
Extra Tree | 0.9566 | 0.9561 | 0.9566 | 0.9563 | 0.9885 | |
AdaBoost | 0.9576 | 0.9589 | 0.9876 | 0.9581 | 0.9859 | |
XGBoost | 0.9724 | 0.9724 | 0.9724 | 0.9724 | 0.9931 | |
CatBoost | 0.9546 | 0.9575 | 0.9546 | 0.9556 | 0.9878 |
Variables | coef | std err | t | P>|t| | 95 % interval |
Total_Trans_Ct | −0.0003 | 2.48e-05 | −13.109 | 0.000 | [−0.000, −0.000] |
Total_Trans_Amt | 7.163e-06 | 4.14e-07 | 17.308 | 0.000 | [6.35e-06, 7.97e-06] |
Total_Revolving_Bal | −4.396e-06 | 5.8e-06 | −0.758 | 0.449 | [-1.58e-05, 6.98e-06] |
Total_Relationship_Count | −0.0008 | 0.000 | −6.324 | 0.000 | [-0.001, -0.001] |
Total_Amt_Chng_Q4_Q1 | −0.0079 | 0.001 | −7.078 | 0.000 | [−0.010, −0.006] |
Total_Ct_Chng_Q4_Q1 | −0.0051 | 0.001 | −4.819 | 0.000 | [−0.007, −0.003] |
Months_Inactive_12_mon | 0.0007 | 0.000 | 3.968 | 0.000 | [0.000, 0.001] |
Contacts_Count_12_mon | 0.0006 | 0.000 | 3.872 | 0.000 | [0.000, 0.001] |
Avg_Utilization_Ratio | 0.0039 | 0.010 | 0.384 | 0.701 | [−0.016, 0.024] |
Categorical Variables | Class | Number of classes | Conversion to numbers |
Attrition_Flag | Existing Customer | 8500 | 0 |
Attrited Customer | 1627 | 1 | |
Gender | M | 4769 | 0 |
F | 5358 | 1 | |
Education_Level | Uneducated | 1487 | 6 |
High School | 2013 | 15 | |
College | 1013 | 18 | |
Graduate | 3128 | 22 | |
Post-Graduate | 516 | 24 | |
Doctorate | 451 | 28 | |
Unknown | 1519 | 22 | |
Marital_Status | Single | 3943 | 0 |
Married | 4687 | 1 | |
Divorced | 748 | 2 | |
Unknown | 749 | 1 | |
Income_Category | Less than $40K | 3561 | 2 |
$40K - $60K | 1790 | 5 | |
$60K - $80K | 1402 | 7 | |
$80K - $120K | 1535 | 10 | |
$120K + | 727 | 14 | |
Unknown | 1112 | 2 | |
Card_Category | Blue | 9436 | 0 |
Silver | 555 | 1 | |
Gold | 116 | 2 | |
Platinum | 20 | 3 |
Sampling techniques | Machine learning | accuracy | precision | recall | F1 | AUC |
Random Oversampling | RF | 0.9650 | 0.9648 | 0.9650 | 0.9449 | 0.9901 |
GBDT | 0.9743 | 0.9746 | 0.9743 | 0.9745 | 0.9954 | |
Extra Tree | 0.9620 | 0.9613 | 0.9620 | 0.9613 | 0.9890 | |
AdaBoost | 0.9521 | 0.9568 | 0.9521 | 0.9535 | 0.9893 | |
XGBoost | 0.9748 | 0.9751 | 0.9748 | 0.9749 | 0.9939 | |
CatBoost | 0.9556 | 0.9606 | 0.9556 | 0.9570 | 0.9900 | |
SMOTE | RF | 0.9615 | 0.9627 | 0.9615 | 0.9620 | 0.9897 |
GBDT | 0.9699 | 0.9701 | 0.9699 | 0.9700 | 0.9945 | |
Extra Tree | 0.9600 | 0.9596 | 0.9600 | 0.9597 | 0.9881 | |
AdaBoost | 0.9556 | 0.9565 | 0.9556 | 0.9560 | 0.9856 | |
XGBoost | 0.9704 | 0.9704 | 0.9704 | 0.9704 | 0.9919 | |
CatBoost | 0.9590 | 0.9607 | 0.9590 | 0.9596 | 0.9884 | |
Borderline-SMOTE | RF | 0.9585 | 0.9583 | 0.9585 | 0.9588 | 0.9891 |
GBDT | 0.9709 | 0.9711 | 0.9709 | 0.9710 | 0.9936 | |
Extra Tree | 0.9556 | 0.9549 | 0.9556 | 0.9551 | 0.9886 | |
AdaBoost | 0.9516 | 0.9538 | 0.9516 | 0.9524 | 0.9855 | |
XGBoost | 0.9719 | 0.9719 | 0.9719 | 0.9719 | 0.9934 | |
CatBoost | 0.9497 | 0.9532 | 0.9497 | 0.9508 | 0.9857 | |
ADASYN | RF | 0.9605 | 0.9615 | 0.9605 | 0.9609 | 0.9901 |
GBDT | 0.9709 | 0.9710 | 0.9709 | 0.9709 | 0.9939 | |
Extra Tree | 0.9566 | 0.9561 | 0.9566 | 0.9563 | 0.9885 | |
AdaBoost | 0.9576 | 0.9589 | 0.9876 | 0.9581 | 0.9859 | |
XGBoost | 0.9724 | 0.9724 | 0.9724 | 0.9724 | 0.9931 | |
CatBoost | 0.9546 | 0.9575 | 0.9546 | 0.9556 | 0.9878 |
Variables | coef | std err | t | P>|t| | 95 % interval |
Total_Trans_Ct | −0.0003 | 2.48e-05 | −13.109 | 0.000 | [−0.000, −0.000] |
Total_Trans_Amt | 7.163e-06 | 4.14e-07 | 17.308 | 0.000 | [6.35e-06, 7.97e-06] |
Total_Revolving_Bal | −4.396e-06 | 5.8e-06 | −0.758 | 0.449 | [-1.58e-05, 6.98e-06] |
Total_Relationship_Count | −0.0008 | 0.000 | −6.324 | 0.000 | [-0.001, -0.001] |
Total_Amt_Chng_Q4_Q1 | −0.0079 | 0.001 | −7.078 | 0.000 | [−0.010, −0.006] |
Total_Ct_Chng_Q4_Q1 | −0.0051 | 0.001 | −4.819 | 0.000 | [−0.007, −0.003] |
Months_Inactive_12_mon | 0.0007 | 0.000 | 3.968 | 0.000 | [0.000, 0.001] |
Contacts_Count_12_mon | 0.0006 | 0.000 | 3.872 | 0.000 | [0.000, 0.001] |
Avg_Utilization_Ratio | 0.0039 | 0.010 | 0.384 | 0.701 | [−0.016, 0.024] |