Research article Special Issues

Estimation of clear sky global solar radiation in Algeria

  • The paper presents the evaluation of performance of three models at three sites for estimating instantaneous clear-sky global solar radiation on a horizontal surface in Algeria. Additionally, recommend it to be used for estimating solar radiation at many locations in similar climates where radiometric measurements are not available and which might be helpful in the selection of the most suitable locations for solar power installations. The results in general exhibit that for global radiation, the daily correlation coefficient is higher than 0.99, whereas the mean absolute percentage error is less than 5%. The daily mean bias error ranges between -3 and +3%. The daily root mean square error is less than 7%. These results represent a precision that indicates that Atwater & Ball and Bird & Hulstrom models can be used successfully to predict solar radiation over three stations in the studied sites.

    Citation: Djelloul Benatiallah, Ali Benatiallah, Kada Bouchouicha, Bahous Nasri. Estimation of clear sky global solar radiation in Algeria[J]. AIMS Energy, 2019, 7(6): 710-727. doi: 10.3934/energy.2019.6.710

    Related Papers:

    [1] Maryanna Klatt, Jacqueline Caputo, Julia Tripodo, Nimisha Panabakam, Slate Bretz, Yulia Mulugeta, Beth Steinberg . A highly effective mindfulness intervention for burnout prevention and resiliency building in nurses. AIMS Public Health, 2025, 12(1): 91-105. doi: 10.3934/publichealth.2025007
    [2] Thiresia Sikioti, Afroditi Zartaloudi, Despoina Pappa, Polyxeni Mangoulia, Evangelos C. Fradelos, Freideriki Eleni Kourti, Ioannis Koutelekos, Evangelos Dousis, Nikoletta Margari, Areti Stavropoulou, Eleni Evangelou, Chrysoula Dafogianni . Stress and burnout among Greek critical care nurses during the COVID-19 pandemic. AIMS Public Health, 2023, 10(4): 755-774. doi: 10.3934/publichealth.2023051
    [3] Vasiliki Georgousopoulou, Panagiota Pervanidou, Pantelis Perdikaris, Efrosyni Vlachioti, Vaia Zagana, Georgios Kourtis, Ioanna Pavlopoulou, Vasiliki Matziou . Covid-19 pandemic? Mental health implications among nurses and Proposed interventions. AIMS Public Health, 2024, 11(1): 273-293. doi: 10.3934/publichealth.2024014
    [4] Aikaterini Toska, Sofia Ralli, Evangelos C. Fradelos, Ioanna Dimitriadou, Anastasios Christakis, Viktor Vus, Maria Saridi . Evaluation of burnout levels among healthcare staff in anesthesiology departments in Greece - Is there a connection with anxiety and depression?. AIMS Public Health, 2024, 11(2): 543-556. doi: 10.3934/publichealth.2024027
    [5] Argyro Pachi, Athanasios Tselebis, Christos Sikaras, Eleni Paraskevi Sideri, Maria Ivanidou, Spyros Baras, Charalampos Milionis, Ioannis Ilias . Nightmare distress, insomnia and resilience of nursing staff in the post-pandemic era. AIMS Public Health, 2024, 11(1): 36-57. doi: 10.3934/publichealth.2024003
    [6] Mohammad Mofatteh . Risk factors associated with stress, anxiety, and depression among university undergraduate students. AIMS Public Health, 2021, 8(1): 36-65. doi: 10.3934/publichealth.2021004
    [7] Lobna Mohamed Mohamed Abu Negm, Fathia Ahmed Mersal, Manal S. Fawzy, Ajitha Thankarajan Rajennal, Rehab Salamah Alanazi, Lujain Obaid Alanazi . Challenges of nursing students during clinical training: A nursing perspective. AIMS Public Health, 2024, 11(2): 379-398. doi: 10.3934/publichealth.2024019
    [8] Margarida A. R. Tomás, Marisa R. Soares, Joaquim M. Oliveira-Lopes, Luís M. M. Sousa, Vânia L. D. Martins . The influence of nursing handover on nurses' mental health: A scoping review. AIMS Public Health, 2025, 12(1): 106-123. doi: 10.3934/publichealth.2025008
    [9] Giovanni Napoli, Simone Autuori, Kumi Senkyire Ephraim . Attitudes of Italian mental health nurses towards mental illness and recovery: a cross-sectional study. AIMS Public Health, 2023, 10(2): 333-347. doi: 10.3934/publichealth.2023025
    [10] Ilenia Piras, Vanessa Usai, Paolo Contu, Maura Galletta . Vicarious trauma, coping strategies and nurses' health outcomes: An exploratory study. AIMS Public Health, 2024, 11(4): 1071-1081. doi: 10.3934/publichealth.2024055
  • The paper presents the evaluation of performance of three models at three sites for estimating instantaneous clear-sky global solar radiation on a horizontal surface in Algeria. Additionally, recommend it to be used for estimating solar radiation at many locations in similar climates where radiometric measurements are not available and which might be helpful in the selection of the most suitable locations for solar power installations. The results in general exhibit that for global radiation, the daily correlation coefficient is higher than 0.99, whereas the mean absolute percentage error is less than 5%. The daily mean bias error ranges between -3 and +3%. The daily root mean square error is less than 7%. These results represent a precision that indicates that Atwater & Ball and Bird & Hulstrom models can be used successfully to predict solar radiation over three stations in the studied sites.


    Bankruptcy prediction is the problem of detecting financial distress in businesses which will lead to eventual bankruptcy. Bankruptcy prediction has been studied since at least 1930s. The early models of bankruptcy prediction employed univariate statistical models over financial ratios. The univariate models were followed by multi-variate statistical models such as the famous Altman Z-score model. The recent advances in the field of Machine learning have led to the adoption of Machine learning algorithms for bankruptcy prediction. Machine Learning methods are increasingly being used for bankruptcy prediction using financial ratios. A study by Barboza, Kimura and Altman found that Machine Learning models can outperform classical statistical models like multiple discriminant analysis (MDA) by a significant margin in bankruptcy prediction (Barboza et al., 2017).

    Bankruptcy prediction is an important for modern economies because early warnings of bankrupt help not only the investor but also public policy makers to take proactive steps to minimize the impact of bankruptcies. The reasons that add to the significance of bankruptcy prediction are as follows:

    (1). Better allocation of resources

    Institutional investors, banks, lenders, retail investors are always looking at information that predicts financial distress in publicly traded firms. Early prediction of bankruptcy helps not only the investors and lenders but also the managers of a firm to take corrective action thereby conserving scare economic resources. Efficient allocation of capital is the cornerstone of growth in modern economies.

    (2). Input to policy makers

    Accurate prediction of bankruptcies of businesses and individuals before they happen gives law makers and policy makers some additional time to alleviate systemic issues that might be causing the bankruptcies. Indeed, with bankruptcies taking center stage in political discourse of many countries, the accurate prediction of bankruptcy is a key input for politicians, bureaucrats and in general for anyone who is making public policy.

    (3). Corrective action for business managers

    The early prediction of bankruptcy is likely to highlight business issues thereby giving the company's manager additional time to make decisions that will help avoid bankruptcy. This effect is likely to be more profound in public companies where the management has a fiduciary duty to the shareholders.

    (4). Identification of sector wide problems

    Bankruptcy prediction models that flag firms belonging to a certain sector are likely to be a leading indicator of an upcoming downturn in a certain sector of an economy. With robust models, the business managers and government policy makers would become aware and take corrective action to limit the magnitude and intensity of the downturn in the specific sector. Industry groups in turn has been shown to significantly effect forecasting models (Chava and Jarrow, 2004).

    (5). Signal to Investors

    Investors can make better and more informed decisions based on the prediction of bankruptcy models. This not only forces the management of firms to take corrective action but also helps to soften the overall economic fallout that results from the bankruptcies. Empirical studies have shown that investment opportunities are significantly related to likelihood of bankruptcy (Lyandres & Zhdanov, 2007).

    (6). Relation to adjacent problems

    Bankruptcy prediction is often the first step used by ratings agencies to detect financial distress in firms. Based on the predictions of bankruptcy models, ratings agencies investigate and assess credit risk. Getting flagged by bankruptcy prediction models is often the first step that triggers the process of revising credit ratings. A literature survey covering 2000–2013 demonstrates the close relation between bankruptcy prediction and credit risk (García et al., 2015).

    Most past studies in bankruptcy prediction including those using Machine Learning have used a relatively small sample of firms and a small number of financial ratios. This study distinguishes itself by using a much larger dataset having data for 21,114 U.S. firms (samples) and 57 financial ratios (features). Our dataset covers US firms from 1970 to 2020. Bankruptcy prediction models have been researched and built since the 1960s. The models built from 1960 to 1990 were primarily statistical models such as univariate, multiple discriminant analysis and logit and probit models. Starting from 1990s machine learning models started outperforming statistical models. Since this study applies the most popular contemporary machine learning algorithms using a big data set, we will compare our model with the machine learning models built since the 1990s. A full listing outlining the comparison with past machine learning studies and models for bankruptcy prediction is shown in the Table 1.

    Table 1.  Past ML studies.
    Author Machine learning models used Number of features (ratios) Size of training set
    Wilson and Sharda (1994) Shallow neural network, multi-discriminant analysis 5 ratios (same ratios used by Altman) 65—Bankrupt firms
    64—non-bankrupt firms
    Total—169 firms
    Min and Lee (2005) Support vector machine (SVM), Multi-discriminant analysis, Logistic Regression, Shallow Neural Network 38 ratios used to compute 2 principal components. 944—Bankrupt firms
    944—non-bankrupt firms
    Total—1888 firms
    Fedorova, Gilenko and Dovzhenko (2013) Adaptive Boost, Artificial Neural Network, Adaptive Boost combined with Neural Networks 75 ratios 444—Bankrupt firms
    444—non-bankrupt firms
    Total—888 firms
    Shi, Xi, Ma, Hu (2009) Bagging ensemble of Artificial Neural Networks (ANNs), Artificial Neural Network (ANN), Decision tree (C4.5), K-Nearest Neighbours and Support Vector Machine (SVM) 20 features (not ratios) Total—1000 samples
    Heo and Yang (2014) Adaboost with Decision Tree, SVM, Decision Tree, Artificial Neural Network 12 ratios 1381—Bankrupt firms
    1381—non-bankrupt firms
    Total—2762 firms
    Du Jardin (2016) Ensemble models, decision tree (DT), Multi-variable discriminant model (MDA), Logistic regression (LR) and Shallow Neural Network. 35 ratios 8010—Failed firms
    8010—non-failed firms
    Chen (2011) Decision Tree, LDA, LR, Self-Organizing Map (SOM), Genetic algorithm, Learning vector optimization, Particle swarm optimization. 8 features created using PCA over 42 ratios (33 financial and 8 non-financial). 50—Bankrupt firms
    150—non-bankrupt firms
    Total—200 firms
    Wang, Ma and Yang (2014) FS-Boosting, LR, NB, DT, ANN, SVM, Ensemble models. 30 financial ratios 112—Bankrupt firms
    128—non-bankrupt firms
    Total—240 firms
    Barboza, Kimura and Altman (2017) Neural Network, Support Vector Machine with Linear kernel, Support Vector Machine with Radial Kernel, Boosting, Bagging and Random Forest 11 financial ratios Total—898 firms

     | Show Table
    DownLoad: CSV

    In this study we have used three popular machine learning techniques—Random Forest, Support Vector Machines, and XGBoost to construct forecasting models. We find that Machine Learning models perform very well, with XGBoost being the most successful technique that achieves an accuracy score of more than 99% in out of sample testing.

    We also apply our XGBoost model to an important current issue, the task of predicting bankruptcies during the second half of 2020. The depth of the recession caused by the lockdowns that have been imposed to contain the COVID-19 pandemic has raised worries that corporate bankruptcies may rise substantially in the near future. According to a report in the New York Times (2020), Edward Altman, a pioneer of bankruptcy prediction research, and the creator of the famous Z score model, expects a "tsunami of bankruptcies" that will exceed the number of bankruptcies that followed the 2008 financial crisis. The result from our Machine Learning model confirms Prof Altman's fears that corporate bankruptcies will rise substantially in late 2020 and equal the highs seen during the 2008-09 recession. However, this study finds that the elevated level of bankruptcies will not be significantly different from 2010.

    The previous studies done for bankruptcy prediction have not taken a systematic view of the data used to build the models. The previous studies have been more focused on the models rather than on the data used to build the models. This study offers a much more balanced view where both the data and the models are given equal importance. To begin with, we have use Compustat are a source database to get an exhaustive list of financial ratios over US firms from 1970 to 2020. Compustat is a high-quality database used by several famous finance related papers such as Fama and French (1993). Most of the previous studies have used relatively small datasets as compared to ours. This study takes a systematic look at as many features as possible to train our machine learning models. Our balanced approach is also consistent with the shift from model centric to data centric approach proposed by Andrew Ng (Gil Press, 2021).

    The rest of the paper is structured as follows:

    Section 2—Describes the existing literature for bankruptcy prediction.

    Section 3—Describes the data and the method used to clean, process, and fit the data into our machine learning models. This section also covers the process used to predict the number of bankruptcies using Q2-2020 ratios.

    Section 4—Describes the results observed from the experiments

    Section 5—Presents our final comments and discusses the implications of the results.

    Bankruptcy prediction models prior to 1990s were primarily statistical models employing univariate, multivariate and logit & probit techniques. In 1966, Beaver applied univariate analysis in which the predictive ability of 30 financial ratios was tested one at a time to predict bankruptcy (Beaver, 1966). Altman in 1968 performed a multi-variate discriminant analysis (MDA) using 5 ratios to create a linear discriminant function of 5 variables (Altman, 1968). Several variants of MDA were developed in the following years. Edmister used 19 financial ratios to build a linear model for bankruptcy prediction (Edmister, 1972). Deakin found that a linear combination of the 14 ratios could be used to predict bankruptcy five years prior to failure (Deakin, 1972). Ohlson studied the shortcomings of MDA models and built a conditional logit model using maximum likelihood estimation (Ohlson, 1980). The datasets used in all these studies were quite small as compared to modern standards. Ohlson's study for example used a dataset of 2058 firms out of which 105 firms represented the bankrupt class.

    The next phase in the evolution of bankruptcy models started in the 1990s with several machine learning algorithms outperforming the older statistical models. Machine learning models such as Random Forests, Support Vector Machines (SVM) and Gradient Boosted Trees were found to be particularly effective for bankruptcy prediction. Barboza, Kimura and Altman compared statistical models with machine learning (ML) models. They found the Random Forests outperformed Alman's Z-score model by a significant margin (Barboza et al., 2017). These results were corroborated by studies (Joshi et al., 2018; Rustam and Saragih, 2018; Gnip and Drotár, 2019). Support Vector Machine (SVM) was also found to be a very effective machine learning algorithm in several studies. Hang et al. (2004) and Chen et al. (2008) achieved superior results for credit rating classification problem by using SVM. Song et al. (2008) used SVM to predict financial distress. Some studies also found boosted trees-based algorithms to outperform SVM. Wang, Ma and Yang proposed a new boosted tree-based algorithm for bankruptcy prediction which they found to be more effective than SVM (Wang et al., 2014). Heo and Yang (2014) used Adaboost algorithm to predict bankruptcy for Korean construction firm. They found Adaboost to have better accuracy than SVM (Heo and Yang, 2014). A more recent study in 2021 has used XGBoost and Random Forest algorithms to predict bankruptcies over 12 months. This study used a medium sized training dataset containing data for 8959 firms registered in Italy (Perboli and Arabnezad, 2021). Another recent study uses a database of Taiwanese firms to predict bankruptcy. This study used data set contain 96 attributes for 6819 firms to train machine learning models (Wang and Liu, 2021). One common attribute shared by all the forementioned studies is the relatively small size of their training data sets. The datasets used by these studies are small as compared to datasets used in the big data era. The largest training dataset in these studies had just 2600 samples which is quite small.

    Based on the literature review, the following trends become apparent:

    ●  Machine Learning Models are now consistently outperforming statistical models

    ●  The training data sets used to train the existing machine learning models are relatively small as compared to the data sets used for training models in other application areas.

    ●  Ensemble methods such as Random Forest and Boosted trees have performed better than other models in bankruptcy prediction.

    This study differentiates itself from previous studies by using a substantially larger dataset as compared to previous studies. We use a very standard and well documented dataset called Compustat to retrieve the financial ratios. Compustat is a standard financial dataset used in financial research. Compustat has been used by some very popular papers in finance such as Fama and French (1993). We have used 57 financial ratios that are listed in Table 2. Financial ratios are inputs used to train Bankruptcy prediction models. While most studies use fewer financial ratios, this study applies a large set of financial ratios of US Firms from 1970–2020 (50 years) to train Random Forest, SVM and XGBoost Models. This section discusses the overall methodology which includes data cleaning, balancing, model fitting, and analysis of results.

    Table 2.  Distribution of training data.
    Class Count
    Bankrupt 1,212
    Non-Bankrupt 19,902
    Total 21,114

     | Show Table
    DownLoad: CSV

    Previous studies have used small to medium sized data sets for training Machine learning models. This study sets itself apart by using a much larger training dataset. We used financial ratios data set from Compustat. The financial ratios data set was then joined with another dataset called Bankruptcy data set. The bankruptcy data set contains the data such as date of bankruptcy, bankruptcy reason and GVKEY (primary key) while the financial ratios dataset contains all the financial ratios mentioned in Table A.1. The two datasets were programmatically joined using a common field named GVKEY. GVKEY is a unique identifier assigned to each firm. The relation amongst the two datasets that were used to create our labelled training dataset is best represented by the ER schema diagram shown in Figure 1.

    Figure 1.  ER Diagram depicting relation between Financial Ratios Dataset and Bankruptcy Dataset.

    The financial ratios dataset we have used contains 57 financial ratios mentioned in Table A.1 in Appendix A. This is an exhaustive list of features used to train our models. We have included ratios which are often overlooked but are likely to help detect patterns related to edge cases.

    The first step of building a predictive model is data pre-processing and cleaning. The original data from Compustat had 75 financial ratios for 21,114 US firms. This data covered firms established in the US between 1970 and 2020. The dataset contained firms that belonged to 2 classes: bankrupt and non-bankrupt or continuing enterprises. The dataset contains 1212 bankrupt firms and 19,902 non-bankrupt firms. The distribution of data points (samples) belonging to these two classes is summarized in Table 2. The next step was to drop features which had null values for more than 6000 firms out of 21,114 firms. This step ensured that we don't have more than 30% of null values in any feature. The goal is to ensure that the true distribution generating this data is preserved and learned by our machine learning models. 18 features (financial ratios) were dropped from the data set because they had null values for more than 6000 (30% of total number of firms). The dataset now had 75 − 18 = 57 features. Next, we scaled our data to have mean = 0 and variance = 1 using Scikit-learns Standard Scaler class. Scaling is required to ensure that gradient descent converges on the minima of the loss function. The last step of data cleaning was to impute the missing values in the 57 financial ratios (features). For imputing the missing values, we used the KNN algorithm which used three nearest neighbours to estimate the missing value. Further, the weight assigned to each neighbour is a function of its Euclidean distance from the data point with missing value. KNN with 3-neighbours has been found to be effective in preserving the true distribution of the data (Beretta and Santaniello, 2016).

    The cleaned and scaled dataset without any missing values was an imbalanced dataset (see Table 2). The dominant class was the bankruptcy class. Approximately 90% of the samples belonged to the majority class which is non-bankrupt firms. Since the goal of this study is train a classifier to identify bankrupt firms, we decided to balance the classes in our training data. This would ensure that our model would learn about the minority class which is the bankrupt class. This is important in the context of bankruptcy prediction because detecting samples belonging to the bankrupt class. To balance the dataset, we use the Synthetic Minority Over-sampling technique (SMOTE) proposed by Chawla et al. (Chawla et al., 2002). SMOTE generates synthetic samples using the features of the data. The minority class is oversampled by taking a minority class sample and then a line is drawn from this minority class sample to k-nearest minority class samples. Synthetic minority class samples are generated along the line joining the minority class sample to its minority class neighbours. Additionally, to ensure that our balanced dataset facilitated learning of the bankrupt class, we also used Borderline-SVM SMOTE. Borderline-SVM SMOTE technique uses samples close the decision boundary (support vectors) to create synthetic samples (Nguyen et al., 2011). Finally, we used the Adaptive Synthetic Sampling (ADASYN) algorithm of He, Bai, Garcia and Li to generate samples in regions of feature space where the density of minority samples is low (He et al., 2008). The result was a balanced dataset containing 19902 samples of non-bankrupt class and 20,517 bankrupt class. The balanced dataset has 57 financial ratios (features).

    The balanced dataset was then shuffled and split into training set containing 70% of the samples and test set containing 30% of the samples. The purpose of creating a test set is to test the accuracy of the models on data that the models have not been trained on. Collecting metrics based on the test set gives practitioners an idea of the generalization performance of machine learning models.

    The training data set was fitted into three machine learning models. These models are: Random Forest, Support Vector Machine (SVM) and XGBoost. After fitting, the models were then used to predict for samples in the test set to assess their relative performance.

    For comparing the performance of the models, we decided to use Accuracy score, Receiver Operating Curve (ROC) and Area Under ROC Curve (AUC). Accuracy score can be used because we are training our models using a balanced dataset. However, to get a better idea of the True Positive Rate (TPR) and False Positive Rate (FPR) we decided to employ ROC and AUC metrics as well. It is important to compare the TPR and FPR because it is more to avoid False negatives (FN) as compared to False positives (FP). False negative (FN) would be a firm which would go bankrupt but is wrongly classified by our model as a non-bankrupt sample. False positive (FP) on the other hand would be a firm that is not bankrupt but is wrongly classified as a bankrupt firm.

    The goal of this study is to predict number of bankruptcies within the next 30, 90 and 180 days. We trained 3 different models to predict the number of bankruptcies within 30, 90 and 180 days. The models were built and analysed using the same approach. The only difference was that the training and test labels for each model were derived from the bankruptcy date. For example, to train the model for predicting number of bankruptcies within 30 days, we used

    where

    where

    else

    Therefore, we trained 9 models to predict bankruptcies within 30, 90 and 180 days. For example, for predicting bankruptcy within 30 days we trained Random Forest, SVM and XGBoost. After training the models, we picked the best model based on performance metrics described in previous section and then we used the best model to predict the number of bankruptcies using the latest Q2 2020 financial ratios from Capital IQ. In this final prediction set, we only kept data for firms which did not have any significant gaps or holes. Finally, we used the final prediction set from Q2 2020 to predict the number of bankruptcies we expect to happen over the next 30, 90 and 180 days.

    As mentioned in the previous section, we trained 9 models, using three different techniques, RF, SVM, XGBoost, for predicting bankruptcies over 30, 90 and 180 days. Next, we used the test set to make predictions and then assessed the relative performance. Based on the chosen metrics of accuracy score and Area under ROC curve (ROC AUC), XGBoost outperformed the other models for predicting bankruptcy within 30, 90 and 180 days. The actual scores for accuracy and AUC are presented in Table 3.

    Table 3.  Accuracy and ROC AUC metrics.
    Model Algorithm Accuracy Score ROC AUC
    Predict Bankruptcies within 30 days

    Random Forest (RF) 0.99676 0.99981
    Support Vector Machine 0.90933 0.96673
    XGBoost 0.99683 0.99992
    Predict Bankruptcies within 90 days

    Random Forest (RF) 0.98654 0.99917
    Support Vector Machine 0.83528 0.90526
    XGBoost 0.99047 0.99933
    Predict Bankruptcies within 180 days

    Random Forest (RF) 0.98580 0.99896
    Support Vector Machine 0.82202 0.89590
    XGBoost 0.98697 0.99902

     | Show Table
    DownLoad: CSV

    The accuracy score of XGBoost models is consistently better than SVM and Random Forest. This result is also consistent with the ROC curves which are shown in Figure 2 below.

    Figure 2.  ROC curves for all 9 models.

    As seen in Figure 2, the ROC curve for XGBoost is closest to the top left corner thereby covering maximum area under it. XGBoost is therefore the best performing model closely followed by Random Forest. The fact that these metrics are calculated using the test set (containing data which model has not been trained on) gives us confidence in the ability of our models to generalize.

    We present the performance metrics of previous studies in Table 4. Previous studies have used 2 performance metrics: Test accuracy and Area Under ROC curve (AUC). To keep the comparison consistent, we have computed both test accuracy score and AUC for our models (see Table 3). Our best model built using XGBoost significantly outperforms the models built in previous studies. The accuracy of our XGBoost model for prediction bankruptcy within 180 days is 98.69% which is lower than the test accuracy of our XGBoost models for predicting bankruptcies within 30 and 90 days. However, our model for predicting bankruptcies within 180 days has a higher test accuracy (98.69%) than models built in previous studies. Similarly, our model for predicting bankruptcies within 180 days has an AUC score of 0.99 which is higher than the AUC score reported by previous studies. Our performance metrics of accuracy and AUC score are computed over out of training samples which also indicates to the robustness of our results.

    Table 4.  Performance metrics of previous studies.
    Author Size of training set Performance
    Wilson and Sharda (1994) 65—Bankrupt firms
    64—non-bankrupt firms
    Total—169 firms
    Test set accuracy*
    95.6% for neural network
    91.8% for MDA
    Min and Lee (2005) 944—Bankrupt firms
    944—non-bankrupt firms
    Total—1888 firms
    Test set accuracy*
    83.06% for SVM
    82.52% for Shallow Neural Network
    79.13% for MDA
    78.30% for Logit
    Fedorova, Gilenko and Dovzhenko (2013) 444—Bankrupt firms
    444—non-bankrupt firms
    Total—888 firms
    Test set accuracy*
    88.8% for Adaptive Boost combining several NN
    87.8% for Logistic regression
    Shi, Xi, Ma, Hu (2009) Total—1000 samples Test set accuracy*
    75.6% for Bagging ensemble of ANNs
    Heo and Yang (2014) 1381—Bankrupt firms
    1381—non-bankrupt firms
    Total—2762 firms
    Test set accuracy*
    78.5% for AdaBoost
    77.1% for ANN
    73.3% for SVM
    Du Jardin (2016) 8010—bankrupt firms
    8010—non-bankrupt firms
    Area under ROC Curve (AUC)**
    0.9049 for Neural Network with Random subspace ensemble
    0.9003 for Neural Networks with Boosting
    0.8952 for Neural Networks with Bagging
    Chen (2011) 50—Bankrupt firms
    150—non-bankrupt firms
    Total—200 firms
    Test set accuracy*
    93.12% for PSO-SVM
    91.87% for GA-SVM
    84.37 for SVM
    Wang, Ma and Yang (2014) 112—Bankrupt firms
    128—non-bankrupt firms
    Total—240 firms
    Test set accuracy*
    81.50% for FS-Boosting
    72.21% for SVM
    73.38% for ANN
    Barboza, Kimura and Altman (2017) 449—Bankrupt firms
    449—non-bankrupt firms
    Total—898 firms
    Area under ROC curve (AUC) **
    Random Forest—92.92 (highest AUC).
    Note: *Studies that reported test set accuracy
    **Studies that reported Area under ROC curve

     | Show Table
    DownLoad: CSV

    Next, we apply our best model, using XGBoost, to the data from Q2-2020 to evaluate the possibility of a substantial upsurge in business bankruptcies in the second half of 2020 because of the deep 2020 recession caused by the pandemic. We apply this best model to the latest available ratios, for Q2-2020, and classify a firm as going bankrupt during the next 30, 90, or 180 days if the predicted probability of bankruptcy is higher than 0.50.

    Using this method, our best model in each category predicted 74 bankruptcies within 30 days, 189 bankruptcies within 90 days and 354 Bankruptcies within 180 days. This prediction is for all firms contained in the S & P Global database, both public and private. The predictions for the number of bankruptcies are summarized in Table 5.

    Table 5.  Bankruptcy predictions using Q2 2020 data.
    Model Predicted Bankruptcies
    XGBoost for Predicting Bankruptcies within 30 days 74
    XGBoost for Predicting Bankruptcies within 90 days 189
    XGBoost for Predicting Bankruptcies within 180 days 354

     | Show Table
    DownLoad: CSV

    S & P Global has reported a total of 336 actual bankruptcies until the end of June 2020. If we add our prediction of 354 bankruptcies to the actual bankruptcies, then we predict a total of 336 + 354 = 690 bankruptcies in 2020. We summarize our predictions in Table 6 below.

    Table 6.  Total number of bankruptcy predictions.
    Time Period Bankruptcies
    Actual bankruptcies reported until Jun-2020 336
    Predicted bankruptcies from our model, Jul-Dec 2020 354
    Total bankruptcies for the year 2020 690

     | Show Table
    DownLoad: CSV

    Since the number of firms in the database changes from year to year, we decided to compare the prediction for 2020 with the past by using bankruptcy rates, i.e., the ratio of the number of bankruptcies to the total number of firms. As shown in Table 7, our prediction of 690 bankruptcies in 2020 represents a bankruptcy rate of 4.35% for all US firms. This rate is the highest in the last 10 years. The second highest rate of 4.2%, only slightly lower, was seen in 2010, in the immediate aftermath of the 2008-09 recession. The average rate during the economic expansion years of 2011–2019 was 3.2%, more than a full percentage point lower than the predicted 2020 rate. We conclude that we will indeed see a much higher rate bankruptcies in 2020, but it is unlikely to be substantially larger than in 2010.

    Table 7.  Comparison of bankruptcy rate with past bankruptcy rates.
    Year # Of Bankruptcies # Of Firms % Of Bankruptcies
    2010 819 19,523 4.20%
    2011 629 19,001 3.31%
    2012 582 18,653 3.12%
    2013 551 18,373 3.00%
    2014 467 18,091 2.58%
    2015 520 17,181 3.03%
    2016 571 17,004 3.36%
    2017 513 17,118 3.00%
    2018 513 16,542 3.10%
    2019 582 14,442 4.03%
    2020 690 15,850 4.35%

     | Show Table
    DownLoad: CSV

    We find that two different Machine Learning algorithms, Random Forest (RF) and Extreme Gradient Boosting (XGBoost) produce accurate predictions of whether a firm will go bankrupt within the next 30, 90, or 180 days, using financial ratios as input features. The XGBoost based models perform exceptionally well, with 99% out-of-sample accuracy. Our training dataset uses a large database of public US firms over a period of 49 years, 1970–2019, and 57 financial ratios. This study has used a substantially larger training dataset as compared to previous studies.

    An application of our best performing XGBoost model to Q2-2020 financial data for a sample of both private and public U.S. firms shows that the bankruptcy rate will climb substantially higher in 2020 than in the expansion years of 2011–2019. However, our model suggests that the rate will be only marginally higher than in 2010.

    We identify the following areas for further research:

    ●  Adding macro-economic features—It will be interesting to add macro-economic features to training data used for training machine learning models for bankruptcy prediction.

    ●  Train deep neural networks with different topologies—Another interesting area of research would be to apply different types of deep neural networks such as TabNet and Recurrent neural networks.

    All authors declare no conflicts of interest in this paper.



    [1] Sawin J (2013) Global Status Report. Renewables. REN21 Secretariat, Paris, France.
    [2] Shukl K, Sudhakar K, Rangneker S (2015) Estimation and validation of solar radiation incident on horizontal and tilted surface at Bhopal, Madhya Pradesh, India. Am-Eurasian J Agric Environ Sci 15: 129-139.
    [3] Benatiallah D, Benatiallah A, Bouchouicha K, et al. (2016) Development and modeling of a geographic information system solar flux in Adrar, Algeria. Int J Sys Model Simul 1: 15-19.
    [4] Kumar BS, Sudhakar K (2015) Performance evaluation of 10 MW grid connected solar photovoltaic power plant in India. Energy Rep 1: 184-192. doi: 10.1016/j.egyr.2015.10.001
    [5] Besarati SM, Padilla RV, Goswami DY, et al. (2013) The potential of harnessing solar radiation in Iran: generating solar maps and viability study of PV power plants. Renewable Energy 53: 193-199. doi: 10.1016/j.renene.2012.11.012
    [6] Elhodeiby AS, Metwally HMB, Farahat MA (2011) Performance analysis of 3.6 kW Rooftop grid connected photovoltaic system Egypt. International Conference on Energy Systems and Technologies, Cairo, Egypt, CEST 2011.
    [7] Kambezidis HD, Psiloglou BE, Karagiannis D, et al. (2016) Recent improvements of the Meteorological Radiation Model for solar irradiance estimates under all-sky conditions. Renewable Energy 93: 142-158. doi: 10.1016/j.renene.2016.02.060
    [8] Mejdou L, Taqi R, Belouaggadian N (2011) Estimation of solar radiation in the Casablanca area. International Congress on Renewable Energies and Energy Efficiency, Fès Morocco.
    [9] Mesri M (2015) Numerical methods to calculate solar radiation, validation through a new Graphic User Interface design. Energy Convers Manage 90: 436-445.
    [10] Zaatri A, Azzizi N (2016) Evaluation of some mathematical models of solar radiation received by a ground collector. World J Eng 13: 376-380.
    [11] Lealea T, Tchinda R (2013) Estimation of diffuse solar radiation in the north and far north of Cameroon. Eur Sci J 9.
    [12] Mesri-Merad M (2012) Estimation of solar radiation on the ground by semi-empirical models. Rev Renewable Energies 15: 451-463.
    [13] El-mghouchi Y, Bouardi A, Choulli Z, et al. (2014) Estimate of the direct, diffuse and global solar radiations. Int J Sci Res 3: 1449-1457.
    [14] Yettou F, Malek A, Haddadi M, et al. (2009) Comparative study of two models of solar radiation calculation in Algeria. Renewable Energies Rev 12: 331-346.
    [15] Yaïche M, Bekkouche S (2010) Estimation of global solar radiation in Algeria for different types of sky. Rev Renewable Energies 13: 683-695.
    [16] Gueymard CA (2014) The sun's total and spectral irradiance for solar energy applications and solar radiation models. Sol Energy 76: 423-453.
    [17] Otunla TA (2019) Estimates of clear-sky solar irradiances over Nigeria. Renewable Energy 131: 778-787. doi: 10.1016/j.renene.2018.07.053
    [18] Ruiz-Arias JA, Gueymard CA (2018) Worldwide inter-comparison of clear-sky solar radiation models: Consensus-based review of direct and global irradiance components simulated at the earth surface. Sol Energy 168: 10-29. doi: 10.1016/j.solener.2018.02.008
    [19] Scarpa F, Bianco V, Tagliafico LA (2018) A clear sky physical based solar radiation decomposition model. Therm Sci Eng Prog 6: 323-329. doi: 10.1016/j.tsep.2017.11.004
    [20] Kambezidis HD, Psiloglou BE, Karagiannis D, et al. (2017) Meteorological Radiation Model (MRM v6.1): improvements in diffuse radiation estimates and new approach for implementation of cloud products. Renewable Sustainable Energy Rev 74: 616-637.
    [21] Bilbao J, Miguel A (2010) Estimation of UV-B irradiation from total global solar meteorological data in central Spain. J Geophys Res 115: D00I09.
    [22] De Miguel A, Román R, Bilbao J, et al. (2011) Evolution of erythemal and total shortwave solar radiation in Valladolid, Spain: Effects of atmospheric factors. J Atmos Sol-Terr Phys 73: 578-586. doi: 10.1016/j.jastp.2010.11.021
    [23] Bilbao J, Román R, Miguel A (2014) Turbidity coefficients from normal direct solar irradiance in Central Spain. Atmos Res 143: 73-84. doi: 10.1016/j.atmosres.2014.02.007
    [24] Yamasoe MA, do Rosário NME, Barros KM (2016) Downward solar global irradiance at the surface in São Paulo city-The climatological effects of aerosol and clouds. J Geophys Res 122.
    [25] Sanchez-Lorenzo A, Calbó J, Brunetti M, et al. (2009) Dimming/brightening over the Iberian Peninsula: trends in sunshine duration and cloud cover and their relations with atmospheric circulation. J Geophys Res 114: D00D09.
    [26] Hinkelman LM, Stackhouse PW, Wielicki BA, et al. (2009) Surface insolation trends from satellite and ground measurements: comparisons and challenges. J Geophys Res 114: D00D20.
    [27] Hatzianastassiou N, Papadimas CD, Matsoukas C, et al. (2012) Recent regional surface solar radiation dimming and brightening patterns: inter-hemispherical asymmetry and a dimming in the Southern Hemisphere. Atmos Sci Lett 13: 43-48. doi: 10.1002/asl.361
    [28] Berk A, Bernstein L, Robertson D (1989) MODTRAN: A moderate resolution model for LOWTRAN7, Rep. GL-TR-89-0122. Air Force Geophys. Lab., Bedford, MA.
    [29] Bird RE, Riordan C (1986) Simple solar spectral model for direct and diffuse irradiance on horizontal and tilted planes at the earth's surface for cloudless atmospheres. J Clim Appl Meteorol 25: 87-97. doi: 10.1175/1520-0450(1986)025<0087:SSSMFD>2.0.CO;2
    [30] Gueymard C (1995) A simple model of the atmospheric radiative transfer of sunshine: algorithms and performance assessment. Florida Solar Energy Center.
    [31] ASHRAE (1985) Handbook of fundamentals. Atlanta, Georgia: American Society of Heating, Refrigeration, and Air-Conditioning Engineers.
    [32] Campbell GS, Norman JM (1989) An Introduction to Environmental Biophysics. 2nd ed., New York Springer, ISBN 0-387-94937-2.
    [33] Atwater MA, Ball JT (1978) A numerical solar radiation model based on standard meteorological observations. Sol Energy 21: 163-70.
    [34] Davies JA (1988) Validation of models for estimating solar radiation on horizontal surface. Atmospheric Environment Service, Downsview (Ont.), IEA Task IX Final Report.
    [35] Ineichen P (2006) Comparison of eight clear sky broadband models against 16 independent data banks Sol Energy 80: 468-478.
    [36] Bouchouicha K, Razagui A, Bachari N, et al. (2016) Hourly global solar radiation estimation from MSG-SEVIRI images-case study: Algeria. World J Eng 13: 266-274.
    [37] Bouchouicha K, Razagui A, Bachari N, et al. (2015) Mapping and Geospatial Analysis of Solar Resource in Algeria. Int J Energy, Environ Econ 23: 735-751.
    [38] Bailek N, Bouchouicha K, Al-Mostafa Z, et al. (2018) A new empirical model for forecasting the diffuse solar radiation over Sahara in the Algerian Big South. Renewable Energy 117: 117-530.
    [39] Journée M, Bertrand C (2011) Quality control of solar radiation data within the RMIB solar measurements network. Sol Energy 85: 72-86. doi: 10.1016/j.solener.2010.10.021
    [40] Pandey CK, Katiyar AK (2013) Solar radiation: Models and measurement techniques. J Energy.
    [41] Bird RE, Hulstrom R (1981) A simplified clear sky model for directand diffuse insulation on horizontal surfaces, Seri/Tr. 642-761.
    [42] El-mghouchi Y, El-bouardi A, Sadouk A, et al. (2016) Comparison of three solar radiation models and their validation under all sky conditions-case study: Tetuan city in northern of Morocco. Renewable Sustainable Energy Rev 58: 1432-1444.
    [43] El-mghouchi Y, El-bouardi A, Choulli Z, et al. (2016) Models for obtaining the daily direct, diffuse and global solar radiations. Renewable Sustainable Energy Rev 56: 87-99.
    [44] Bird RE, Huldstrom R (1980) Direct insolation models. Trans ASME J Solar Energy Eng 103: 182-192.
    [45] Stone RJ (1993) Improved statistical procedure for the evaluation of solar radiation estimation models. Sol Energy 51: 289-291.
    [46] National Centers For Environmental Information. Available from: www.ncdc.noaa.gov.
    [47] Hussain M (1984) Estimation of global and diffuse irradiation from sunshine duration and atmospheric water vapour contents. Sol Energy 33: 217-220.
    [48] Van Heuklon TK (1979) Estimating atmospheric ozone for solar radiation models. Sol Energy 22: 63-68.
    [49] Mefti A (2007) Contribution to the solar deposit determination by solar soil data processing and meteosat images. Doctoral Thesis, University of USTHB Algiers.
  • This article has been cited by:

    1. Yong-xin Lin, Xue Chen, Heng-you Lan, Analysis and prediction of American economy under different government policy based on stepwise regression and support vector machine modelling, 2023, 3, 2769-2140, 1, 10.3934/DSFE.2023001
    2. Mushtaq Hussain Khan, Angesh Anupam, 2023, Sentiment Analysis Towards Bankruptcy of Silicon Valley Bank: Twitter-Based Study, 979-8-3503-3179-0, 1, 10.1109/GlobConET56651.2023.10150197
    3. María Jesús Segovia‐Vargas, María del Mar Camacho‐Miñano, Vera Gelashvili, Social sustainability: Viability analysis of social firms, 2024, 1758-5880, 10.1111/1758-5899.13455
    4. Ummey Hany Ainan, Lip Yee Por, Yen-Lin Chen, Jing Yang, Chin Soon Ku, Advancing Bankruptcy Forecasting With Hybrid Machine Learning Techniques: Insights From an Unbalanced Polish Dataset, 2024, 12, 2169-3536, 9369, 10.1109/ACCESS.2024.3354173
    5. Manisha More, Rajasmita Panda, Bapurao Bandgar, Mayuri More, 2023, Bankruptcy Prediction Using Machine Learning: A New Technological Approach to Prevent Corporate Bankruptcy Through Well Deployed Streamlit Based Application, 978-1-6654-7517-4, 1, 10.1109/ICONAT57137.2023.10080089
    6. Katarina Valaskova, Dominika Gajdosikova, Jaroslav Belas, Bankruptcy prediction in the post-pandemic period: A case study of Visegrad Group countries, 2023, 14, 2353-1827, 253, 10.24136/oc.2023.007
    7. Xiangzheng Fu, Yifan Chen, Sha Tian, DlncRNALoc: A discrete wavelet transform-based model for predicting lncRNA subcellular localization, 2023, 20, 1551-0018, 20648, 10.3934/mbe.2023913
    8. Manel Hamdi, Sami Mestiri, Adnène Arbi, Artificial Intelligence Techniques for Bankruptcy Prediction of Tunisian Companies: An Application of Machine Learning and Deep Learning-Based Models, 2024, 17, 1911-8074, 132, 10.3390/jrfm17040132
    9. Young-Taek Park, Donghan Kim, Ji Soo Jeon, Kwang Gi Kim, Predictors of Medical and Dental Clinic Closure by Machine Learning Methods: Cross-Sectional Study Using Empirical Data, 2024, 26, 1438-8871, e46608, 10.2196/46608
    10. Vera Gelashvili, Alba Gómez-Ortega, Almudena Macías-Guillén, María Luisa Delgado Jalón, Analysis of European accounting and auditing firms: do they have different business viability?, 2024, 1526-5943, 10.1108/JRF-07-2024-0198
    11. Francesco Fasano, Carlo Adornetto, Iliess Zahid, Maurizio La Rocca, Luigi Montaleone, Gianluigi Greco, Alfio Cariola, The dilemma of accuracy in bankruptcy prediction: a new approach using explainable AI techniques to predict corporate crises, 2024, 28, 1460-1060, 1, 10.1108/EJIM-06-2024-0633
    12. Ziyue Huang, 2025, Evaluating Hybrid Machine Learning Models Bankruptcy Prediction, 979-8-3315-3378-6, 25, 10.1109/ISCBI64586.2025.11015371
    13. Andrés Caicedo Carrero, Daniel Isaac Roque, Aplicación del modelo Zavgren en el análisis de la insolvencia financiera en el sector constructor entre 2018-2022, 2025, 11, 2422-3182, e3357, 10.22430/24223182.3357
  • Reader Comments
  • © 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(6395) PDF downloads(833) Cited by(8)

Figures and Tables

Figures(11)  /  Tables(6)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog