Mean | Median | Min | Max | Kurtosis | Skewness | St.Dev | |
Monthly data | 31560 | 31822 | 24083 | 37158 | -0.31 | -0.42 | 30.44 |
Quarterly data | 32452 | 32675 | 24760 | 37158 | -0.21 | -0.54 | 30.71 |
Citation: Valeria Hirschler, Gustavo Maccallini, Claudia Molinari, Mariana Hidalgo, Patricia Intersimone, Claudio Gonzalez. Type 2 diabetes markers in indigenous Argentinean children living at different altitudes[J]. AIMS Public Health, 2018, 5(4): 440-453. doi: 10.3934/publichealth.2018.4.440
[1] | Lamya Lairgi, Rachid Lagtayi, Yassir Lairgi, Abdelmajid Daya, Rabie Elotmani, Ahmed Khouya, Mohammed Touzani . Optimization of tertiary building passive parameters by forecasting energy consumption based on artificial intelligence models and using ANOVA variance analysis method. AIMS Energy, 2023, 11(5): 795-809. doi: 10.3934/energy.2023039 |
[2] | Hassan Shirzeh, Fazel Naghdy, Philip Ciufo, Montserrat Ros . Stochastic energy balancing in substation energy management. AIMS Energy, 2015, 3(4): 810-837. doi: 10.3934/energy.2015.4.810 |
[3] | Emrah Gulay . Forecasting the total electricity production in South Africa: Comparative analysis to improve the predictive modelling accuracy. AIMS Energy, 2019, 7(1): 88-110. doi: 10.3934/energy.2019.1.88 |
[4] | P. A. G. M. Amarasinghe, N. S. Abeygunawardana, T. N. Jayasekara, E. A. J. P. Edirisinghe, S. K. Abeygunawardane . Ensemble models for solar power forecasting—a weather classification approach. AIMS Energy, 2020, 8(2): 252-271. doi: 10.3934/energy.2020.2.252 |
[5] | Diksha Kaur, Tek Tjing Lie, Nirmal K. C. Nair, Brice Vallès . Wind Speed Forecasting Using Hybrid Wavelet Transform—ARMA Techniques. AIMS Energy, 2015, 3(1): 13-24. doi: 10.3934/energy.2015.1.13 |
[6] | Muhammad Farhan Hanif, Muhammad Sabir Naveed, Mohamed Metwaly, Jicang Si, Xiangtao Liu, Jianchun Mi . Advancing solar energy forecasting with modified ANN and light GBM learning algorithms. AIMS Energy, 2024, 12(2): 350-386. doi: 10.3934/energy.2024017 |
[7] | Mohamed Elweddad, Muhammet Güneşer, Ziyodulla Yusupov . Designing an energy management system for household consumptions with an off-grid hybrid power system. AIMS Energy, 2022, 10(4): 801-830. doi: 10.3934/energy.2022036 |
[8] | Harry Ramenah, Philippe Casin, Moustapha Ba, Michel Benne, Camel Tanougast . Accurate determination of parameters relationship for photovoltaic power output by augmented dickey fuller test and engle granger method. AIMS Energy, 2018, 6(1): 19-48. doi: 10.3934/energy.2018.1.19 |
[9] | Bilal Akbar, Khuram Pervez Amber, Anila Kousar, Muhammad Waqar Aslam, Muhammad Anser Bashir, Muhammad Sajid Khan . Data-driven predictive models for daily electricity consumption of academic buildings. AIMS Energy, 2020, 8(5): 783-801. doi: 10.3934/energy.2020.5.783 |
[10] | Gokhan Sahin, W.G.J.H.M. Van Wilfried Sark . Estimation of most optimal azimuthal angles for maximum PV solar efficiency using multivariate adaptive regression splines (MARS). AIMS Energy, 2023, 11(6): 1328-1353. doi: 10.3934/energy.2023060 |
In an effort to deal with long-term uncertainties within the energy sector, more accurate long-term peak electricity demand forecasting methods are made use of in decision making and planning purposes. Inaccurate forecasting may seriously result in poor decision making and bad policies establishment. For instance, the South African government, electricity consumers, producers and industries need accurate forecasts for planning future events to assist in long-term development plans for the country in order to minimise risks. The long-term electricity demand forecasting methods are dependent on the choice of variable selections and very importantly, assumptions made for the long run. Researchers have dealt with long-term electricity demand forecasting using different models. For example; Optimal Forecasting Quantile Regression (OFQR), Multiple Simple Linear Regression (MSLR), Artificial Neutral Networks (ANNs), Quantile Regression (QR) and time series analysis models have been used for long-term electricity demand forecasting ([1,2,3,4,5] and [6] among others).
Short-term electricity demand forecasting has attracted substantial attention than medium and long-term forecasting [3]. However, QRA is a new approach and produces more accurate long-term point forecasts for electricity peak demand. The model also performs well in practice for both short and medium-term peak electricity demand. It also gives the smallest values based on prediction intervals (PIs). This has been confirmed in the following literature; for example, [5,7] and [8] among others.
The popularity of long-term peak electricity demand forecasting is mainly motivated by the need to understand the various uncertainties associated with the decision making processes in the South African industries. According to [9], the development of renewable energy policy activities by the South African government is greatly influenced by factors such as the periodical occurrence of different policy crises as a result of economic instability. The key interest for this study is identifying the linear relationship between the response and predictor variables using GAMs, OLS and QR models. OLS estimates quantify the relationship around the mean of the distribution while QR quantifies the relationship across all the various quantiles. Moreover, QR is robust in handling extreme value points of the response variable. Secondly, the electricity demand in South Africa is higher in winter than in summer and influenced especially by irregular events like extreme weather conditions. Also, generally electricity demand is non-stationary and exhibits an upward trend due to mainly economic and population growth. The long-term electricity demand forecasting is essential to avoid the shortage of electricity that may hamper the South African economic growth. South Africa needs reliable forecasts of electricity demand to mitigate the risk of shortages in the future so as to guarantee sustainability.
Although the short-term electricity demand is most widely applied, the long-term peak electricity demand is increasing in popularity with papers such as those by [4] who estimate long-term peak demand by comparing different techniques using a typical fast growing methodology with dynamic electricity demand characteristics and typical developing methodology. Empirical results show that the model is superior, although it uses single electricity demand observation and is capable of giving several electricity demand peak forecasts corresponding at different levels of maximum temperature and some level of social activities. However, it is not easy to rely on a single methodology for determining the electricity demand of such a fast growing and changing system.
The work done by [3] describes in a comparative manner that electricity demand forecasting is divided into three groups, namely, short, medium and long-term electricity demand forecasting. Long-term electricity demand forecasting normally deals with long time horizons. These include one year to ten years and sometimes up to three decades ahead forecast. Furthermore, it is often characterized by large uncertainties created by these distant horizons.
On modelling and forecasting long-term daily peak electricity demand, applied partially linear additive quantile regression (PLAQR) model is applied by [10] using South African data ranging from January 2007 to December 2013. Some data obtained from the 28 South African weather stations divided into coastal and inland regions are used. Variable selection is carried out using Lasso. The empirical results are presented with a comparative analysis of two developed models namely; with pairwise interactions and without interactions. Based on the pinball loss function, the average loss suffered by PLAQR with pairwise interaction is less than that of PLAQR without interaction. Moreover, based on RMSE, MAE and MAPE, PLAQR with pairwise interaction is better than PLAQR without interaction. The empirical results have also shown the usefulness of PLAQR models.
In recent paper, the proposed approaches such as GAMs, additive quantile regression (AQR) with and without interactions and QRA models are used for short-term probabilistic hourly load forecasting in [8]. The authors consider using average pinball loss function for model comparisons and also constructed PI indices such as prediction interval with nominal confidence (PINC), prediction interval widths (PIWs), prediction interval normalised average widths (PINAWs) and prediction interval normalised average deviations (PINADs) for the comperative evaluation of three different models. In their discussion, they indicated that QRA model gives the superior results as it produces the most accurate forecasts compared to other models. It also produces the smallest PINAW and PINAD values.
The study performed by [11] utilizes QR model for peak load demand forecasting to create the estimates of daily peak load demand. The proposed method is used based on its ability to avoid underprediction of the peak load demand and risk of power blackouts. The empirical results show that constructing the quantile from 0.97 to 0.99 is sufficient to estimate the upper limit for the daily peak demand. The authors proposed that the QR model performances should also be compared in other empirical data (for example, economic and financial data). However, in contrast to our study, the authors did not utilize QRA and GAMs in their analysis. This paper considers QRA approach as a comprehensive forecasting solution for long-term peak electricity demand using monthly and quarterly data.
Considering an overview on electricity demand forecasting in section 1.2, the major contributions of this study are summarised in this manner: (1) the study is concentrated on QRA methodology to forecast long-term electricity demand; (2) the developed method should be useful in producing improved forecasting results; (3) the study could answer many challenges and increase the potential to develop a strategy in generating capacity planning and system reliable assessment of dealing with long-term peak electricity demand in South Africa; (4) the long-term electricity demand forecasting solution is discussed and the best forecasting model for future peak electricity demand suggested and (5) the peak electricity demand literature seems scarce in South Africa on the issue of long-term electricity demand and this closes a gap by introducing new methodology to forecast long-term peak electricity demand.
The study considers monthly and quarterly data for peak electricity demand forecasting using quantile regression methods based on the following arguments. Firstly, the paper evaluates accuracy measures of both OLS and QR models. A comparative analysis is done using generalised additive models. The final structure of the paper is organised as follows: In Section 2, we first present the generalised additive models, followed by quantile regression models and then later explores more on additive quantile regression and quantile regression averaging. In Section 3, we present variable selection methods using cross validation (CV), Least absolute shrinkage and selection operator (Lasso) and elastic net. In Section 4, we briefly summarise the empirical results as well as discussing further the general remarks of the results. The results in this section are found using the statistical package ‘r’. Lastly, in Section 5, we wrap up with the conclusion.
Focusing on the main interest of this paper, we lay out the methodology for fitting the monthly and quarterly peak electricity demand with OLS, GAMs, QR and QRA. Then combine the forecasts using some forecast combination methods.
Generalized additive model (GAM) is developed by [12]. GAM is known to be a simple powerful technique that is easy to interpret, more flexible because the relationships between dependent and independent variables are not assumed to be linear and the regularization of predictor functions assist to avoid overfitting effect [13]. In generalized additive models the response variable is expressed as sum of smooth functions. GAMs capture the common non linear patterns which the linear regression technique fails to do. Let $ y_{t} $ denote hourly peak demand and $ \mu_{t} = E(y_{t}) $, then the model can be written as:
$ g(μt)=xx∗tΘΘ+f1(x1t)+f2(x2t)+f3(x3t,x4t)+…, $ | (2.1) |
where $ \pmb{x}^{\ast}_{t} $ is a row of the model matrix for any strictly parametric model components, $ \pmb{\Theta} $ is the corresponding parameter vector, $ f_{i} $ are smooth functions of the covariates, $ x_{it} $ and $ f_{1}, f_{2}, f_{3}... $ are estimated by maximizing a penalized log-likelihood function analogous to the penalized least squares criterion. The GAMs are built around penalised regression smoothers and one of them is called cubic spline smoother [14]. Cubic splines are smoothest interpolators that minimize the penalized least squares. However, they have many free parameters to be smoothed and this seems to be a wasteful exercise. We usually estimate $ g(x_{t}) $ by minimizing
$ n∑t=1{yt−f(xt)}2+λ∫f″(x)2dx, $ | (2.2) |
where $ \lambda $ is a non-negative smoothing parameter. If $ \lambda $ tends to infinity, the penalty term becomes more important and forces $ f^{''}(x) $ to zero. The bigger the value of $ \lambda $, the more the curves become smoother. In this paper, cubic smoothing splines as a solution to optimization problem is used.
QR was introduced by [15] and is described in detail in [16] as an extension of classical least squares estimation of conditional mean models to conditional quantile function. An updated QR theory captures this method as a particular center of the distribution, minimizing the weighted absolute sum of deviations [17]. The conditional quantile function is given by
$ Qτ(yt|xxt)=xxTtββτ+εt, $ | (2.3) |
where, $ 0 < \tau < 1 $ and $ Q_\tau (y_{t}|\pmb {x}_{t}) $ denotes the conditional function for the $ \tau^{th} $ quantile of the electricity demand. Given a set of training data $ G = (y_{t}, \pmb {x}^{T}_{t})^{T} $ for $ t = 1, ..., G $, the parameter $ \pmb{\beta}_{\tau} $ can be estimated as
$ ˆβˆβτ=argmin1G{τ∑t:yt≥xxTtββτ|yt−xxTtββτ|+(1−τ)∑t:yt<xxTtββτ|yt−xxTtββτ|}. $ | (2.4) |
The parameter $ \pmb{\widehat{\beta}}_{\tau} $ in Eq 2.4 is usually estimated by using linear programming methods called least absolute deviation (LAD) regression. If $ \tau $ = $ 0.5 $, Eq 2.4 is now given by
$ ˆβˆβτ=1GG∑t=1|yt−xxTtββτ|, $ | (2.5) |
giving the Ł1 (median regression) estimator. Eq 2.4 can also be written as
$ ˆβˆβτ=1GG∑t=1ρτ(yt−xxTtββτ)=1GG∑t=1(τ−11yt−xxTtββτ<0)(yt−xxTtββτ), $ | (2.6) |
where $ \rho_{\tau} $ is a check function such that $ \rho_{\tau}(b) = \tau(b) $ if $ b\geq0 $ and $ \rho_{\tau}(b) $ = $ (\tau-1)b $ if $ b < 0 $, $ \pmb{1}_{B} $ is the indicator function of the event B.
The study done by [18] proposes the boosting AQR procedure for forecasting uncertainty using electricity smart meter data. The authors consider boosting procedure to estimate an AQR model for a set of quantiles of the future distribution. According to [18], the $ k^{th} $-step ahead forecast for the condition at $ \tau^{th} $-quantile of the electricity demand in equation (2.3) can be estimated using the pinball loss function defined by
$ L(yt,Qτ)={τ(yt−ˆQτ)ifyt≥ˆQτ;(1−τ)(ˆQτ−yt)ifyt<ˆQτ, $ | (2.7) |
where $ \widehat{Q}_{\tau} $ denotes the quantile forecast and $ y_{t} $ represents the actual value of peak electricity demand.
QRA is another recent forecast combination technique used to compute the prediction interval. Despite its popularity and simplicity, it is noted that combining forecasts has not been performed widely in the area of long-term peak electricity demand forecasting using quarterly or monthly data. However, it has been performing well in the practice of electricity price forecasting (see, [7,19], among others). According to [18] the QRA model is given by
$ yt=gk(xxt)+εt, $ | (2.8) |
where $ \pmb{x}_{t} = (\pmb{y}_{t}, \pmb{z}_{t}) $; $ \pmb{z}_{t} $ is a vector of exogenous variables known at time $ t $; $ \pmb{y}_{t} $ is a vector of past demand occurring prior to time $ t $; $ \varepsilon_{t} $ denotes the model error term with $ E[\varepsilon_{t}] = 0 $ and $ E[\pmb{x}_{t}\varepsilon_{t}] = 0 $.
The following section briefly discusses three variable selection methods for penalised regression which includes Lasso, CV and elastic net. Lasso guarantees both variable selection and shrinkage simultaneously. Unlike other penalised regression techniques, in Lasso some of the coefficients may be shrunken all the way to zero. The Lasso estimates for the OLS regression model is given by
$ ^ββ=argminp∑j=1(yy−xxββj)2+λ1p∑j=1|ββj|, $ | (3.1) |
where, $ \pmb{y} = (\pmb{y}_{1}, ..., \pmb{y}_{n}^{T} $ is the response vector and $ \pmb{x}_{j} = (\pmb{x}_{1j}, ..., \pmb{x}_{nj})^{T}, j = 1, ..., p $ are the linear independent predictor, $ \lambda $ is the Lasso regularization parameter, $ \sum^{p}_{j = 1}|\pmb{\beta}_{j}|\leq t $ is called Lasso penalty and $ t $ is the Lasso turning parameter [20]. If $ p > n $, the Lasso selects at most $ n $ variables. There is however an adaptive Lasso method and this method modifies the Lasso penalty by applying weights to each parameter that forms the Lasso constraint. Elastic net is a regularization and variable selection method suggested by [21]. The method removes the limitation on the number of selected variables and also encourages grouping effect that Lasso fails to do. The elastic net estimator is given by
$ ^ββ=argminp∑j=1(yy−xxββj)2+λ1p∑j=1|ββj|+λ2p∑j=1ββ2j, $ | (3.2) |
where, $ \sum^{p}_{j = 1}|\pmb{\beta}_{j}|\leq t_{1} $ and $ \sum^{p}_{j = 1}\pmb{\beta}^{2}_{j} < t_{2} $ are elastic net penalties, $ t_{1} $ and $ t_{2} $ are elastic net turning parameters. Cross validation is another most commonly used selection criterion method that uses part of the training data set to fit the model and the remaining part estimate the prediction error. The data consist of $ 28 $ South African weather stations divided into interior and coastal regions with monthly and quarterly peak electricity demand (MPED and QPED) data, temperature, lagged demand and calendar effects. The MPED and QPED are used as the dependent variables while temperature, lagged demand and calendar effects are used as predictor variables. For MPED, the temperature effects for coastal are AMTC, maxTC and minTC defined as average monthly coastal, average maximum and minimum coastal temperatures respectively. The effects for interior include AMTI, maxTI and minTI defined as average monthly interior, average maximum and minimum interior temperatures respectively.
Likewise, for QPED the temperature effects for coastal are AQTC, maxTC and minTC defined as average quarterly coastal, average maximum and minimum coastal temperatures respectively. The effects for quarterly interior include AQTI, maxTI and minTI defined as average quarterly interior, average maximum and minimum interior temperatures respectively. The temperature effects for both monthly and quarterly peak electricity demand coastal and interior are given by average minimum of coastal and interior temperatures (AminTCI), average of average coastal and interior temperatures (AATCI), average maximum of coastal and interior temperatures (AmaxTCI), difference between average maximum of coastal and interior temperatures (DmaxTCI), difference between average minimum of coastal and interior temperatures (DminTCI) and the difference between average of average monthly coastal and interior temperatures (DAAMTCI) and the difference between average of average quarterly coastal and interior temperatures (DAAQTCI). Furthermore, day type (day of the week), day before holiday (DBH), day after holiday (DAH) and day holiday (DH) are defined as calender effects.
In forecasting electricity demand at inland and coastal regions, we use long-term peak electricity demand models. The QR models are used for both monthly and quarterly data. This study will explore the inclusion of monthly temperature covariates that will also be included in the peak electricity demand parameters through heating and cooling degree days which can be calculated as:
$ HDDt=max(Tref−Tt,0) $ |
and
$ CDDt=max(Tt−Tref,0). $ |
The total monthly heating degree-days ($ \mbox{MHDD}_t $) and the total monthly cooling degree-days ($ \mbox{MCDD}_t $) are calculated as:
$ MHDDt=max[m∑j=1HDDt,j−m∑j=1CDDt,j,0] $ |
and
$ MCDDt=max[m∑j=1CDDt,j−m∑j=1HDDt,j,0] $ |
which reduces to
$ MHDDt=max[m∑j=1(max(Tref−Tt,j,0))−m∑j=1(max(Tt,j−Tref,0)),0] $ |
and
$ MCDDt=max[m∑j=1(max(Tt,j−Tref,0))−m∑j=1(max(Tref−Tt,j,0)),0], $ |
respectively, where $ m $ is the number of days in month $ t $, $ T_t $ is average daily temperature on day $ t $ and $ T_{\mbox{ref}} $ is the reference temperature which separates cold temperatures from hot temperatures. Based on the description of variable selection given in section 3.1, the following MPED models for electricity demand are proposed:
Model 1:
Linear quantile regression (LQR) without interactions model is given as
$ Qτ(MPED|xxt)=α0+α1(DH)+α2(DAH)+α3(DBH)+α4(Daytype)+α5(CDDAminTCI)+α6(CDDAmaxTCI)+α7(HDDAAMTCI)+α8(HDDAminTCI)+α9(noltrend)+εt,τ, $ | (3.3) |
where $ \alpha_{0}, \alpha_{1}, ..., \alpha_{9} $ are constants, CDDAminTCI and CDDAmaxTCI are cooling degree days average minimum and maximum coastal and interior temperatures, HDDAAMTCI is monthly heating degree days average of average coastal and interior temperature, HDDAminTCI is heating degree days average minimum coastal and interior temperature, noltrend is a non-linear trend, $ \varepsilon_{t, \tau} $ is residual error.
Model 2: LQR with interactions model is given by
$ Qτ(MPED|xxt)=α0+α1(DAH)+α2(DBH)+α3(Daytype)+α4(CDDAminTCI∗Daytype)+α5(CDDAmaxTCI)+α6(HDDAAMTCI)+α7(HDDAminTCI)+α8(noltrend)+εt,τ. $ | (3.4) |
Model 3: QRA is given in Eq 3.5
$ MPED=α0+α1(fOLS)+α2(fGAM)+α3(fQR)+εt,τ, $ | (3.5) |
where fOLS, fGAM and fQR are ordinary least squares, generalised additive model and quantile regression forecasts respectively.
Similarly we can drive the formulae for calculating the total quarterly heating and cooling degree-days $ QHDD_{t} $ and $ QCDD_{t} $ respectively as in the monthly formulae. The quarterly temperature covariate will be included in the long-term peak electricity demand parameters through heating and cooling degree-days. Based on the description of variable selection given in section 3.1, the following QPED models for electricity demand are proposed:
Model 4: LQR without interactions model is given as
$ Qτ(QPED|xxt)=α0+α1(DH)+α2(DAH)+α3(Daytype)+α4(CDDAminTCI)+α5(CDDAmaxTCI)+α6(CDDAAQTCI)+α7(noltrend)+εt,τ, $ | (3.6) |
where HDDAAQTCI is quarterly heating degree days average of average coastal and interior temperature.
Model 5: LQR with interactions model is given as
$ Qτ(QPED|xxt)=α0+α1(DH)+α2(DAH)+α3(Daytype)+α4(CDDAminTCI)+α5(CDDAmaxTCI)+α6(HDDAAQTCI∗Dytype)+α7(noltrend)+εt,τ. $ | (3.7) |
Model 6: QRA is given in equation (3.8)
$ QPED=α0+α1(fOLS)+α2(fGAM)+α3(fQR)+εt,τ. $ | (3.8) |
The QR model defined in Eq 2.3 will be helpful in this analysis. The parameter for OLS over $ \pmb {\beta} $ is estimated by minimizing:
$ n∑t=1r(yt−xxTtββ)=n∑t=1(yt−xxTtββ)2, $ | (3.9) |
where $ r $ is the quadratic loss function and the parameter for MLE over $ \pmb {\beta} $ based on the sample $ \{\pmb {x}_{t}, y_{t}\}_{t = 1}^{n} $ is calculated by maximizing
$ L(β)∝exp{−12σ2n∑t=1(yt−xxTtββ)2}. $ | (3.10) |
Stochastic gradient boosting (SGB) is a modification of Gradient boosting (GB) which is a machine learning technique. It fits an additive model in a stage-wise way. The additive model can take the form given in Eq 3.11 ([22]).
$ f(x)=M∑m=1βmb(x;γm), $ | (3.11) |
where $ b(x; \gamma_m) \in \mathbb{R} $ are functions of $ x $ which are characterised by the expansion parameters $ \gamma_m $, $ \beta_m $. The parameters $ \beta_m $ and $ \gamma_m $ are fitted in a stage-wise way, a process which slows down over-fitting ([22]). With SGB, a random sample of the training data set is taken without replacement. A more detailed discussion of this is found in [23].
Support vector regression (SVR) which is based on support vector machines (SVM) uses different kernel functions which map low dimensional data to high dimensional space. SVR was introduced by [24] and estimates a regression function of the form given in Eq (3.12).
$ f(x)=xTw+b, $ | (3.12) |
where $ b $ is a scalar and $ \omega $ is a vector of weights. Eq (3.12) can be reformulated as a quadratic programming problem (QPP) by introducing an $ \varepsilon $-insensitive loss function as given in Eq (3.13) ([24]).
$ min(ω,b,ξ1,ξ2)12ωTω+C(eTξ1+eTξ2)s.t. yi−xTiω−b≤ε+ξ1xTiω+b−yi≤ε+ξ2and ξ1i,ξ2i≥0,i=1,...,m, $ | (3.13) |
where $ C > 0, \varepsilon > 0 $ are input parameters, $ \xi_1 = (\xi_{11}, ..., \xi_{1m})^T $ and $ \xi_2 = (\xi_{21}, ..., \xi_{2m}) $ are slack variables.
In this study, the South African monthly and quarterly electricity demand data from Eskom ranging from January 2000 to March 2014 are used. The agammaegate energy data with 5204 daily observations are used to create 171 monthly and 57 quarterly data for analysis. Both training data (Jan 2000–Dec 2008) and testing (Jan 2009–Mar 2014) are used for MPED and QPED modelling. The temperature data for coastal and interior regions from 28 South African weather stations are considered. The study uses the most common variables that have been included in the electricity demand models which include, average monthly and quarterly electricity demand, non linear trend and temperature.
It further analyses three different benchmark models which are GAMs, SVR and SGB. The use of at least two benchmark models is consistent with current trends in forecasting. Current trends in forecasting use a variety of models, statistical models, including comparative analysis with machine learning models. In this study we use one statistical learning benchmark model which is the generalized additive model and two machine learning models, stochastic gradient boosting and support vector regression. Using at least two benchmark models helps us to see how good our model is against well-known methods which produce accurate forecasts. See for instance, [25].
Figures 1 and 2 show monthly and quarterly peak electricity demand (MPED and QPED) plots for the period 2000 to 2014 with density, q-q and box plots respectively. Both MPED and QPED plots show upward trends with strong seasonality. The presence of increasing trends are indicated by the overall upward drifts in the time series. The seasonal variations produce the regular fluctuations from year to year but appear similar across months and quarters.
Table 1 shows the summary statistics for both monthly and quarterly electricity demand data ranging from 1 January 2000 to 31 March 2014. The minimum and maximum values are compared and also used to asses the spread behaviour of both quarterly and monthly electricity demand. The minimum values of MPED and QPED from January–March 2000 and on Monday, January 2000 are 24083 and 24760 MW respectively. Moreover, the maximum value of both MPED and QPED from April-June 2007 and also on Thursday, 24 May 2007 is 37158 MW. The kurtosis values of MPED and QPED are -0.31 and -0.21 respectively. The values indicate that both MPED and QPED data do not perfectly follow the normal distribution. Furthermore, negative kurtosis shows that the distribution has lighter tails and flatter peaks compared to the normal distribution. Table 1 also reveals that both monthly and quarterly electricity demand data are left skewed.
Tables 2 and 3 present the summary results of OLS forecast and QR, QRA, GAMs forecasts for both MPED and QPED models based on RMSE, MAE and MAPE respectively. In Table 2, M1 and M2 represent OLS forecast without and with interactions, respectively, for MPED model, while M4 and M5 represent OLS forecast without and with interactions, respectively, for QPED model.
Likewise, in Table 3, M1 and M2 represent linear QR forecast without and with interactions, respectively, for MPED model while M4 and M5 represent linear QR forecast without and with interactions, respectively, for QPED model, M3 and M6 are QRA forecast for MPED and QPED model respectively, M7 and M8 are GAMs forecast for both MPED and QPED models. The values of RMSE = 79.95 MW, MAE = 72.45 MW and MAPE = 0.22 MW for M4 in Table 2 are less compared to that of M1, M2 and M5 models. Likewise, the values of RMSE = 76.55 MW, MAE = 51.61 MW and MAPE = 0.16 MW for M6 in Table 3 are less compared to that of M1, M2, M3, M4, M5, M7 and M8 models. Moreover, the values of RMSE, MAE and MAPE for M6 (QRA for QPED) are less compared to that of M4 (OLS for MPED). The results of RMSE, MAE and MAPE for M6 (QRA for QPED) also confirm that the model is doing well compared to that of M3 (QRA for MPED). In addition, the values of RMSE, MAE and MAPE for benchmark (SVR and SGB) models in Table 4 are larger than that of M6 model.
Figures 8 and 10 show the results of cross validation estimates of the mean squared prediction error for ridge, Lasso and elastic net models in dependence on the regularization parameter (log of lamda). The numbers on top of each figures shows the number of non-zero regression coefficients. The mean squared prediction errors suggest the smaller values of the parameter that shrinks the coefficients to optimal. Figures 9 and 11 show the coefficient paths for ridge, Lasso and elastic net model independence on log of lamda for $ \alpha $ = 0, 1 and 0.5 respectively. The analysis was also done repeated using different values of $ \alpha $.
The comparison of MPED cross validation estimates of the mean squared prediction error for ridge, lasso and elastic net in Figure 8 shows that the variable selection method called Lasso is superior compared to other methods. There are also noticeable differences in cross validation estimates for all the methods for which Lasso gives the best estimates. Using Lasso and elastic net regression model for model selection, only one non-zero coefficient shows that the function has chosen the second vertical line on the cross validation. This shows that the model might be doing a good job. In Figure 9, all coefficients of ridge regression model are essentially zero when the log of lamda is 15. However, as we relax lamda, the coefficients moves away from zero in a nice smooth way. In fitting Lasso, a lots of the r squared what are explained as quite heavily shrunk coefficients but towards the end coefficients increases (grow very large). This may suggest that at the end of the path, the model might be overfitted. Furthermore, elastic net regression model looks like Lasso with slight differences and this confirms that it is an extension of Lasso proposed by [20]. However, elastic net does not perform satisfactory because there are two shrinkage procedures such as ridge and Lasso in it. Double shrinkage might introduce unnecessary bias. Likewise, Figure 10 shows the QPED model selection of ridge, Lasso and elastic net where the dotted red lines show the cross validation curve. The two dashed lines in Figure 10 also show log of lambda values that gives the minimum mean square errors for cross validation (left dashed line) and that is within one standard error(right dashed line). Figure 11 shows all coefficients of ridge regression model to be zeros when the value of lambda is 14 or more. It can be seen from figure 11 that Lasso regression shrinks the regression coefficients to zero as lambda increases. For instance, when log of lambda $ = 2 $ there are 6 non zero coefficients and when log of lambda $ = 0 $ all coefficients are zero.
Mean | Median | Min | Max | Kurtosis | Skewness | St.Dev | |
Monthly data | 31560 | 31822 | 24083 | 37158 | -0.31 | -0.42 | 30.44 |
Quarterly data | 32452 | 32675 | 24760 | 37158 | -0.21 | -0.54 | 30.71 |
M1 | M2 | M4 | M5 | |
RMSE | 327.91 | 338.12 | 79.95 | 172.17 |
MAE | 252.06 | 262.83 | 72.45 | 128.41 |
MAPE | 0.77 | 0.80 | 0.22 | 0.37 |
M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | |
RMSE | 324.89 | 345.38 | 315.46 | 87.92 | 87.97 | 76.55 | 315.96 | 165.94 |
MAE | 249.31 | 273.41 | 231.70 | 73.55 | 73.64 | 51.61 | 235.01 | 123.57 |
MAPE | 0.76 | 0.83 | 0.71 | 0.22 | 0.22 | 0.16 | 0.72 | 0.36 |
GAMs are considered as simple and powerful technique because they provide more flexible predictor functions that uncover hidden patterns in the data and regularize the predictor functions. Moreover, this is one of the techniques that is mostly indispensable in the analysis of long-term peak electricity demand forecasting. This machine learning technique is considered as one of benchmark models in MPED and QPED analysis.
Model 7: Monthly data model GAM is given as
$ MPED=α0+s(Daytype)+α1(DH)+α2(CDDAminTCI)+α3(CDDAmaxTCI)+α4(HDDAAMTCI)+α5(HDDAminTCI)+s(noltrend)+εt,τ, $ | (4.1) |
Model 8: Quarterly data model GAM is given as
$ QPED=α0+α1(DH)+α2(DAH)+s(Daytype)+α3(CDDAminTCI)+α4(CDDAmaxTCI)+α5(HDDAAQTCI∗Daytype)+s(noltrend)+εt,τ, $ | (4.2) |
where $ s $ represents smoothing.
SVR (benchmark) | SGB(benchmark) | |
RMSE | 398.35 | 728.19 |
MAE | 325.73 | 558.98 |
MAPE | 0.98 | 1.66 |
Figures 1 and 2 display monthly and quarterly peak electricity demand plots for the period 2000 to 2014 respectively. Hence, Figure 3 shows monthly and quarterly average temperature plots. It also shows strong seasonal effects. The smallest values of RMSE, MAE and MAPE in OLS forecast model (M4) for both MPED and QPED are 79.95 MW, 72.45 MW and 0.22 MW respectively. Likewise, the smallest RMSE, MAE and MAPE in QRA model (M6) are 76.5 MW, 51.61 MW and 0.16 MW respectively.
Table 4 shows the error levels (RMSE = 398.35 MW, MAE = 325.73 MW, MAPE = 0.98 MW) of SVR to be lower than the error level (RMSE = 728.19 MW, MAE = 558.98 MW, MAPE = 166 MW) of SGB model. Figures 6 and 7 show the SVR and SGB benchmark model forecasts for MPED. Clearly, we can see that SVR forecasts in Figure 6 fit very well compared to SGB forecasts in Figure 7. Figures 4 and 5 show the actuals of both MPED and QPED with quantile regression forecast (fQR), generalised additive model forecast (fGAM) and quantile regression averaging forecast (fQRA) respectively. Figure 5 shows fQRA fits very well in QPED data.
In this study the long-term monthly and quarterly peaks electricity demand forecasting using South African data have been presented and therefore the policies to control the supply of electricity are required to make sure that the electricity demand is met to support the South African community. By using monthly and quarterly data and then later applying QR models, the more precise long-term forecasting models are developed. The QR models are becoming more attractive to long-term electricity demand modelling this days as it enables us to construct new quantile functions easily. We believe this study will make important contribution in all research application areas more especially in short, medium or long-term peak electricity demand methodology. QRA model in QPED has the smallest RMSE, MAE and MAPE compared to other models. The Lasso model gives best estimates compared to other variable selection methods. The empirical results of proposed models are suitable not just for long-term electricity demand forecasting, but also for testing policies geared towards the expansion of the electricity infrastructure that should be intensified in South Africa in order to cope with the increasing demand exercised by the country's economic growth and rapid industrialisation programme. QRA is a new technique and performed well in QPED data for long-term electricity demand and it did not perform bad in MPED data also. However, it is not advisable to rely only on a single approach for determining the electricity demand in such a fast growing and changing system of South African economy.
The authors are grateful to the National Research Foundation of South Africa for funding this research, to Eskom, South Africa's power utility company for providing the data, University of Limpopo for their resources and to the numerous people for helpful comments on this paper.
The authors declare no conflict of interest.
[1] |
Roth GA, Huffman MD, Moran AE, et al. (2015) Global and regional patterns in cardiovascular mortality from 1990 to 2013. Circulation 132: 1667–1678. doi: 10.1161/CIRCULATIONAHA.114.008720
![]() |
[2] |
Woolcott OO, Castillo OA, Gutierrez C, et al. (2014) Inverse association between diabetes and altitude: a cross-sectional study in the adult population of the United States. Obesity 22: 2080–2090. doi: 10.1002/oby.20800
![]() |
[3] |
Miele CH, Schwartz AR, Gilman RH, et al. (2016) Increased Cardiometabolic Risk and Worsening Hypoxemia at High Altitude. High Alt Med Biol 17: 93–100. doi: 10.1089/ham.2015.0084
![]() |
[4] | Ramos DA, Kruger H, Muro M, et al. (1967) Patologia del hombre nativo de las grandes alturas: investigacion de las causos de muerte en 300 autopsias, Bol Of Sanit Panamn. 62: 497–501. |
[5] | Niermeyer S, Zamdio S, Moore LG (2001) The people, In High altitude: an exploration of human adaptation, ed. TF Hornbein, RB Schoene, New York: Marcel Dekker, 42–100. |
[6] |
De Ferrari A, Miranda JJ, Gilman RH, et al. (2014) Prevalence, clinical profile, iron status and subject-specific traits for excessive erythrocytosis in Andean adults living permanently at 3825 meters above sea level. Chest 146: 1327–1336. doi: 10.1378/chest.14-0298
![]() |
[7] |
Horscroft JA, Kotwica AO, Laner V, et al. (2017) Metabolic basis to Sherpa altitude adaptation. Proc Natl Acad Sci USA 114: 6382–6387. doi: 10.1073/pnas.1700527114
![]() |
[8] |
Hirschler V, Maccallini G, Aranda C, et al. (2012) Dyslipidemia without obesity in indigenous Argentinean children living at high altitude. J. Pediatr 161: 646–651. doi: 10.1016/j.jpeds.2012.04.008
![]() |
[9] | http://www.censo2010.indec.gov.ar/ Accessed September 3, 2015. INDEC, 2010. Instituto Nacional de Estadística y Censos de la Republica Argentina (Accessed February, 2017). |
[10] | National High Blood Pressure Education Program Working Group on High Blood Pressure in Children and Adolescents. The 4th Report on the Diagnosis, Evaluation and Treatment of High Blood Pressure in Children and Adolescents. Pediatrics 2004; 114 (Suppl. 2, 4th Report): 555–576. |
[11] | Kuczmarski RJ, Ogden CL, Guo SS, et al. (2002) 2000 CDC growth charts for the United States: methods and development. Vital Health Stat 246: 1–190. |
[12] | National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III).Third Report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) Final Report. Circulation 2002;106: 3143–3421. |
[13] |
Sherpa LY, Deji, Stigum H, et al. (2010) Obesity in Tibetans aged 30-70 living at different altitudes under the north and south faces of Mt. Everest. Int J Environ Res Public Health 7: 1670–1680. doi: 10.3390/ijerph7041670
![]() |
[14] |
Hansen JC, Gilman AP, Odland JO (2010) Is thermogenesis a significant causal factor in preventing the "globesity" epidemic? Med Hypotheses 75: 250–256. doi: 10.1016/j.mehy.2010.02.033
![]() |
[15] |
Sierra-Johnson J, Romero-Corral A, Somers VK, et al. (2008) Effect of altitude on leptin levels, does it go up or down? J Appl Physiol 105: 1684–1685. doi: 10.1152/japplphysiol.01284.2007
![]() |
[16] | van den Elzen APM, de Ridder MAJ, Grobbee DE et al. (2004) Families and the natural history of blood pressure: A 27-year follow-up study. Am J Hypertens 17: 936–940. |
[17] |
Kivimäki M, Lawlor DA, Smith GD, et al. (2006) Early socioeconomic position and blood pressure in childhood and adulthood: the Cardiovascular Risk in Young Finns Study. Hypertension 47: 39–44. doi: 10.1161/01.HYP.0000196682.43723.8a
![]() |
[18] |
de Moraes AC, Lacerda MB, Moreno LA, et al. (2014) Prevalence of high blood pressure in 122,053 adolescents: a systematic review and meta-regression. Medicine (Baltimore) 93: e232. doi: 10.1097/MD.0000000000000232
![]() |
[19] |
Jafar TH, Islam M, Poulter N, et al. (2005) Children in South Asia have higher body mass-adjusted blood pressure levels than white children in the United States: a comparative study. Circulation 111: 1291–1297. doi: 10.1161/01.CIR.0000157699.87728.F1
![]() |
[20] |
Lane DA, Gill P (2004) Ethnicity and tracking blood pressure in children. J Hum Hypertens 18: 223–228. doi: 10.1038/sj.jhh.1001674
![]() |
[21] |
Wolfel EE, Selland MA, Mazzeo RS, et al. (1994) Systemic hypertension at 4300 m is related to sympathoadrenal activity. J Appl Physiol 76: 1643–1650. doi: 10.1152/jappl.1994.76.4.1643
![]() |
[22] | Howden R, Kleeberger SR (2012) Genetic and environmental influences on gas exchange. Compr Physiol 2: 2595–2614. |
[23] |
Verberk WJ, Mieke S (2016) Are blood pressure monitors affected by high altitude? Heart Asia 8: 52–53. doi: 10.1136/heartasia-2016-010814
![]() |
[24] |
Faeh D, Moser A, Panczak R, et al. (2016) Independent at heart: persistent association of altitude with ischaemic heart disease mortality after consideration of climate, topography and built environment. J Epidemiol Community Health 70: 798–806. doi: 10.1136/jech-2015-206210
![]() |
[25] |
Fiori G, Facchini F, Pettener D, et al. (2000) Relationships between blood pressure, anthropometric characteristics and blood lipids in high- and low-altitude populations from Central Asia. Ann Hum Biol 27: 19–28. doi: 10.1080/030144600282343
![]() |
[26] |
Mohanna S, Baracco R, Seclén S (2006) Lipid profile, waist circumference, and body mass index in a high altitude population. High Alt Med Biol 7: 245–255. doi: 10.1089/ham.2006.7.245
![]() |
[27] | Millán J, Pintó X, Muñoz A, et al. (2009) Lipoprotein ratios: Physiological significance and clinical usefulness in cardiovascular prevention. Vasc Health Risk Manag 5: 757–765. |
[28] |
McLaughlin T, Abbasi F, Cheal K, et al. (2003) Use of metabolic markers to identify overweight individuals who are insulin resistant. Ann Intern Med 139: 802–809. doi: 10.7326/0003-4819-139-10-200311180-00007
![]() |
[29] |
Hanak V, Munoz J, Teague J, et al. (2004) Accuracy of the triglyceride to high-density lipoprotein cholesterol ratio for prediction of the low-density lipoprotein phenotype B. Am J Cardiol 94: 219–222. doi: 10.1016/j.amjcard.2004.03.069
![]() |
[30] |
Miele CH, Schwartz AR, Gilman RH, et al. (2016) Increased Cardiometabolic Risk and Worsening Hypoxemia at High Altitude. High Alt Med Biol 17: 93–100. doi: 10.1089/ham.2015.0084
![]() |
[31] | Bailey DM, Davies B, Young I, et al. (2001) A potential role for free radical-mediated skeletal muscle soreness in the pathophysiology of acute mountain sickness. Aviat Space Environ Med 72: 513–521. |
[32] |
Jefferson JA, Simoni J, Escudero E, et al. (2004) Increased oxidative stress following acute and chronic high altitude. High Alt Med Biol 5: 61–69. doi: 10.1089/152702904322963690
![]() |
[33] |
Askew EW (2002) Work at high altitude and oxidative stress: antioxidant nutrients. Toxicology 180: 107–119. doi: 10.1016/S0300-483X(02)00385-2
![]() |
[34] |
Zhang JZ, Behrooz A, Ismail-Beigi F (1999) Regulation of glucose transport by hypoxia. Am J Kidney Dis 34: 189–202. doi: 10.1016/S0272-6386(99)70131-9
![]() |
1. | Luis F.M. Sepulveda, Petterson S. Diniz, João O.B. Diniz, Stelmo M.B. Netto, Carolina L.S. Cipriano, Alexandre C. Araújo, Victor H.B. Lemos, Alexandre C.P. Pessoa, Darlan B.P. Quintanilha, João D.S. Almeida, Aristófanes C. Silva, Anselmo C. Paiva, Geraldo Braz, Márcia I.A. Silva, Eliana M.G. Monteiro, Italo F.S. Silva, Eduardo C. Fernandes, Forecasting of individual electricity consumption using Optimized Gradient Boosting Regression with Modified Particle Swarm Optimization, 2021, 105, 09521976, 104440, 10.1016/j.engappai.2021.104440 |
Mean | Median | Min | Max | Kurtosis | Skewness | St.Dev | |
Monthly data | 31560 | 31822 | 24083 | 37158 | -0.31 | -0.42 | 30.44 |
Quarterly data | 32452 | 32675 | 24760 | 37158 | -0.21 | -0.54 | 30.71 |
M1 | M2 | M4 | M5 | |
RMSE | 327.91 | 338.12 | 79.95 | 172.17 |
MAE | 252.06 | 262.83 | 72.45 | 128.41 |
MAPE | 0.77 | 0.80 | 0.22 | 0.37 |
M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | |
RMSE | 324.89 | 345.38 | 315.46 | 87.92 | 87.97 | 76.55 | 315.96 | 165.94 |
MAE | 249.31 | 273.41 | 231.70 | 73.55 | 73.64 | 51.61 | 235.01 | 123.57 |
MAPE | 0.76 | 0.83 | 0.71 | 0.22 | 0.22 | 0.16 | 0.72 | 0.36 |
SVR (benchmark) | SGB(benchmark) | |
RMSE | 398.35 | 728.19 |
MAE | 325.73 | 558.98 |
MAPE | 0.98 | 1.66 |
Mean | Median | Min | Max | Kurtosis | Skewness | St.Dev | |
Monthly data | 31560 | 31822 | 24083 | 37158 | -0.31 | -0.42 | 30.44 |
Quarterly data | 32452 | 32675 | 24760 | 37158 | -0.21 | -0.54 | 30.71 |
M1 | M2 | M4 | M5 | |
RMSE | 327.91 | 338.12 | 79.95 | 172.17 |
MAE | 252.06 | 262.83 | 72.45 | 128.41 |
MAPE | 0.77 | 0.80 | 0.22 | 0.37 |
M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | |
RMSE | 324.89 | 345.38 | 315.46 | 87.92 | 87.97 | 76.55 | 315.96 | 165.94 |
MAE | 249.31 | 273.41 | 231.70 | 73.55 | 73.64 | 51.61 | 235.01 | 123.57 |
MAPE | 0.76 | 0.83 | 0.71 | 0.22 | 0.22 | 0.16 | 0.72 | 0.36 |
SVR (benchmark) | SGB(benchmark) | |
RMSE | 398.35 | 728.19 |
MAE | 325.73 | 558.98 |
MAPE | 0.98 | 1.66 |