
This study tried to demonstrate the role of Time Series models in a modeling and forecasting process using publicly available long-term records of monthly global price of bananas during the period of January 1990 to November 2021 reported in the International Monetary Fund. Following the Box–Jenkins methodology, an ARIMA (2,1,4) with a drift model was selected as the best-fit model for the Time Series, according to its lowest AIC value. Using the Levenberg-Marquardt algorithm, the results revealed that the NARNN model with 12 neurons in the hidden layer and 6 times delays provided the best performance in the nonlinear autoregressive neural network models at its smaller MSE value. The ARIMA and NARNN models are good at modelling linear and nonlinear problems for the Time Series, respectively. However, using the HYBRID model, a combination of the ARIMA and NARNN models that has both linear and nonlinear modeling capabilities can be a better choice for modeling the Time Series. The comparative results revealed that the HYBRID model with 11 neurons in the hidden layer and 3 times delays yielded higher accuracy than the NARNN model with 12 neurons in the hidden layer and 6 times delays, and the ARIMA (2,1,4) with a drift model, according to its lowest MSE in this study.
Citation: Yeong Nain Chi, Orson Chi. Time Series forecasting global price of bananas using Hybrid ARIMA-NARNN model[J]. Data Science in Finance and Economics, 2022, 2(3): 254-274. doi: 10.3934/DSFE.2022013
[1] | Ming Li, Ying Li . Research on crude oil price forecasting based on computational intelligence. Data Science in Finance and Economics, 2023, 3(3): 251-266. doi: 10.3934/DSFE.2023015 |
[2] | Kexian Zhang, Min Hong . Forecasting crude oil price using LSTM neural networks. Data Science in Finance and Economics, 2022, 2(3): 163-180. doi: 10.3934/DSFE.2022008 |
[3] | Tsumbedzo Mashamba, Modisane Seitshiro, Isaac Takaidza . A comprehensive high pure momentum equity timing framework using the Kalman filter and ARIMA forecasting. Data Science in Finance and Economics, 2024, 4(4): 548-569. doi: 10.3934/DSFE.2024023 |
[4] | Ezekiel NN Nortey, Edmund F. Agyemang, Enoch Sakyi-Yeboah, Obu-Amoah Ampomah, Louis Agyekum . AI meets economics: Can deep learning surpass machine learning and traditional statistical models in inflation time series forecasting?. Data Science in Finance and Economics, 2025, 5(2): 136-155. doi: 10.3934/DSFE.2025007 |
[5] | Yunus Emre Gur . Development and application of machine learning models in US consumer price index forecasting: Analysis of a hybrid approach. Data Science in Finance and Economics, 2024, 4(4): 469-513. doi: 10.3934/DSFE.2024020 |
[6] | Qian Shen, Yifan Zhang, Jiale Xiao, Xuhua Dong, Zifei Lin . Research of daily stock closing price prediction for new energy companies in China. Data Science in Finance and Economics, 2023, 3(1): 14-29. doi: 10.3934/DSFE.2023002 |
[7] | Wojciech Kuryłek . Can we profit from BigTechs' time series models in predicting earnings per share? Evidence from Poland. Data Science in Finance and Economics, 2024, 4(2): 218-235. doi: 10.3934/DSFE.2024008 |
[8] | Samuel Asante Gyamerah, Collins Abaitey . Modelling and forecasting the volatility of bitcoin futures: the role of distributional assumption in GARCH models. Data Science in Finance and Economics, 2022, 2(3): 321-334. doi: 10.3934/DSFE.2022016 |
[9] | Alejandro Rodriguez Dominguez, Om Hari Yadav . A causal interactions indicator between two time series using extreme variations in the first eigenvalue of lagged correlation matrices. Data Science in Finance and Economics, 2024, 4(3): 422-445. doi: 10.3934/DSFE.2024018 |
[10] | Morten Risstad, Mathias Holand . On the relevance of realized quarticity for exchange rate volatility forecasts. Data Science in Finance and Economics, 2024, 4(4): 514-530. doi: 10.3934/DSFE.2024021 |
This study tried to demonstrate the role of Time Series models in a modeling and forecasting process using publicly available long-term records of monthly global price of bananas during the period of January 1990 to November 2021 reported in the International Monetary Fund. Following the Box–Jenkins methodology, an ARIMA (2,1,4) with a drift model was selected as the best-fit model for the Time Series, according to its lowest AIC value. Using the Levenberg-Marquardt algorithm, the results revealed that the NARNN model with 12 neurons in the hidden layer and 6 times delays provided the best performance in the nonlinear autoregressive neural network models at its smaller MSE value. The ARIMA and NARNN models are good at modelling linear and nonlinear problems for the Time Series, respectively. However, using the HYBRID model, a combination of the ARIMA and NARNN models that has both linear and nonlinear modeling capabilities can be a better choice for modeling the Time Series. The comparative results revealed that the HYBRID model with 11 neurons in the hidden layer and 3 times delays yielded higher accuracy than the NARNN model with 12 neurons in the hidden layer and 6 times delays, and the ARIMA (2,1,4) with a drift model, according to its lowest MSE in this study.
Like COVID-19, Tropical Race 4 (TR4), also known as Panama Disease, has adversely affected the global banana industry. Recently, TR4 has suddenly accelerated, spreading from Asia to Australia, the Middle East, Africa and Latin America, where the majority of the bananas are shipped globally. Currently, TR4 is now in more than 20 countries, prompting fears of a banana pandemic and shortages of the world's favorite fruit bananas (Altendorf, 2019).
Bananas, the world's most popular fruit with different tastes, sizes, and colors, are the fourth most important food crop after wheat, rice, and maize in terms of production, and they are the world's favorite fruit in terms of consumption quantity (Ruiz et al., 2017). Excluding plantains, 22.7 million tons of bananas were traded, which was almost 20% of global production in 2017. The value of this trading was $11 billion, which is higher than the export value of any other exported fruit (Voora and Bermudez, 2020).
Asia is the largest banana-producing region, while India and China are the two leadings banana-producing countries. The Asia-Pacific is the market leader with a 61% share of global consumption, while India is the world's leading producer of bananas by accounting for nearly 25.7% of the total output. The Philippines consolidated its position as the second largest exporter of bananas behind Ecuador (Voora and Bermudez, 2020).
Latin America and the Caribbean are the largest exporting regions, responsible for approximately 80% of global exports. The global banana exports were estimated at 23.3 million tons in 2018, and Ecuador accounted for 24.7% of these global exports, as the largest exporter of bananas. Belgium, Costa Rica, and Columbia were the other top banana exporters in the world, while the United States was the leading importer of bananas with an 18% share of the world's imports (Voora and Bermudez, 2020).
The retail value of the banana sector was estimated to be between $20 billion and $25 billion in 2016 (Voora and Bermudez, 2020). According to the Market Reports World (2019), the global banana sector should experience a compound annual growth of 1.21% in consumption from 2019 to 2024, reaching a global consumption volume of 136.0 million tons by 2025, compared to 116.2 million tons in 2017.
According to the U.S. Bureau of Labor Statistics, the average consumer price of bananas (all fresh, traditional, first-quality organic and non-organic yellow bananas) was $0.53 (per pound on average in U.S.) with a standard deviation of $0.06 (minimum: $0.40, maximum: $0.64, and median: $0.51). Furthermore, the average consumer price for bananas was 39.91% higher in 2020 versus 1990 (a $13.29 difference in value). Between 1990 and 2020, bananas costing $33.30 in 1990 cost $46.59 in 2020 for an equivalent purchase. Compared to the overall inflation rate of 2.30% during this same period, inflation for bananas was lower.
Time Series forecasting a model to predict future values based on previously observed values is one of the most applied data science techniques in business. This model is used extensively in finance, supply chain-management, production, and inventory planning. It has a well-established theoretical grounding in statistics. Several articles have used Time Series techniques to forecast banana production (Hamjah, 2014; Hossain et al., 2016; Eyduran et al., 2020), but only a few studies have focused on banana price forecasting (Omar et al., 2014; Fatin et al., 2020).
Neural networks have become one of the most popular trends in machine learning for Time Series modeling and forecasting. Recently, there is increasing interest in using neural networks to model and forecast banana harvest yields (Rathod et al., 2017; Rathod and Mishra, 2018; Rebortera and Fajardo, 2019). Despite the importance of banana demand and supply in the global markets, there is still a lack of studies in the technical literature available on global banana price forecasting schemes. Hence, the primary purpose of this study was to apply an autoregressive integrated moving average (ARIMA) and nonlinear autoregressive neural network (NARNN) hybrid (HYBRID) model to forecast global price of bananas during the period of January 1990 to November 2021. The findings in this study may be able to bridge an important gap in Time Series forecasting by combining the best statistical and machine learning methods. The rest of the paper is made up of the following sections: Section 2 introduces Time Series data and ARIMA, NARNN, and HYBRID models for this study, Section 3 presents and analyzes the empirical results, and finally Section 4 concludes the study.
The long-term records of the monthly global price of bananas (units: U.S. dollars per metric ton, not seasonally adjusted) from January 1990 to November 2021 (Figure 1) are available to the public from the International Monetary Fund, retrieved from FRED, Federal Reserve Bank of St. Louis (https://fred.stlouisfed.org/series/PBANSOPUSDM). The average monthly global price of bananas was $726.15 per metric ton with a standard deviation of $281.88 (minimum: $250.51, maximum: $1,298.34, and median: $659.98).
Time Series involves data collected sequentially in time. In a univariate Time Series, it is a set of single (scalar) observations recorded at a specific time t. The sequence of random variables {yt: t = 1, 2, ⋯, T}, yt ∈ R where t ∈ T is the time indexing when the data was observed, is called a stochastic process. This stochastic process is often used in modeling Time Series data to find the parameters of the Time Series, and then use them as a model in predicting future values of the Time Series.
Autoregressive Integrated Moving Average (ARIMA) is a statistical analysis model for modeling and forecasting Time Series data to predict future trends. The purpose of each of these parts is to make the model fit better to predict future points in the Time Series (Montgomery et al., 2008). Statistically, ARIMA (p, d, q) model can be expressed as:
yt=ϕ1yt−1+ϕ2yt−2+⋯+ϕpyt−p+et+θ1et−1+θ2et−2+⋯+θqet−q=∑i=1pϕiyt−i−∑j=1qθjet−j+et | (1) |
where p = the order of the autoregressive process (the number of lagged terms),
d = the number of differences required to make the Time Series stationary,
q = the order of the moving average process (the number of lagged terms),
ϕ = (ϕ1, ϕ2, ⋯, ϕp) is the vector of model coefficients for the autoregressive process,
θ = (θ 1, θ 2, ⋯, θq) is the vector of model coefficients for the moving average process, and
et = the residual error (i.e., white noise).
In backshift notation B, "backshift operator" or "lag operator", is a useful notational device when working with Time Series lags: Byt = yt−1 (means "back up by one time unit") and Bkyt = yt−k (means "backshift k times"). Thus, the ARIMA (p, d, q) model can be expressed in backshift notation as:
ϕp(B)(1−B)dyt=c+θq(B)et | (2) |
where ϕp(B) = (1 – ϕ1B - … - ϕpBp) = (1 – Σp i = 1 ϕiBi), θq(B) = (1 - θ1B - … - θqBq) = (1 – Σq j = 1 θjBj), c is a constant, and et is the error term.
In Time Series forecasting, the Box–Jenkins methodology (Box & Jenkins, 1970) refers to a systematic method of identifying, estimating, checking, and forecasting ARIMA models (Box et al., 2016). It is used as the process to forecast the ARIMA model in this study based on its autocorrelation function (ACF) and partial autocorrelation function (PACF) as a means of determining the stationarity of the univariate Time Series and the lag lengths.
In a Time Series, a common assumption is that the Time Series is stationary, which means that the statistical properties (i.e., mean and variance) of the process do not change over time. Therefore, the Box-Jenkins methodology starts with the assumption that the Time Series should be as on stationary status. Differencing can be used if the Time Series is not stationary. Empirically, plots and summary statistics can be used to identify trends and autoregressive elements to get an idea of the amount of differencing and the size of the lag that will be required for model identification.
In order to figure out good parameters for the Time Series model, the Akaike's Information Criterion (AIC) or the Bayesian Information Criterion (BIC) can be used to determine the orders of an ARIMA model when its minimized value is attempted. In the diagnostic checking step, plots and statistical tests of the residual errors can also be used to determine the model fitting, to evaluate the model fitting in the context of the Time Series data, and to check where the model can be improved.
The idea behind the autoregressive (AR) process is to explain the present value of the Time Series, yt, by a function of p past values, (yt-1, yt-2, ⋯, yt-p). Thus, the AR process of order p, AR(p), is defined by the equation:
yt=ϕ1yt−1+ϕ2yt−2+⋯+ϕpyt−p+et=∑i=1pϕiyt−i+et | (3) |
where ϕ = (ϕ1, ϕ2, ⋯, ϕp) is the vector of model coefficients for the autoregressive process, and et is white noise, et ~ N (0, σ2).
The NARNN is a natural generalization of the classic linear AR(p) process. The NARNN of order p can be expressed as:
yt=Φ(yt−1,yt−2,⋯,yt−p,w)+εt | (4) |
where Φ (∙) is an unknown function determined by the neural network structure and connection weights, w is a vector of all parameters (weights), and ɛt is the error term. Thus, it performs a nonlinear functional mapping from the past observations, (yt-1, yt-2, ⋯, yt-p), to the future value, yt, which is equivalent to a nonlinear autoregressive model (Zhang, 2003).
With the Time Series data, lagged values of the Time Series can be used as inputs to a neural network, which is called the NARNN model. Mathematically, the NARNN model (Benrhmach et al., 2020) can be written by the equation of the form as:
yt=a0+∑j=1kwjΦ(b0j+∑i=1dwijyt−i)+εt | (5) |
where d = the number of input units,
k is the number of hidden units,
a0 is the constant corresponding to the output unit,
b0j is the constant corresponding to the hidden unit j,
wj is the weight of the connection between the hidden unit j and the output unit,
wij is the parameter corresponding to the weight of the connection between the input unit i and the hidden unit j, and Φ (∙) is a nonlinear function, so-called this the transfer (activation) function. The logistic function (i.e., sigmoid) is commonly used as the hidden layer transfer function, that is, Φ(y) = 1 / (1 + exp(-y)).
The most common learning rules for the NARNN model are the Levenberg-Marquardt (LM), Bayesian Regularization, and Scaled Conjugate Gradient training algorithms. In this study, the LM algorithm was considered, because it works without computing the exact Hessian matrix, which increases the training speed, and it has stable convergence (Gavin, 2020). The LM algorithm, first published by Levenberg (1944) and then rediscovered by Marquardt (1963), has become a standard technique for nonlinear least-squares problems. The LM algorithm is an iterative technique that locates the minimum of an objective function F(x) that is expressed as the sum of squares of nonlinear functions (Madsen et al., 2004):
F(x)=(1/2)∑i=1n[fi(x)]2 | (6) |
Furthermore, the LM algorithm steps to search the direction of the iteration given by the solution φi to the equations,
(JTiJi+λiI)φi=−JTifi | (7) |
where Ji is the Jacobian of fi, I am the identity matrix, and λi are the non-negative scalars, called the combination coefficient.
In the LM algorithm, for some scalar Δ > 0 related to λi, the vector φi is the solution of the constrained sub-problem of minimizing (1/2) || Ji φ + fi ||22 subject to || φ ||2 ≤ Δ (Gill et al., 1981, p. 136).
The ARIMA and NARNN models are good at modeling linear and nonlinear problems for the Time Series, respectively. The Hybrid model, a combination of the ARIMA and NARNN models has both linear and nonlinear modelling capabilities, can be a better choice for modelling the Time Series. Assuming that an unknown function can be used to demonstrate the relationship between linear and nonlinear components in the Time Series, that relationship can be illustrated as follows:
yt=f(Lt,Nt) | (8) |
where the linear component is represented by Lt, and the nonlinear component is shown by Nt. Assuming that the linear and nonlinear components in the Time Series have simple additive relationships. Zhang (2003) states that the Time Series can be considered as a combination of the linear and nonlinear components as follows:
yt=Lt+Nt | (9) |
First, the linear component will be modelled by the ARIMA model. Then, the residuals from the ARIMA model will have only the nonlinear relationship, which can be obtained by taking the difference of the observed values and the predicted values as follows:
et=yt−ˆLt | (10) |
where et is the residual of the linear model at time t, and ˆLt is the predicted value for time t. To find the nonlinear relationship, residuals can be modelled by the NARNN model in this study as follows:
ˆNt=et=f(et−1,et−2,⋯,et−n)+εt | (11) |
where f is the transformation function modeled by the NARNN model, and ɛt is the random error. The forecasts from the ARIMA and NARNN models are combined to obtain the forecast of the Time Series ˆyt, which is denoted by:
ˆyt=ˆLt+ˆNt | (12) |
R 4.0.2 for Windows, an open source for statistical computing and graphics supported by the R Foundation for Statistical Computing, was used as the tool to model and forecast global price of bananas during the period of January 1990 to November 2021 for this study. The function "decompose" in R can be applied to estimate the seasonal, trend and irregular components of a seasonal Time Series. The results revealed that the estimated trend component showed a steady increase over time, and the estimated seasonal component definitely displayed seasonality, with a pattern recurrence occurring once every 12 months (yearly).
Seasonal adjustment is the estimation and removal of seasonal effects that are not explainable by the dynamics of trends or cycles from a Time Series to reveal certain non-seasonal features. This can be done by subtracting the estimated seasonal component from the original Time Series. After removed the seasonal variation, the seasonally adjusted Time Series only contained the trend component and an irregular component.
The ACF of the Time Series, seasonal adjusted global price of bananas from January 1990 to November 2021, showed a strongly positive correlations at up to 26 lags that never decay to zero. Meanwhile, the test statistic of the Augmented Dickey-Fuller Test was Dickey-Fuller = −3.1414 with lag order = 7, and the p-value of the test was 0.09806. This suggested that the Time Series was non-stationary and needed to be differenced. This was also illustrated by the single spike at the first lag followed by small apparently random values after the first lag for the PACF. Since the PACF cut off after the first lag, it seems that the Time Series followed the autoregressive (AR) process.
In Time Series analysis, differencing can be used to transform a non-stationary Time Series into a stationary one. When both trend and seasonality are present, both a non-seasonal first difference and a seasonal difference should be applied separately. According to the Augmented Dickey-Fuller Test, Dickey-Fuller = −10.352 with lag order = 7, and the p-value of the test was smaller than 0.01. It rejected the null hypothesis that was non-stationary, and suggested that the first difference of the Time Series was stationary. The ACF of the first differenced Time Series showed a significant positive spike at the first lag, followed by correlations that were statistically significant. The corresponding PACF of the first differenced Time Series showed a most likely gradual decrease after the first few negative lags. Since the ACF cut off after the first lag and the PACF decreased gradually, a reasonable conclusion was that the first differenced Time Series followed the moving average (MA) process.
In order to find a solution, the function "auto.arima()" from the "forecast" package in R 4.0.2 for Windows was employed to identify both the structure of the Time Series (stationary or not) and type (seasonal or not), and sets the model's parameters, that takes into account the AIC, AICc, or BIC values generated to determine the best fit ARIMA model. Consequently, the ARIMA (2,1,4) with a drift model was selected to be the best fit model for the Time Series, seasonal adjusted global price of banana from January 1990 to November 2021, according to the lowest AIC value ( = 4313.15) in this study, and the parameters of the ARIMA (2,1,4) with a drift model were presented in Table 1.
Parameter | Estimate | Standard Error |
Constant | 1.7817 | 0.8797 |
AR Lag 1 | −0.3986 | 0.1539 |
AR Lag 2 | 0.1309 | 0.1281 |
Difference | ||
MA Lag 1 | 0.1042 | 0.1484 |
MA Lag 2 | −0.3869 | 0.1159 |
MA Lag 3 | −0.1057 | 0.0822 |
MA Lag 4 | −0.2921 | 0.0564 |
Sigma2 estimated as 4751, Log Likelihood = −2148.58 | ||
AIC = 4313.15, AICc = 4313.54, BIC = 4344.72 | ||
Training Set Error Measures: | ||
RMSE = 66.89923, MSE = 4475.50697, MAE = 47.63762, MAPE = 8.500749 | ||
Source: own work |
A common task when building a forecasting model is to check that the residuals satisfy some assumptions that they are uncorrelated, normally distributed, etc. The top figure of Figure 2 shows that the residuals from the ARIMA (2,1,4) with a drift model did not violate the assumption of constant location and scale. The bottom right figure of Figure 2 shows that the residual histogram did not reveal a series deviation from normality in this case. The ACF plot of the residuals (the bottom left figure of Figure 2) also shows that all sample autocorrelations were within the threshold limits, indicating that the residuals appeared to be random.
The Ljung-Box Q-test (Ljung and Box, 1978) is a diagnostic tool used to test the lack of fit of a Time Series model. In this case, the test statistic of the Ljung-Box Q-test was Q = 20.292 with 17 degrees of freedom, and the p-value of the test was 0.2596 with model degrees of freedom: 7 and total lags used: 24, indicating that the residuals were random and that the model provided an adequate fit to the Time Series. Combined with the Ljung-Box Q-test statistic, this suggested that the ARIMA (2,1,4) with a drift model appropriately modeled the dynamics for this Time Series. Forecasting process with the ARIMA (2,1,4) with a drift model also indicated a good fit of the model for forecasting at a constant increasing rate in the short-term (Figure 3).
In MATLAB, the NARNN model applied to Time Series prediction using its past values of a univariate Time Series can be expressed as follows:
y(t)=Φ(y(t−1),y(t−2),⋯,y(t−d))+e(t) | (13) |
where y(t) is the Time Series value at time t, d is the time delay, and e(t) is the error of the approximation of the Time Series at time t. This equation describes how the NARNN model is used to predict the future value of a Time Series, y(t), using the past values of the Time Series, (y(t-1), y(t-2), ⋯, y(t-d)). The function Φ (∙) is an unknown nonlinear function, and the training of the neural network aims to approximate the function by means of the optimization of the network weights and neuron bias. This tends to minimize the sum of the squared differences between the observed (yi) and predicted (ŷi) values (i.e., MSE=(1/n)∑ i=1n(yi−ˆyi)2) (Beale et al., 2019).
In this study, the NARNN model was applied to model and forecast the Time Series, global price of bananas from January 1990 to November 2021. Furthermore, the logistic sigmoid and linear transfer functions at the hidden and output layers were used respectively. The number of hidden neurons and the number of delays was set experimentally after a data pre‐processing and analysis stage. The extracted features were trained using the LM training algorithm for the target Time Series in the MATLAB (2019a) Neural Network Toolbox: 383 timesteps of one element, global price of bananas from January 1990 to November 2021.
The training target timesteps are presented to the network during training, and the network is adjusted according to its error. The validation target timesteps are used to measure network generalization and to halt training when generalization stops improving. The testing target timesteps have no effect on training and so provide an independent measure of network performance during and after training (Beale et al., 2019). The division of the Time Series in this analytical work was 70% for the training, 15% for the validation, and 15% for the testing. Randomly, 383 data samples were divided into 269 data for the training, 57 data for the validation, and 57 data for the testing
The development of the optimal architecture for the NARNN model requires determination of time delays, the number of hidden neurons, and an efficient training algorithm. The optimum number of time delays and hidden neurons were obtained through a trial-and-error procedure. Furthermore, the LM algorithms were employed for training of the NARNN model and its performance was evaluated under the optimal neural network structure. The prediction performance of the models was evaluated by its MSE. The error analysis showed that the NARNN model with 12 neurons in the hidden layer and 6 times delays provided the best performance (MSE = 3058.11185) using the LM algorithm (Table 2).
Number of hidden neurons | Number of delays (d) | ||||
2 | 3 | 4 | 5 | 6 | |
8 | 5495.27669 | 4894.51856 | 5619.10430 | 4526.92313 | 3947.92018 |
9 | 4957.78393 | 4824.05006 | 5716.89865 | 3964.32643 | 3212.59409 |
10 | 5179.43173 | 5084.66744 | 4904.24470 | 4626.09369 | 6040.92292 |
11 | 6407.52867 | 4092.28144 | 4162.03297 | 4210.05822 | 3305.64168 |
12 | 5142.34650 | 4757.30244 | 4276.64426 | 3266.47802 | 3058.11185 |
Source: Author's work |
In order to train the NARNN, open loop architecture (Figure 4) shows a block diagram of the NARNN generated during MATLAB processing in the MATLAB (2019a) Neural Network Toolbox. Figure 3 displays the training progress using the LM algorithm stopped when the validation error increased for six iterations with Performance = 2800, Gradient = 11862.397, and Mu = 10.00 at epoch 16. In terms of processing time, the LM algorithm took 0:00:00 during training. The term epoch represents the number of iterations during training in which it is attempted to minimize the error function. The LM algorithm typically requires more memory but less time. Training automatically stops when generalization stops improving, as indicated by an increase in the mean square error of the validation samples (Beale et al., 2019).
The performance plot illustrated the relationship between the training, validation, and testing phases in forecasting global price of bananas, in terms of MSE versus the number of epochs. The performance was evaluated by taking MSE and epochs after the training was completed, and then the values were generated. The performance plot is a useful diagnostic tool to plot the training, validation, and testing errors to check the progress of training. It also illustrated that the training stopped when the validation error increased at the circled epoch. As illustrated in Figure 5, the best performance for the validation phase was 7321.0222 at epoch 10 for the NARNN model. The results showed a good network performance because the validation error and testing error have similar characteristics, and it did not appear that any significant overfitting had occurred.
In the regression plots, the dashed line in each plot represents the perfect result outputs = targets, which can be seen on the regression plots. The solid line in each plot represents the best-fit linear regression line between outputs and targets. On top of each plot, the regression R value measures the correlation between the outputs and the targets. If R equals one, this indicates that there is an exact linear relationship between the outputs and the targets. If R is close to zero, then there is no linear relationship between the outputs and the targets.
As illustrated in Figure 5, the regression R value for the training phase was 0.98054; for the validation phase, 0.95846; for the testing phase, 0.9628; and for all the samples, 0.97416 for the NARNN model. In the training phase, the all R values were above 0.9. This can be seen in Figure 6, indicating that all three models fit equally well statistically. Similarly, all the R values were above 0.9 thus displaying acceptable fit from the testing phase perspective, which indicated good predictive abilities for all three models for the new datasets.
The error histogram gives an indication of outliers, which are data points where the fit is significantly worse than that of most of the data. In the error histograms (Figure 7), the blue bars represent the training data, the green bars represent the validation data, and the red bars represent the testing data. The results showed that there were a few training points and some testing points outside of the range. These outliers were also visible on the training and testing regression plots (Figure 6). If the outliers are valid data points but are unlike the rest of the data, then the network is extrapolating for these points. This means more data similar to the outlier points should be considered in training analysis and that the network should be retrained.
The dynamic network time-series response plots were displayed in Figure 8 for the NARNN model, showing that the outputs were distributed evenly on both sides of the response curve, and the errors versus time were small in the training, validation, and testing phases. The results indicated that the all three models were able to predict the Time Series over the simulation period efficiently.
The error autocorrelation function describes how the prediction errors are related in time. For a perfect prediction model, there should only be one nonzero value of the autocorrelation function, and it should occur at zero lag (this is the MSE). This would mean that the prediction errors are completely uncorrelated with each other (white noise). If there is significant correlation in the prediction errors, then it should be possible to improve the prediction perhaps by increasing the number of delays in the tapped delay lines.
The correlations for the NARNN model (Figure 9), except for the one at zero lag, all fell approximately within the 95% confidence limits around zero, so the models seemed to be adequate. There are, however, some exceptions which suggests that the created network can be improved by retraining it or by increasing the number of neurons in the hidden layer. If even more accurate results are required, retraining the network will change the initial weights and biases of the network, and may produce an improved network after retraining.
In MATLAB, the HYBRID model applied to Time Series prediction using its past residuals from the SARIMA model can be expressed as follows:
e(t)=Φ(e(t−1),e(t−2),⋯,e(t−d))+ε(t) | (14) |
where e(t) is the residual of the Time Series at time t, d is the time delay, and ɛ(t) is the error term. This equation describes how the HYBRID model is used to predict the future residual of a Time Series, e(t), using the past residuals of the Time Series, (e(t-1), e(t-2), ⋯, e(t-d)).
Similarly, the development of the optimal architecture for the HYBRID model requires the determination of time delays, the number of hidden neurons, and an efficient training algorithm. The results of the error analysis showed that the HYBRID model with 11 neurons in the hidden layer and 3 time delays also provided the best performance (MSE = 2982.66655) with the LM algorithm (HYBRID) (Table 3). At the same time, the training progress using the LM algorithm for the Mixed-LM model stopped when the validation error increased for six iterations with Performance = 2700, Gradient = 175.1257, and Mu = 10.00 at epoch 13 (Figure 10).
Number of hidden neurons | Number of delays (d) | ||||
2 | 3 | 4 | 5 | 6 | |
8 | 3920.93665 | 3859.27889 | 3302.37490 | 4348.93180 | 4955.52066 |
9 | 4341.29026 | 3843.05508 | 3645.42461 | 3062.75879 | 4927.80700 |
10 | 3675.49724 | 3739.71316 | 3456.30464 | 3914.27581 | 5361.73173 |
11 | 5313.30679 | 2982.66655 | 4256.62635 | 4250.12751 | 3101.86040 |
12 | 3679.04563 | 3952.80522 | 3311.36531 | 3860.98596 | 4491.00417 |
Source: Author's work |
As illustrated in Figure 11, the best performance for the validation phase was 5943.2994 at epoch 7 for the HYBRID model. The results also showed a good network performance because the validation and testing errors have similar characteristics, and did not appear that any significant overfitting has occurred. In the error histograms (Figure 12), the blue bars represent the training data, the green bars represent the validation data, and the red bars represent the testing data. The results showed that there had a few testing and validation points outside of the range.
The dynamic network time-series response plots were displayed in Figure 13 for the HYBRID model, showed that the outputs were distributed evenly on both sides of the response curve, and the errors versus time were small in the training, validation, and testing phases. The results indicated that the HYBRID model was able to predict the Time Series over the simulation period efficiently as well. For the HYBRID model, the correlations except for the one at zero lag, all fell approximately within the 95% confidence limits around zero, so the model was adequate (Figure 14).
Assuming normal weather conditions and no further spread of banana plant diseases, banana production and trade volumes are rapidly increasing in response to fast population growth in producing countries as well as expanding global import demand. Obviously, the future will provide more intensive knowledge, with a better understanding of the natural complexities of living systems. Bringing together a wide variety of perspectives and concepts requires holistic solutions that involve working across disciplines, principles, and methods to support interdisciplinarity and transdisciplinarity, to explore and formalize systems concepts, and to develop systemic methods for learning and change.
Despite the importance of banana demand and supply in the global markets, there is a lack of studies in the technical literature available on global price of bananas forecasting schemes. Forecasting is a kind of dynamic filtering, in which past values of the Time Series can be used to predict future values. In Time Series forecasting, how to increase the prediction accuracy as much as possible with the non-stationary and noise data is a significant challenge. Experimentally, we hope to explore a hybrid model that can effectively improve the performance of global price of bananas throughput forecasting. In the proposed HYBRID model, a combination of statistical model in linear modeling and machine learning model to capture nonlinear pattern from the residuals of linear model, can be a better choice for modeling and forecasting the Time Series for this study. The fundamental idea is that this combination supports for the limitations of one model with the strengths of the other.
Empirically, the ARIMA and NARNN models are good at modelling linear and nonlinear problems for the Time Series, respectively. In this study, the best choice Time Series model was an ARIMA (2,1,4) with a drift model as its lowest AIC values among other models. It was noticed that this ARIMA (2,1,4) with a drift model gave evidence about future global price of bananas would increase over time. Furthermore, the NARNN model with 12 neurons in the hidden layer and 6 times delays was evaluated as the optimal neural network structure using the LM algorithm.
This study not only wanted to find the best fit model for the Time Series, global price of bananas from January 1990 to November 2021, but also tried to evaluate the accuracy of the ARIMA, NARNN, and HYBRID models used in the forecasting of global price of bananas. The comparative results revealed that the HYBRID model with 11 neurons in the hidden layer and 3 times delays (MSE = 2982.66655) yielded higher accuracy than the NARNN model with 12 neurons in the hidden layer and 6 times delays (MSE = 3058.11185), and the ARIMA (2,1,4) with a drift model (MSE = 4475.50697) in this study (Table 4).
ARIMA (2,1,4) | NARNN (Hidden Layer = 12, d = 6) | HYBRID (Hidden Layer = 11, d = 3) | |
MSE | 4475.50697 | 3058.11185 | 2982.66655 |
Source: Author's work |
According to the results of this study, this HYBRID ARIMA-NARNN model can provide richer information which is important in the decision-making process related to the future global price of bananas, it can be employed in forecasting the future performance for global price of bananas change outcomes. Thus, this study may provide an integrated modeling approach as a decision-making supportive method for forecasting global price of bananas in advance. Understanding past global price of bananas is important for the analysis of current and future global price changes for bananas.
In order to sustain these observations, research programs utilizing the resulting data should be able to improve significantly our understanding and narrow our projections of the future changes and variabilities in the global price of bananas. For further research tasks, the nonlinear autoregressive exogenous (NARX) neural network can consider past information of the same Time Series (global price of bananas), current and past information of the externally determined Time Series that influences the Time Series of interest (i.e., banana production, banana sales, weather information, disease impact information, etc.) in terms of forecasting accurately.
All authors declare no conflicts of interest in this paper.
[1] | Altendorf S (2019) Banana Fusarium Wilt Tropical Race 4: A mounting threat to global banana markets? In 2019 Food Outlook - Biannual Report on Global Food Markets, FAO, Rome, 13–20. Available from: http://www.fao.org/3/ca6911en/CA6911EN_TR4EN.pdf. |
[2] | Beale MH, Hagan MT, Demuth HB (2019) Deep Learning ToolboxTM: Getting Started Guide. Deep Learn Natick, MA: The MathWorks, Inc. |
[3] |
Benrhmach G, Namir K, Namir A, et al. (2020) Nonlinear autoregressive neural network and extended Kalman filters do prediction of financial Time Series. J Appl Math 2020: 1–6. https://doi.org/10.1155/2020/5057801 doi: 10.1155/2020/5057801
![]() |
[4] | Box GEP, Jenkins GM, Reinsel GC, et al. (2015) Time Series Analysis: Forecasting and Control (5th ed.). John Wiley and Sons Inc. |
[5] |
Eyduran SP, Akın M, Eyduran E, et al. (2020) Forecasting banana harvest area and production in Turkey using Time Series analysis. Erwerbs-Obstbau 62: 281–291. https://doi.org/10.1007/s10341-020-00490-1 doi: 10.1007/s10341-020-00490-1
![]() |
[6] |
Fatin ZN, Titik E, Mulyatno SB (2020) The analysis of price and market integration of banana commodities in Lampung, Indonesia. Russ J Agric Socio-Econ Sci 3: 61–68. https://doi.org/10.18551/rjoas.2020-03.07 doi: 10.18551/rjoas.2020-03.07
![]() |
[7] |
Gardner M, Dorling SR (1998) Artificial neural networks (the multilayer perceptron) - a review of applications in the atmospheric sciences. Atmos Environ 32: 2627–2636. https://doi.org/10.1016/S1352-2310(97)00447-0 doi: 10.1016/S1352-2310(97)00447-0
![]() |
[8] | Gavin HP (2013) The Levenberg-Marquardt algorithm for nonlinear least squares curve-fitting problems. Department of Civil and Environmental Engineering Duke University, 19. Available from: http://people.duke.edu/~hpgavin/ce281/lm.pdf. |
[9] | Hamjah MA (2014) Forecasting major fruit crops productions in Bangladesh using Box-Jenkins ARIMA Model. J Econ Sustainable Dev 5: 96–107. Available from: https://core.ac.uk/download/pdf/234646336.pdf. |
[10] |
Hossain MM, Abdulla F, Majumder AK (2016) Forecasting of banana production in Bangladesh. Am J Agric Biol Sci 11: 93–99. https://doi.org/10.3844/ajabssp.2016.93.99 doi: 10.3844/ajabssp.2016.93.99
![]() |
[11] |
Levenberg K (1944) A method for the solution of certain non-linear problems in least squares. Q Appl Math 2: 164–168. https://doi.org/10.1090/qam/10666 doi: 10.1090/qam/10666
![]() |
[12] |
Ljung GM, Box GEP (1978) On a measure of lack of fit in Time Series models. Biometrika 65: 297–303. https://doi.org/10.2307/2335207 doi: 10.2307/2335207
![]() |
[13] |
MacKay DJC (1992) Bayesian Interpolation. Neural Comput 4: 415–447. https://doi.org/10.1162/neco.1992.4.3.415 doi: 10.1162/neco.1992.4.3.415
![]() |
[14] | Market Reports World (2019) Banana market size, share, analysis - segmented by geography - growth, trends, and forecast (2019–2024). Global Banana Market Research Report, Market Reports World, 94. Available from: https://www.marketreportsworld.com/banana-market-13487750. |
[15] |
Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind and Appl Math 11: 431–441. https://doi.org/10.1137/0111030 doi: 10.1137/0111030
![]() |
[16] |
Moller MF (1993) A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks 6: 525–533. https://doi.org/10.1016/S0893-6080(05)80056-5 doi: 10.1016/S0893-6080(05)80056-5
![]() |
[17] | Montgomery DC, Jennings CL, Kulahci M (2008) Introduction to Time Series Analysis and Forecasting. Hoboken, John Wiley & Sons. Inc. |
[18] | Omar MI, Dewan MF, Hoq MS (2014) Analysis of price forecasting and spatial co-integration of banana in Bangladesh. Eur J Bus Manage 6: 244–255. Available from: https://www.iiste.org/Journals/index.php/EJBM/article/view/11463. |
[19] | Rathod S, Mishra GC, Singh KN (2017) Hybrid Time Series models for forecasting banana production in Karnataka State, India. J Indian Soc Agric Stat 71: 193–200. Available from: https://www.researchgate.net/publication/322165651_Hybrid_Time_Series_Models_for_Forecasting_Banana_Production_in_Karnataka_State_India. |
[20] | Rathod S, Mishra GC (2018) Statistical models for forecasting mango and banana yield of Karnataka, India. J Agric Sci Technol 20: 803–816. Available from: http://jast.modares.ac.ir/article-23-19768-en.html. |
[21] |
Rebortera MA, Fajardo AC (2019) An enhanced deep learning approach in forecasting banana harvest yields. Int J Adv Comput Sci Appl 10: 275–280. https://doi.org/10.14569/IJACSA.2019.0100935 doi: 10.14569/IJACSA.2019.0100935
![]() |
[22] | Ruiz A, Fobelets V, Grosscurt C, et al. (2017) The external costs of banana production: A global study. Research Report Prepared for Fairtrade International, True Price Trucost. Available from: http://makefruitfair.org/wp-content/uploads/2017/07/170224_Research_Report_External_Cost_ of_Bananas_-_final.pdf. |
[23] | Voora V, Larrea C, Bermudez S (2020) Sustainable Commodities Marketplace Series 2019, The International Institute for Sustainable Development, Global market report: Bananas 12. Available from: https://www.iisd.org/system/files/publications/ssi-global-market-report-banana.pdf. |
[24] |
Sariev E, Germano G (2020) Bayesian regularized artificial neural networks for the estimation of the probability of default. Quant Financ 20: 311–328. https://doi.org/10.1080/14697688.2019.1633014 doi: 10.1080/14697688.2019.1633014
![]() |
[25] | Young WL (1977) The Box-Jenkins approach to Time Series analysis and forecasting: principles and applications. RAIRO. Rech opérationnelle, 11: 129–143. Available from: http://www.numdam.org/article/RO_1977__11_2_129_0.pdf. |
[26] |
Zhang GP, Patuwo BE, Hu MY (1998) Forecasting with artificial neural networks: The state of the art. Int J Forecast 14: 35–62. https://doi.org/10.1016/S0169-2070(97)00044-7 doi: 10.1016/S0169-2070(97)00044-7
![]() |
[27] |
Zhang GP (2003) Time Series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50: 159–175. https://doi.org/10.1016/S0925-2312(01)00702-0 doi: 10.1016/S0925-2312(01)00702-0
![]() |
1. | Jiajia Zhang, Zhifu Tao, Jinpei Liu, Xi Liu, Huayou Chen, A hybrid interval‐valued time series prediction model incorporating intuitionistic fuzzy cognitive map and fuzzy neural network, 2025, 44, 0277-6693, 93, 10.1002/for.3181 |
Parameter | Estimate | Standard Error |
Constant | 1.7817 | 0.8797 |
AR Lag 1 | −0.3986 | 0.1539 |
AR Lag 2 | 0.1309 | 0.1281 |
Difference | ||
MA Lag 1 | 0.1042 | 0.1484 |
MA Lag 2 | −0.3869 | 0.1159 |
MA Lag 3 | −0.1057 | 0.0822 |
MA Lag 4 | −0.2921 | 0.0564 |
Sigma2 estimated as 4751, Log Likelihood = −2148.58 | ||
AIC = 4313.15, AICc = 4313.54, BIC = 4344.72 | ||
Training Set Error Measures: | ||
RMSE = 66.89923, MSE = 4475.50697, MAE = 47.63762, MAPE = 8.500749 | ||
Source: own work |
Number of hidden neurons | Number of delays (d) | ||||
2 | 3 | 4 | 5 | 6 | |
8 | 5495.27669 | 4894.51856 | 5619.10430 | 4526.92313 | 3947.92018 |
9 | 4957.78393 | 4824.05006 | 5716.89865 | 3964.32643 | 3212.59409 |
10 | 5179.43173 | 5084.66744 | 4904.24470 | 4626.09369 | 6040.92292 |
11 | 6407.52867 | 4092.28144 | 4162.03297 | 4210.05822 | 3305.64168 |
12 | 5142.34650 | 4757.30244 | 4276.64426 | 3266.47802 | 3058.11185 |
Source: Author's work |
Number of hidden neurons | Number of delays (d) | ||||
2 | 3 | 4 | 5 | 6 | |
8 | 3920.93665 | 3859.27889 | 3302.37490 | 4348.93180 | 4955.52066 |
9 | 4341.29026 | 3843.05508 | 3645.42461 | 3062.75879 | 4927.80700 |
10 | 3675.49724 | 3739.71316 | 3456.30464 | 3914.27581 | 5361.73173 |
11 | 5313.30679 | 2982.66655 | 4256.62635 | 4250.12751 | 3101.86040 |
12 | 3679.04563 | 3952.80522 | 3311.36531 | 3860.98596 | 4491.00417 |
Source: Author's work |
ARIMA (2,1,4) | NARNN (Hidden Layer = 12, d = 6) | HYBRID (Hidden Layer = 11, d = 3) | |
MSE | 4475.50697 | 3058.11185 | 2982.66655 |
Source: Author's work |
Parameter | Estimate | Standard Error |
Constant | 1.7817 | 0.8797 |
AR Lag 1 | −0.3986 | 0.1539 |
AR Lag 2 | 0.1309 | 0.1281 |
Difference | ||
MA Lag 1 | 0.1042 | 0.1484 |
MA Lag 2 | −0.3869 | 0.1159 |
MA Lag 3 | −0.1057 | 0.0822 |
MA Lag 4 | −0.2921 | 0.0564 |
Sigma2 estimated as 4751, Log Likelihood = −2148.58 | ||
AIC = 4313.15, AICc = 4313.54, BIC = 4344.72 | ||
Training Set Error Measures: | ||
RMSE = 66.89923, MSE = 4475.50697, MAE = 47.63762, MAPE = 8.500749 | ||
Source: own work |
Number of hidden neurons | Number of delays (d) | ||||
2 | 3 | 4 | 5 | 6 | |
8 | 5495.27669 | 4894.51856 | 5619.10430 | 4526.92313 | 3947.92018 |
9 | 4957.78393 | 4824.05006 | 5716.89865 | 3964.32643 | 3212.59409 |
10 | 5179.43173 | 5084.66744 | 4904.24470 | 4626.09369 | 6040.92292 |
11 | 6407.52867 | 4092.28144 | 4162.03297 | 4210.05822 | 3305.64168 |
12 | 5142.34650 | 4757.30244 | 4276.64426 | 3266.47802 | 3058.11185 |
Source: Author's work |
Number of hidden neurons | Number of delays (d) | ||||
2 | 3 | 4 | 5 | 6 | |
8 | 3920.93665 | 3859.27889 | 3302.37490 | 4348.93180 | 4955.52066 |
9 | 4341.29026 | 3843.05508 | 3645.42461 | 3062.75879 | 4927.80700 |
10 | 3675.49724 | 3739.71316 | 3456.30464 | 3914.27581 | 5361.73173 |
11 | 5313.30679 | 2982.66655 | 4256.62635 | 4250.12751 | 3101.86040 |
12 | 3679.04563 | 3952.80522 | 3311.36531 | 3860.98596 | 4491.00417 |
Source: Author's work |
ARIMA (2,1,4) | NARNN (Hidden Layer = 12, d = 6) | HYBRID (Hidden Layer = 11, d = 3) | |
MSE | 4475.50697 | 3058.11185 | 2982.66655 |
Source: Author's work |