
This paper introduces a novel non-homogeneous stochastic diffusion process, useful for modeling both decreasing and increasing trend data. The model is based on a generalized Gamma-like curve. We derive the probabilistic characteristics of the proposed process, including a closed-form unique solution to the stochastic differential equation, the transition probability density function, and both conditional and unconditional trend functions. The process parameters are estimated using the maximum likelihood (ML) method with discrete sampling paths. A small Monte Carlo experiment is conducted to evaluate the finite sample behavior of the trend function. The practical utility of the proposed process is demonstrated by fitting it to two real-world data sets, one exhibiting a decreasing trend and the other an increasing trend.
Citation: Safa' Alsheyab, Mohammed K. Shakhatreh. A new stochastic diffusion process based on generalized Gamma-like curve: inference, computation, with applications[J]. AIMS Mathematics, 2024, 9(10): 27687-27703. doi: 10.3934/math.20241344
[1] | Emrah Altun, Mustafa Ç. Korkmaz, M. El-Morshedy, M. S. Eliwa . The extended gamma distribution with regression model and applications. AIMS Mathematics, 2021, 6(3): 2418-2439. doi: 10.3934/math.2021147 |
[2] | Hongping Guo, Yuhang Qian, Yiran Zhu, Xinming Dai, Xiao Wang . Confidence intervals for the difference between coefficients of variation of zero-inflated gamma distributions. AIMS Mathematics, 2023, 8(12): 29713-29733. doi: 10.3934/math.20231521 |
[3] | Ahmed R. El-Saeed, Ahmed T. Ramadan, Najwan Alsadat, Hanan Alohali, Ahlam H. Tolba . Analysis of progressive Type-Ⅱ censoring schemes for generalized power unit half-logistic geometric distribution. AIMS Mathematics, 2023, 8(12): 30846-30874. doi: 10.3934/math.20231577 |
[4] | Mustafa M. Hasaballah, Oluwafemi Samson Balogun, M. E. Bakr . Frequentist and Bayesian approach for the generalized logistic lifetime model with applications to air-conditioning system failure times under joint progressive censoring data. AIMS Mathematics, 2024, 9(10): 29346-29369. doi: 10.3934/math.20241422 |
[5] | Samah M. Ahmed, Abdelfattah Mustafa . Estimation of the coefficients of variation for inverse power Lomax distribution. AIMS Mathematics, 2024, 9(12): 33423-33441. doi: 10.3934/math.20241595 |
[6] | Ehab M. Almetwally, Ahlam H. Tolba, Dina A. Ramadan . Bayesian and non-Bayesian estimations for a flexible reduced logarithmic-inverse Lomax distribution under progressive hybrid type-Ⅰ censored data with a head and neck cancer application. AIMS Mathematics, 2025, 10(4): 9171-9201. doi: 10.3934/math.2025422 |
[7] | Abdulhakim A. Al-Babtain, Amal S. Hassan, Ahmed N. Zaky, Ibrahim Elbatal, Mohammed Elgarhy . Dynamic cumulative residual Rényi entropy for Lomax distribution: Bayesian and non-Bayesian methods. AIMS Mathematics, 2021, 6(4): 3889-3914. doi: 10.3934/math.2021231 |
[8] | Mustafa M. Hasaballah, Yusra A. Tashkandy, Oluwafemi Samson Balogun, M. E. Bakr . Reliability analysis for two populations Nadarajah-Haghighi distribution under Joint progressive type-II censoring. AIMS Mathematics, 2024, 9(4): 10333-10352. doi: 10.3934/math.2024505 |
[9] | Ahmed M. Gemeay, Najwan Alsadat, Christophe Chesneau, Mohammed Elgarhy . Power unit inverse Lindley distribution with different measures of uncertainty, estimation and applications. AIMS Mathematics, 2024, 9(8): 20976-21024. doi: 10.3934/math.20241021 |
[10] | Tie Li, Zhengcheng Zhang . Generalized conditional spacings and their stochastic properties. AIMS Mathematics, 2024, 9(9): 23909-23923. doi: 10.3934/math.20241162 |
This paper introduces a novel non-homogeneous stochastic diffusion process, useful for modeling both decreasing and increasing trend data. The model is based on a generalized Gamma-like curve. We derive the probabilistic characteristics of the proposed process, including a closed-form unique solution to the stochastic differential equation, the transition probability density function, and both conditional and unconditional trend functions. The process parameters are estimated using the maximum likelihood (ML) method with discrete sampling paths. A small Monte Carlo experiment is conducted to evaluate the finite sample behavior of the trend function. The practical utility of the proposed process is demonstrated by fitting it to two real-world data sets, one exhibiting a decreasing trend and the other an increasing trend.
Modelling real data, particularly in fields like finance and biology, often involves latent randomness. Consequently, using deterministic models, such as those based on ordinary or partial differential equations, to represent this data may lead to inaccuracies. Alternatively, stochastic diffusion processes (SDPs) can be used, as they are designed to account for the random behavior inherent in these data sets. Typically, a diffusion process X(t) is a solution of the stochastic differential equation (SDE)
dX(t)=a(t,X(t),θ)dt+b(t,X(t),θ)dW(t), | (1.1) |
where W is a Wiener process, a(t,X(t),θ), b(t,X(t),θ) are called the drift and diffusion functions respectively, are known functions, and θ are unknown indexed parameters. Recently, (1.1) has been employed to develop various new homogeneous/non-homogeneous SDPs, including the Log-Logistic [1], Schumer [2], Lomax [3], Weibull [4], Square-Brennan-Schwartz [5], Brennan-Schwartz [6], Gamma [7], Gompertz [8], and Rayleigh [9] diffusion processes. Most of these studies have focused on modeling growth data, such as microorganism culture growth [1], the evolution of electricity net consumption [2], population growth [5], and the growth of the total stock of private car-petrol [7]. On the other hand, modeling data with a declining trend, such as mortality rates, unemployment rates, and infectious diseases, remains of great interest. However, there are few SDPs available for such data. For example, in [3] proposed the Lomax SDP to model the adolescent fertility rate in Morocco, and in [4] introduced the Weibull SDP to model the age dependency ratio in Morocco.
It is worth mentioning that homogeneous SDPs assume a constant rate for the phenomenon under consideration, whereas non-homogeneous SDPs allow for variability in the rate. The latter is more reasonable in many situations. For example, interest rates are typically a function of time, making it impractical to assume a constant rate. Similarly, while mortality rates can sometimes be constant over a short period, they often change over time due to various factors, notably education and health care improvements. Therefore, modeling such phenomena requires non-homogeneous SDPs to provide insightful analysis and more accurate trend forecasting.
The SDPs introduced in the above-cited references are solved using stochastic calculus methods. Moreover, statistical inference techniques, particularly the maximum likelihood (ML) estimates, are employed to estimate the parameters involved in these models. Consequently, the trend function of the process which is a function in these parameters, can be easily estimated immediately due to the invariance property. While the ML estimates are satisfactory, the derivation of these estimates requires the functional form of the processes, which can be summarized through the transition probability density function. Unfortunately, the transition probability density function is quite difficult to obtain in many processes, which makes obtaining the estimates of these parameters impossible, and approximation methods are required in these cases, for example, [10,11,12], among others.
In this article, we introduce a novel stochastic model related to a particular type of the generalized Gamma-like stochastic diffusion model. The generalized Gamma probability distribution, proposed in [13], is particularly useful for modeling diverse types of data, especially lifetime data. The probability density function of the generalized Gamma distribution is
f(t;a,α,p)∝tα−1exp{−(ta)p}, | (1.2) |
where a,α,p>0. Note that a is a scale parameter, while the other two parameters are shape parameters. Additionally, this distribution includes several other common distributions, such as the exponential, gamma, and Weibull distributions, as special cases. However, we shall consider the following version of the generalized Gamma distribution with one parameter:
f(t;α)∝tαexp{−b1(α)tb2(α)}, |
where b1(α),b2(α)>0.
We are motivated to introduce a novel SDP that is designed to model various types of trend data, particularly decreasing and increasing trends. This contrasts with many existing SDPs in the literature, which typically model only a single type of trend. Interestingly, the proposed SDP, based primarily on a generalized gamma-like distribution with a single parameter, effectively accommodates both decreasing and increasing trend data. It is important to note that adding more parameters does not necessarily improve the modeling of such data. For instance, the two-parameter Weibull stochastic diffusion, as discussed in [4], does not significantly enhance the analysis of the data described in the data section. Moreover, introducing additional parameters can complicate the process of obtaining maximum likelihood estimates.
Our main contributions in this paper are as follows: Firstly, we define a novel SDP capable of modeling decreasing and increasing trend data through careful selection of its drift function, which involves a single parameter. Secondly, we thoroughly study the main characteristics of the proposed process by demonstrating its existence and uniqueness, determining the transition probability density function, calculating the moments of the process, and analyzing both the trend and conditional trends. Notably, the trend function is proportional to the probability density function given in (1.1). Thirdly, the SDP parameters are estimated using the maximum likelihood method. While the likelihood function appears intractable, which is common in various SDPs, it becomes manageable due to the kernel function being a log-normal distribution. By applying the log-likelihood function, the process simplifies after some elementary algebra. Fourthly, we conduct simulation experiments to demonstrate the consistency of the maximum likelihood estimates. Finally, two real-world applications exhibiting decreasing and increasing trends are analyzed, and the SDP outperforms several existing processes.
The paper is organized as follows: Section 2 provides a description of the novel SDP and its characteristics, including the solution of the process, the transition probability density function (TPDF), and the moments. In Section 3, the process parameters are obtained using the maximum likelihood estimation method with discrete sampling of the process. A small Monte Carlo experiment is conducted in Section 4. The performance of the proposed process is applied to a real-world data set in Section 5. Finally, some concluding remarks are given in Section 6.
In this section, we introduce a new one-dimension stochastic diffusion process. Some features of the process, such as the existence of the solution, transition probability distributions, and moments, are explained and derived.
The GGC process is defined as the non-homogeneous diffusion process depending on {X(t),t∈[t1,T],t1>0} assuming values in (0,∞) with infinitesimal moments given by
a(t,X(t),θ)=(αt−103αt−102/α)X(t),b1/2(t,x(t),θ)=σX(t), | (2.1) |
where t1 is the initial time and T is the last time. The above-described process can be formally viewed as a solution to the following SDP:
dX(t)=(αt−103αt−102/α)X(t)dt+σX(t)dW(t);x(t1)=x1, | (2.2) |
where σ>0, and α∈R∖{0} is a non-zero real constant. Clearly, the solution of (2.2) can be obtained via the one-dimensional Itô's integral as follows:
X(t)=X1+∫Tt1(αt−103αt−102/α)X(t)dt+σ∫Tt1X(t)dW(t). |
Here, we show the existence and uniqueness of the solution for the GGC process given via in (2.2). Toward this goal, it is enough to verify that the infinitesimal moments satisfy uniform Lipschitz and linear growth conditions; see [11].
Theorem 2.1. The SDE in (2.2) possesses a unique solution.
Proof. First, we show that the GGC process satisfies a uniform Lipschitz. To do so, consider x,y∈R+ and t∈[t1,T]. It then follows that
|a(t,x)−a(t,y)|+|√b(t,x)−√b(t,y)|=|a(t,x−y)|+|√b(t,x−y)|,=|(αt−103αt−102/α)(x−y)|+|σ(x−y)|,=(|(αt−103αt−102/α)|+|σ|)|x−y|,≤(supt1≤t≤T|(αt−103αt−102/α)|+|σ|)|x−y|. |
On the other hand, the process satisfies linear growth since for y=0, we have that
|a(t,x)|2+|√b(t,x)|2≤(|a(t,x)|+|√b(t,x)|)2,≤[(supt1≤t≤T|(αt−103αt−102/α)|+|σ|)|x|]2,≤(supt1≤t≤T|(αt−103αt−102/α)|+|σ|)2(1+|x|)2. |
Thus, there exists an almost surely (a.s.) continuous process {x(t),t∈[t1,T];t1>0} that is the unique solution of the SDE (2.2) with probability 1.
The determination of the probability distribution for the solution of the GGC process plays a central role in studying the fundamental characteristics of the proposed process, especially its mean function, which serves as a basis for trend analysis. Moreover, since the process parameters are unknown, estimating these quantities using methods like maximum likelihood requires knowledge of the probability distribution of the sample path. The explicit solution of the SDE (2.2) can be obtained by considering the transformation Y(t)=log(X(t)). Upon applying Itô's formula to y, we have the following:
dY(t)=1X(t)dX(t)−12X2(t)σ2X2(t)dt,=(αt−103αt−102/α−σ22)dt+σdW(t), |
where Y(t1)=log(X1). On integrating the above equation, we obtain,
Y(t)−Y(t1)=∫tt1(αs−103αs−102/α−σ22)ds+σ(W(t)−W(t1)), |
and hence
Y(t)=Y(t1)+αlog(t/t1)−103/α1−102α(t−102α+1−t−102α+11)−σ22(t−t1)+σ(W(t)−W(t1)). |
Therefore, the solution in terms of the original GGC process.
X(t)=X1(tt1)αexp(−103−102+α(t−102α+1−t−102α+11)−σ22(t−t1))eσ(W(t)−W(t1)). | (2.3) |
Observe that the process Y(t) is a Gaussian process if and only if the initial condition Y1 is a constant or is normally distributed. Since the initial condition is constant a.s, i.e., P(Y1=y1)=1, and Y(t) is a Markovian process, it then follows that the finite-dimensional distribution of Y(t) is normal. Consequently, the finite dimensional distribution of X(t) is log-normal distribution. Additionally, the TPDF of X(t) given X(s) where s<t follows a log-normal distribution denoted by Λ1(μ(s,t,xts),σ2(t−s)), where μ(s,t,xs) is given by
μ(s,t,x)=log(x)+αlog(t/s)−103−102+α(t−102α+1−s−102α+1)−σ22(t−s). |
Therefore, the TPDF of the process considered has the following form:
f(y,t|xs,s)=1y[2πσ2(t−s)]−1/2exp(−[log(y)−μ(s,t,xs)]22σ2(t−s)). | (2.4) |
Since X(t) is distributed according to Λ1(μ(s,t,xs),σ2(t−s)), it then follows from the properties of the lognormal distribution that the nth conditional moment of X(t) given X(s) is
E[Xn(t)|X(s)=xs]=exp(nμ(s,t,xs)+n2σ22(t−s)). |
The conditional mean which considers as the trend function can be obtained using (n=1). That is
E[X(t)|X(s)=xs]=xs(ts)αexp(−103−102+α(t−102α+1−s−102α+1)). | (2.5) |
On the other hand, the mean function or the unconditional trend of the process is given by
E[X(t)]=xt1(tt1)αexp(−103−102+α(t−102α+1−t−102α+11)), | (2.6) |
where the above equation is obtained under the assumption that P(X(t1)=x1)=1. Notice that when α>0, it then follows that the trend function is proportional to one-parameter generalized Gamma density. Similarly, other statistical measures such as the variance, Skewness, and Kurtois can be obtained. In particular, the variance of the process is
Var[X(t)]=E[X2(t)]−(E[X(t)])2=x2t1(tt1)2αexp(−2(103)−102+α(t−102α+1−t−102α+11))(eσ2(t−t1)−1). |
Once the process, along with its basic properties, is introduced and discussed, it becomes important to examine its significance in simulation and application. However, the presence of unknown parameters makes trend analysis in practice difficult. Consequently, these parameters need to be estimated, and the maximum likelihood estimate emerges as a suitable choice, given the known functional form of the TPDF. Furthermore, ML estimates possess advantageous properties such as invariance, efficiency, and asymptotic normality.
Here, the two parameters involved in the drift and diffusion functions are estimated using the ML method. We consider discrete sampling observations of the process x(t1),x(t2),…,x(tn) at times t1,t2,…,tn=T. For simplicity, put tj+1−tj=h and use xi to refer to x(ti)=xi. The likelihood function of θ=(α,σ2)T can be obtained from Eq (2.4), taking into account that the initial condition taking P(X(t1)=x1)=1, is
L(θ)=n−1∏j=1f(xj+1,tj+1|xj,tj),=n−1∏j=11xj+1[2πσ2(tj+1−tj)]−1/2exp(−[log(xj+1)−μ(j,j+1,xj)]22σ2(tj+1−tj)). |
The log-likelihood equation is
ℓ(α,σ2)=−n−12log(2πh)−n−12log(σ2)−n−1∑j=1log(xj+1)−12σ2hn−1∑j=1[log(xj+1xj)−αlog(tj+1tj)+103−102+α(t−102α+1j+1−t−102α+1j)+σ2h2]2. | (3.1) |
The ℓ(α,σ2) can be maximized by solving the nonlinear likelihood equation obtained by differentiating with respect to θ=(α,σ2)T. The first partial derivatives of ℓ(α,σ2) are given by
∂ℓ(α,σ)∂α=n−1∑j=1(Hα,j+σ2h2)[−log(tj+1tj)+105α2(−102+α)(log(tj+1)t−102α+1j+1−log(tj)t−102α+1j)−103(−102+α)2(t−102α+1j+1−t−102α+1j)], | (3.2) |
∂ℓ(α,σ))∂σ2=−n−12σ2+12σ4hn−1∑j=1H2α,j−n−1∑j=1h8, | (3.3) |
where
Hα,j=log(xj+1xj)−αlog(tj+1tj)+103−102+α(t−102α+1j+1−t−102α+1j). |
Let S(θ)=(∂ℓ(θ)/∂α,∂ℓ(θ)/∂σ2)T be the score function. Then, the MLE of ˆθ=(ˆα,ˆσ2) can be obtained by solving S(θ)=0. Unfortunately, the MLEs cannot be obtained in closed form, and numerical methods are required to obtain these estimates. From Eq (3.3), we obtain
ˆσ22=1h[(1+1n−1n−1∑j=1H2ˆα,j)1/2−1]. | (3.4) |
On substituting (3.4) in Eq (3.2) the following nonlinear equation is obtained for the estimator ˆα:
n−1∑j=1(Hˆα,j+ˆσ2h2)[−log(tj+1tj)+105ˆα2(ˆα−102)(log(tj+1)t1−102ˆαj+1−log(tj)t1−102ˆαj)−103(ˆα−102)2(t1−102ˆαj+1−t1−102ˆαj)]=0. | (3.5) |
Let g(α) be the left-hand side equation (3.6). Therefore, the ML estimate of α can be achieved by solving the non-linear equation g(α)=0.
Once the ML estimates of α and σ2 are obtained, we proceed to provide estimates for the conditional mean and the mean of the process. Due to the invariance property of the ML estimates, see for example Theorem [5.28,308] in [14], the ML estimates for the conditional mean and the unconditional mean can be obtained. Let ˆα and ˆσ2 be the ML estimates of α and σ2, respectively, then the estimated conditional mean of the process (ECMF) is
ˆE(X(t)|X(s))=xs(ts)ˆαexp(−103−102+ˆα(t−102ˆα+1−s−102ˆα+1)). | (3.6) |
Similarly, the estimated mean function (EMF) of the process is
ˆE(X(t)|X(t1))=x1(tt1)ˆαexp(−103−102+ˆα(t−102ˆα+1−t−102ˆα+11)), | (3.7) |
where in the above equation we used the assumption that P(X(t1)=x(t1))=1.
At time, we would like to obtain a confidence band for the CMF and MF of the process. From Eq (2.4), we have that for t>s, X(t)|X(s) follows Λ1(μ(s,t,xxs),σ2(t−s).) Therefore, we have that
Z=ln(X(t))−μ(s,t,xs)σ√t−s∼N(0,1). |
Consequently, a (1−α)100% confidence band for z is determined by P(−Zγ/2≤Z≤Zγ/2)=1−γ, for γ∈(0,1).
P[−Zγ/2≤log(x(t)x1)−αlog(tt1)+103−102+α(t−102α+1−t−102α+11)+σ22(t−t1)σ√t−t1≤Zγ/2]≈1−γ. |
Therefore, the (1−γ)100% confidence bound (CB) for X(t) is
xlower(t)≤x(t)≤xupper(t), | (3.8) |
where,
xlower(t)=x1exp[−Zγ/2σ√t−t1+αlog(tt1)−103−102+α(t−102α+1−t−102α+11)−σ22(t−t1)],xupper(t)=x1exp[Zγ/2σ√t−t1+αlog(tt1)−103−102+α(t−102α+1−t−102α+11)−σ22(t−t1)]. |
Here, our primary goal is to illustrate the pattern of the GGC process by simulating sample trajectories. Additionally, we use these sample paths to evaluate the frequentist performance of the ML estimates for the trend function of the GGC process. The following algorithm describes the procedure for simulating a trajectory of the GGC process computed at N time points.
Algorithm
1: Initialize, t1, T, N, x1, α, σ,
2: Compute Δ:=T−t1N,
3: For j=2,…,N, do
(a) Discretize the time interval [t1,T], into N time points with each point computed as ti=ti−1+(i−1)Δ,
(b) Generate a standard normal observation, i.e., Z:N(0,1), and the increments are computed as Wj=Wj−1+√ΔZi,
(c) The sample path of the GGC process can be obtained upon using (2.3), i.e., Xj=Xj−1(tjtj−1)αexp{−103−102+α(t−102α+1j−t−102α+1j−1)−σ22(tj−tj−1)}exp(σ(Wj−Wj−1)).
4: End do
5: Repeat step 3 m times to obtain m sample paths.
6: For i=1,…,m, use the sample path Xi(tj) in step 5 to compute the ML estimates, using Eqs (3.2) and (3.3), ˆαi and ˆσi, respectively.
7: The ML estimates are then given by: ˆα=m−1∑mi=1ˆαi and ˆσ=m−1∑mi=1ˆσi with mean square errors respectively, MSE(ˆα)=(m−1)−1∑mi=1(ˆαi−α)2 and MSE(^σ2)=(m−1)−1∑mi=1(^σ2i−σ2)2.
We generate m=50 training sample paths using the algorithm described above, varying the parameters in the graphs. All computations in this paper were performed using R software [15].
(1) In the first scenario, we set the time interval [t1,T]=[1000,1010], N=100, and X(t1)=9000. Figure 1 shows that the trajectories vary in direction based on the sign of the parameter α. The dispersion of the process is influenced by the value of σ; smaller values result in trajectories that closely follow a single curve, while larger σ values lead to more dispersed yet similarly trending curves. Using the 50 sample paths, we obtain the maximum likelihood (ML) estimates of the process parameters and subsequently the ML estimates of the trend function. Figure 1 demonstrates that the estimated trend closely matches the actual trend function, indicating the reliability of our estimates.
(2) In the second scenario, we consider [t1,T]=[500,510], N=100, and X(t1)=3000. Similarly, Figure 2 reveals that the process exhibits either a decreasing or increasing trend. The sample paths tend to either closely follow a single curve or scatter depending on whether the dispersion parameter σ is small or large, respectively. Additionally, using 50 sample trajectories, we obtain the ML estimates of α and σ, and consequently the estimate of the trend function. A closer examination shows that the ML estimates behave consistently across different scenarios.
This section presents the application of our model to two real-world datasets: the number of infant deaths in the United Kingdom from 1977 to 2020 and CO2 emissions (in kilotons) in Morocco from 1990 to 2020. These annualized data are sourced from the World Bank database. The first dataset is fitted using the GGC process and compared with two existing models: the two-parameter Weibull SDP [4] and the Lomax SDP [3], both of which are suitable for modeling data with decreasing trends. Similarly, the second dataset is fitted using the GGC process and compared with the log-logistic SDP [1], which is appropriate for modeling data with increasing trends.
According to the Royal College of Paediatrics and Child Health (RCPCH), infant death rates in all countries of the UK have significantly fallen over the past 40 years. Most childhood deaths occur during the first year of life, particularly in the first month (neonatal period). Newborn deaths account for 70% to 80% of infant deaths. The vast majority of newborn deaths are due to perinatal causes, especially preterm birth, and are strongly linked to maternal health and congenital anomalies. The remaining infant deaths occur due to a wide variety of causes, including sudden unexplained death in infancy (SUDI). The overall decline in infant death rates since 1980 likely reflects general improvements in health care, specifically prenatal and newborn care. Breastfeeding and safe sleeping positions are protective factors for infant survival, especially for premature babies. This study aims to use a stochastic diffusion process to model the number of infant deaths and to forecast the number of infant deaths in the United Kingdom. We use the data from 1977 to 2018, available at https://databank.worldbank.org/source/world-development-indicators, as training data to estimate the model parameters: ˆα=−1779.057 and ˆσ=0.02208178. Consequently, the estimated trend functions (unconditional EMF and conditional ECMF) are obtained immediately due to the invariance of ML estimates. Additionally, we use the years 2019 and 2020 to forecast the number of infant deaths in the UK. The results for the forecasted values for these two years are presented in Table 1 and shown in Figure 3. These results demonstrate that the GGC process is quite effective in predicting the values for these years, particularly when employing ECMF. Figure 3 (left panel) displays the observed data, estimated trend function (EMF), and estimated confidence bands, revealing that the GGC process fits the current data well and provides accurate predictions. Figure 3 (right panel) further illustrates the conditional trend function estimation along with forecasts. Moreover, we compare the proposed diffusion process with the Weibull [4] and Lomax [3] diffusion processes using the current data. Table 2 lists the estimated parameters and the Akaike information criterion (AIC), showing that the GGC process has the lowest AIC value and therefore outperforms the two-parameter diffusion processes. This is also well demonstrated in Figure 3. Moreover, Figure 4 shows the fits made using the methods mentioned in Table 2, revealing that the GGC process performed much better than the Weibull [4] and Lomax [3] diffusion processes in modelling the current data.
Year | Real Data | EMF | ECMF |
2019 | 2703 | 2790.843 | 2763.366 |
2020 | 2571 | 2738.968 | 2653.739 |
Model | α | β | σ | AIC |
GGC | -1779.057 | NA | 0.02208178 | 500.9154 |
Weibull | -58.01113386 | 7.200000 | 0.02290836 | 505.927 |
Lomax | -59.05057463 | -0.29692752 | 0.02290831 | 505.924 |
The accuracy of the forecast can be quantified using measures such as the mean absolute error (MAE), the root mean square error (RMSE), and the mean absolute percentage error (MAPE), defined as follows: MAE=142∑42i=1|x(ti)−ˆx(ti)|=257.9876, RMSE=√142∑42i=1|x(ti)−ˆx(ti)|2=330.2669, and MAPE=142∑42i=|x(ti)−ˆx(ti)|x(ti)×100=5.1237. Table 3 shows the observed data along with their corresponding forecast. The value obtained for MAPE is less than 10, and this indicates that we obtained a high-accuracy prediction.
Year(t) | x(t) | ˆx(t) | Year(t) | x(t) | ˆx(t) |
1977 | 9353 | 9353.000 | 1998 | 4121 | 4606.083 |
1978 | 9011 | 8999.828 | 1999 | 3979 | 4476.542 |
1979 | 9041 | 8664.142 | 2000 | 3814 | 4352.688 |
1980 | 8942 | 8344.971 | 2001 | 3675 | 4234.245 |
1981 | 8433 | 8041.403 | 2002 | 3600 | 4120.956 |
1982 | 7849 | 7752.580 | 2003 | 3619 | 4012.576 |
1983 | 7479 | 7477.700 | 2004 | 3681 | 3908.874 |
1984 | 7264 | 7216.006 | 2005 | 3680 | 3809.631 |
1985 | 7183 | 6966.78 | 2006 | 3676 | 3714.641 |
1986 | 7096 | 6729.379 | 2007 | 3700 | 3623.708 |
1987 | 6991 | 6503.152 | 2008 | 3696 | 3536.649 |
1988 | 6858 | 6287.515 | 2009 | 3614 | 3453.288 |
1989 | 6539 | 6081.914 | 2010 | 3515 | 3373.459 |
1990 | 6189 | 5885.826 | 2011 | 3424 | 3297.008 |
1991 | 5831 | 5698.758 | 2012 | 3324 | 3223.784 |
1992 | 5415 | 5520.246 | 2013 | 3179 | 3153.648 |
1993 | 5031 | 5349.855 | 2014 | 3051 | 3086.466 |
1994 | 4724 | 5187.171 | 2015 | 3010 | 3022.114 |
1995 | 4492 | 5031.808 | 2016 | 2988 | 2960.472 |
1996 | 4342 | 4883.399 | 2017 | 2925 | 2901.426 |
1997 | 4239 | 4741.599 | 2018 | 2817 | 2844.871 |
Another application of our model is to analyze Carbon Dioxide (CO2) emissions in Morocco, measured in kilotons (kt). CO2 emissions primarily result from burning fuels and cement manufacturing. Since the Industrial Revolution, the increasing combustion of carbon-based fuels has significantly raised atmospheric CO2 concentrations, accelerating global warming and contributing to man-made climate change. Additionally, CO2 dissolves in water to form carbonic acid, which leads to ocean acidification. This subsection examines CO2 emissions (kt) in Morocco from 1990 to 2020, with data sourced from https://databank.worldbank.org/source/world-development-indicators. We use the data from 1990-2018 as training data to estimate the model parameters: ˆα=81.55085457 and ˆσ=0.02977168. Therefore, the estimated trend functions both (the unconditional EMF and the conditional ECMF) are derived directly due to the invariance of the ML estimates. Similarly, we use the data from 2019 and 2020 to project CO2 emissions in Morocco. The predicted values for these two years are reported in Table 4 and illustrated in Figure 5. Once again, the results demonstrate that the GGC process is capable of accurately predicting the values for these years using both EMF and ECMF. Figure 5 (left panel) depicts the observed data, the estimated trend function (EMF), and the estimated confidence intervals, indicating that the GGC process accurately fits the data and yields precise forecasts. Figure 5 (right panel) further demonstrates the conditional trend function estimation and projections. Furthermore, we compare the proposed diffusion process with the log-logistic diffusion process [1] using the current dataset. Table 5 presents the estimated parameters and the AIC criterion, showing that the GGC process has a lower AIC value and thus surpasses the two-parameter diffusion process. This is also clearly illustrated in Figure 5. Furthermore, Figure 6 displays the fits performed using the methods outlined in Table 5, demonstrating that the GGC process outperformed the Log-Logistic diffusion process in modeling this data.
Year | Real Data | EMF | ECMF |
2019 | 70986.3 | 67764.45 | 66863.89 |
2020 | 66719.5 | 70480.36 | 73831.33 |
Model | α | β | σ | AIC |
GGC | 81.55085457 | NA | 0.02977168 | 481.1204 |
log-logistic | -0.03638392 | 0.64735468 | 0.03437501 | 490.76 |
Similarly, the accuracy of the forecast can be assessed using the metrics: MAE=2567.19, RMSE=3115.123, and MAPE=6.032141. Table 6 shows the observed data along with their corresponding forecast. The MAPE value obtained is less than 10, indicating a high-accuracy prediction.
Year(t) | x(t) | ˆx(t) | Year(t) | x(t) | ˆx(t) |
1990 | 21497.8 | 21497.80 | 2005 | 43579.9 | 39011.22 |
1991 | 23119.0 | 22372.13 | 2006 | 44991.0 | 40585.80 |
1992 | 24879.0 | 23281.55 | 2007 | 46341.0 | 42223.10 |
1993 | 25577.0 | 24227.46 | 2008 | 48630.2 | 43925.60 |
1994 | 27712.5 | 25211.30 | 2009 | 48765.7 | 45695.85 |
1995 | 28791.8 | 26234.57 | 2010 | 51749.5 | 47536.51 |
1996 | 28484.7 | 27298.83 | 2011 | 55923.5 | 49450.35 |
1997 | 29903.8 | 28405.71 | 2012 | 258076 | 51440.24 |
1998 | 30634.6 | 29556.88 | 2013 | 57595.5 | 53509.16 |
1999 | 32198.7 | 30754.09 | 2014 | 58691.7 | 55660.21 |
2000 | 32876.5 | 31999.17 | 2015 | 60362.5 | 57896.61 |
2001 | 36273.0 | 33294.00 | 2016 | 60289.9 | 60221.69 |
2002 | 37470.0 | 34640.54 | 2017 | 63014.9 | 62638.94 |
2003 | 37251.1 | 36040.83 | 2018 | 64286.1 | 65151.94 |
2004 | 41009.0 | 37496.99 |
We introduce a new stochastic diffusion process based on generalized Gamma-like curves, referred to as the GGC process. The GGC process is capable of modeling both increasing and decreasing trend data. We investigate and derive several structural properties, including the explicit solution of the process, the transition probability density function, and the moments of the process. The maximum likelihood method is employed to estimate the GGC process parameters. Simulation studies demonstrate that the ML estimates are consistent even in small samples. The potential of the GGC process for analyzing data with a declining trend is illustrated by fitting it to infant mortality data in the UK. The proposed diffusion process outperforms two existing popular stochastic diffusion processes that are designed to fit decreasing trend data. Additionally, it is applied to CO2 emissions in Morocco, which exhibit an increasing trend. The GGC process again showed satisfactory performance in modeling this data and outperformed the log-logistic SDP, which is suitable for modeling growth data.
As with any stochastic diffusion process or model, the GGC process may be inappropriate for modeling unimodal or bathtub-shaped data. This can be addressed by selecting different infinitesimal moments or by incorporating additional suitable parameters.
For future work, we plan to model the first passage time using the GGC process with more applications. Additionally, we will explore other estimation methods, particularly the Bayesian approach, which may be more powerful in handling other application scenarios.
Conceptualization, M.K.S.; methodology, M.K.S.; writing-original draft preparation, M.K.S.; software, M.K.S and S.A.; validation, S.A.; formal analysis, M.K.S. and S.A. All authors have read and agreed to the published version of the manuscript.
We would like to thank the Editor, Associate Editor, and the three anonymous referees for their insightful comments and valuable suggestions, which greatly improved our paper.
No potential conflict of interest was reported by the authors.
[1] |
A. El Azri, N. Ahmed, A stochastic log-logistic diffusion process: statistical computational aspects and application to real data, Stoch. Models, 40 (2024), 261–277. https://doi.org/10.1080/15326349.2023.2241070 doi: 10.1080/15326349.2023.2241070
![]() |
[2] |
A. Nafidi, A. El Azri, R. Gutiérrez-Sánchez, A stochastic Schumacher diffusion process: probability characteristics computation and statistical analysis, Methodol. Comput. Appl. Probab., 25 (2023), 66. https://doi.org/10.1007/s11009-023-10031-4 doi: 10.1007/s11009-023-10031-4
![]() |
[3] |
A. Nafidi, I. Makroz, R. Gutiérrez-Sánchez, A stochastic Lomax diffusion process: statistical inference and application, Mathematics, 9 (2021), 100. https://doi.org/10.3390/math9010100 doi: 10.3390/math9010100
![]() |
[4] |
A. Nafidi, M. Bahij, R. Gutiérrez-Sánchez, B. Achchab, Two-parameter stochastic Weibull diffusion model: statistical inference and application to real modeling example, Mathematics, 8 (2020), 160. https://doi.org/10.3390/math8020160 doi: 10.3390/math8020160
![]() |
[5] |
A. Nafidi, G. Moutabir, R. Gutiérrez-Sánchez, E. Ramos-Ábalos, Stochastic square of the Brennan-Schwartz diffusion process: statistical computation and application, Methodol. Comput. Appl. Probab., 7 (2020), 455–476. https://doi.org/10.1007/s11009-019-09714-8 doi: 10.1007/s11009-019-09714-8
![]() |
[6] |
A. Nafidi, G. Moutabir, R. Gutiérrez-Sánchez, Stochastic Brennan–Schwartz diffusion process: statistical computation and application, Mathematics, 7 (2019), 1062. https://doi.org/10.3390/math7111062 doi: 10.3390/math7111062
![]() |
[7] |
R. Gutiérrez, R. Gutiérrez-Sánchez, A. Nafidi, The trend of the total stock of the private car-petrol in Spain: stochastic modelling using a new gamma diffusion process, Appl. Energy, 86 (2009), 18–24. https://doi.org/10.1016/j.apenergy.2008.03.016 doi: 10.1016/j.apenergy.2008.03.016
![]() |
[8] |
R. Gutiérrez, R. Gutiérrez-Sánchez, A. Nafidi, Modelling and forecasting vehicle stocks using the trends of stochastic Gompertz diffusion models: the case of Spain, Appl. Stoch. Model. Bus. Ind., 25 (2009), 385–405. https://doi.org/10.1002/asmb.754 doi: 10.1002/asmb.754
![]() |
[9] |
R. Gutiérrez, R. Gutiérrez-Sánchez, A. Nafidi, The stochastic Rayleigh diffusion model: statistical inference and computational aspects. Applications to modelling of real cases, Appl. Math. Comput., 175 (2006), 628–644. https://doi.org/10.1016/j.amc.2005.07.047 doi: 10.1016/j.amc.2005.07.047
![]() |
[10] |
B. M. Bibby, M. Sørensen, Martingale estimation functions for discretely observed diffusion processes, Bernoulli, 1 (1995), 17–39. https://doi.org/10.2307/3318679 doi: 10.2307/3318679
![]() |
[11] | P. E. Kloeden, E. Platen, Numerical solution of stochastic differential equations, Springer Berlin, Heidelberg, 1992. https://doi.org/10.1007/978-3-662-12616-5 |
[12] | B. L. S. Prakasa Rao, Statistical inference for diffusion type processes, Arnold, London, UK, 1999. |
[13] |
E. W. Stacy, A generalization of the Gamma distribution, Ann. Math. Statist., 33 (1962), 1187–1192. https://doi.org/10.1214/aoms/1177704481 doi: 10.1214/aoms/1177704481
![]() |
[14] | M. J. Schervish, Theory of statistics, Springer-Verlag, New York, USA, 1995. https://doi.org/10.1007/978-1-4612-4250-5 |
[15] | The R Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, 2016. Available from: https://web.mit.edu/r_v3.3.1/fullrefman.pdf. |
Year | Real Data | EMF | ECMF |
2019 | 2703 | 2790.843 | 2763.366 |
2020 | 2571 | 2738.968 | 2653.739 |
Model | α | β | σ | AIC |
GGC | -1779.057 | NA | 0.02208178 | 500.9154 |
Weibull | -58.01113386 | 7.200000 | 0.02290836 | 505.927 |
Lomax | -59.05057463 | -0.29692752 | 0.02290831 | 505.924 |
Year(t) | x(t) | ˆx(t) | Year(t) | x(t) | ˆx(t) |
1977 | 9353 | 9353.000 | 1998 | 4121 | 4606.083 |
1978 | 9011 | 8999.828 | 1999 | 3979 | 4476.542 |
1979 | 9041 | 8664.142 | 2000 | 3814 | 4352.688 |
1980 | 8942 | 8344.971 | 2001 | 3675 | 4234.245 |
1981 | 8433 | 8041.403 | 2002 | 3600 | 4120.956 |
1982 | 7849 | 7752.580 | 2003 | 3619 | 4012.576 |
1983 | 7479 | 7477.700 | 2004 | 3681 | 3908.874 |
1984 | 7264 | 7216.006 | 2005 | 3680 | 3809.631 |
1985 | 7183 | 6966.78 | 2006 | 3676 | 3714.641 |
1986 | 7096 | 6729.379 | 2007 | 3700 | 3623.708 |
1987 | 6991 | 6503.152 | 2008 | 3696 | 3536.649 |
1988 | 6858 | 6287.515 | 2009 | 3614 | 3453.288 |
1989 | 6539 | 6081.914 | 2010 | 3515 | 3373.459 |
1990 | 6189 | 5885.826 | 2011 | 3424 | 3297.008 |
1991 | 5831 | 5698.758 | 2012 | 3324 | 3223.784 |
1992 | 5415 | 5520.246 | 2013 | 3179 | 3153.648 |
1993 | 5031 | 5349.855 | 2014 | 3051 | 3086.466 |
1994 | 4724 | 5187.171 | 2015 | 3010 | 3022.114 |
1995 | 4492 | 5031.808 | 2016 | 2988 | 2960.472 |
1996 | 4342 | 4883.399 | 2017 | 2925 | 2901.426 |
1997 | 4239 | 4741.599 | 2018 | 2817 | 2844.871 |
Year | Real Data | EMF | ECMF |
2019 | 70986.3 | 67764.45 | 66863.89 |
2020 | 66719.5 | 70480.36 | 73831.33 |
Model | α | β | σ | AIC |
GGC | 81.55085457 | NA | 0.02977168 | 481.1204 |
log-logistic | -0.03638392 | 0.64735468 | 0.03437501 | 490.76 |
Year(t) | x(t) | ˆx(t) | Year(t) | x(t) | ˆx(t) |
1990 | 21497.8 | 21497.80 | 2005 | 43579.9 | 39011.22 |
1991 | 23119.0 | 22372.13 | 2006 | 44991.0 | 40585.80 |
1992 | 24879.0 | 23281.55 | 2007 | 46341.0 | 42223.10 |
1993 | 25577.0 | 24227.46 | 2008 | 48630.2 | 43925.60 |
1994 | 27712.5 | 25211.30 | 2009 | 48765.7 | 45695.85 |
1995 | 28791.8 | 26234.57 | 2010 | 51749.5 | 47536.51 |
1996 | 28484.7 | 27298.83 | 2011 | 55923.5 | 49450.35 |
1997 | 29903.8 | 28405.71 | 2012 | 258076 | 51440.24 |
1998 | 30634.6 | 29556.88 | 2013 | 57595.5 | 53509.16 |
1999 | 32198.7 | 30754.09 | 2014 | 58691.7 | 55660.21 |
2000 | 32876.5 | 31999.17 | 2015 | 60362.5 | 57896.61 |
2001 | 36273.0 | 33294.00 | 2016 | 60289.9 | 60221.69 |
2002 | 37470.0 | 34640.54 | 2017 | 63014.9 | 62638.94 |
2003 | 37251.1 | 36040.83 | 2018 | 64286.1 | 65151.94 |
2004 | 41009.0 | 37496.99 |
Year | Real Data | EMF | ECMF |
2019 | 2703 | 2790.843 | 2763.366 |
2020 | 2571 | 2738.968 | 2653.739 |
Model | α | β | σ | AIC |
GGC | -1779.057 | NA | 0.02208178 | 500.9154 |
Weibull | -58.01113386 | 7.200000 | 0.02290836 | 505.927 |
Lomax | -59.05057463 | -0.29692752 | 0.02290831 | 505.924 |
Year(t) | x(t) | ˆx(t) | Year(t) | x(t) | ˆx(t) |
1977 | 9353 | 9353.000 | 1998 | 4121 | 4606.083 |
1978 | 9011 | 8999.828 | 1999 | 3979 | 4476.542 |
1979 | 9041 | 8664.142 | 2000 | 3814 | 4352.688 |
1980 | 8942 | 8344.971 | 2001 | 3675 | 4234.245 |
1981 | 8433 | 8041.403 | 2002 | 3600 | 4120.956 |
1982 | 7849 | 7752.580 | 2003 | 3619 | 4012.576 |
1983 | 7479 | 7477.700 | 2004 | 3681 | 3908.874 |
1984 | 7264 | 7216.006 | 2005 | 3680 | 3809.631 |
1985 | 7183 | 6966.78 | 2006 | 3676 | 3714.641 |
1986 | 7096 | 6729.379 | 2007 | 3700 | 3623.708 |
1987 | 6991 | 6503.152 | 2008 | 3696 | 3536.649 |
1988 | 6858 | 6287.515 | 2009 | 3614 | 3453.288 |
1989 | 6539 | 6081.914 | 2010 | 3515 | 3373.459 |
1990 | 6189 | 5885.826 | 2011 | 3424 | 3297.008 |
1991 | 5831 | 5698.758 | 2012 | 3324 | 3223.784 |
1992 | 5415 | 5520.246 | 2013 | 3179 | 3153.648 |
1993 | 5031 | 5349.855 | 2014 | 3051 | 3086.466 |
1994 | 4724 | 5187.171 | 2015 | 3010 | 3022.114 |
1995 | 4492 | 5031.808 | 2016 | 2988 | 2960.472 |
1996 | 4342 | 4883.399 | 2017 | 2925 | 2901.426 |
1997 | 4239 | 4741.599 | 2018 | 2817 | 2844.871 |
Year | Real Data | EMF | ECMF |
2019 | 70986.3 | 67764.45 | 66863.89 |
2020 | 66719.5 | 70480.36 | 73831.33 |
Model | α | β | σ | AIC |
GGC | 81.55085457 | NA | 0.02977168 | 481.1204 |
log-logistic | -0.03638392 | 0.64735468 | 0.03437501 | 490.76 |
Year(t) | x(t) | ˆx(t) | Year(t) | x(t) | ˆx(t) |
1990 | 21497.8 | 21497.80 | 2005 | 43579.9 | 39011.22 |
1991 | 23119.0 | 22372.13 | 2006 | 44991.0 | 40585.80 |
1992 | 24879.0 | 23281.55 | 2007 | 46341.0 | 42223.10 |
1993 | 25577.0 | 24227.46 | 2008 | 48630.2 | 43925.60 |
1994 | 27712.5 | 25211.30 | 2009 | 48765.7 | 45695.85 |
1995 | 28791.8 | 26234.57 | 2010 | 51749.5 | 47536.51 |
1996 | 28484.7 | 27298.83 | 2011 | 55923.5 | 49450.35 |
1997 | 29903.8 | 28405.71 | 2012 | 258076 | 51440.24 |
1998 | 30634.6 | 29556.88 | 2013 | 57595.5 | 53509.16 |
1999 | 32198.7 | 30754.09 | 2014 | 58691.7 | 55660.21 |
2000 | 32876.5 | 31999.17 | 2015 | 60362.5 | 57896.61 |
2001 | 36273.0 | 33294.00 | 2016 | 60289.9 | 60221.69 |
2002 | 37470.0 | 34640.54 | 2017 | 63014.9 | 62638.94 |
2003 | 37251.1 | 36040.83 | 2018 | 64286.1 | 65151.94 |
2004 | 41009.0 | 37496.99 |