1.
Introduction
In many applied fields, such as biology, medicine, insurance, engineering, economics, physics and finance, the modeling of the observed phenomenon from experimental data is crucial. The more appropriate and precise the model, the more we are able to understand the phenomenon in great detail. One area of study is the development of flexible probability distributions, which are frequently based on these models. For this aim, the most standard method is based on a parametric transformation of existing distributions. Such a transformation aims to increase the adaptability of a reference distribution for broader perspectives in terms of data analysis. Modern computer technology has contributed to the applicability of these distributions, allowing the processing of complex functions, accessibility of efficient optimization techniques and achievement of a high degree of numerical precision. The most famous and useful classes of distributions are described in detail in the surveys of [19,23,38]. Post-2020 classes include the transmuted Muth generated class by [5], Box-Cox gamma generated class by [6], Topp-Leone odd Fréchet generated class by [7], exponentiated power generalized Weibull power series generated class by [8], truncated Cauchy power generated class by [9], exponentiated truncated inverse Weibull generated class by [10], transmuted odd Fréchet generated class by [13], Marshall-Olkin exponentiated generalized generated class by [41], exponentiated M generated class by [16], type II power Topp-Leone generated class by [17], modified T-X generated class by [12], modified odd Weibull generated class by [22], type II general inverse exponential generated class by [27], odd generalized gamma generated class by [32], and truncated generalized Fréchet generated (TGF-G) class by [42].
However, modern classes of distributions based on simple, quite original and motivated transformation of the reference distribution remain rare. In this article, such a candidate is proposed; we introduce the logarithmically-exponential generated (LE-G) class defined by the following cumulative distribution function (cdf):
where G(x;ω) denotes the cdf of a reference continuous distribution with ω as parameter vector. One can notice that F(x;α,ω)=Tα[G(x;ω)], where Tα(y) denotes the logarithmically-exponential transformation: Tα(y)=ln(1−e−αy−1)/ln(1−e−α), with y∈(0,1). Three major facts behind the LE-G class are formulated below.
1. We insist on the simplicity and originality of the logarithmically-exponential definition of F(x;α,ω), with only one transformation-parameter, that has never been considered before to the best of our knowledge.
2. The LE-G class can be viewed as a natural "limit class" of the TGF-G class introduced by [42]; the cdf of TGF-G class tends to Equation (1.1) when β tends to 0.
3. When α tends to +∞, the following equivalence result holds:
which corresponds to the cdf defining the odd inverse exponential generated (OIE-G) class formerly introduced in [28] with α=1. See, also, [26] and [33], for further proceedings in this direction. A deeper relation between the LE-G and OIE-G classes will be revealed in this study. The respective successes of the TGF-G and OIE-G classes in statistics motivate further investigations into the LE-G class, which is the aim of this study. The first part is devoted to the important functions of the class, and especially, the probability density function (pdf), hazard rate function (hrf) and quantile function (qf). Then, as concrete examples, special members of the LE-G class are exposed. In particular, we intend to show the importance of the new class by focusing on a new heavy right-tailed distribution with three parameters derived to the reference Lomax (Lom) distribution. This choice is motivated by the curvature flexibility of the corresponding pdf and hrf, demonstrating rarely observed shapes for a three-parameter heavy-tailed distribution. Especially, we see decreasing and unimodal shapes for the pdf, and increasing, decreasing and unimodal shapes for the hrf. The general properties of the LE-G class are examined, describing equivalences of the cdf, pdf and hrf at the boundaries, modes analysis, some stochastic ordering structure, functional expansions of the cdf and the exponentiated pdf, diverse moment measures, information measures related to the entropy concept and a crucial reliability measure. The properties of the modified Lom distribution are also specified, and supported by numerical and graphical illustrations. Then the practice of the LE-G class is developed for the purpose of data fitting. The parameter estimation of a special model is executed with the famous maximum likelihood method. In particular, by considering the proposed modified Lom model, a short Monte Carlo simulation is performed to assess the efficiency of the obtained estimates. Next, our modeling strategy is applied to two real heavy-tailed data sets. Well-referenced statistical measures are in favor of the proposed modified Lom model, surpassing four competitors also derived from the Lom model.
Section 2 refines the presentation of the LE-G class. Its theoretical properties are developed in Section 3. Our statistical approach is described in Section 4. Applications are given in Section 5. We end the paper with concluding remarks in Section 6.
2.
The LE-G class
Here, we refine the presentation of the LE-G class with functions of interest and examples of specific distributions.
2.1. Important functions
Based on Eq (1.1), the pdf of the LE-G class is obtained upon differentiation (in the almost everywhere sense). It is defined by
where g(x;ω) denotes the pdf associated to G(x;ω). As usual, if we consider a random variable X with such a pdf, for any a and b with a<b, the distribution of X is characterized by P(a≤X≤b)=∫baf(t;α,ω)dt. The cdf of X at the point x is rediscovered by taking a=−∞ and b=x.
As a crucial reliability function, the corresponding hrf is derived as
The hrf can be increasing or decreasing, non-monotonic, or discontinuous in the broadest sense. In survival analysis, it is informative about the underlying mechanism of failure of a lifetime system. See, for instance, [1].
The qf of the LE-G class is defined as the inverse function of Eq (1.1). After elementary operations, we arrive at
where QG(u;ω) denotes the qf associated to G(x;ω), that is, QG(x;ω)=G−1(u;ω) with u∈(0,1). This function is fundamental in defining various parameters, measures and statistical models of the LE-G class. About this, all the information is available in [25].
The above three functions are fundamental for further statistical processing of the LE-G class, and their relative simplicity motivates more in this regard.
2.2. List of special distributions
In terms of heterogeneous and motley distributions, the LE-G class is rich. Based on useful referenced distributions, Table 1 lists some special distributions of the class which have various domains and numbers of parameters.
All of these special distributions have their own mathematical and practical characteristics; they can be studied independently for various purposes. The next part is devoted to the LELom distribution, defined with the Lom distribution as distribution of reference.
2.3. On the LELom distribution
In order to motivate the LELom distribution, a retrospective on the Lom distribution is necessary. First, the cdf and pdf of the Lom distribution are specified by
and G(x;ρ,θ)=0 for x≤0, and
and g(x;ρ,θ)=0 for x≤0, respectively. Thus, the Lom distribution has a decreasing pdf which tends to 0 when x tends to +∞ with a polynomial decay rate. Moreover, it is skewed to the right with a heavy tail. It is known to be one of the simplest distributions of this type in the literature. For this reason, the Lom distribution has served as a model in various contexts of applied sciences dealing with heavy-tailed data. More information on this topic can be found in [2,3,4,14,15,20].
Despite undeniable qualities, the Lom distribution has some modeling limitations, including a pdf which is "desperately only decreasing", and a hrf which lacks flexibility; it can not be increasing or have N shapes, among others. In order to overpass these limitations, several modifications or extensions of the Lom distribution have been proposed, such as the power lomax (PLom) distribution by [35], odd inverse exponential Lomax (OInLom) distribution by [26] and exponential Lomax (ExLom) distribution by [24], to cite a few. The LELom distribution, briefly presented in Table 1, constitutes a new modification of the Lom distribution. As a preliminary statement, it may be more adapted to some data sets than the existing modifications. First, the Lelom distribution is defined with the cdf given as
and F(x;α,ρ,θ)<0 for x≤0. From Equation (2.1), standard operations give the following pdf:
and f(x;α,ρ,θ)<0 for x≤0. From Equation (2.2), the hrf of the LELom distribution is obtained as
and h(x;α,ρ,θ)<0 for x≤0. In order to communicate on the flexibility of the LELom distribution, various curvatures of f(x;α,ρ,θ) and h(x;α,ρ,θ) are shown in Figure 1.
In Figure 1 (i), we see that the pdf of the LELom distribution is unimodal and asymmetric on the right with a possibly heavy tail; the mathematical heavy-tailed nature of the LELom distribution will be discussed later. Also, a short left tail is sometimes observed. This detail is more important than it seems at first glance; it can be of interest for the modeling of certain lifetime phenomena presenting low values before a numerical peak. Two concrete examples of this kind will be the subject of all the attention in the applications part of this study (see Section 5). The corresponding hrf is revealed to be very flexible; in Figure 1 (ii), we see that it possesses increasing, decreasing and unimodal (upside-down) shapes, with more or less plateness. The modeling qualities of LELom distributions are thus proven.
We end this part by presenting the qf of the LELom distribution. Based on Equation (2.3), it is given as
The analytical expression of this function allows a direct quantile analysis of the LELom distribution, including the expressions of the quantile density function, hazard quantile function, median and the two other quartiles, quantile asymmetry measures and quantile plateness measures. The importance of such a manageable quantile function in survival studies can be found in [25] and [31]. This is also a plus for the LELom distribution.
3.
Some theoretical properties
This section is devoted to some theoretical properties of the LE-G class, with applications to the LELom distribution at all stages.
3.1. Functional equivalences
The functional equivalences of the cdf, pdf and hrf of the LE-G class are now discussed, with the cases G(x;ω)→0 and G(x;ω)→1 distinguished. First, when G(x;ω)→0, a direct analysis gives
This last equivalence function also holds for h(x;α,ω). We see that α affects the decay rate of the exponential term for the pdf and hrf; it plays an important role in this regard.
For the complementary case G(x;ω)→1, by putting SG(x;ω)=1−G(x;ω) and using standard equivalence results, we get
and h(x;α,ω)∼h(x;ω), where h(x;ω)=g(x;ω)/SG(x;ω) is the hrf associated to G(x;ω). Here, the role of α is minor; it is only involved in the proportional constants only. However, these constants can be consequent for a fixed x such that G(x;ω)≈1.
As an example of application, if we focus on the LELom distribution, for x→0, we get
and
Thus, in all circumstances, f(x;α,ρ,θ) tends to 0 when x tends to 0, and with an exponential decay rate. Note that, in this case, α[1−(1+x/ρ)−θ∼αθx/ρ, meaning that all the parameters affect this convergence in the exponential sense. As already stated, the asymptotic behavior of h(x;α,ρ,θ) corresponds to the one of f(x;α,ρ,θ). On the other hand, when x→+∞, by invoking the standard equivalence results, we get
and
Thus, in all cases, f(x;α,ρ,θ) tends to 0 with a polynomial decay rate when x tends to +∞, more or less rapidly, depending on θ. Also, by this equivalence and the integral Riemann criteria, we have ∫+∞0etxf(x;α,ρ,θ)dx=+∞ for any t>0. Therefore, the LELom distribution is a true heavy-tailed distribution in the mathematical sense. Concerning the hrf, when x→+∞, we have h(x;α,ρ,θ)∼θx−1→0. Thus, only the parameter θ affects the equivalence function as a proportional constant.
3.2. Mode analysis
The mode analysis of a distribution consists in identifying the mode(s) or maximal point(s) of the corresponding pdf. For the LE-G class, a maximal point for f(x;α,ω) can be determined via the the study of df(x;α,ω)/dx, or more simply, the following derivative:
Thus, a maximal point of f(x;α,ω) satisfy the following non-linear equation: dln[f(x;α,ω)/dx=0, corresponding to
If only one maximum is obtained, the LE-G is unimodal, if more, it is multimodal, depending on the definition of the reference distribution. If the above equations are too complicated from the analytical point of view, numerical or graphical approaches are more convenient for the mode analysis.
Concerning the LELom distribution, this aspect has already been commented; it is mainly unimodal and this can be observed in Figure 1, and the mode satisfies the following non-linear equation:
Obviously, only mathematical software has the ability to evaluate it numerically. Mathematica, Python or R can be used in this regard.
3.3. Stochastic ordering
The following proposition demonstrates an important first order stochastic dominance result between the LE-G and OIE-G classes.
Proposition 1. The LE-G class first order stochastically dominates the OIE-G class. That is, we have
where F∗(x;α,ω)=e−α(G(x;ω)−1−1).
Proof. First, let us set q(x)=ln(1−x)/x for x∈(0,1). Then, its derivative with respect to x is dq(x)/dx=−[ln(1−x)+x/(1−x)]/x2, which is negative since ln(1−y)≥−y/(1−y) for y<1, the equality being acheived for y=0 only. Therefore, q(x) is a decreasing function. Thus, for any (u,v)∈(0,1)2 with u≤v, we have q(v)≤q(u), and, since ln(1−v)<0, it is equivalent to
By applying this inequality with u=e−αG(x;ω)−1 and v=e−α satisfying u≤v with (u,v)∈(0,1)2, we get
This ends the proof of Proposition 1.
Proposition 1 proves that the OIE-G class is not only a "limit case" of the LE-G class; a well-identified stochastic dominance exists. Roughly speaking, the LE-G models can achieve certain objectives that the OIE-G models can not, and vice versa. Also, it motivates the LE-G models as alternatives of the OIE-G models.
In addition, the LE-G class satisfies the following monotone likelihood ratio property.
Proposition 2. For α1≤α2, the following ratio function
is decreasing with respect to x.
Proof. First, after some simplifications, we get
Therefore
Now, let us study the positivity of the numerator term. Let us consider the following intermediary function: q(α)=αeαy−1/[eαy−1−1] with y=G(x;ω). Then, we have
By the general inequality ez≥1+z for any z∈R, we get y(eαy−1−1)−α≥yαy−1−α=0. Therefore, dq(α)/dα≥0, implying that q(α) is an increasing function with respect to α. Therefore, α1≤α2 implies that q(α1)≤q(α2), which is equivalent to
It follows that dκ(x;α1,α2,ω)/dx≤0, and thus κ(x;α1,α2,ω) is a decreasing function with respect to x. This ends the proof of Proposition 2.
The demonstrated stochastic ordering properties involve many other properties described in [37]. We leave the interested reader to this reference for more information in this direction.
3.4. Functional expansions
The following expansion investigates a linear relation between F(x;α,ω) and basic functions of the reference distribution.
Proposition 3. The following expansion holds:
where
(−ℓm) refers to the general binomial coefficient at m and −ℓ, and SG(x;ω)=1−G(x;ω) defining the survival function associated to G(x;ω).
Proof. Based on the logarithmic and exponential series expansions, and the generalized binomial theorem, we have
Proposition 3 is proved.
Proposition 3 applied to the LELom distribution gives the following simple expansion:
The following expansion investigates a linear relation between exponentiated f(x;α,ω) and functions of the reference distribution.
Proposition 4. Let δ>0. Then, the following expansion holds:
where
and
Proof. We have
The general binomial theorem is the key of the proof. By applying it three times in a row, it comes
Proposition 4 is proved.
Notice that
(i) A direct application of Proposition 4 with δ=1 gives the following expansion for the pdf of the LE-G class:
where v[1]m(x;ω)=g(x;ω)SG(x;ω)m corresponds to the pdf of the minimum of m+1 independent random variables with the reference distribution as common distribution when it is multiplied by the constant m+1. Thus, it is well identified for most of the standard distributions of the literature.
In particular, in the context of the LELom distribution, we have
where
understanding that v[1]m(x;ρ,θ)=0 for x≤0. One can notice that v[1]m(x;ρ,θ)=g(x;ρ,θ(m+1))/(m+1), meaning that f(x;α,ρ,θ) is expressed as an infinite linear combination of pdfs of the Lom distribution.
(ii) Proposition 3 applied to the LELom distribution gives the following simple expansion for f(x;α,ρ,θ)δ:
where
understanding that v[δ]m(x;ρ,θ)=0 for x≤0.
These expansion results are useful to obtain manageable analytical expressions of various measures of importance, such as different types of moment, entropy and reliability measures. Some of them will be the subject of further development in the next subsections.
3.5. Moment measures
Now, let us introduce a random variable X whose distribution belongs to the LE-G class and E be the standard expectation. Then, if the rth order moment about the origin of X converges in the mathematical sense, it is given by
The chances of finding a simple expression for this integral are slim. For the evaluation of μ′r, two standard solutions can be studied: (i) a numerical approach by applying approximation integral procedure and (ii) an expansion of μ′r involving manageable coefficients. This last approach requires the result in Proposition 4 applied with δ=1; subject to the mathematical validity on the interchanges of the sums and integral signs, we have
where ξ[r]m=∫+∞−∞xrv[1]m(x;ω)dx. Since v[1]m(x;ω) is a well identified function for most of the standard distributions, ξ[r]m is often easily calculable via standard integration techniques or can be extracted in published works. That is, μ′r can be obtained by summing well-identified coefficients. A correct approximation of μ′r follows by replacing the infinite bound by any large integer; in most practical situations, the integer 40 gives satisfaction.
Let us now study the moments about the origin of a random variable following the LELom distribution, starting by establishing their convergence. First, at the neighborhood of x=0, we recall that the following function equivalence holds: f(x;α,ρ,θ)∼{−αρ/(θln(1−e−α)]}x−2e−α[1−(1+x/ρ)−θ]−1 implying that xrf(x;α,ρ,θ)=o(x−1/2) for any r≥0 and, by the Riemann integrability criterion, the integral of xrf(x;α,ρ,θ) over (0,ϵ) with ϵ>0 converges. At the neighborhood of x=+∞, we recall that f(x;α,ρ,θ)∼{−αθρθ/[ρ(eα−1)ln(1−e−α)]}x−θ−1 and, by the Riemann integrability criterion, the integral of xr−θ−1 over (ϵ,+∞) with ϵ>0 converges if and only if r<θ. Hence, the condition r<θ is necessary to guarantee the convergence of μ′r; the LELom distribution does not admit moments about the origin of all orders. Thus, under the condition r<θ, numerical approximations of μ′r have sense. Also, for an evaluation of μ′r, the complementary expansion approach described in Eq (3.3) gives
where, by introducing the beta function defined by B(a,b)=∫10ta−1(1−t)b−1dt with a>0 and b>0,
It comes
As sketched previously, a precise approximation of μ′r follows by replacing the infinite bound with 40. As an example of numerical work, some values for the mean and variance of X are collected in Table 2. In Table 2, we see some monotonic behavior of the mean and variance of X, being more or less small for the considered values.
In order to complete the study above, one can mention that the moment generating function of X is specified by
with t∈R such that the integral converges. Subject to the logical validity of the interchanges of the sums and integral signs, the consequence of Proposition 4 with δ=1 is
where υm(t)=∫+∞−∞etxv[1]m(x;ω)dx. This last term is already available in the litterature for a wide panel of parental distributions. If we deal with the parental Lom distribution as in the LELom distribution, the moment generating function exists at least for t≤0. It is however quite complex, but two analytical results involving special functions are provided in [29,Section 5].
Alternatively, if X admits moments of all orders, which is not the case for the LELom distribution, then we can use the following formula:
The characteristic function can be dettermined through the following relationship: φ(t)=M(it), where i2=−1.
In addition, the incomplete version of the rth moment about the origin of X can be treated in a similar way. From it, one can define various measures of deviations, and diverse types of residual and reversed residual lifetime functions. Further developments on this topic can be found in [23].
3.6. Information measures
In order to measure the variation or uncertainty of distributions belonging to the LE-G class, several information measures can be used. The most well-known are described in [11]. Here, we focus our attention on the Rényi and Tsallis entropy measures introduced by [36,39], respectively. First, if the Rényi entropy measure of the LE-G class converges in the mathematical sense, it can be defined as
with p>0 and p≠1. By arguing similarly as for moments about the origin, the integral term can be evaluated numerically by any integral approximation procedure. A more direct approach, which is "exact" in the mathematical sense, consists of giving a simple expansion of I[p]R. Owing to Proposition 4, subject to the mathematical validity on the interchanges of the sums and integral signs, we have
where ϕ[p]m=∫+∞−∞v[p]m(x;ω)dx, which is quite manageable for most of the standard distributions.
In the setting of the LELom distribution, let us discuss the convergence of I[p]R deduced to the convergence of the integral term. First, at the neighborhood of x=0, we have f(x;α,ρ,θ)p∼{−αρ/(θln(1−e−α)]}px−2pe−αp[1−(1+x/ρ)−θ]−1 implying that f(x;α,ρ,θ)p=o(x−1/2). By the Riemann integrability criterion, the integral of f(x;α,ρ,θ)p over (0,ϵ) with ϵ>0 converges. At the neighborhood of x=+∞, we have f(x;α,ρ,θ)p∼{−αθρθ/[ρ(eα−1)ln(1−e−α)]}px−p(θ+1) and, by the Riemann integrability criterion, the integral of x−p(θ+1) over (ϵ,+∞) with ϵ>0 converges if and only if p(θ+1)>1. Therefore, we must have p(θ+1)>1 to ensure the convergence of I[p]R. Under this condition, numerical approximations of I[p]R are possible. Also, owing to Equation (3.4), one can write
where
Therefore, the following formula is valid:
as well as the following practical approximation:
Similarly, the Tsallis entropy measure has an integral of the form ∫+∞−∞f(x;α,ω)qdx as main term; it can be defined as
where q>0 and q≠1. A mathematical treatment similar to the one applied to I[p]R can be performed. In particular, for the LELom distribution, I[q]T converges if and only if q(θ+1)>1. Moreover, the following result holds:
As an example of numerical work, some values for the above entropy measures are presented in Table 3.
From Table 3, we see that the amount of uncertainty of the LELOm distribution is versatile, reaching small or large values. This aspect also reveals the statistical richness and complexity of the LELom distribution.
3.7. A crucial reliability measure
The stress-strength model concerns the service life of a component exposed to a certain stress and having strength (or resistance). The stress and the strength are modeled by two random variables, say Y and X, respectively. It is assumed that if the stress exceeds the strength, the component can not live. The reliability of such a random system is expressed by the following measure: R=P(Y<X). Here, we investigate the expression of R in the context of the LE-G family. More precisely, we suppose that X and Y are independent variables, and come from LE-G family in the following scenario; X is with the cdf F(x;α1,ω) and pdf f(x;α1,ω), and Y is with the cdf F(x;α2,ω). Thus, the parameter α can differ from one distribution to the other, but the reference distribution is the same. In this setting, we have
The obtained integral remains manageable; its numerical computation is quite possible. Alternatively, by virtue of Propositions 3 and 4, subject to the mathematical validity on the interchanges of the sums and integral signs, we obtain
where β(α2)k,ℓ,m is defined by Eq (3.1) with α=α2 and $ \tau^{[1]; (\alpha_1)}_{k', \ell', m'} isdefinedbyEq(3.2)with \delta = 1 and \alpha = \alpha_1 $ and
Therefore, we have
Thus, it can be approximated via standard computation techniques. Discussions and applications on the role of R in various settings can be found in [18].
4.
Statistical approach
An important objective of the LE-G class is to produce flexible parametric models for concrete problems dealing with data. Here, we present a common strategy to construct accurate LE-G models for data fitting. Let x1,…,xn be n values supposed to be generated from a given LE-G distribution. Then, we can estimate the unknown parameters of the corresponding LE-G model by the maximum likelihood approach. Specifically, by denoting (α,ω) the vector of unknown parameters, its maximum likelihood estimate (MLE) is given as
where L(x1,…,xn;α,ω) denotes the likelihood function defined by
The components of (ˆα,ˆω) are called MLEs of those of (α,ω), respectively. Subject to mathematical validity, one can find the MLEs through the maximization of the log-likelihood function defined by
The desired MLEs are solutions of the following system of equations:
and, by denoting ω=(ω1,…,ωp), for j=1,…,p,
For the great majority of the referenced distributions, nice expressions of these solutions are impossible. Therefore, the use of a numerical procedure is necessary to obtain an acceptable approximation of (ˆα,ˆω). For the sake of clarity, the existence and uniqueness of MLEs are not proven here, but it is of course an essential aspect of the method. The established theory of the maximum likelihood approach is in effect; the constructions of confidence intervals or likelihood ratio tests for the parameters of LE-G models are entirely possible. We may refer to [21] for the generalities.
The above theory holds for the LELom model, under the following configuration: ω=(ρ,θ), for x>0,
and
Then, precise numerical approximations of the MLEs of (α,ρ,θ) can be obtained. We illustrate this claim by conducting a Monte Carlo simulation study with respect to the sample size n. The R software is used (see [34]) and we proceed as follows:
1. We draw N=5000 samples of size n=100, 200, 300 and 1000 from the LELom distribution under the following sets of parameters: Set1 =(α=0.5,ρ=0.5,θ=0.5), Set2 =(α=1.5,ρ=0.5,θ=0.8), Set3 =(α=1.2,ρ=0.5,θ=0.5) and Set4 =(α=1.2,ρ=0.5,θ=0.8).
2. We calculate the MLEs of α, ρ and θ for each of the N samples, say ˆαj, ˆρj and ˆθj, respectively, for j=1,…,N.
3. We compute the average MLEs (AMLEs) and mean-squared errors (MSEs) for c=α, ρ and θ through the following formulas:
4. The obtained numerical results are indicated in Table 4 for Sets 1 and 2, and Table 5 for Sets 3 and 4.
From Tables 4 and 5, the following comments can be formulated. As the value of n increases, the AMLEs tend to the corresponding true values of the parameters. The MSEs decrease to 0 as the value of n increases. These two observations are consistent with the theoretical first-order properties of the MLEs. Therefore, the LELom model combined with the maximum likelihood estimation method seems to be a good solution for fitting data with a heavy tail, and a possible short left tail, a claim that will be confirmed in the next section.
5.
Applications
Here, the accuracy of the fit of the LELom model is compared to that of some competing models such as the three-parameter power lomax (PLom) model by [35], two-parameter odd inverse exponential Lomax (OInLom) model by [26], three-parameter exponential Lomax (ExLom) model by [24], and standard two-parameter Lomax (Lom) model.
Based on a given data set, the following fit measures are taken into account: Cramér-von Mises (W), Anderson-Darling (A) and Kolmogorov-Smirnov (KS) statistics with the p-value of the related KS statistical test. In addition, the following criteria based on the maximum likelihood approach are considered: Akaike information criterion (AIC) and Bayesian information criterion (BIC), both defined from the estimated log-likelihood function denoted as ˆℓ. For such a data set, the model with the smallest values for W, A, KS, AIC, and BIC, and the largest p-value is considered the best model.
The results are obtained using the R software.
The first data set was originally reported by [40]. It represents the Maximum Annual Flood Discharges (MAFD) of the North Saskachevan River over a period of 47 years. It contains 47 observations, one per year, and the unit is 1000 cubic feet per second. The data set is given as {19.885, 20.940, 21.820, 23.700, 24.888, 25.460, 25.760, 26.720, 27.500, 28.100, 28.600, 30.200, 30.380, 31.500, 32.600, 32.680, 34.400, 35.347, 35.700, 38.100, 39.020, 39.200, 40.000, 40.400, 40.400, 42.250, 44.020, 44.730, 44.900, 46.300, 50.330, 51.442, 57.220, 58.700, 58.800, 61.200, 61.740, 65.440, 65.597, 66.000, 74.100, 75.800, 84.100,106.600,109.700,121.970,121.970,185.560}.
The second data set includes the actual monthly tax revenue in Egypt from January 2006 to November 2010. It was originally provided by [30]. The data set is presented as {5.9, 20.4, 14.9, 16.2, 17.2, 7.8, 6.1, 9.2, 10.2, 9.6, 13.3, 8.5, 21.6, 18.5, 5.1, 6.7, 17.0, 8.6, 9.7, 39.2, 35.7, 15.7, 9.7, 10.0, 4.1, 36.0, 8.5, 8.0, 9.2, 26.2, 21.9, 16.7, 21.3, 35.4, 14.3, 8.5, 10.6, 19.1, 20.5, 7.1, 7.7, 18.1, 16.5, 11.9, 7.0, 8.6, 12.5, 10.3, 11.2, 6.1, 8.4, 11.0, 11.6, 11.9, 5.2, 6.8, 8.9, 7.1, 10.8}.
Let us begin with the analysis of the first data set. The MLEs related to the considered models are indicated in Table 6, along with their related standard errors (SEs).
In order to measure the accuracy of the fits of the models, Table 7 gives the values of −ˆℓ, AIC, BIC, W, A and KS (statistic and p-value) for the considered models.
Based on the results in Table 7, we see that the LELom model is the best with AIC =439.1414, BIC =444.7550, W =0.0280, A =0.1990, KS =0.0793 and the p-value of the KS test given as p-value =0.9234. We can notice the proximity of this last value to the optimal unit. The second best model is the OInLom model. We illustrate that in Figure 2 by plotting the estimated pdfs and cdfs of the models over the histogram and empirical cdf, respectively.
Thanks to its small left tail and peakness properties, the estimated functions of the LELom model have well captured the characteristics of the data, contrary to the other models.
Figure 3 completes the previous study by providing the probability-probability (P-P) and quantile-quantile (Q-Q) plots of the LELom model.
This figure shows that the black lines in the P-P and Q-Q plots well adjust the corresponding scatter plots, demonstrating the good fit of the LELom model to the data.
We now use our methodology for the second data set. The MLEs related to the considered models are indicated in Table 8, along with their related SEs.
Table 9 provides the values of AIC, BIC, W, A and KS (statistic and p-value) for the considered models.
From Table 9, it is noted that the best model is the LELom model with AIC =390.7361, BIC =396.9687, W =0.0433, A =0.2617, KS =0.0823 and p-value =0.8189. The second best model is the OInLom model. However, it remains far from the performance of the LELom model on the basis of the benchmarks considered.
The main estimated functions of the models over the corresponding empirical objects are presented in Figure 4.
We see that the small left tail and peak properties of the LELom model made the difference with the fits of the other models.
Figure 5 concludes the previous study by showing the P-P and Q-Q plots of the LELom model.
The black lines in the P-P and Q-Q plots adjust the scatter plots nicely, demonstrating the good fit of the LELom model to the data.
In view of the above results, we can recommend the LELom model for the analysis of data similar to those considered.
6.
Conclusions
In this paper, we have motivated a modeling strategy based on a class of distributions defined from an original and simple logarithmically-exponential transformation. A new flexible modification of the Lomax model with three parameters is derived, allowing the fit of data having a heavy right tail and an eventual short left tail. The proposed class is the subject of in-depth work on its probabilistic and statistical properties, exploring the functional equivalences, modes analysis, stochastic ordering, functional expansions, moment measures, information measures and reliability measures. The new Lomax model is then applied to two real-life data sets, one related to the environment and the other to finance. It is shown to provide a near perfect fit, outperforming some existing models also based on the reference Lomax model, and with the same number of parameters. It is understood that only a special distribution of the new class has been applied; others distributions with different features can be used quite efficiently for applications in medicine, economics, insurance, reliability, and life testing, among others.
Conflict of interest
The authors declare that the publication of this paper does not involve any conflicts of interest.
Acknowledgments
The authors are grateful to the reviewers for their constructive comments.
This work was supported by King Saud University (KSU). The first author, therefore, gratefully acknowledges the KSU for technical and financial support.
This project is supported by Researchers Supporting Project number (RSP-2020/156) King Saud University, Riyadh, Saudi Arabia.