
An extended interval is a range A=[A_,¯A] where A_ may be bigger than ¯A. This is not really natural, but is what has been used as the definition of an extended interval so far. In the present work we introduce a new, natural, and very intuitive way to see an extended interval. From now on, an extended interval is a subset of the Cartesian product R×Z2, where Z2={0,1} is the set of directions; the direction 0 is for increasing intervals, and the direction 1 for decreasing ones. For instance, [3,6]×{1} is the decreasing version of [6,3]. Thereafter, we introduce on the set of extended intervals a family of metrics dγ, depending on a function γ(t), and show that there exists a unique metric dγ for which γ(t)dt is what we have called an "adapted measure". This unique metric has very good properties, is simple to compute, and has been implemented in the software R. Furthermore, we use this metric to {define variability for random extended intervals. We further study extended interval-valued ARMA} time series and prove the Wold decomposition theorem for stationary extended interval-valued times series.
Citation: Babel Raïssa GUEMDJO KAMDEM, Jules SADEFO KAMDEM, Carlos OGOUYANDJOU. An abelian way approach to study random extended intervals and their ARMA processes[J]. Data Science in Finance and Economics, 2024, 4(1): 132-159. doi: 10.3934/DSFE.2024005
[1] | Wojciech Kuryłek . Can we profit from BigTechs' time series models in predicting earnings per share? Evidence from Poland. Data Science in Finance and Economics, 2024, 4(2): 218-235. doi: 10.3934/DSFE.2024008 |
[2] | Matthew Ki, Junfeng Shang . Prediction of minimum wages for countries with random forests and neural networks. Data Science in Finance and Economics, 2024, 4(2): 309-332. doi: 10.3934/DSFE.2024013 |
[3] | Lindani Dube, Tanja Verster . Interpretability of the random forest model under class imbalance. Data Science in Finance and Economics, 2024, 4(3): 446-468. doi: 10.3934/DSFE.2024019 |
[4] | Sami Mestiri . Credit scoring using machine learning and deep Learning-Based models. Data Science in Finance and Economics, 2024, 4(2): 236-248. doi: 10.3934/DSFE.2024009 |
[5] | Tahir Afzal, Muhammad Asim Afridi, Muhammad Naveed Jan . Integrating LSTM with Fama-French six factor model for predicting portfolio returns: Evidence from Shenzhen stock market China. Data Science in Finance and Economics, 2025, 5(2): 177-204. doi: 10.3934/DSFE.2025009 |
[6] | Esau Moyoweshumba, Modisane Seitshiro . Leveraging Markowitz, random forest, and XGBoost for optimal diversification of South African stock portfolios. Data Science in Finance and Economics, 2025, 5(2): 205-233. doi: 10.3934/DSFE.2025010 |
[7] | Nitesha Dwarika . Asset pricing models in South Africa: A comparative of regression analysis and the Bayesian approach. Data Science in Finance and Economics, 2023, 3(1): 55-75. doi: 10.3934/DSFE.2023004 |
[8] | Alexandra Piryatinska, Boris Darkhovsky . Retrospective technology of segmentation and classification for GARCH models based on the concept of the ϵ-complexity of continuous functions. Data Science in Finance and Economics, 2022, 2(3): 237-253. doi: 10.3934/DSFE.2022012 |
[9] | Kasra Pourkermani . VaR calculation by binary response models. Data Science in Finance and Economics, 2024, 4(3): 350-361. doi: 10.3934/DSFE.2024015 |
[10] | Wojciech Kurylek . Are Natural Language Processing methods applicable to EPS forecasting in Poland?. Data Science in Finance and Economics, 2025, 5(1): 35-52. doi: 10.3934/DSFE.2025003 |
An extended interval is a range A=[A_,¯A] where A_ may be bigger than ¯A. This is not really natural, but is what has been used as the definition of an extended interval so far. In the present work we introduce a new, natural, and very intuitive way to see an extended interval. From now on, an extended interval is a subset of the Cartesian product R×Z2, where Z2={0,1} is the set of directions; the direction 0 is for increasing intervals, and the direction 1 for decreasing ones. For instance, [3,6]×{1} is the decreasing version of [6,3]. Thereafter, we introduce on the set of extended intervals a family of metrics dγ, depending on a function γ(t), and show that there exists a unique metric dγ for which γ(t)dt is what we have called an "adapted measure". This unique metric has very good properties, is simple to compute, and has been implemented in the software R. Furthermore, we use this metric to {define variability for random extended intervals. We further study extended interval-valued ARMA} time series and prove the Wold decomposition theorem for stationary extended interval-valued times series.
Intervals analysis (see Bauch and Neumaier (1992); Moore (1966); Jaulin et al., (2001); Alefeld and Herzberger (2012)), initially developed in the 1960s to take into account in a rigorous way, different types of uncertainties (rounding errors due to finite precision calculations, measurement uncertainties, linearization errors), makes it possible to build supersets of the domain of variation of a real function. Coupled with the usual theorems of existence, for example, the Brouwer or Miranda theorems, the interval theory also makes it possible to rigorously prove the existence of solutions for a system of equations (see Goldsztejn et al., (2005)). With interval analysis, it was now possible to model interval data.
In recent years, more precisely since the end of the 1980s, interval modeling has caught the attention of a growing number of researchers. The advantage of an interval-valued time series over a point-valued time series lies in that it contains both the trend (or level) information and volatility information (e.g., the range between the boundaries), while some informational loss is encountered when one uses a conventional point-valued data set, e.g., the closing prices of a stock collected at a specific time point within each time period, since it fails to record the valuable intraday information. Higher-frequency point-valued observations could result in hardly discriminating information from noises. A solution is to analyze the information in an interval format by collecting the maximum and minimum prices in a day, which avoids undesirable noises in the intraday data and contains more information than point-valued observations Sun et al. (2018). For instance, in their work Lu et al. (2022) proposed a modified threshold autoregressive interval-valued models with interval-valued factors to analyze and forecast interval-valued crude oil prices, and proved that oil price range information is more valuable than oil price level information in forecasting crude oil prices.
Huge progress in the field of interval-valued time series has been done by Billard and Diday (2000, 2003), who first proposed a linear regression model for the center points of 37 interval-valued data. They have been followed by other authors (Maia et al., (2008); Hsu and Wu (2008); Wang and Li (2011); González-Rivera and Lin (2013); Wang et al., (2016)). To study interval data, all those references apply point-valued techniques on the center, the left bound, or the right bound. By so doing, they may not efficiently make use of the information contained in interval data. In 2016, Han et al. (2016) developed a minimum-distance estimator to match the interval model predictor with the observed interval data as much as possible. They proposed a parsimonious autoregressive model for a vector of interval-valued time series processes with exogenous explanatory interval variables in which an interval observation is considered as a set of ordered numbers. It is shown that their model can efficiently utilize the information contained in interval data, and thus provides more efficient inferences than point-based data and models Han et al. (2015). As recent development in the field, one can refer to the work of Dai et al., (2023) where a new set-valued GARCH model was constructed. We also advise readers to look at the work of Wu et al., (2023).
Despite all these advances, the classical theory of interval modeling has some inconveniences. We can enumerate two which are addressed in another work and in the present paper, respectively.
First, the set of random intervals (or more generally random sets) is not a vector space. Indeed, the set of intervals is not an abelian group for the classical addition of intervals. So, all the useful theorems obtained through orthogonal projection such as the Wold decomposition theorem cannot be extended to interval-valued processes. Second, in time series, interval-valued data does not take into account some specifications or details of the study period, for instance in financial markets where a movement in stock prices during a given trading period is an observation of bounded intervals by maximum and minimum daily prices (see Han et al. (2016)). One can use two concepts to address each of these inconveniences. One can consider the set of random intervals as a "pseudovector space" where vectors do not necessarily have opposites. This concept of a pseudovector space was developed in Kamdem et al., (2020) to address the first inconvenience stated above. The second inconvenience can be addressed by working with "extended intervals" instead of classical intervals, as in the present paper.
Indeed, it may be more relevant to consider extended intervals formed by the opening and closing prices, regarding stock prices. Also, for the daily temperature in meteorology, instead of taking the max and min, it would be better in some cases to take the morning and evening temperature, as well as for the systolic and diastolic blood pressures in medicine. For this last example of blood pressure, when plotting the blood pressure of somebody as extended intervals of morning and evening records, one can easily see days where the morning blood pressure was higher than the evening one, which can indicate illness or emotional issues.
Therefore, given the constraints imposed by classical interval theory and its application on time series, our approach is based on the concept of extended or generalized intervals for which the left bound is not necessarily less than the right one. This generalization makes our modeling approach relevant for time series analysis. This generalization guarantees the completeness of interval space and consistency between interval operations. Extended intervals are also used for time series analysis in Han et al. (2012), but their approach does not highlight the advantages of generalized interval-valued variables.
Our contribution is therefore both theoretical and empirical. In other words, we have conceptualized and redefined some of the specific characteristics of the set of extended intervals. More precisely, we define on the set of extended intervals, a topology which generalizes the natural topology on the set of classical interval, unlike the topology introduced by Ortolf (1969) on generalized intervals, which restricted on classical interval is different from the natural topology.
The rest of the work is organized as follows: The main purpose of Section 2 is to fix notations, and give a novel and consistent definition of extended intervals. In Section 3 we introduce a suitable class of distances on the set of random extended intervals, which solves a disadvantage of the Hausdorff. We use this new distance to define the variance and covariance of random extended intervals and we show that they share some useful properties with point-valued random variables, (see propositions 3 and 4). Section 4 is concerned with stationary extended interval-valued time series, and ARMA model are investigated. In Section 5, we prove the Wold decomposition version of extended interval-valued time series. Section 6 is about numerical studies. In this section we present an algorithm to convert efficiently point-valued data to extended interval-valued data. We make a simulation of an I-AR(1) process and illustrate the interpretation of a plot of extended intervals on a few data on blood pressure. We also do empirical analysis and forecasting of the French CAC 40 market index from June 1st to July 26, 2019. The paper ends with a short conclusion.
In this section, we first recall some basic concepts related to standard intervals. Next, we define what is meant by "extended interval", and we introduce the set R← of real numbers traveled in the reverse direction as a Cartesian product. At the end of this section, we present a novel representation of extended intervals.
Let Kkc(R) be the set of nonempty compact (and convex) intervals. For A=[a1,a2],B=[b1,b2]∈Kkc(R), and λ∈R, we recall the operations
A+B=[a1+b1,a2+b2] | (2.1) |
λA={[λa1,λa2] if λ≥0[λa2,λa1] if λ≤0. | (2.2) |
It is noteworthy that Kkc(R) is closed under those operations, but it is not a vector space, since A+(−1)A is not necessarily {0}, unless A={0}. The Hausdorff distance dH is defined for closed intervals [a1,a2] and [b1,b2] by
dH([a1,a2],[b1,b2])=max(|b1−a1|,|b2−a2|). |
It is well-known that (Kkc(R),dH) is a complete metric space (see Yang and Li (2005) for details). For A∈Kkc(R), the support function of A is the function s(⋅,A):R→R defined by
s(x,A)=sup{ax;a∈A}. | (2.3) |
Equivalently, if we set A=[a1,a2],
s(x,A)=max(xa1,xa2). |
Keep in mind that s(x,A) returns x times the left bound of A when x is negative, and x times the right bound of A when x is positive. This observation will be used to extend the support function on "extended closed intervals".
Definition 1. An extended interval is a range A of real numbers between A_ and ¯A, with A_,¯A∈R∪{±∞}, traveled through from A_ to ¯A.
The difference with standard intervals is that, for extended intervals, we do not impose that A_≤¯A, but the running direction is important. We say that A is an increasing extended interval or a proper interval when A_<¯A, a decreasing extended interval or an improper interval when A_>¯A, and a degenerate interval when A_=¯A. When A_ and ¯A are in A, we say that A is an extended closed interval and denote it by A=⌊A_,¯A⌋. We also have extended open intervals ⌋A_,¯A⌊, R=⌋−∞,∞⌊ and R←:=⌋∞,−∞⌊.
Every non-degenerate extended interval A represents the classical interval from min(A_,¯A) to max(A_,¯A) in the increasing direction (for an increasing extended interval) or in the decreasing direction (for a decreasing extended interval). We call A_ the left bound and ¯A the right bound of the extended interval A.
A bounded extended interval can be seen as a subset of the product set
R⇄:=R×Z2=R×{0,1}=:R×{+,−}. |
An element of R⇄ is then a pair (x,α) where x∈R, and the direction α∈{0,1}. In this structure, we have two kinds of degenerate extended intervals, namely {a}+:={a}×{0} and {a}−:={a}×{1}. A decreasing extended interval (when A_>¯A) is written as ⌋A_,¯A⌊:=[¯A,A_]×{1}, and an increasing interval (when A_<¯A) as ⌋A_,¯A⌊:=[A_,¯A]×{0}.
Thus, R⇄ is the set of real numbers R endowed with two directions represented by the elements of the Abelian group Z2. The direction 0 (or +) means you move on the real line from the left to the right, and the direction 1 (or −) means you move from the right to the left. Further, the product [2,4]×{0,1} is the subset of R⇄ in which one can move either from 2 to 4 or from 4 to 2. Equivalently, [2,4]×{0,1}=([2,4]×{0})∪([2,4]×{1}).
We denote [a,b]×{0} by [a,b]+, or just [a,b], and [a,b]×{1} by [a,b]−. Also, we denote (x,0) by x+ or just x, and (x,1) by x−. For instance, 3∈[2,4] and 3∉[2,4]−, while 3−∉[2,4] and 3−∈[2,4]−.
Practically, talking about the French CAC40 index, if we say that we got 4922− today, this will mean that we got a value of 4922 and the index was decreasing when we got this value. This is an example of how this new structure of extended intervals can be very useful in the context of the trading market, and more.
The best choice of topology on the second member {0,1} of R⇄ is the discrete topology: every subset is open. So, if we also endow R with its natural topology, the only compact and convex subset for the product topology in R⇄, are the closed extended intervals ⌊A_,¯A⌋.
We need now to clarify how to compute the intersection of extended intervals with our notations. First observe that A⊆B means that B_≤A_≤¯A≤¯B or B_≥A_≥¯A≥¯B. For instance, ⌊1,2⌋⊈⌊3,1⌋. In fact, the elements of ⌊1,2⌋ are 1+,1.2+,1.5+, and so on, and do not belong to ⌊3,1⌋=[1,3]−. The only obstruction for the inclusion to hold in this example is the difference in the running direction between both intervals.
Proposition 2.1. Let A and B be two compact extended intervals. If A and B are running in opposite directions, then A∩B=∅. Otherwise, the intersection A∩B is the biggest extended interval C such that C⊆A and C⊆B. This is naturally extended to general subsets A and B.
Example 2.1. ⌊0,1⌋∩⌊1,2⌋={1}, ⌊1,0⌋∩⌊2,1⌋={1}←, ⌊0,1⌋∩⌊2,1⌋=∅, ⌊2,1⌋∩⌊3,1⌋=⌊2,1⌋, ⌊3,1⌋∩⌊4,2⌋=⌊3,2⌋, R∩R←=∅.
Now that union and intersection are well defined for subsets of R⇄, one can define topologies on the latter.
Definition 2. The natural topology of R⇄ is the topology generated by the set of extended open intervals.
The topology induced on R by R⇄ coincides with the natural topology of R. We denote by K(R) the set of all compact extended intervals, except decreasing degenerate extended intervals. That means that all degenerate intervals in K(R) are increasing. We extend the Hausdorff distance on K(R) as
dH(A,B)=max(|A_−B_|,|¯A−¯B|). | (2.4) |
Example 2.2. In K(R), the extended closed intervals ⌊A_,¯A⌋ and ⌊¯A,A_⌋ are different, unless A_=¯A, and dH(⌊A_,¯A⌋,⌊¯A,A_⌋)=|¯A−A_|. This distance can be viewed as the effort needed to turn ⌊A_,¯A⌋ into ⌊¯A,A_⌋.
It is simple to see that each extended interval A∈K(R) is uniquely defined by the restriction of its support function on {−1,1}. Moreover, the map (K(R),dH)→(R{−1,1},dmax) is an isometry. (To be precise, dmax here is the maximum distance given by dmax(f,g)=max(|g(−1)−f(−1)|,|g(1)−f(1)|).) Thus, the following result is a consequence of the completeness of (R{−1,1},dmax).
Theorem 2.1. (K(R),dH) is a complete metric space.
We endow K(R) with the topology induced by the Hausdorff distance dH. We extend multiplication (2.2) on extended intervals in such a way that multiplication of an increasing extended interval by a negative number gives a decreasing extended interval and vice versa. This ensure the consistency of the extensions on K(R) of the internal composition laws (2.1)–(2.2):
λA=⌊λA_,λ¯A⌋,A+B=⌊A_+B_,¯A+¯B⌋,A−B=⌊A_−B_,¯A−¯B⌋,∀λ∈R. | (2.5) |
The operator − can be seen as an extension of the difference of Hukuhara defines for standard intervals by A−B=[min(A_−B_,¯A−¯B),max(A_−B_,¯A−¯B)]. It is simple to see that (K(R),+,⋅) is a vector space and 0:=[0,0] is the zero vector.
For extended closed intervals A and B the support function reads
sA(u)={sup{ux;x∈A} if A_≤¯Ainf{ux;x∈A} if ¯A<A_. | (2.6) |
For instance, sA(−1)=−A_ and sA(1)=¯A. Hence, the support function from the vector space of extended closed intervals to the vector space R{−1,1} of maps from {−1,1} to R, is linear. That is, for all compact extended intervals A,B,
sA+B=sA+sBsλA=λsA,∀λ∈RsA−B=sA−sB. |
For any extended interval A, we call the vector of sA the column vector SA=(−sA(−1),sA(1))′.
Let (Ω,A,P) be a probability space. For any A∈K(R), we set
hits(A)={B∈K(R);A∩B≠∅} |
as set of compact extended intervals that hit A. We endow the set K(R) of compact extended intervals with the σ−algebra B(K(R)) generated by {hits(A);A∈K(R)}. For simplicity, we denote X−1(hits(A)):={ω∈Ω;X(ω)∩A≠∅} by X−1(A) and call it the inverse image of A by X. This inverse image X−1(A) is the collection of ω∈Ω such that X(ω) hits A. The following three definitions are equivalent to the ones given in Han et al. (2012).
Definition 3. A random extended interval on a probability space (Ω,A,P) is a map X:Ω→K(R) such that, for any A∈K(R), X−1(A)∈A.
So, a random extended interval is a measurable map X:Ω→K(R) from the underlying probability space to K(R), endowed with the σ−algebra B(K(R)). We denote by U[Ω,K(R)] the set of random extended intervals. U[Ω,K(R)] inherits from the vector space structure of K(R). The distribution of X∈U[Ω,K(R)] is the map PX:B(K(R))→[0,1] defined on O∈B(K(R)) by
PX(O):=P(X∈O). |
Definition 4. A map f:Ω→R is called a selection map for a random extended interval X when f(ω)∈X(ω) for almost every ω∈Ω.
Selection maps for X=⌊X_,¯X⌋ are then maps leaving between X_ and ¯X. For instance, X_ and ¯X are selection maps for X. The expectation of X is the set of expectations of measurable selection maps for X. More precisely:
Definition 5. The expectation of a random extended interval X on a probability space (Ω,A,P) is the extended interval
E[X]=⌊E[X_],E[¯X]⌋. | (3.7) |
Proposition 3.2. For any X,Y∈U[Ω,K(R)] and λ∈R, E[X+λY]=E[X]+λE[Y].
We denote by SX={f∈L1(Ω) such that f is a selection map for X} the set of integrable selection maps for X and SX(A0)={f∈L1(Ω,A0) such that f is a selection map for X} the set of (Ω,A0)−integrable selection maps for X, where A0 a sub−σ−field of A. The expectation of X is the classical interval {E[f];f∈SX} together with the running direction coming from X.
To quantify the variability of X, that is the dispersion of X around its expectation, we need a suitable distance measure on random extended intervals. The first distance that could come to mind is the Hausdorff distance. But, a disadvantage of the Hausdorff distance is for instance that dH([0,2],[5,6])=5=dH([0,2],[5,7]), while intuitively the distance between [0,2] and [5,6] should be less than the distance between [0,2] and [5,7].
In Bertoluzza et al. (1995), the authors defined the squared distance d2γ(A,B) between two standard intervals as follow: For any interval A=[A_,¯A], we consider the one-to-one map ∇A:[0,1]→A, t↦tA_+(1−t)¯A. Then, the squared distance d2γ(A,B) is given by
d2γ(A,B)=∫10(∇A(t)−∇B(t))2γ(t)dt=∫10(t(A_−B_)+(1−t)(¯A−¯B))2γ(t)dt, | (3.8) |
where γ(t)dt is a Borel measure on [0,1] such that:
γ(t)≥0 for every t∈[0,1]; | (3.9a) |
∫10γ(t)dt=1; | (3.9b) |
γ(t)=γ(1−t); | (3.9c) |
γ(0)>0 | (3.9d) |
We extend dγ on extended intervals with the same formula (3.8) and assumptions (3.9a)-(3.9d). If d2γ(A,B)=0, then ∇A(t)=∇B(t) for almost every t∈[0,1], which implies that A_=B_ and ¯A=¯B; thus A=B. For triangle inequality, we first write
(∇A(t)−∇C(t))2=(∇A(t)−∇B(t))2+(∇B(t)−∇C(t))2+2(∇A(t)−∇B(t))(∇B(t)−∇C(t)). |
Hence,
d2γ(A,C)=d2γ(A,B)+d2γ(B,C)+2∫10(∇A(t)−∇B(t))(∇B(t)−∇C(t))γ(t)dt. | (3.10) |
From here, using Hölder's inequality, one gets the triangle inequality. Thus, dγ is a distance on the set K(R) of extended intervals. The two extended intervals A=⌊A_,¯A⌋ and ˜A=⌊¯A,A_⌋ represent the same standard interval but are different in K(R), and dγ(A,˜A)=|A_−¯A|cst (with cst=(∫10(2t−1)2γ(t)dt)1/2≠0) vanishes if and only if A_=¯A. This distance can be seen as the effort needed to turn ˜A into A.
Conditions (3.9a)–(3.9b) are required if we want the distance dγ on degenerate intervals [a,a] and [b,b] to give the usual distance |b−a|. On other hand, the distance dγ is suitable for intervals since it does not share some disadvantages of the Hausdorff distance, see Bertoluzza et al. (1995) for more details.
The norm of an interval A is the distance between A and 0: ‖A‖=dγ(A,0). Condition (3.9c) means that there is no preferable position between left and right bounds. More precisely, this condition implies that ‖⌊a,0⌋‖=‖⌊0,a⌋‖=|a|(∫10t2γ(t)dt)1/2. The previous observation justifies the following definition.
Definition 6. We say that γ(t)dt is an adapted measure if, in addition to conditions (3.9a)–(3.9d) one has
∫10t2γ(t)dt=1 | (3.9f) |
Example 3.3. One can check that, with
γ(t)=t(1−t)(480−102403π√t(1−t))+1, |
γ(t)dt is an adapted measure. We will refer to this as the standard adapted measure. It has been used in the software R Core Team (2021) to check Lemma 3.1.
Generally, for any c∈(0,∞), the formula
γc(t)=t(1−t)(a+b√t(1−t))+c, |
defines an adapted measure for a=−30c+510 and b=512(c−21)3π.
This dγ distance can be related to the DK distance measure developed by Körner and Näther (2002) as follows:
d2γ(A,B)=(sA(−1)−sB(−1))2K(−1,−1)+(sA(1)−sB(1))2K(1,1)−2(sA(−1)−sB(−1))(sA(1)−sB(1))K(−1,1)=(−sA(−1)+sB(−1)sA(1)−sB(1))′(K(−1,−1)K(−1,1)K(1,−1)K(1,1))(−sA(−1)+sB(−1)sA(1)−sB(1))d2γ(A,B)=S′A−BKγSA−B | (3.11) |
where the kernel Kγ=(K(i,j))i,j=−1,1 introduced by Han et al. (2012) is given by
{K(−1,−1)=∫10t2γ(t)dtK(1,1)=∫10(1−t)2γ(t)dtK(−1,1)=K(1,−1)=∫10t(1−t)γ(t)dt. | (3.12) |
We will often denote ⟨SA−B,SA−B⟩γ:=d2γ(A,B). As observed before by Han et al. (2012), the kernel Kγ is symmetric positive definite and defines an inner product on K(R). We use some properties of this inner product in order to perform the proofs of Lemma 2 and Theorem 3.1. The following lemma shows that there exists a unique distance dγ with γ(t)dt an adapted measure. This lemma is also useful for numerical simulations.
Lemma 3.1. All adapted measures induce the same metric given by
Kγ=(1−1/2−1/21)andd2γ(A,B)=(A_−B_)2+(¯A−¯B)2−(A_−B_)(¯A−¯B). |
Proof. If γ(t)dt is an adapted measure, then K(1,1)=K(−1,−1)=∫10t2γ(t)dt=1. Using conditions (3.9a)-(3.9d), one shows that K(−1,1)=K(1,−1)=−1/2.
Let X and Y be two random intervals. For any ω∈Ω, X(ω) and Y(ω) are two extended intervals and one can compute the distance dγ(X(ω),Y(ω)). We defined a new distance on random extended intervals by taking the square root of the mean of the squared distance d2γ(X(ω),Y(ω)) in (Ω,A,P).
Definition 7. The Dγ distance is defined for two random extended intervals X,Y by
Dγ(X,Y)=(E[d2γ(X,Y)])1/2=√∫Ω∫10(∇X(ω)(t)−∇Y(ω)(t))2γ(t)dtdP(ω), |
provided the integral converges.
We denote by L2[Ω,K(R)] the set of random extended intervals X such that E‖X‖2γ:=E(d2γ(X,0))=D2γ(X,0)<∞.
Lemma 3.2. L2[Ω,K(R)] is a vector space under laws (2.5).
Proof. It is enough to show that L2[Ω,K(R)] is a sub-vector space of U[Ω,K(R)]. Let X,Y∈L2[Ω,K(R)] and λ∈R. Then, Dγ(λX,0)=|λ|Dγ(X,0) and
D2γ(X+Y,0)=E[S′X+YKγSX+Y]=E[(SX+SY)′Kγ(SX+SY)]=D2γ(X,0)+D2γ(Y,0)+2E[S′XKγSY]≤2D2γ(X,0)+2D2γ(Y,0). |
The last inequality comes from the fact that, using Cauchy-Schwarz inequality,
2S′XKγSY=2⟨SX,SY⟩γ≤2√⟨SX,SX⟩γ√⟨SY,SY⟩γ≤⟨SX,SX⟩γ+⟨SY,SY⟩γ |
It is simple to see that for any X,Y∈L2[Ω,K(R)], 0≤Dγ(X,Y)<∞ and the triangle inequality for Dγ follows from the one of dγ. However, Dγ is not a metric on L2[Ω,K(R)] since Dγ(X,Y)=0 does not imply the strict equality X=Y, but that they are equal almost everywhere. We denote by L2[Ω,K(R)] the quotient set of L2[Ω,K(R)] under the equivalent relation "being equal almost everywhere". Then, Dγ is a metric on L2[Ω,K(R)]. We will keep denoting any class in L2[Ω,K(R)] by a representative X∈L2[Ω,K(R)].
Theorem 3.2. (K(R),dγ) and (L2[Ω,K(R)],Dγ) are complete metric spaces.
Proof. Assume that (An=⌊A_n,¯An⌋)n is a dγ− Cauchy sequence in K(R). Then, (A_n,¯An)′n is a Cauchy sequence in R2 and so converges, say, to (A_,¯A)′. In fact, that dγ(Ap,Aq)=S′Ap−AqKγSAp−Aq goes to 0 as p,q go to infinity implies that SAp−Aq=(−A_p+A_q,¯Ap−¯Aq)′ goes to 0 as p,q go to infinity. Also, (An)n converges to A=⌊A_,¯A⌋ since dγ(An,A)=S′An−AKγSAn−A. Hence, (K(R),dγ) is a complete metric space. Now, assume that (Xn=⌊X_n,¯Xn⌋)n is a Dγ−Cauchy sequence in L2[Ω,K(R)]. Then, from Fatou's lemma and Definition 7,
E[lim infp,q→∞d2γ(Xp(ω),Xq(ω))]≤lim infp,q→∞E[d2γ(Xp(ω),Xq(ω))]=0. |
Hence, E[lim infp,q→∞d2γ(Xp(ω),Xq(ω))]=0, which implies that for almost every ω∈Ω, lim infp,q→∞d2γ(Xp(ω),Xq(ω))=0. Hence there exists a subsequence (Xnk(ω)) which is a Cauchy sequence in the complete metric space (K(R),dγ). So, for almost every ω, (Xnk(ω))k dγ-converges to X(ω)=⌊X_(ω),¯X(ω)⌋; setting X(ω) to be 0 for the remaining ω, one obtains an random extended interval X. As limk→∞d2γ(Xnk,X)=0, we also have that limk→∞d2γ(Xn,Xnk)=d2γ(Xn,X) for any n. Using Fatou's lemma again,
limn→∞E[d2γ(Xn,X)]=limn→∞E[lim infk→∞d2γ(Xn,Xnk)]≤limn→∞lim infk→∞E[d2γ(Xn,Xnk)]=0, |
since limp,q→∞E[d2γ(Xp(ω),Xq(ω))]=0 implies that limn,k→∞E[d2γ(Xn,Xnk)]=0.
Remark 3.1. It is clear that the space K(R) of compact extended intervals can be identified as a 2−dimensional vector space, and the metric dγ can be written as
dγ(A,B)=‖USY−USX‖, |
where U is a matrix such that Kγ=U′U. Thus, (L2[Ω,K(R)],Dγ) is identified as the 2−dimensional random vector space on (Ω,A,P) with Dγ(X,Y)=E(‖USY−USX‖2)1/2, and the previous result follows from the completeness of the 2−dimensional random vector space.
Definition 8. We say that a sequence (Xn) of random extended intervals converges to X in probability under the metric dγ when (d2γ(Xn,X)) converges to 0 in probability, that is
∀ε>0,limn→∞P(d2γ(Xn,X)≥ε)=0. |
Theorem 3.3. A sequence (Xn) such that supnE‖Xn‖<∞, converges to X in (L2[Ω,K(R)],Dγ) if and only if (Xn) converges to X in probability under the metric dγ.
Proof. Let us assume that (Xn) converges to X, that is (D2γ(Xn,X)=E[d2γ(Xn,X)]) converges to 0. That means that (dγ(Xn,X)) converges to 0 in norm L2 in (Ω,A,P), which implies that (d2γ(Xn,X)) converges to 0 in probability. Conversely, assume that (Xn) converges to X in probability under the metric dγ. So, the inequality |dγ(Xn,0)−dγ(X,0)|≤dγ(Xn,X) implies that (‖Xn‖) converges to ‖X‖ in probability. By Fatou's Lemma,
E‖X‖≤lim infn→∞E‖Xn‖≤supnE‖Xn‖<∞. |
The inequality
d2γ(Xn,X)≤2‖Xn‖2+2‖X‖2 |
implies that (dγ(Xn,X)) is uniformly integrable. Finally, the dominated convergence theorem implies that (Dγ(Xn,X)) converges to 0.
Corollary 3.1. Let (Xn) be a sequence of random extended intervals such that supnE‖Xn‖<∞ and (λn) a family of nonnegative real numbers such that ∑λ2n<∞. Then, (Sn=∑ni=0λiXi) converges in probability under the metric dγ.
Definition 9 (Han et al. (2012)). The covariance of two random extended intervals X, Y is the real
Cov(X,Y):=E⟨SX−E[X],SY−E[Y]⟩γ=∫Ω∫10(∇X(ω)(t)−∇E[X](t))(∇Y(ω)(t)−∇E[Y](t))γ(t)dtdP(ω). | (3.13) |
The variance of X is the real
Var(X)=Cov(X,X)=E⟨SX−E[X],SX−E[X]⟩γ=D2γ(X,E[X]). | (3.14) |
The next proposition is the extended interval version of Theorem 4.1 of Yang and Li (2005).
Proposition 3.3. For all random extended intervals X,Y,Z, the following hold:
① Var(C)=0, for every constant interval C;
② Var(X+Y)=Var(X)+2Cov(X,Y)+Var(Y);
③ Cov(X,Y)=Cov(Y,X);
④ Cov(X+Y,Z)=Cov(X,Z)+Cov(Y,Z);
⑤ Cov(λX,Y)=λCov(X,Y);
⑥ Var(λX)=λ2Var(X), for every λ∈R;
⑦ P(dγ(X,E[X])≥ε)≤Var(X)/ε2 for every ε>0 (Chebyshev inequality).
Proof. For any constant extended interval C, one has E[C]=C and Var(C)=0 follows. Using the linearity of S and the form (3.12) of the metric dγ, one proves items ②-⑥. The Chebyshev inequality follows from the fact that P(dγ(X,E[X])≥ε)≤E[dγ(X,E[X])2]/ε2.
In the particular case of adapted measures, we have the following results, which are very useful in numerical simulations.
Proposition 3.4. If γ(t)dt is an adapted measure, a,b are random variables, and X is a random extended interval, then
① Var(⌊a,0⌋)=Var(⌊0,a⌋)=Var(a);
② Var(⌊a,a⌋)=Var(a);
③ Cov(⌊a,0⌋,⌊0,b⌋)=−12Cov(a,b);
④ Var(X)=Var(X_)−Cov(X_,¯X)+Var(¯X);
⑤ Cov(X,Y)=Cov(X_,Y_)+Cov(¯X,¯Y)−12Cov(X_,¯Y)−12Cov(Y_,¯X);
⑥ E‖X‖2=E[X_2]+E[¯X2]−E[X_¯X].
Item ⑤ of the above proposition is similar to the one obtained for classical intervals in Example 4.1 of Yang and Li (2005), but the two last terms −12Cov(X_,¯Y)−12Cov(Y_,¯X) are not present in the formula of Yang and Li. This difference can be explained by the fact that, for our distance dγ, there is no preference between the left and the right bound, which is not the case for the distance dp used by Yang and Li (2005). From the formula of Yang, if the left bounds of X,Y are independent and their right bounds are also independent then Cov(X,Y)=0, which is not the case for our formula ⑤ above.
Let L2[Ω,K(R)]0={X∈U[Ω,K(R)],E[X]=0, and E[‖X‖2γ]<∞}, that is, the sub-vector space of L2[Ω,K(R)] made by random extended interval with mean zero. For a random extended interval X∈L2[Ω,K(R)]0, Cov(X,X)=0 means that X=E[X]=0 almost everywhere. Hence, formula (3.14) cannot define a scalar product on L2[Ω,K(R)]0. We denote by L2[Ω,K(R)]0 the set of classes of zero mean random extended intervals equal almost everywhere. We will keep denoting any class in L2[Ω,K(R)]0 by a representative X∈L2[Ω,K(R)]0. L2[Ω,K(R)]0 inherits from the structure of the vector space of L2[Ω,K(R)]0, and for X,Y∈L2[Ω,K(R)]0, the formula (3.13) reads
Cov(X,Y)=E⟨SX,SY⟩γ=∫Ω∫10∇X(ω)∇Y(ω)γ(t)dtdP(ω) | (3.15) |
and is a scalar product on L2[Ω,K(R)]0.
Theorem 3.4. (L2[Ω,K(R)]0,Cov) is a Hilbert space.
Proof. From what is written above, Cov is a scalar product on L2[Ω,K(R)]0. For the completeness, use the fact that ⟨,⟩γ defined a scalar product on R2.
Example 3.4. Take Ω=R, A the Borel topology, and P=dx the Borel measure. Let us consider the random extended interval
X=⌊f(ω),g(ω)⌋, | (3.16) |
the left and right bounds respectively,
f(ω)=(1/√2π)exp(−0.5ω2)g(ω)=0.3exp(−0.3ω). |
We may write X⇝NE(0,1,0.3) to say that the left bound of X follows the standard normal distribution and its right bound follows the exponential distribution with parameter 0.3. The density functions of those random variables have been plotted on Figure 1.
Let (Xt)t∈Z be an extended interval time series; that is, for any integer t, Xt is an random extended interval. We denote by At the expectation of Xt and by Ct(j)=Cov(Xt,Xt−j) the auto-covariance function.
Definition 10. We say that an extended interval time series (Xt) is stationary when neither At nor Ct(j) depends on t. In this case, we just denote them A and C(j), respectively.
For any n∈Z+, the auto-covariance matrix is given by
Cn=(C(i−j))1≤i,j≤n=(C(0)C(1)⋯C(n−1)C(1)C(0)⋯C(n−2)⋮⋮⋮⋮C(n−1)C(n−2)⋯C(0)). | (4.17) |
The proof of the following theorem is similar to the one of Theorem 4 in Wang et al., (2016).
Theorem 4.5. The auto-covariance function of any stationary process satisfies:
① C(k)=C(−k) for all k∈Z;
② |C(k)|≤C(0) for all k∈Z;
③ the auto-covariance matrix Cn is positive semi-definite;
④ if C(0)>0 and (C(k)) converges to 0 then Cn is positives definite.
Let X1,…,XT be a sample of a stationary extended interval time series (Xt) with expectation A. An unbiased estimator of A is given by
mX=X1+⋯+XTT | (4.18) |
and the sample-covariance is given by
ˆC(k)=1TT−|k|∑i=1∫10(∇Xi+|k|(t)−∇mX(t))(∇Xi(t)−∇mX(t))γ(t)dt. | (4.19) |
Theorem 4.6. Let (Xt) be a stationary extended interval-valued time series with expectation A and auto-covariance function C(k) such that (C(k)) converges to 0. Then, mX is a consistent estimator of A; that is, for any ε>0, limT→∞P(dγ(mX,A)≥ε)=0.
Proof. One has
Var(mX)=D2γ(mX,A)=E⟨SmX−A,SmX−A⟩γ=1T2T∑i,j=1E⟨SXi−A,SXj−A⟩γ=1T2T∑i,j=1C(i−j)=1T2T∑i−j=−T(T−|i−j|)C(i−j)=1TT∑k=−T(1−kT)C(k). |
So, Var(mX) goes to 0 as T goes to infinity since (C(k)) converges to 0. By the Chebyshev inequality, ∀ε>0, P(dγ(m,A)≥ε)≤Var(mX)/ε2 goes to 0 as T goes to infinity.
As usual, ˆC(k) is not an unbiased estimator of C(k) (unless mX=A), but:
Theorem 4.7. If (C(k)) converges to 0 as k goes to infinity, then for any k, ˆC(k) is an asymptotically unbiased estimator of C(k), that is limT→∞E[ˆC(k)]=C(k).
Proof.
ˆC(k)=1TT−|k|∑i=1∫10(∇Xi+|k|(t)−∇mX(t))(∇Xi(t)−∇mX(t))γ(t)dt=1TT−|k|∑i=1∫10(∇Xi+|k|(t)−∇A(t))(∇Xi(t)−∇A(t))γ(t)dt+1TT−|k|∑i=1∫10(∇mX(t)−∇A(t))2γ(t)dt−1TT−|k|∑i=1∫10(∇mX(t)−∇A(t))(∇Xi+|k|(t)+∇Xi(t)−2∇A(t))γ(t)dt |
Hence,
limT→∞E[ˆC(k)]=limT→∞1TT−|k|∑i=1E[C(k)]+limT→∞1TT−|k|∑i=1Var(mX)−limT→∞1TT−|k|∑i=1(Cov(mX,Xi+|k|)+Cov(mX,Xi))=C(k)−limT→∞1T2T−|k|∑i=1T∑j=1(Cov(Xj,Xi+|k|)+Cov(Xj,Xi))=C(k)−limT→∞1T2T∑j−i=−T(T−|j−i|)(C(j−i−|k|)+C(j−i))=C(k)−limT→∞1TT∑l=−T(1−|l|T)(C(l−|k|)+C(l))=C(k) |
Let (Xt) be an extended interval-valued stationary time series with expectation A, and auto-covariance function C(k). To capture the dynamics of (Xt) one can assume that it follows an interval autoregressive moving-average (I-ARMA) process of order (p,q), that is
Xt=K+p∑i=1θiXt−i+εt+q∑i=1ϕiεt−i, | (4.20) |
Where K is a constant extended interval, ϕi and θi are the parameters of the model, (εt)⇝IID({0},σ2), and for each t, εt is uncorrelated with the past of Xt. This model was introduced and studied by Han et al. (2012). They call such a model an Autoregressive Conditional Interval Model, and they proposed a DK−distance based estimation method to estimate the parameters. Our interest in this method is to do forecasting and we propose a different estimation method, based on the Yule-Walker equation.
By taking expectation at the both sides of (4.20) one finds
λA=K, | (4.21) |
where λ=1−θ1−⋯−θp. So, as in the case of real random variables, the expectation μt of Xt does not depend on t and the new series X′t=Xt−1λK is a zero-mean I-ARMA process, i.e. Equation (4.20) with K=0. In what follows, until the numerical study section, we assume that K=0, that is, (Xt) is a zero-mean stationary process. When p=0, the process (Xt) is called an extended interval-valued moving-average time series process of order q, I-MA(q), and when q=0, one obtains an extended interval-valued autoregressive time series process of order p, I-AR(p).
Let L be the delay operator, thus LXt=Xt−1. Setting Θ(L)=1−θ1L−⋯−θpLp and Φ(L)=1+ϕ1L+⋯+ϕqLq, equation (4.20) can be written as
Θ(L)Xt=Φ(L)εt. | (4.22) |
The functions Θ and Φ are called the autoregressive and moving-average polynomials, respectively.
In particular, if (Xt) is an I-MA(1) process: Xt=εt+ϕεt−1, then
C(1)=ϕσ2. | (4.23) |
In section 5 we show that any non-deterministic zero-mean stationary random extended interval process can be expressed as a MA(∞).
If the moving-average polynomial Φ=1, then (4.22) leads to
Xt=(1−Θ(L))Xt+εt, | (4.24) |
which is an extended interval-valued autoregressive process of order p, I-AR(p). In this case, the existence and the uniqueness of a stationary solution is not guaranteed. However, when a stationary solution exits, using Proposition 3.3 it is simple to show that its auto-covariance function satisfies
C(k)−p∑i=1θiC(k−i)=0, for any 1≤k≤p. | (4.25) |
Hence, the parameters of an I-AR(p) process satisfy the Yule-Walker equation
CpΘ=cp, | (4.26) |
where cp=(C(1),…,C(p))T, Θ=(θ1,…,θp)T and Cp is the auto-covariance matrix (4.17).
Theorem 4.8. Any AR(1) process Xt=θXt−1+εt, with 0<θ<1 and suptE‖εt‖<∞, possesses a unique stationary solution given by Xt=∑∞i=0θiεt−i.
Proof. One has
Xt=θXt−1+εt=θ2Xt−2+θεt−1+εt=θn+1Xt−n−1+n∑i=0θiεt−i. |
As 0<θ<1 one has that ∑θ2i<∞. This together with suptE‖εt‖<∞ implies that (Sn=∑ni=0θiεt−i) converges in probability under the metric dγ by Corollary 3.1. Since (Xt) is stationary, Var(Xt)=E‖Xt‖2 is constant and
E‖Xt−n∑i=0θiεt−i‖2=E‖θn+1Xt−n−1‖2=θ2(n+1)E‖Xt−n−1‖2 |
goes to 0 as n goes to infinity. Hence, E‖Xt−∑∞i=0θiεt−i‖2=0. This implies that Xt=∑∞i=0θiεt−i a.e. From this solution, we have
Cov(Xt+k,Xt)=σ2∞∑i=kθkθi−k=σ2θk1−θ2. |
Now, if (Xt) is an I-ARMA(1,1) process: Xt=θXt−1+εt+ϕεt−1, then
C(2)=θC(1)andC(1)=θC(0)+ϕσ2. | (4.27) |
Let (Xt)t∈Z be a zero-mean extended interval-valued stationary process. The sets St=¯Span({Xk}tk=−∞) and S−∞=∞⋂t=−∞St are Hilbert spaces of L2[Ω,K(R)]0. For any j≥0, the projection PSt−jXt of Xt on St−j is called the prediction of Xt on St−j. We shall say that an extended interval-valued process (Xt)t∈Z is deterministic if for any t∈Z, Xt∈St−1. Xt−PSt−1Xt is called the error in the projection of Xt on St−1 and when PSt−1Xt=Xt one says that (Xt)t∈Z is (perfectly) predictable. As (L2[Ω,K(R)]0,Cov) is a Hilbert space, we have the following Wold decomposition for extended interval time series.
Theorem 5.9. Let (Xt)t∈Z be a non-deterministic extended interval-valued stationary time series process with expectation {0} and auto-covariance function (C(k)). Then, Xt can be expressed as
Xt=∞∑k=0αkεt−k+Wta.s | (5.28) |
where:
(i) αk=1σ2Cov(Xt,εt−k),α0=1 and ∞∑k=0α2k<∞;
(ii) {εt}⇝WN({0},σ2), with σ2=Var(Xt−PSt−1Xt);
(iii) Cov(Wt,εs)=0 for all t,s∈Z;
(iv) (Wt)t∈Z is zero-mean, stationary and deterministic.
Proof. For any t∈Z, the application of Theorem 4 in Bierens (2012) to the regular sequence (Xt−k)∞k=0 gives that Xt can be expressed as
Xt=∞∑k=0θket−k+Wt a.s | (5.29) |
where {et−k}∞k=0 is an uncorrelated process with Cov(ei,ej)=δij, θk=Cov(Xt,et−k), and ∞∑k=1θ2k<∞, Wt∈U⊥t with Ut=¯Span({ek}tk=−∞)⊂St. Since the process (Xt)t∈Z is non-deterministic, the residual εt=Xt−PSt−1Xt is different from 0 and εt=‖εt‖et, hence (5.28) holds with αk=θk/‖εt−k‖, and (εt) is also uncorrelated. As Wt,εt∈L2[Ω,K(R)]0, and E[Wt]=0=E[εt]. Wt∈U⊥t implies that Cov(Wt,εs)=0 for any s≤t. For s>t, taking the scalar product of (5.29) with εs, one has Cov(Wt,εs)=Cov(Xt,εs)=0 since εs∈S⊥s−1 and Xt∈St⊂Ss−1 for s>t. This proves (iii). Let Xt,n be the projection of Xt on St,n=span({Xt−j}nj=1), and εt,n the residual. Then, Xt,n takes the form
Xt,n=n∑j=1βj,nXt−j, |
where the scalars βk,n do not depend on t, since they are solutions of the system of equations
n∑j=1βj,nC(j−k)=C(k),k=1,…,n. |
Hence, E[Xt,n]=0, E[εt,n]=0. Moreover,
Var(εt,n)=‖Xt−Xt,n‖2=‖Xt−n∑j=1βj,nXt−j‖2=C(0)+n∑i,j=1βi,nβj,nC(i−j)−2n∑j=1βj,nC(j). |
Hence, Var(εt,n)=σn does not depend on t and same for σ=‖εt‖=limn→∞σn. Also,
Cov(Xt+k,εt,n)=C(k)−n∑j=1βj,nC(k+j), |
which does not depend on t. Using the Cauchy-Schwarz inequality,
limn→∞|Cov(Xt+k,εt,n−εt)|≤√C(0)limn→∞‖εt,n−εt‖=0, |
which implies that Cov(Xt+k,εt)=limn→∞Cov(Xt+k,εt,n) and does not depend on t. So,
αk=1‖εt‖Cov(Xt+k,ek)=1‖εt‖2Cov(Xt+k,εt) |
does not depend on t. Moreover, α0=Cov(Xt,εt)‖εt‖2=1. All this completes the proof of (i) and (ii). For k≥0,
Cov(Wt,Wt−k)=Cov(Xt−k−∞∑j=0αjεt−k−j,Xt−∞∑j=0αjεt−j)=C(k)−∞∑j=0αjCov(Xt,εt−k−j)−∞∑j=kαjCov(Xt−k,εt−j)+σ2∞∑j=0αj+kαj=C(k)−σ2∞∑j=0αj+kαj, |
which does not depend on t. As Wt∈St, one can write Wt=∑∞k=0akXt−k. Taking the covariance with εt and using the fact that εt⊥Span(Xt−1,Xt−2,…), one gets Cov(Wt,εt)=a0Cov(Xt,εt)=a0‖εt‖2. Since Cov(Wt,εt)=0, one deduces that a0=0, hence Wt∈St−1, and thus (Wt) is deterministic from the past of (Xt). This completes the proof of (iv).
Let (Xt) be an AR(1) process:
Xt=K+θXt−1+εt. | (6.30) |
Then, from the Yule-Walker equation, the parameter θ can be estimated by ˆθ=ˆC(1)ˆC(0) with
ˆC(0)=1TT∑i=1∫10(∇Xi−∇mX)2γ(t)dt=1TT∑i=1d2γ(Xi,mX),ˆC(1)=1TT−1∑i=1∫10(∇Xi+1−∇mX)(∇Xi−∇mX)γ(t)dt=12TT−1∑t=1(d2γ(Xi+1,mX)+d2γ(Xi,mX)−d2γ(Xi+1,Xi)), |
where ˆC(1) and ˆC(0) are the sample-covariance.
More generally, if we assume that the I-AR(p) process (4.24) is stationary, then from Theorem 4.5, when C(0)>0 and (C(k)) converges to 0, the Yule-Walker equation (4.26) is well-posed and from a large sample X1,…,XT, the coefficients of the I-AR(p) process can be estimated by
ˆΘ=ˆCpˆcp. |
Using (3.10) and (4.19), the sample-covariance can be written as
ˆC(k)=12TT−|k|∑i=1(d2γ(Xi+k,mX)+d2γ(Xi,mX)−d2γ(Xi+k,Xi)). | (6.31) |
It is natural to assume that γ(t)dt is an adapted measure and, in this case, the distance dγ is given by Lemma 3.1 and is easy to numerically compute.
Extended intervals can be very useful for displaying data. The plot of just one extended interval A gives much informations: (a) the range of values of the considered index during the recording; (b) the direction of variation of the considered index : decreasing when the arrow is pointing down, and increasing when the arrow is pointing up.
Figure 2 displays systolic (in blue) and diastolic (in red) blood pressure of a person recorded in the morning (left bounds) and in the afternoon (right bounds), over 4 days in 2004. One sees easily that on the 11/03/04, the blood pressure recorded in the morning is higher than the one recorded in the afternoon, both for systolic and diastolic.
Now, we plot the model (6.30) with θ=0.2, and K=[13.31,14.2], ¯εt and εt_ following independent standard normal distributions. Figure 3(a) shows a sample for this model for T=100, when the interval standard normal distribution used is the one plotted on Figure 3(b). One sees that most of the outputs of this sample are standard intervals (71 standard intervals versus 29 decreasing ones) while for the error (interval standard normal distribution), they seem to be the same number (41 standard intervals versus 59 decreasing). Figure 4 displays the estimated auto-covariance function C(k) and shows that it goes to 0 as k becomes large. Also, K is estimated using the formula ˆK=(1−ˆθ)mX.
T | ˆK | C(T−2) | ˆθ | Error |
100 | [13.31,14.2] | −0.02807759 | 0.1747072 | 0.02529285 |
500 | [13.51569,14.41001] | 0.01240641 | 0.1892873 | 0.01071265 |
In Figure 5, we have plotted as standard min-max intervals (in blue) and open-close extended intervals (in red), the CAC 40 Stock Index from January 2nd to May 31st, 2019 (105 trading days). Extended intervals are formed by the opening values (left bounds) and the closing values (right bounds). This figure shows that most often, neither opening nor closing values are the lowest or the highest value of the index for the day. Notice that, in such an index, what is important most often is not just the opening and closing values, but also to know how it has been fluctuating along the day. For instance, the plot shows many days where the opening value and closing value are the same with fluctuation throughout the day. Now, we wish to find the I-ARMA model which best fits this data. The first step is to test stationarity. The augmented Dickey–Fuller test shows that neither the data nor its first difference are stationaries, but its second difference is stationary. So, we take the second difference data and use AIC to determine the optimal order (p,q). We define the AIC of the random interval to be the summation of the AIC of the bounds, and we assume that p,q=1,2,3,4. Figure 6 shows that the optimal order is p=q=1. Finally, using equation (4.27), we estimated the coefficients of the I-ARMA model by ˆθ=ˆC(2)ˆC(1),ˆϕ=ˆC(1)−ˆθˆC(0) and we found
ˆθ=−0.2519991andˆϕ=−0.5326387. | (6.32) |
Figure 7 shows the forecast of the differentiated CAC 40 for the next 40 trading days. From this graph, it appears that the sense of variation of CAC 40 throughout the day has been well predicted for 25 days over 40. Also, the predicted arrow is most often on top of the real value. This prediction, for sure, can be improved by using extended intervals with non-linear estimation methods.
In this paragraph we present how data are usually pretreated and show that this process can be better performed when one wishes to use extended intervals.
Let us consider an index ID (for example the French CAC40 index) that we try to model for predicting future values. Let us assume that the values of this index are changing every minute and that we want to analyze it over one year. That will make a huge set of data to analyze if we consider every single value of the index.
What people do most often is to consider a frequency; in the case of ID, one can decide to analyze daily values. But, we have something like 1440 values every day and have to decide for the value of the day. In point-value analysis, people consider either the opening value or the closing value or the average value of the day as the value of that day. It is clear that a lot of values have been neglected and this could lead to an inconsistent analysis.
In an analysis with the standard interval, people most often consider the highest and lowest values of a day to form the interval representing the value of the index that day. (See for example, Wang and Li (2011).) By so doing, every interval contains all the values of the index that day. But, the interval can be irreasonably large and does not reflect the variations of the index during the day. One can still do better by using extended intervals.
With extended intervals, one can proceed as follows. The first value is the left bound of the first interval. If the next value is smaller (resp. bigger) then we keep looking for the next value until either the index is no longer significantly decreasing (resp. increasing), or we have passed 1440 values (the period cannot exceed 1 day). The right bound of the first interval is then the previous value recorded, and the actual value is the left bound of the second interval, and we repeat this process until the end of the data set. This process is summarized as Algorithm 1, which returns the sequence Res of extended intervals obtained and the corresponding sequence of time intervals. There is a need to explain when we say "corresponding sequence of time intervals". The left bound of the first time interval is the time when the left bound of the first extended interval of Res has been recorded, and so on.
By applying this algorithm, we do not have a regular period, which is needed for a time series analysis. The period can be taken here as the average of the periods of extended intervals obtained.
We have implemented Algorithm 1 in R and test it on the CAC 40 stock index recorded minute by minute during five days: from June 22, to June 26, 2020. After treating the 2169 data, we obtained 787 extended intervals as shown in Figure 8. The initial data was recorded everyday from 9:00 am to 6:05 pm, except the last day which ends at 10:52 am. So, the total time of recording was 38 hours and 12 minutes. As we obtained 787 extended intervals, we can take as period for time series analysis: 3 minutes. We then assume that every extended interval is recorded during a lap time of 3 minutes.
Observe that the minimum value per day as well as the maximum value of the CAC 40 during the five days we considered is the same. So, those data could not be analyzed with min-max standard intervals. In Figure 9 are have plotted the extended intervals that we obtained.
Algorithm 1 Transform point-values to extended intervals |
Require: data, time, ε, frequency = 1440 Res←{}, ResTime←{} N←length(data), i←1 A_←data[i], T_←time[i] while i<N do if data[i+1]≤data[i] then i←i+1, j←1 while the index is decreasing or is not significantly (use ε) increasing do i←i+1, j←j+1 if j>frequency then break end if end while ¯A←data[i], ¯T←time[i] add A=⌊A_,¯A⌋ in Res and T=⌊T_,¯T⌋ in Restime i←i+1 end if if data[i+1]>data[i] then i←i+1, j←1 while the index is increasing or is not significantly (use ε) decreasing do i←i+1, j←j+1 if j>frequency then break end if end while ¯A←data[i], ¯T←time[i] add A=⌊A_,¯A⌋ in Res and T=⌊T_,¯T⌋ in ResTime i←i+1 end if end while return Res and ResTime |
In this work, we have redefined extended intervals in a more natural manner and written an algorithm to efficiently transform point-valued data to extended interval-valued data. An extended interval is a standard interval endowed with a direction α, which is an element of the Abelian group Z2={0,1}. The direction 0 means you move on the real line from the left to the right, and the direction 1 means you move from the right to the left. This process can be generalized on Rn. For example, one could define extended rectangles on R2 with 4 directions represented by the Abelian group Z4.
We have seen that by using extended intervals to record the values of a given index, every extended interval gives the value of the index and the direction of variation at the time of recording. We have proposed a language that we hope will be use in the future in the trading markets. Precisely, talking about the French CAC40 index, if we say that we got 4922− today, this would mean that we got a value of 4922 and the index was decreasing when we got this value. This is an example of how this new structure of extended intervals can be very useful in the context of trading markets, and more. A suitable distance has been defined on extended intervals and used to define variance and covariance on random extended intervals, in a natural way. We have studied ARMA processes with extended intervals both theoretically and numerically. In the numerical part, we forecasted on CAC 40 stock index from January 2nd to July 26, 2019.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
All authors declare no conflicts of interest in this paper.
[1] | Alefeld G, Herzberger J (2012) Introduction to interval computation. Academic press. |
[2] | Bauch H, Neumaier A (1990) Interval methods for systems of equations. cambridge university press, Zamm-Z Angew Math Me 72: 590–590. |
[3] | Bertoluzza C, Corral Blanco N, Salas A (1995) On a new class of distances between fuzzy numbers. Mathware soft comput 2. |
[4] | Bierens HJ (2012) The wold decomposition. Available from: http://citeseerx.ist.psu.edu/viewdoc/summary?doi = 10.1.1.231.2308. |
[5] | Billard L, Diday E (2000) Regression analysis for interval-valued data. In Data Analysis, Classification, and Related Methods, 369–374. Springer. |
[6] |
Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98: 470–487. https://doi.org/10.1198/016214503000242 doi: 10.1198/016214503000242
![]() |
[7] |
Dai X, Cerqueti R, Wang Q, et al. (2023) Volatility forecasting: a new garch-type model for fuzzy sets-valued time series. Ann Oper Res 1–41. https://doi.org/10.1007/s10479-023-05746-z doi: 10.1007/s10479-023-05746-z
![]() |
[8] | Goldsztejn A, Daney D, RueherM, et al. (2005) Modal intervals revisited: a mean-value extension to generalized intervals. In Proceedings of QCP-2005 (Quantification in Constraint Programming), Barcelona, Spain. |
[9] |
González-Rivera G, Lin W (2013) Constrained regression for interval-valued data. J Bus Econ Stat 31: 473–490. https://doi.org/10.1080/07350015.2013.818004 doi: 10.1080/07350015.2013.818004
![]() |
[10] | Han A, Hong Y, Wang S (2012) Autoregressive conditional models for interval-valued time series data. The 3rd International Conference on Singular Spectrum Analysis and Its Applications, 27. |
[11] | Han A, Hong Y, Wang S (2015) Autoregressive conditional models for interval-valued time series data. Working Paper. |
[12] | Han A, Hong Y, Wang S, et al. (2016) A vector autoregressive moving average model for interval-valued time series data. Essays in Honor of Aman Ullah, 417–460. Emerald Group, Publishing Limited. |
[13] |
Hsu HL, Wu B (2008) Evaluating forecasting performance for interval data. Comput Math Appl 56: 2155–2163. https://doi.org/10.1016/j.camwa.2008.03.042 doi: 10.1016/j.camwa.2008.03.042
![]() |
[14] | Jaulin L, Kieffer M, Didrit O, et al. (2001) Interval analysis. Appl Interval Anal, Springer London. https://doi.org/10.1007/978-1-4471-0249-6_2 |
[15] |
Kamdem JS, Kamdem BRG, Ougouyandjou C (2020) S-arma model and wold decomposition for covariance stationary interval-valued time series processes. New Math Natl Comput 17: 191–213. https://doi.org/10.1142/S1793005721500101 doi: 10.1142/S1793005721500101
![]() |
[16] | Kaucher E (1973) Über metrische und algebraische Eigenschaften einiger beim numerischen Rechnen auftretender Räume. na. |
[17] | Körner R, Näther W (2002) On the variance of random fuzzy variables. In Statistical modeling, analysis and management of fuzzy data, 25–42. Springer. |
[18] |
Lu Q, Sun Y, Hong Y, et al. (2022) Forecasting interval-valued crude oil prices using asymmetric interval models. Quantit Financ 22: 2047–2061.https://doi.org/10.1080/14697688.2022.2112065 doi: 10.1080/14697688.2022.2112065
![]() |
[19] |
Maia ALS, de Carvalho FdA, Ludermir TB (2008) Forecasting models for interval-valued time series. Neurocomputing 71: 3344–3352. https://doi.org/10.1016/j.neucom.2008.02.022 doi: 10.1016/j.neucom.2008.02.022
![]() |
[20] | Moore RE (1966) Interval analysis Prentice-Hall Englewood Cliffs, NJ. |
[21] | Ortolf HJ (1969) Eine Verallgemeinerung der Intervallarithmetik. Gesellschaft für Mathematik und Datenverarbeitung. |
[22] | R Core Team (2021) R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. |
[23] |
Sun Y, Han A, Hong Y, et al. (2018) Threshold autoregressive models for interval-valued time series data. J Econometrics 206: 414–446. https://doi.org/10.1016/j.jeconom.2018.06.009 doi: 10.1016/j.jeconom.2018.06.009
![]() |
[24] | Wang X, Li S (2011) The interval autoregressive time series model. In Fuzzy Systems (FUZZ), 2011 IEEE International Conference on, 2528–2533. IEEE. |
[25] |
Wang X, Zhang Z, Li S (2016) Set-valued and interval-valued stationary time series. J Multivariate Anal 145: 208–223. https://doi.org/10.1016/j.jmva.2015.12.010 doi: 10.1016/j.jmva.2015.12.010
![]() |
[26] |
Wu D, Dai X, Zhao R, et al. (2023) Pass-through from temperature intervals to china's commodity futures' interval-valued returns: Evidence from the varying-coefficient its model. Financ Res Lett 58: 104289. https://doi.org/10.1016/j.frl.2023.104289 doi: 10.1016/j.frl.2023.104289
![]() |
[27] | Yang X, Li S (2005) The Dp-metric space of set-valued random variables and its application to covariances. Int J Innov Comput Inf Contr 1: 73–82. |
T | ˆK | C(T−2) | ˆθ | Error |
100 | [13.31,14.2] | −0.02807759 | 0.1747072 | 0.02529285 |
500 | [13.51569,14.41001] | 0.01240641 | 0.1892873 | 0.01071265 |