The epidemiology of X-linked recessive diseases, a class of genetic disorders, is modeled with a discrete-time, structured, non linear mathematical system. The model accounts for both de novo mutations (i.e., affected sibling born to unaffected parents) and selection (i.e., distinct fitness rates depending on individual's health conditions). Assuming that the population is constant over generations and relying on Lyapunov theory we found the domain of attraction of model's equilibrium point and studied the convergence properties of the degenerate equilibrium where only affected individuals survive. Examples of applications of the proposed model to two among the most common X-linked recessive diseases (namely the red and green color blindness and the Hemophilia A) are described.
1.
Introduction
In statistical analysis, the change point problem pertains to the phenomenon wherein observations demonstrate distinct distributions before and after an unspecified temporal threshold. This issue is crucial across various fields, including financial analysis, economics, and medical research. Tracing back to 1954, Page first introduced the concept of a change point and identified the presence of a singular change point using the well-established cumulative sum procedure [1]. Over subsequent decades, a plethora of methods for detecting change points has been developed, yielding substantial advancements in research, particularly within parametric methodologies. Arellano-Valle discussed the identification of change points in parameters of the location-scale skew normal distribution via Bayesian inference [2]. Moreover, Chernoff and Zacks explored the occurrence of mean shifts in normal distributions from a Bayesian viewpoint [3]. Hsu conducted tests for variance shifts in sequences of independent normal random variables, addressing scenarios where the initial variance level is unknown, employing two test statistics: one locally most powerful and the other based on the cumulative sum of chi-square values [4]. Hinkley inferred the change points of binomial parameter shifts in sequences of zero-one variables through maximum likelihood estimation [5]. Further details on the parametric approach are discussed in [6,7], while studies on non-parametric methods for change point detection are detailed in [8,9,10].
The likelihood ratio test is a method extensively employed for addressing the problem of detecting change points. Cai [11] introduced a test statistic of the likelihood ratio type to detect potential bathtub-shaped changes in the scale parameter of independent exponential distributions. Wang [12] utilized the likelihood ratio test statistic to identify changes in the parameters of a skew slash distribution. Ferreira [13] enhanced the use of the likelihood ratio test in their development of the mean-shift method for detecting outliers in regression models under skew scale-mixtures of normal distributions. Simulation studies have demonstrated the efficiency of the proposed method in detecting outliers. From a model selection perspective, determining whether a process has undergone change involves choosing between a model with a single set of parameters and one with multiple sets of parameters and change points. In this regard, information criteria serve as useful tools for change point detection. Chen [14] applied the Schwarz information criterion (SIC) to identify and locate all possible variance change points in sequences of independent Gaussian random variables. Ngunkeng [15] proposed a test statistic based on SIC to detect parameter changes in a skew normal distribution. Chen [16] developed a modified information criterion (MIC) by refining the measure of model complexity, which consistently selects the correct model and effectively detects changes at both early and late observation stages. Pan [17] applied the modified information criterion to detect and locate multiple change points in sequences of independent random variables. Said [18] proposed a change point detection procedure based on MIC for skew normal distributions. Kilai [19] conducted change point detection on the exponentiated generalized Gull alpha power exponential distribution using MIC and applied this method to COVID-19 data. It is noteworthy that both the likelihood ratio test and information criterion methods rely on the log-likelihood function. However, for complex models associated with intractable densities, deriving the log-likelihood function can be highly intricate, resulting in substantial computational burdens.
The expectation-maximization (EM) algorithm, introduced by Dempster [20], is a versatile technique for iteratively computing maximum likelihood estimates in models with incomplete data. This approach efficiently handles complex statistical models by treating latent variables as hypothetical missing data and utilizing EM algorithms. Numerous studies have discussed the convergence of parameter estimate sequences obtained through the EM algorithm [21,22,23,24]. The EM algorithm finds extensive applications in complex linear mixed models and finite mixed models, notably illustrated in [25,26,27]. Abe [28] introduced an overparameterized stochastic representation for the skew normal distribution, enhancing the EM algorithm's efficiency. This methodology has been extended to multivariate and mixture models, significantly improving computational performance. Moreover, Zhu developed a novel approach for assessing the local influence of incomplete data using the EM algorithm, employing a procedure similar to Cook's to replace the traditional observed data log-likelihood function with the conditional expectation of the complete data log-likelihood function [29]. Zhu's innovative work has significantly inspired our proposed method for detecting change points in complex models.
The rest of the paper is organized as follows: In Section 2, we introduce the Q-function modified information criterion (QMIC) method for change point detection and discuss the nice properties and motivations of the procedure. In Section 3, we adopt the QMIC method to detect the changes in parameters for the skew normal distribution and propose a procedure to detect change points. Simulations are performed in Section 4 to investigate the performance of the proposed method and to compare it with methods based on MIC and SIC statistics. In Section 5, we apply the QMIC procedure to three significant Latin American stock return markets to detect change points. Finally, conclusions are noted in Section 6.
2.
Q-function modified information criterion
In this section, we introduce a novel method for change point detection in complex models, discussing its advantageous properties and underlying motivations.
2.1. A brief description of the EM algorithm
To elucidate our approach, we provide a concise overview of the expectation-maximization (EM) algorithm.
Let Yc=(Yo,Ym) be the complete dataset with a probability density function (PDF) f(Yc|θ), where θ is an r-dimensional parameter vector, and Yo and Ym denote the observed data and missing data, respectively. In this setting, the complete data log-likelihood function can be given by
whereas the observed data log-likelihood function is given by
The complete data log-likelihood function is generally straightforward, in contrast to the observed data log-likelihood function, which is often complex in many statistical applications. This complexity forms the central focus of our study. The EM algorithm provides an alternative method for managing the observed data log-likelihood function, effectively simplifying the computational challenges associated with incomplete datasets.
The standard EM algorithm primarily comprises two steps: the expectation step (E-step) and the maximization step (M-step).
E-step: Given the complete dataset Yc and parameter estimates ˆθ(h) of the h-th iteration, take the expectation with respect to the conditional distribution f(Ym|Yo,ˆθ(h)), and then the Q-function is given by
M-step: Determine ˆθ(h+1) by
Wu [24] illustrated that the sequence {ˆθ(h)} obtained by the EM algorithm converges to the maximum likelihood estimates ˆθ under some mild conditions.
2.2. Method of QMIC
Let Y1,Y2,⋯,Yn be a sequence of independent random variables and assume that Yi with the PDF f(y;θi). We assume that all f(y;θi) come from the same parametric distribution family {f(y;θ):θ∈Rd}, for i=1,2,⋯,n. Our primary inquiry is whether a change point exists in this sequence and, if so, to identify its location. Consequently, we are interested in testing the null hypothesis
versus
The log-likelihood function under the null hypothesis can be expressed as
whereas under the alternative hypothesis, it has the form
Chen, Gupta, and Pank [16] considered the effect of change point location on model selection and proposed the modified information criterion. Under the null hypothesis, we define
The modified information criterion for the change point problem given by
where ˆθ is the maximum likelihood estimator of θ, and ˆθ1k and ˆθ2k maximize L(θ1k,θ2k|Y) for the given k.
Let
The addition of the term dim(ˆθ)logn eliminates the constant term in the difference between MIC(k) and MIC(n). Chen, Gupta, and Pank [16] used the statistic Sn to test whether the null hypothesis (2.1) holds, and whether the null hypothesis is rejected when Sn takes a sufficiently large number.
Despite the utility of the MIC test, substantial challenges persist with complex models due to the intractability of the log-likelihood function. Consequently, it is imperative to explore alternative approaches to the log-likelihood function to address these difficulties effectively.
Motivated by the EM algorithm, we propose the following Q-function modified information criterion (QMIC) for change point detection. The idea is to apply the same procedure as MIC to the Q-function instead of the log-likelihood function. Under the null hypothesis, we define
For 1≤k<n, the QMIC in the presence of change points is
where ˆθ is the maximum likelihood estimate of θ obtained by the EM algorithm under the null hypothesis, and ˆθ1k,ˆθ2k are the estimates of θ1k,θ2k which satisfy
The difference between QMIC(n) and min1≤k<nQMIC(k) plays a key role in the model selection. Based on this, we define a test statistic Wn to test whether the null hypothesis of no change holds.
Let
The statistic Wn can be considered as a measure of the difference between θ and ˆθ1k,ˆθ2k. The null hypothesis is rejected with a sufficiently large value of Wn, and then we suggest that there are change points in this sample and the location of the change point can be estimated by
Owing to the promising outcomes derived from the EM algorithm, the solutions ˆθ1k and ˆθ2k achieved in maximizing the Q-function lead to the formulation of the following lemma:
Lemma 2.1. Suppose that Lo(θ|Yo) is unimodal in Ω with only a stationary point, where Ω is a subset in the d-dimensional space, and the first-order partial derivatives of Q(ψ|θ) are continuous in ψ and θ. Then we obtain, under Wald conditions W1–W7 [16] and the null hypothesis,
consistently in probability for all k, as n→∞.
Under Wald conditions W1–W7 and the alternative hypothesis with a change point at k satisfying the limit of k/n at (0, 1) as n→∞, then
consistently in probability for all k, as n→∞.
The above lemma is derived from the global convergence theorem [24] and the consistency lemma [16]; therefore, details of the proof are omitted here. Another motivation for our approach is its set of desirable properties, which are analogous to those of the MIC criterion.
Note that a sufficient number of sample observations are required to obtain the maximum likelihood estimates of the parameters. Perron and Vogelsang [30] considered a trimmed procedure for the method; the key idea is to trim k0 samples at the beginning and k1 samples at the end, which means detecting the change point in the location interval [k0+1,n−k1]. They pointed out that k0 and k1 can be arbitrarily selected according to individual needs.
Attention should be paid to multiple changes in the data. Vostrikova [31] proposed a binary segmentation method that assumes at most one change within each segmentation and repeats several consecutive steps to detect multiple changes. Based on the binary segmentation method, it is always possible to split the problem of multiple changes into several single change problems to deal with. Consequently, this paper focuses on developing a testing procedure for detecting a single change point.
3.
QMIC for change point detection in skew normal distribution
In this section, we use the proposed QMIC approach to detect the change points in the skew normal distribution.
Let Y=(Y1,Y2,…,Yn) be a random sample of size n from skew normal distribution SN(μ,σ2,λ). We use the QMIC procedure to examine the possibility of changes in location parameter μ, scale parameter σ2, and shape parameter λ. Here, the null hypothesis is interested in
versus the alternative hypothesis
Before proceeding to the change point detection, we discuss the following lemma, which is indispensable for obtaining the Q-function.
Theorem 3.1. A random variable Y∼SN(μ,σ2,λ) has a stochastic representation given by
where δ=λ/√(1+λ2),T=|T0|, |⋅| denotes the absolute value, and T0 and T1 are independent standard normal random variables.
To develop an effective EM algorithm for skew normal distribution, identical to [32], we consider the following hierarchical representation for the SN model based on Lemma 3.1:
where Δ=σδ, Γ=(1−δ2)σ2, and HN(0,1) is the half-normal distribution. We consider an important result that Ti given Yi follows a truncated normal distribution on (0, ∞) with parameters μTi and M2T before truncation, denoted by Ti|Yi∼N(μTi,M2T)I(0,∞), where
Let Y=(y1,⋯,yn)T be the observed data, T=(t1,⋯,tn)T on behalf of the corresponding missing variables, and the complete data is the combination of Y and T, denoted by W=(YT,TT)T. To facilitate the implementation, let θ=(μ,△,Γ) be the vector of parameters in focus.
Invoking the hierarchical representation of yi, under the null hypothesis, the complete data log-likelihood function can be expressed as follows:
It is well known that the EM algorithm consists of two steps; namely, the expectation step (E-step) and the maximization step (M-step).
E-step: Given the observed dataset Y=(y1,⋯,yn) and current parameter estimates ˆθ(h) of the h-th iteration, take the expectation of the log-likelihood function for the condition distribution f(T|Y,ˆθ(h)), and then the Q-function can be obtained:
where
where μ(h)Ti, M2(h)T are defined as (3.3), ϕ(⋅) denotes the PDF of standard normal distribution, and Φ(⋅) is the cumulative density function (CDF) of standard normal distribution.
In certain cases, the complexity of the Q-function derived from the E-step can make the M-step analytically challenging. For skew normal distribution, this challenge is efficiently addressed by employing a series of conditional maximization (CM) steps.
CM-steps: Update ˆθ(h) by maximizing Q(ˆθ|ˆθ(h)) over ˆθ, which leads to the following results:
For a prescribed value ϵ>0, if the value of the log-likelihood function of two consecutive iterations satisfies |LH0(Y|ˆθ(h+1))/LH0(Y|ˆθ(h))−1|<ε, the iterative process is broken up. In our study, we set ε=10−5, as recommended by Zeller et al. [33]. More details on the maximum likelihood estimates of the SN distribution can be found in [34].
The maximum likelihood estimates of the parameters can be obtained using the EM algorithm, denoted as ˆθ=(ˆμ,ˆΔ,ˆΓ). Therefore, the expression of the Q(ˆθ|ˆθ) under the null hypothesis is
where ˆs1i and ˆs2i can be obtained from (3.5) and (3.6), respectively, by replacing ˆθ(h) in them with ˆθ.
Under the alternative hypothesis, the complete data log-likelihood function is
where Δik=σikλik/√(1+λik2), Γik=σ2ik/(1+λik2), i=1,2.
We consider the alternative hypothesis of the existence of a single change point as a combination of two skew normal distributed variables, one with sample size k and the other with n−k, such that maximizing the overall Q-function is equivalent to making these two components maximum. Therefore, with the current estimate of ˆθ, the EM algorithm is applied to each of these two components to obtain the maximum likelihood estimates of the parameters ˆθ1k=(ˆμ1k,ˆΔ21k,ˆΓ1k) and ˆθ2k=(ˆμ2k,ˆΔ22k,ˆΓ2k), respectively. Then we can obtain
By substituting (3.7) and (3.9) into (2.5), we can obtain the Wn statistic under the null hypothesis (3.1) of the skew normal distribution
The null hypothesis (3.1) is rejected for a sufficiently large number of Wn. Here, we use the bootstrap method to simulate the distribution of the statistic Wn, and a suggested framework to obtain the critical value cα of the Wn statistic for a given significance level α and the P-value is shown in Algorithm 1.
The QMIC method combined with the binary segmentation method can be utilized to address the detection of multiple change points of skew normal distribution.
4.
Simulation results
In this section, we investigate the critical values of Wn and use simulations to compare the performance of several test methods in terms of powers. In addition to our proposed QMIC information criterion, we also consider the modified information criterion [18] and the Bayesian information criterion [15].
4.1. Critical values
We performed simulations of critical values under different combinations of parameters and sample sizes using Algorithm 1 in Section 3. The critical values were estimated with B = 5000 bootstrap samples of n=50,100,200,300,400 at nominal test sizes α=0.1,0.05,0.01, respectively. The results, summarized in Table 1, indicate that the empirical critical values exhibit noticeable variation with the sample size at each significance level.
4.2. Power comparison
In this section, we conducted simulations under different scenarios to investigate the performance of the test procedures in terms of power. Additionally, we constructed the test statistic Tn based on the classical Bayesian information criterion (BIC), defined as follows, to facilitate comparison.
where BIC(n) under H0 and BIC(k) under H1 are defined as follows:
The pre-change distribution was set to follow SN(μ1,σ1,λ1) with parameters (μ1,σ1,λ1)=(2,2,1). Post-change, the distribution was set to SN(μn,σn,λn) with parameters θn=(μn,σn,λn) taking values (3,3,0), (2.5,2.5,2), (3,3,2), and (1.5,1.5,1.5). The sample sizes considered were n=50,100, and 150. Changes were introduced approximately at the beginning (14), at the center (12), and at the end (34) of the sample size n. The results are listed in Tables 2 and 3.
From the simulation results, it is evident that the power of the proposed test increases with both the sample size and the magnitude of parameter changes. Notably, the power tends to be higher when the change occurs toward the middle of the data sequence, compared to changes occurring near the beginning or end.
Furthermore, our comparative analysis indicates that the proposed QMIC procedure is highly competitive relative to the MIC and BIC methods. In general, the QMIC procedure outperforms the BIC and MIC methods across a range of sample sizes and change locations. The power of all three tests increases with sample size, further underscoring the robustness of the QMIC procedure in detecting changes under various sample sizes and parameter shifts. Additionally, the QMIC procedure maintains strong performance even in scenarios involving changes in one or two parameters, further highlighting its effectiveness in a broader range of conditions.
5.
Application
In this section, we draw a conclusion that the method proposed in this paper presents analytical results that are similar to those obtained from the modified information criterion [18] based on the classical log-likelihood function. To illustrate this result, we focus our attention on three significant Latin American stock return markets: the Chilean, Brazilian, and Mexican markets. The stock returns for each of the three countries were recorded weekly from October 31, 1995, to October 31, 2000, resulting in two time series, each with 261 data points. Arellano-Valle [2] used different models to fit the three datasets, and almost all the criterion suggest that the skew normal distribution is a good fitting model for the stock returns of these three markets, compared with the normal distribution. The data sources and further details regarding these datasets can be found in Ngunkeng and Wei [15] and Said, Wei, and Tian [18].
Let St be the stock return value at week t. In order to ensure the independence in the times series data and study the change in stock returns, in general, we study the stock return rates instead of the stock returns directly, which is defined as
Before proceeding with the actual analysis, we need to test the independence of the transformed data Rt. Hsu [35] proposed several methods that can be used for testing the independence of such a transformed dataset. Here, we choose the Portmanteau test defined as
where ri is the auto-correlation coefficient (ACF) at lag i, and k is the lag up to which the auto-correlation coefficient function is considered. Under the condition that the null hypothesis of independence holds, Qk has an asymptotic χ2 distribution with the degree of freedom k, denoted as Qk∼χ2(k).
5.1. Chilean stock market
We consider the stock return rate series Rt of the Chilean market. The ACF plot and the normal Q-Q plot of Rt are depicted in Figure 1. The left graph in Figure 1 presents the auto-correlation coefficient of Rt, from which it is clear that Rt is not severely auto-correlated. Furthermore, the test statistic Qk of Rt for the Chilean market is obtained as follows:
which indicates that the null hypothesis of independence of the transformed data failed to be rejected.
The right graph in Figure 1 illustrates that the Rt data deviates from the normal distribution, which indicates that the normality assumption fails. Therefore, as suggested by Arellano-Valle [2], we treat the skew normal distribution as a plausible candidate for fitting the Rt series for Chile. The estimated parameters for the fitted skew normal distribution are as follows:
Here, "SE" denotes the standard error of the estimated parameters.
We apply the proposed QMIC procedure with the bootstrap method to test the hypothesis in Section 2, and implement the binary segmentation method to detect all possible change points in the Rt data. To enhance the robustness of our analysis, consistent with the approach described by Perron and Vogelsang [30], we trimmed 20 data points from both the beginning and the end of each series. Figure 2 shows the corresponding values of the Wn test statistics for each series of Rt, as well as the possible locations of the change points detected by the proposed method.
● Consider the sequence Rt for the Chilean stock market, t = 1, 2, ⋯, 261. Under the null hypothesis, QMIC(261) = −707.3299, min1≤k≤261QMIC(k) = QMIC(112) = −723.3759, and the test statistic Wn=32.7396. With the discussion in Section 2, we use the bootstrap method to simulate the distribution of the test statistic Wn and obtain the approximate P-value =0.000<0.05 with B = 2000. Therefore, we reject the null hypothesis and conclude that there is a change in the Rt series and the first change is located in the 112th position. Moreover, it corresponds to the change point located at the 113th position (December 26, 1997) of the stock return data St with the associated stock return value of 736.133. The reason for the change point may be due to the Asian financial crisis of 1997, which reached its climax by a mini crash on October 27, 1997.
● With the binary segmentation method, we consider the series Rt for t from 1 to 112. Under the null hypothesis, we obtain QMIC(112) = −500.0035, min1≤k≤112QMIC(k) = QMIC(24) = −467.5512, and the test statistic Wn=−18.2969. This indicates that the QMIC value under the original hypothesis is smaller than the QMIC value under the alternative hypothesis, and thus we accept the null hypothesis that there is no change point in this sequence.
● Then we consider the sequence Rt for t from 113 to 261, and we compute the QMIC value when the null hypothesis holds. QMIC(149)=−362.0137, min1≤k≤149QMIC(k)=QMIC(114)=−356.9705, the test statistic Wn=9.9687, and the approximated P-value = 0.007, which leads us to accept there is another change point in Rt, and the change occurs at the 112+114=226th position in Rt, corresponding to the 227th position (March 3, 2000) of St with the stock return value 764.411. This change point may have been triggered by the Russian financial crisis of 1998, which devalued the ruble and suspended payments from its government to foreign creditors.
● In the following, we consider the sequence Rt for t from 113 to 226, and we have QMIC(114)=−269.8757, min1≤k≤114QMIC(k)=QMIC(57)=−262.1506, and the test statistic Wn=6.4836 with the approximated P-value = 0.0145, and thus, we conclude that there is still a change point occurring at the 113+57=170th position in Rt corresponding to the 171st position (February 5, 1999) in St with the associated stock return 524.462. Argentina's financial crisis that emerged in December 1999 may have been responsible for this change point.
The graphs of the Chilean stock return St and return rate Rt with the locations of the corresponding change points are presented in Figure 3.
In our analysis, we identified three significant change points in the Chilean stock returns, each corresponding to major global financial events. As shown in Table 4, the first change point on December 26, 1997, aligns with the Asian financial crisis, reflecting its global impact on markets like Chile. The second, on February 5, 1999, corresponds with the Russian financial crisis, leading to a notable dip in returns due to global investor uncertainty. The final change point on March 3, 2000, is linked to the Argentine crisis, which also affected Chile through regional economic ties. These change points illustrate the sensitivity of the Chilean market to external financial shocks.
Many results have been obtained for the variation point test of the Chilean return data. Arellano-Valle, Castro, and Loschi [2] detected only one change point in this dataset utilizing the Bayesian method and located the change point in the first week of February 1998. Ngunkeng and Wei [15] detected two change points in this dataset based on the Schwarz information criterion, which was at positions 113 and 170. The same result is also obtained by Said, Wei, and Tian [18] based on the modified information criterion. Using the method proposed in this paper, while detecting an approximate agreement with the change points mentioned in the latter two methods, we also detect another change point located at the 227th position. From Figure 3, we can see that the additional change point is also reasonable.
5.2. The Brazilian stock market
Subsequently, the Brazilian stock return data are examined using the identical analytical procedure employed for the Chilean stock returns. To test the independence in the Brazilian stock return data, we plot the ACF for the Brazilian stock return rate Rt, as shown in the left panel of Figure 4. The ACF plot exhibits no strong auto-correlation for the Rt data, and to be more convincing, we calculate the Portmanteau test statistic Qk of Rt for the Brazilian market, and the result is given by
Therefore, we fail to reject the null hypothesis of independence of Rt data for the Brazilian stock market. The right panel in Figure 4 shows that the normality of the Rt data is violated, so we use the skew normal distribution as a replacement for change point detectiion. The estimated parameters for the fitted skew normal distribution are as follows:
In the following, the suggested QMIC method and the binary segmentation procedure are applied to the Rt dataset to detect the potential change points in it. As with the previous analysis, we applied the same data trimming method. Figure 5 shows the corresponding values of the Wn test statistics across k for each sub-sequence of Rt and the possible locations of the change points. The procedures and findings of the change point detection for the Rt dataset of the Brazilian stock market are listed below.
● We first screen the whole Rt series from 1 to 261 for possible changes. Under the null hypothesis, we obtain that QMIC(261)=−491.5966, min1≤k≤261QMIC(k)=QMIC(88)=−518.8437, the test statistic Wn=43.9407, and the approximated P-value = 0.000 < 0.05 with B = 2000 bootstrap sampling. Therefore, we reject the null hypothesis and conclude that there exists a change point in the Rt series and that the change occurs at the 88th position. This change point corresponds to the 89th position of the original series with a stock return value of 1278.702 on July 11, 1997, which may have been caused by the Asian financial crisis of 1997.
● Then we consider the sub-sequence Rt for t from 89 to 261, and the results are QMIC(173)=−250.2238, min1≤k≤173QMIC(k)=QMIC(103)=−244.9595, and Wn=10.1956. Through B = 2000 bootstrap sampling, the P-value is approximately equal to 0.006, and hence, there is a change point in the sub-sequence of Rt. The change point occurs at position 103 of the sub-sequence, corresponding to the 103+88=191st position of the whole Rt, which is at the 192nd position of the original data with a stock return 675.134 on July 2, 1999. We believe that the Russian financial crisis of 1998 may have contributed to this change point.
● Next, we consider the sub-sequence Rt for t from 89 to 191, and we observe that QMIC(103) = −90.7087, min1≤k≤103QMIC(k)=QMIC(80)=−80.0023, Wn=3.1978, and the approximated P-value =0.1065>0.05. Therefore, there is no sufficient reason to assume that there is a change point in this sub-sequence.
● Furthermore, the Rt sequence for t from 1 to 88 is considered. The QMIC(88)=−261.4871, Wn=7.39759, and min1≤k≤88QMIC(k)=QMIC(40)=−255.4527. The P-value obtained from B = 2000 bootstrap samples is 0.018, and thus there is a change point at position 40, which is equivalent to the 41st position in the Brazilian St series with a stock return of 728 on August 9, 1996. This change point may be due to the financial crisis of Mexico in 1995.
The graphs of the Brazilian stock return St and return rate Rt with the locations of the corresponding change points are presented in Figure 6.
In summary, we can see that there are three change points in the Brazilian stock return, and the corresponding results can be found in Table 5. The first change point on August 9, 1996, coincides with the aftermath of the 1995 Mexican financial crisis, reflecting the broader regional economic instability. The second change point on July 11, 1997, corresponds to the onset of the 1997 Asian financial crisis, highlighting the global impact on emerging markets, including Brazil. The third change point on July 2, 1999, aligns with the aftermath of the 1998 Russian financial crisis, demonstrating the sensitivity of the Brazilian market to global financial disturbances.
Arellano-Valle, Castro, and Loschi [2] showed that there was one change point in Brazilian stock returns and that the change point occurred in the first week of August 1997, while Ngunkeng and Wei [15] suggested that there were four change points in the data, occurring on July 11, 1997, March 8, 1996, July 31, 1998, and June 2, 2000, respectively. In this paper, using our proposed method, a total of three variation points are detected, including the most probable one (July 11, 1997), and the other two change points occur where there are significant changes in Brazilian stock return data. Our method ensures that the change points do not occur at the beginning or the end of the process, compared to Ngunkeng's results.
5.3. The Mexican stock market
Additionally, the Mexican stock return data are subjected to a similar analytical scrutiny, maintaining methodological consistency across all datasets.
To examine the independence of the Mexican stock market data, we calculated the statistic Q24=261×∑24i=1r2i=23.77342, which is less than the critical value χ20.95(24)=36.415. This leads us to not reject the null hypothesis, suggesting that the Rt series for the Mexican market are independent. The left graph in Figure 7 depicts the ACF values of the Rt series, and the right graph presents the normal Q-Q plot, which indicates a deviation from normality. Furthermore, the application of normality tests, such as the Shapiro-Wilk test, substantiates the nonconformance with the normality assumption. The estimated parameters for the fitted skew normal distribution are as follows:
The procedures and results of change point detection in the Rt dataset of the Mexican stock market are presented below:
● We first screen the whole Rt series from 1 to 261 for possible changes. Under the null hypothesis, we obtain that QMIC(261)=−522.5629, min1≤k≤261QMIC(k)=QMIC(103)=−528.2583, the test statistic Wn=22.3890, and the approximated P-value = 0.000 < 0.05 with B = 2000 bootstrap sampling. Therefore, we reject the null hypothesis and conclude that there exists a change point in the Rt series and that the change occurs at the 103rd position. This change point corresponds to the 104th position of the original series with a stock return value of 1242.851 on October 24, 1997.
● Similarly, we examine all possible subsequences using the binary segmentation method. We identified another change at the 152nd position in the Rt series, corresponding to a stock return value of 791.198. This change occurred on September 25, 1998.
In summary, we have identified changes at the 104th and 152nd positions, corresponding to October 24, 1997, and September 25, 1998, respectively. These changes may have been influenced by the Asian financial crisis of 1997 and the Russian financial crisis of 1998. Figure 8 displays the monthly stock return rate and the monthly stock return index for Mexico, including identified change points.
Arellano-Valle, Castro, and Loschi [2] showed that there was one change point in Mexican stock returns and that the change point occurred in the 1st week of September 1997. In contrast, the proposed QMIC approach identifies two distinct changes at the 104th and the 152nd positions, respectively, whereas another analytical method detects only a single change.
By analyzing these three examples, it can be seen that our approach not only produces analytic results that are very similar to those obtained from a classical method based on the log-likelihood function but also provides programs for complex models with intractable likelihood functions.
6.
Conclusions
In this study, we propose an enhancement to the modified information criterion (MIC) for detecting change points in skew normal distribution models by integrating the expectation-maximization (EM) algorithm's Q-function. This novel QMIC framework establishes a procedure for detecting simultaneous changes in all three parameters of a skew normal distribution, thereby improving the precision and robustness of change point detection across various data sequences. The complexity of deriving an analytic asymptotic distribution for the test statistic under QMIC is addressed through bootstrap simulations, allowing for flexible and accurate determination of critical values at different significance levels. Extensive simulations demonstrate that QMIC outperforms traditional methods, such as the MIC and the Bayesian information criterion (BIC), by more effectively capturing the nuances of skew normal distributions, particularly in datasets with non-normal characteristics.
The application of QMIC to stock market datasets highlights its practical utility, successfully identifying multiple change points that may be overlooked by conventional methods. This capability is especially valuable in financial data analysis, where detecting subtle shifts can significantly impact decision-making processes. The method's ability to uncover intricate structural changes underscores its potential for broader applications across various domains. In summary, the QMIC framework presents a valuable contribution to change point detection methodologies, offering a robust, sensitive, and versatile tool that complements existing state-of-the-art techniques, enhancing both the theoretical understanding and practical analysis of complex data.
Author contributions
Yang Du: conceptualization, methodology, writing-original draft and writingreview & editing; Weihu Cheng: validation and supervision. All authors have thoroughly reviewed and approved the final version of the manuscript for publication.
Conflict of interest
The authors declare no conflict of interest.