
Volatility, a pivotal factor in the financial stock market, encapsulates the dynamic nature of asset prices and reflects both instability and risk. A volatility quantitative investment strategy is a methodology that utilizes information about volatility to guide investors in trading and profit-making. With the goal of enhancing the effectiveness and robustness of investment strategies, our methodology involved three prominent time series models with six machine learning models: K-nearest neighbors, AdaBoost, CatBoost, LightGBM, XGBoost, and random forest, which meticulously captured the intricate patterns within historical volatility data. These models synergistically combined to create eighteen novel fusion models to predict volatility with precision. By integrating the forecasting results with quantitative investing principles, we constructed a new strategy that achieved better returns in twelve selected American financial stocks. For investors navigating the real stock market, our findings serve as a valuable reference, potentially securing an average annualized return of approximately 5 to 10% for the American financial stocks under scrutiny in our research.
Citation: Keyue Yan, Ying Li. Machine learning-based analysis of volatility quantitative investment strategies for American financial stocks[J]. Quantitative Finance and Economics, 2024, 8(2): 364-386. doi: 10.3934/QFE.2024014
[1] | Xin Zhang, Zhaobin Ma, Bowen Ding, Wei Fang, Pengjiang Qian . A coevolutionary algorithm based on the auxiliary population for constrained large-scale multi-objective supply chain network. Mathematical Biosciences and Engineering, 2022, 19(1): 271-286. doi: 10.3934/mbe.2022014 |
[2] | Jianquan Guo, Guanlan Wang, Mitsuo Gen . Green closed-loop supply chain optimization strategy considering CER and incentive-compatibility theory under uncertainty. Mathematical Biosciences and Engineering, 2022, 19(9): 9520-9549. doi: 10.3934/mbe.2022443 |
[3] | Zhuang Shan, Leyou Zhang . A new Tseng method for supply chain network equilibrium model. Mathematical Biosciences and Engineering, 2023, 20(5): 7828-7844. doi: 10.3934/mbe.2023338 |
[4] | Jing Yin, Ran Huang, Hao Sun, Taosheng Lin . A collaborative scheduling model for production and transportation of ready-mixed concrete. Mathematical Biosciences and Engineering, 2023, 20(4): 7387-7406. doi: 10.3934/mbe.2023320 |
[5] | Ping Wang, Rui Chen, Qiqing Huang . Does supply chain finance business model innovation improve capital allocation efficiency? Evidence from the cost of capital. Mathematical Biosciences and Engineering, 2023, 20(9): 16421-16446. doi: 10.3934/mbe.2023733 |
[6] | Chun Li, Ying Chen, Zhijin Zhao . Frequency hopping signal detection based on optimized generalized S transform and ResNet. Mathematical Biosciences and Engineering, 2023, 20(7): 12843-12863. doi: 10.3934/mbe.2023573 |
[7] | Gang Zhao, Chang-ping Liu, Qi-sheng Zhao, Min Lin, Ying-bao Yang . A study on aviation supply chain network controllability and control effect based on the topological structure. Mathematical Biosciences and Engineering, 2022, 19(6): 6276-6295. doi: 10.3934/mbe.2022293 |
[8] | Pablo Flores-Sigüenza, Jose Antonio Marmolejo-Saucedo, Joaquina Niembro-Garcia, Victor Manuel Lopez-Sanchez . A systematic literature review of quantitative models for sustainable supply chain management. Mathematical Biosciences and Engineering, 2021, 18(3): 2206-2229. doi: 10.3934/mbe.2021111 |
[9] | Siyuan Yin, Yanmei Hu, Yuchun Ren . The parallel computing of node centrality based on GPU. Mathematical Biosciences and Engineering, 2022, 19(3): 2700-2719. doi: 10.3934/mbe.2022123 |
[10] | Tinghuai Ma, Lei Guo, Xin Wang, Yurong Qian, Yuan Tian, Najla Al-Nabhan . Friend closeness based user matching cross social networks. Mathematical Biosciences and Engineering, 2021, 18(4): 4264-4292. doi: 10.3934/mbe.2021214 |
Volatility, a pivotal factor in the financial stock market, encapsulates the dynamic nature of asset prices and reflects both instability and risk. A volatility quantitative investment strategy is a methodology that utilizes information about volatility to guide investors in trading and profit-making. With the goal of enhancing the effectiveness and robustness of investment strategies, our methodology involved three prominent time series models with six machine learning models: K-nearest neighbors, AdaBoost, CatBoost, LightGBM, XGBoost, and random forest, which meticulously captured the intricate patterns within historical volatility data. These models synergistically combined to create eighteen novel fusion models to predict volatility with precision. By integrating the forecasting results with quantitative investing principles, we constructed a new strategy that achieved better returns in twelve selected American financial stocks. For investors navigating the real stock market, our findings serve as a valuable reference, potentially securing an average annualized return of approximately 5 to 10% for the American financial stocks under scrutiny in our research.
Survival analysis is a branch of statistics for analyzing time-to-event data. When looking into survival data, one frequently encounters the problem of competing risks in which samples are subject to multiple kinds of failure. The Cox proportional hazards model, introduced by Cox [1], is popular in survival analysis for describing the relationship between the distributions of survival times and covariates and is commonly employed to analyze cause-specific survival data. The traditional approach is to separately fit a Cox proportional hazards model to the data for each failure type, after considering the data with other kinds of failure censored. However, this conventional method encounters problems like the estimates are hard to interpret and the confidence bands of estimated hazards are wide, because the method does not cover all failure types [2,3].
An alternative approach is to fit competing risks data by using a mixture model that incorporates the distinct types of failure to partition the population into groups, and it assumes an individual will fail from each risk with the probabilities being attributed to the proportions of each group, respectively. Moreover, the mixture approach is helpful for estimating the effects of covariates in each group through parametric proportional hazard regressions such as Cox’s model. McLachlan and Peel [4] noted that a mixture model is allowed for both dependent and independent competing risks and it can improve a model’s fit to the data than the traditional approach in which the causes of failure are assumed to be independent. Mixture models are popular in competing risks analysis, because their resultant estimates are easy to interpret [2], although complex.
Semi-parametric mixture models are a generalization of parametric mixture models and have become a prominent approach for modelling data with competing risks. Semiparametric approaches to mixture models are preferable for their ability to adjust for the associated variables and allow for assessing the effects of these variables on both the probabilities of eventual causes of failure through a logistic model and the relevant conditional hazard functions by applying the Cox proportional hazards model (cf. [2]). Below, we review the existing semiparametric methods of mixture models for competing risks data.
Ng and McLachlan [5] proposed an ECM-based semi-parametric mixture method without specifying the baseline hazard function to analyze competing risks data. They noted that when the component-baseline hazard is not monotonic increasing their semi-parametric approach can consistently produce less biased estimates than those done by fully parametric approaches. Moreover, when the component-baseline hazard is monotonic increasing, the two approaches demonstrate comparable efficiency in the estimation of parameters for mildly and moderately censoring. Chang et al. [6] studied non-parametric maximum-likelihood estimators through a semiparametric mixture model for competing risks data. Their model is feasible for right censored data and can provide estimates of quantities like a covariate-specific fatality rate or a covariate-specific expected time length. Moreover, Lu and Peng [7] set up a semiparametric mixture regression model to analyze competing risks data under the ordinary mechanism of conditional independent censoring. Choi and Huang [8] offered a maximum likelihood scheme for semiparametric mixture models to make efficient and reliable estimations for the cumulative hazard function. One advantage with their approach is that the joint estimations for model parameters connect all considered competing risks under the constraint that all the probabilities of failing from respective causes sum to 1 given any covariates. Other research studies for competing risks data are based on semiparametric mixture models, e.g. [5,6,7,8].
Although the mixture hazard model is preferable to direct approaches, two important but challenging issues frequently encountered in the applications are the determination of the number of risk types and the identification of the failure type of each individual.
It is understandable that the results of a mixture model analysis highly depend on the number of components. It is also conceivably hard to cover all types of competing risks in a mixture model. Validity indices are a vital technique in model selection. The cluster validity index is a kind of criterion function to determine the optimal number of clusters. Some cluster validity indices presented by [9,10,11] are designed to find an optimal cluster number for fuzzy clustering algorithms; some are only related to the membership, while some take into account the distance between the data sets and cluster centers. Wu et al. [12] proposed median-type validity indices, which are robust to noises and outliers. Zhou et al. [13] introduced a weighted summation type of validity indices for fuzzy clustering, but they are unfeasible for mixture regression models. Conversely, Henson et al. [14] evaluated the ability of several statistical criteria such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC) to produce a proper number of components for latent variable mixture modeling. However, AIC and BIC may present the problem of over- and under-estimation on the number of components [15], respectively.
As to the identification of failure types, many studies on the problems of competing risks like [5,6,7,8] assumed that the failure type of an individual is known if the subject’s failure time is observed, but if an individual is censored and only the censored time is known, then which failure type the subject fails from is unknown. In fact, even if one observes the failure time, then the true cause of failure might not be clear and needs further investigations. Thus, deciding the number of competing risks and recognizing the failure type of each individual are critical in competing risks analysis, but scant work has been done on them.
Besides the above problems, another critical issue existing in mixture Cox hazard models particularly is the estimation of the baseline hazard function. The Cox proportional hazards model consists of two parts: the baseline hazard function and the proportional regression model. Bender et al. [16] assumed that the baseline hazard function follows a specific lifetime distribution, but this assumption is obviously restrictive. A single lifetime distribution may not adequately explain all data —for example, the failure rate is not monotonic increasing or decreasing. Alternatively, some scholars adopted nonparametric approaches to estimate the baseline hazard function that are more flexible. Ng and McLachlan [5] assumed the baseline hazard function to be piecewise constant by treating each observed survival time as a cut-off point, but the piecewise constant assumption has the disadvantage that the estimated curve is not smooth, while smoothing is required in several applications [17]. In fact, our simulation results also show that the derived estimates based on a piecewise constant hazard function are not sufficient in some cases (e.g. model 4 in Figure 4). Understandably, an inadequate estimation of the baseline function affects the selection of the number of model components; and hence leads to insufficient estimates of the model parameters.
In order to solve the above mentioned problems with the Cox mixture hazard modelling for competing risks data, we propose four indices and the kernel estimation for the base line function in this paper. Validity indices are a vital technique in model selection, but they have been less utilized for deciding the number of components of a mixture regression model. By using posterior probabilities and residual functions, we propose four validity indices that are applicable to regression models in this study. Under the EM-based mixture model, the posterior probabilities play an important role in classifying data, which take role of data memberships in fuzzy clustering. Unlike the traditional regression model, the survival model does not meet the assumption that survival time variation is constant for each covariate. Therefore, we incorporate the functions of posterior probabilities and the sum of standard residuals to constitute the new validity indices and verify the effectiveness of the proposed new indices through extensive simulations. Moreover, we extend the kernel method of Guilloux et al. [18] to estimate the baseline hazard function smoothly and hence more accurately.
The remainder of this paper is organized as follows. Section 2 introduces the mixture Cox proportional hazards regression model, develops an EM-based algorithm to estimate the model parameters, and also discusses kernel estimations for the baseline hazard function. Section 3 constructs four validity indices for selecting the number of model components in a mixture Cox proportional hazards regression model. Section 4 carries out several simulations and assesses the effectiveness of our validity indices. Section 5 analyzes a practical data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Finally, Section 6 states conclusions and a discussion.
For mixture model analysis, suppose each member of a population can be categorized into g mutually exclusive clusters according to its failure type. Let D={(tj,XTj,δj):j=1,⋯,n} , be a sample drawn from this population where T denotes the transpose of a vector, tj is the failure or right censoring time, Xj=(xj1,xj2,...,xjd)T is a d-dimensional vector of covariates, and:
δj={1,ifthej−thindividaulisuncensored,0,ifthej−thindividauliscensored. |
The mixture probability density function (pdf) of t is defined by:
f(t)=g∑i=1pi⋅fi(t) , subject to g∑i=1pi=1, | (1) |
where pi is the mixing probability of failure due to the ith type of risk and g is the number of model components.
In the ith component, the hazard function hi(t|Xj,βi) given covariate Xj follows a Cox proportional hazards model defined by
hi(t|Xj,βi)=h0i(t)exp(XjTβi), | (2) |
where βi=(βi1,βi2,...,βid)T is the vector of regression coefficients, and h0i(t) is the baseline hazard function of the ith component. We define the ith component-survival function and pdf by:
Si(t|Xj,βi)=exp[−H0i(t)exp(XjTβi)] |
and
fi(t|Xj,βi)=h0i(t)exp[XjTβi−H0i(t)exp(XjTβi)], |
where H0i(t)=∫t0h0i(s)ds is the ith component-cumulative baseline hazard function.
The unknown parameters are the mixing probabilities p=(p1,p2,...,pg−1)T and regression coefficients β=(β1T,β2T,...,βgT)T , where βi=(βi1,βi2,...,βid)T . Based on (1) and Zhou [19], the log-likelihood function under the mixture hazards model with right censored data is given by
l(p,β)=n∑j=1g∑i=1{δjlog[pi⋅fi(tj|Xj,βi)]+(1−δj)log[pi⋅Si(tj|Xj,βi)]}, |
where f(tj|Xj,β)=g∑i=1pi⋅fi(tj|Xj,βi) and S(tj|Xj,β)=g∑i=1pi⋅Si(tj|Xj,βi) .
Assume that the true causes of failure for an individual are unobserved, and hence the data are incomplete. We introduce the latent variable zij as:
zij={1,ifj−thindividualfailsduetoi−thtypeofrisk;0,otherwise. |
The complete-data log-likelihood function is given by:
lc(p,β)=n∑j=1g∑i=1zij{δjlog[pi⋅fi(tj|Xj,βi)]+(1−δj)log[pi⋅Si(tj|Xj,βi)]}. | (3) |
Subsequently, the parameters are estimated through the expectation and maximization (EM) algorithm.
E-step: On the (k+1)th iteration of E-step, we calculate the conditional expectation of the complete-data log-likelihood (3) given the current estimates of the parameters, i.e.:
E[lc(p,β)|p(k),β(k)]=n∑j=1g∑i=1zij(k){δjlog[pi⋅fi(tj|Xj,βi)]+(1−δj)log[pi⋅Si(tj|Xj,βi)]}=n∑j=1g∑i=1zij(k)logpi+n∑j=1z1j(k)[δjlogf1(tj|Xj,β1)+(1−δj)logS1(tj|Xj,β1)]+⋮+n∑j=1zgj(k)[δjlogfg(tj|Xj,βg)+(1−δj)logSg(tj|Xj,βg)]=Q0+Q1+⋯+Qg. | (4) |
Here, p(k) and β(k) are the estimates of p and β , respectively, in the kth iteration. By Baye’s Theorem, we have:
zij(k)=E(zij|p(k),β(k))=pi(k)fi(tj|Xj,βi(k))δjSi(tj|Xj,βi(k))1−δjg∑l=1pl(k)fl(tj|Xj,βl(k))δjSl(tj|Xj,βl(k))1−δj | (5) |
which is the posterior probability that the jth individual with survival time tj fails due to the ith type of risk.
M-step: The (k+1)th iteration of M-step provides the updated estimates p(k+1) and β(k+1) that maximizes (4) with respect to p and β .
Under the constraints g∑i=1pi=1 , to maximize Q0=n∑j=1g∑i=1zij(k)logpi from (4), we obtain the estimation of mixing probability with
pi(k+1)=n∑j=1zij(k)n. | (6) |
The equation Qi from (4) for i=1,...,g can be written as:
Qi=n∑j=1zij(k){δj[logh0i(tj)+XjTβi]−exp(XjTβi)H0i(tj)}. | (7) |
Define the score vector U(βi) for i=1,...,g as the first derivate of (7) with respect to the vector βi given H0i(t) fixed at H0i(k+1)(t) , and the estimation βi(k+1) satisfies the equation:
U(βi)=∂Qi∂βi|H0i(tj)=H0i(k+1)(tj)=n∑j=1zij(k)[δj−exp(XjTβi)H0i(k+1)(tj)]Xj=0. | (8) |
To estimate the baseline hazard function under the mixture hazards model, we propose the kernel estimator. Define Nj(t)=I(tj⩽t∧δj=1) as an event counting process and Yj(t)=I(tj⩾t) as risk process. The updated kernel estimator of ith component-baseline hazard function h0i(t) on the (k+1)th iteration is defined by:
h0i(k+1)(t|X,Zi(k),βi(k),b(k))=1b(k)∫τ0K(t−ub(k))dH0i(k+1)(u|X,Zi(k),βi(k)),τ⩾0, | (9) |
where K:R→R is a kernel function, and b(k) is a positive parameter called the bandwidth. There are several types of kernel functions commonly used, appearing in Table 1 and Figure 1. We try these kernel functions in the simulated examples and find no significant differences. In this paper, we choose biweight as the kernel function to estimate the baseline hazard function.
Kernel function | K(u) |
Gaussian | K(u)=1√2πe−12u2,−∞<u<∞ |
Epanechnikov | K(u)=34(1−u2),|u|⩽1 |
Biweight | K(u)=1516(1−u2)2,|u|⩽1 |
Triweight | K(u)=3532(1−u2)3,|u|⩽1 |
Derived by smoothing the increments of the Breslow estimator, the kernel estimator (9) can be written as:
h0i(k+1)(t|X,Zi(k),βi(k),b(k))=1b(k)n∑j=1∫τ0K(t−ub(k))zij(k)I(¯Y(u)>0)Sni(u|X,Zi(k),βi(k)),dNj(u), | (10) |
where ¯Y=1nn∑j=1Yj and Sni(u|X,Zi,βi)=n∑j=1zijexp(XjTβi)Yj(u) .
Horova et al. [20] and Patil [21] introduced the cross-validation method to select the bandwidth of the kernel estimator. We define the cross-validation function for bandwidth b written as CV(b) under our model as:
CV(b)=g∑i=1n∑j=1zij(k)⋅[h0i(k+1)(tj|X,Zi(k),βi(k),b(k))]2−2g∑i=1∑∑l≠j1b(k)K(tl−tjb(k))zil(k)δlY(tl)zij(k)δjY(tj). |
The selection of bandwidth on the (k+1)th iteration is given by:
b(k+1)=argminb∈BnCV(b), | (11) |
where Bn cover the set of acceptable bandwidths.
The algorithm is shown as follows, where we fix n and g and set up initial values for mixing probabilities p(0) , which are usually 1/1gg , regression coefficients β(0) , baseline hazard rates h0(0) , bandwidth b(0) is 0.5, and a termination value, ε>0 .
Set the initial counter l=1 .
Step 1: Compute Z(l−1) with p(l−1) , h0(l−1) and β(l−1) by (5);
Step 2: Update p(l) with Z(l−1) by (6);
Step 3: Update h(l)0 with Z(l−1) , β(l−1) and b(l−1) by (10);
Step 4: Update bandwidth b(l) with Z(l−1) , h(l)0 and β(l−1) by (11);
Step 5: Update β(l) with Z(l−1) , h(l)0 and β(l−1) by (8);
Step 6: IF ‖p(l)−p(l−1)‖2+‖h(l)0−h(l−1)0‖2+‖β(l)−β(l−1)‖2<ε , THEN stop;
ELSE let l=l+1 and GOTO Step 1.
Note that the superscript (.) represents the number of iterations, h(0)0=(h01(0),h02(0),...,h0g(0))T is a g×n matrix, where h(0)0i=(h(0)0i(t1),h(0)0i(t2),...,h(0)0i(tn))T , and each row is initialized by a constant vector.
In traditional regression analysis, we select the best model by picking the one that minimizes the sum of squared residuals, but unlike the traditional regression model, the survival model does not meet the assumption that the standard deviation of the survival time is a constant at each covariate. From Figure 2(a), we see that the survival time with higher expectation has higher standard deviation. Therefore, to select the best model we need to adjust the standard deviation to avoid being greatly affected by data that have large standard deviations. Moreover, if the model fits the data well, then each observed survival time will be close to the expectation of the component model, which has the largest posterior probability corresponding to one’s risk type.
Figure 2(b) illustrate that observation A is closer to the mean line (red line) of the component model corresponding to risk type 1, say model 1, than to the mean line (blue line) of model 2. From (5), we see that the posterior probabilities of the observation A corresponding to the first type of risk (red line) will be much larger than that of the second type of risk (blue line). Hence, to build up validity indices for mixture models, we consider the posterior probabilities as the role of weights and define the mixture sum of standard absolute residuals (MsSAE) and mixture sum of standard squared residuals (MsSSE) as follows:
MsSAE=g∑i=1n∑j=1ˆzij|tj−ˆEi(tj)|√ˆVari(tj) ; MsSSE=g∑i=1n∑j=1ˆzij[tj−ˆEi(tj)]2ˆVari(tj) , |
where ˆEi(tj)=∞∫0exp[−ˆH0i(t)]exp(xjTˆβi)dt and ˆVari(tj)=2∞∫0t⋅exp[−ˆH0i(t)]exp(xjTˆβi)dt−ˆEi(tj)2 . The squared distance is considered, because it is easier to catch an abnormal model.
From Figure 2(c) we can see that the expectation (green line) of the survival time according to the third type of risk (ER3) is close to that (blue line) corresponding to the second type of risk (ER2). In order to penalize the overfitting model, which is the model with too many model components, we consider the distance between the expectations of each survival time according to any two types of risk as the penalty. Define the average absolute separation ¯ASep , the average squared separation ¯SSep , the minimum absolute separation minASep and the minimum squared separation minSSep as:
¯ASep=2g(g−1)g∑i=1g∑l>in∑j=1|ˆEi(tj)−ˆEl(tj)| ; ¯SSep=2g(g−1)g∑i=1g∑l>in∑j=1[ˆEi(tj)−ˆEl(tj)]2; |
minASep=mini≠ln∑j=1|ˆEi(tj)−ˆEl(tj)| ; minSSep=mini≠ln∑j=1[ˆEi(tj)−ˆEl(tj)]2. |
A good model will possess small mixture standard residuals and large separation of expectations. Hence, based on the above-mentioned functions of residuals and separation of means, we propose four validity indices V1 ~ V4 for selecting the number of model components under the mixture hazards regression model.
(V1). Absolute standard residuals and average separation VaRaS
VaRaS=MsSAE¯ASep=∑gi=1∑nj=1ˆzij|tj−ˆEi(tj)|/|tj−ˆEi(tj)|√ˆVari(tj)√ˆVari(tj)2g(g−1)g∑i=1g∑l>in∑j=1|ˆEi(tj)−ˆEl(tj)| |
We find an optimal number g of types of risk by solving min2⩽g⩽n−1VaRaS .
(V2). Squared standard residuals and average separation VsRaS
VsRaS=MsSSE¯SSep=∑gi=1∑nj=1ˆzij[tj−ˆEi(tj)]2/ˆzij[tj−ˆEi(tj)]2ˆVari(tj)ˆVari(tj)2g(g−1)g∑i=1g∑l>in∑j=1[ˆEi(tj)−ˆEl(tj)]2 |
We find an optimal number g of types of risk by solving min2⩽g⩽n−1VsRaS .
(V3). Absolute standard residuals and minimum separation VaRmS
VaRmS=MsSAEminASep=∑gi=1∑nj=1ˆzij|tj−ˆEi(tj)|/|tj−ˆEi(tj)|√ˆVari(tj)√ˆVari(tj)mini≠l∑nj=1|ˆEi(tj)−ˆEl(tj)| |
We find an optimal number g of types of risk by solving min2⩽g⩽n−1VaRmS .
(V4). Squared standard residuals and minimum separation VsRmS
VsRmS=MsSSEminSSep=∑gi=1∑nj=1ˆzij[tj−ˆEi(tj)]2/[tj−ˆEi(tj)]2ˆVari(tj)ˆVari(tj)mini≠l∑nj=1[ˆEi(tj)−ˆEl(tj)]2 |
We find an optimal number g of types of risk by solving min2⩽g⩽n−1VsRmS .
For the simulated data we consider four different models M1~M4. Under the mixture Cox proportional hazards model (2), the ith component hazard function is:
hi(t|Xj,βi)=h0i(t)exp(xjβi,1,1+...+xjkβi,1,k+...+xjβi,d,1+...+xjkβi,d,k), |
where d is the number of covariates, k is the degree of models, βi=(βi,1,1,βi,1,k,...,βi,d,k,βi,d,k)T is the vector of regression coefficients and h0i(t) is the ith component-baseline hazard function.
Consider two common distributions for the baseline hazard functions, Weibull and Gompertz; the ith component Weibull baseline and Gompertz baseline are defined by h0i(t)=λiρitρi−1 and h0i(t)=λiexp(ρitj) , respectively, where λi and ρi are the scale and shape parameters. Let λλ=(λ1,...,λg)T , ρρ=(ρ1,...,ρg)T , and β=(β1T,β2T,...,βgT)T . The covariates X=(x1,x2,...,xn)T in all cases are generated independently from a uniform distribution U(−4,4) . The information for four models is shown in Table 2, and the scatter plots of a sample dataset are presented in Figure 3.
Model | n1 | g2 | d3 | k4 | p=[p1⋮pg] | BH5 | λλ=[λ1⋮λg] | ρρ=[ρ1⋮ρg] | ββ=[ββ1T⋮ββgT] | Ui 6 |
M1 | 200 | 2 | 1 | 1 | [0.70.3] | Weibull | [0.0051.5] | [3.02.0] | [0.30.5] | U1(5,9)U2(2,6) |
M2 | 200 | 2 | 1 | 2 | [0.50.5] | Gompertz | [0.20.7] | [1.52.0] | [0.80.1−0.60.1] | U1(4,9)U2(4,9) |
M3 | 400 | 2 | 2 | 1 | [0.50.5] | Weibull | [0.0030.002] | [0.50.7] | [0.8−0.5−0.60.5] | U1(12,15)U2(10,13) |
M4 | 400 | 3 | 1 | 1 | [0.350.300.35] | Gompertz | [0.00020.0020.0003] | [0.72.00.8] | [−0.80.21.0] | U1(10,15)U2(4,6)U3(15,20) |
1: sample size; 2: number of risk types; 3: number of covariates; 4: degree of models; 5: baseline hazard function; 6: censored times are generated from a uniform distribution Ui(a,b) for i=1, …, g. |
We consider an EM-based semi-parametric mixture hazards model to analyze simulated data, and compare the two methods of estimating the baseline hazard function. For the first method proposed by Ng and McLachlan [5], they assume the baseline hazard function is piecewise constant and calculate this function using maximum likelihood estimation (MLE). For the second method introduced in this paper, we use a kernel estimator to estimate the baseline hazard rates and choose biweight as the kernel function.
In order to graphically demonstrate the results, we first show the results for a single run of simulation in Table 3 and Figure 4. The correct rate (CR) in Table 3 is the percentage of individuals matched into the true attributable type of risk. According to the results of the estimation, we match the individuals into one type of risk with largest posterior probability. Thus, this correct rate is defined as:
CR=1nn∑j=1g∑i=1I{j∈risk(i)∩ˆzij=maxi(ˆZj)} where ˆZj=(ˆz1j,ˆz2j,...,ˆzgj)T. |
p | ββ | CR | MsSSE/n | ||
M1 | True1 | [0.70.3] | [0.30.5] | ||
Piecewise2 | [0.5610.439] | [0.5280.851] | 0.860 | 0.810 | |
Kernel3, bw4=1.0 | [0.6720.328] | [0.3360.586] | 0.945 | 0.659 | |
M2 | True | [0.50.5] | [0.80.1−0.60.1] | ||
Piecewise | [0.6410.958] | [0.6740.136−1.1360.298] | 0.705 | 0.963 | |
Kernel, bw=0.5 | [0.5230.476] | [0.7380.078−0.7620.146] | 0.855 | 0.910 | |
M3 | True | [0.50.5] | [0.8−0.5−0.60.5] | ||
Piecewise | [0.5080.491] | [0.993−0.562−0.5620.608] | 0.838 | 1.240 | |
Kernel, bw=0.4 | [0.4780.522] | [0.885−0.534−0.6280.521] | 0.843 | 1.142 | |
M4 | True | [0.350.300.35] | [−0.80.21.0] | ||
Piecewise | [0.3990.2650.335] | [−0.9380.9201.137] | 0.693 | 1.211 | |
Kernel, bw=0.9 | [0.3680.3060.325] | [−0.8060.1920.927] | 0.873 | 0.828 | |
1: true parameters; 2: piecewise constant estimates; 3: kernel estimates; 4: bandwidth. |
When using a piecewise constant estimator under M1, from the estimated mixing probabilities, CR in Table 3 and the expectation line in Figure 4(M1-1), it can be seen that we will misclassify some data into the 2nd type of risk where their true risk type is the 1st one. As a result, the estimates of regression coefficients in Table 3 and the cumulative baseline hazard rate in Figure 4(M1-2) are not close to the true model. Furthermore, under M4, from the expectation line according to the 1st and 2nd types of risk in Figure 4(M4-1), it can be seen that we will misclassify some data between the 1st and 2nd types of risk when using piecewise constant estimator. The estimates of regression coefficients in Table 3 and the cumulative baseline hazard rate in Figure 4(M4-2) are mismatched with the real model. It is obvious that using the kernel procedure for the baseline hazard estimation will increase CR compared to using the piecewise constant procedure.
We next show the results for 1000 simulations in Table 4. The absolute relative bias (ARB) for parameter θ is defined by:
ARB(θ)=|E(ˆθ)−θE(ˆθ)|. |
bias_ p 3 | MSE_ p 4 | bias_ ββ 5 | MSE_ ββ 6 | ¯ARB | ¯CR | ¯MsSSE | ||
M1 | Piecewise1 | [−0.1600.160] | [0.0260.026] | [0.0880.275] | [0.0200.076] | 0.401 | 0.699 | 0.796 |
Kernel2 | [−0.0350.035] | [0.0020.002] | [−0.073−0.007] | [0.0070.000] | 0.107 | 0.856 | 0.653 | |
M2 | Piecewise | [0.132−0.132] | [0.0170.017] | [−0.0970.041−0.6520.172] | [0.0100.0010.4290.029] | 0.646 | 0.680 | 1.329 |
Kernel | [0.089−0.089] | [0.0080.008] | [−0.1230.054−0.3110.017] | [0.0180.0060.1240.000] | 0.292 | 0.774 | 1.009 | |
M3 | Piecewise | [0.028−0.028] | [0.0000.000] | [0.167−0.091−0.0790.046] | [0.0280.0080.0060.002] | 0.122 | 0.847 | 1.271 |
Kernel | [−0.0060.006] | [0.0000.000] | [0.033−0.0200.069−0.051] | [0.0010.0000.0040.002] | 0.054 | 0.849 | 1.097 | |
M4 | Piecewise | [0.043−0.0550.012] | [0.0010.0030.000] | [−0.0030.7910.251] | [0.0020.6270.063] | 0.766 | 0.646 | 0.737 |
Kernel | [0.018−0.0420.023] | [0.0000.0010.000] | [0.0320.071−0.014] | [0.0020.0090.000] | 0.112 | 0.799 | 0.565 | |
1: piecewise constant estimates; 2: kernel estimates; 3: bias of p ; 4: mean square error (MSE) of p ; 5: bias of ββ ; 6: mean square error (MSE) of ββ . |
In Table 4 the mean absolute relative bias ( ¯ARB ) of the model with k parameters is defined by ¯ARB=∑ki=1ARB(θi)/∑ki=1ARB(θi)kk . Moreover, ¯CR and ¯MsSSE are the mean of CR and MsSSE/n for each simulation. Table 4 presents that the ¯ARB and ¯MsSSE of the kernel estimate are smaller than those of the piecewise constant estimate. Moreover, the ¯CR of the kernel estimate is larger than that of the piecewise constant estimate in all cases considered. Thus, the model with the baseline hazard functions estimated by the kernel method fits the data better than that with piecewise constant baseline.
In section 4.2, we consider an EM-based semi-parametric mixture hazards model to analyze simulated data under models M1~M4 by considering several possible number of risk types, that is model components, and use the kernel estimator to estimate the baseline hazard rates with biweight as the kernel function. Next, we use validity indices to select the optimal number of model components. The following six validity indices are used to compare with the validity indices we have come up with ( VaRaS , VsRaS , VaRmS , and VsRmS ).
1. Partition coefficient VPC proposed by Bezdek [22].
2. Normalized partition coefficient VNPC proposed by Dave [23].
3. Partition entropy VPE proposed by Bezdek [24].
4. Normalized partition entropy VNPE proposed by Dunn [25].
5. Akaike information criterion AIC.
6. Bayesian information criterion BIC.
It is well known that memberships play an important role in fuzzy clustering. Similarly, under the EM-based mixture model, the posterior probabilities are closely related to the role of memberships. Therefore, we replace the role of memberships with posterior probabilities in the validity indices VPC , VNPC , VPE , and VNPE . Moreover, the formulas for AIC and BIC are computed by
AIC=−2⋅lc(ˆp,ˆβ)+2k;BIC=−2⋅lc(ˆp,ˆβ)+klog(n), |
where lc(ˆp,ˆβ) is the complete-data log-likelihood (3) given the estimated parameters, and k is the number of parameters for estimation.
All in all we consider ten indices, including VPC , VNPC , VPE , VNPE , AIC, BIC, VaRaS , VsRaS , VaRmS , and VsRmS , to select the optimal number of model components. Table 5 shows the proportion of choosing the correct number of model components over 1000 simulation runs based on the considered indices respectively. In each simulation run, each model of M1~M4 is fitted for components 2, 3, and 4 separately. Note that we assume the number of model components is greater than one for satisfying the requirement of the proposed validity indices. We define the proportion of choosing the correct number of risk types by each index in Table 5 as:
#(choosecorrectgbyindex)#(simulaiton). |
VPC | VNPC | VPE | VNPE | AIC | BIC | VaRaS | VsRaS | VaRmS | VsRmS | |
M1 | 0.962 | 0.880 | 0.976 | 0.894 | 0.964 | 0.950 | 0.896 | 0.896 | 0.984 | 0.992 |
M2 | 0.954 | 0.564 | 0.963 | 0.485 | 0.524 | 0.631 | 0.863 | 0.851 | 0.981 | 0.990 |
M3 | 1.000 | 0.798 | 1.000 | 0.868 | 0.998 | 0.998 | 0.994 | 0.998 | 1.000 | 1.000 |
M4 | 0.486 | 0.780 | 0.413 | 0.810 | 0.646 | 0.660 | 0.923 | 0.916 | 0.813 | 0.703 |
Table 5 shows that the proportion of choosing the correct g by traditional indices VPC , VNPC , VPE , VNPE , AIC, and BIC are not consistent under models M1~M4, where at least one model is not performing well (denoted by fluorescent yellow color in the table). On the other hand, the proposed indices ( VaRaS , VsRaS , VaRmS , and VsRmS ) are consistent and possess high proportions for every model, except that the proportion of VsRmS under M4 is 0.703, which is slightly low, but it is still higher than that of most traditional indices. Hence, the proposed validity indices are superior than others in selecting the correct number of components.
As a practical illustration of the proposed EM-based semi-parametric mixture hazard model, we consider the survival times of 506 patients with prostate cancer who entered a clinical trial during 1967–1969. These data were randomly allocated to different levels of treatment with the drug diethylstilbestrol (DES) and were considered by Byar and Green [26] and published by Andrews and Herzberg [27]. Kay [28] analyzed a subset of the data by considering eight types of risk, defined by eight categorical variables: drug treatment (RX: 0, 0.0 or 0.2 mg estrogen; 1, 1.0 or 5.0 mg estrogen); age group (AG: 0, < 75 years; 1, 75 to 79 years; 2, > 79 years); weight index (WT: 0, > 99 kg; 1, 80 to 99 kg; 2, < 80 kg); performance rating (PF: 0, normal; 1, limitation of activity); history of cardiovascular disease (HX: 0, no; 1, yes); serum haemoglobin (HG: 0, > 12 g/100 ml; 1, 9 to 12 g/100 ml; 2, < 9 g/100 ml); size of primary lesion (SZ: 0, < 30 cm2; 1, ≥ 30 cm2), and Gleason stage/grade category (SG: 0, ≤ 10; 1, > 10). Cheng et al. [26] classified this dataset with three types of risk as: (1) death due to prostate cancer; (2) death due to cardiovascular (CVD) disease; and (3) other causes.
We analyze the same dataset with eight categorical variables (RX, AG, WT, PF, HX, SZ, SG). There are 483 patients with complete information on these covariates, and the proportion of censored observations is 28.8%. We ignore the information about the risk factors and use indices, including VPC , VNPC , VPE , VNPE , AIC, BIC, VaRaS , VsRaS , VaRmS , and VsRmS to select the optimal number of risk types. From Table 6, the number of risk types selected by VaRaS , VsRaS , VaRmS , and VsRmS is three, and that selected by other indices is two. The number of model components selected by the indices we have proposed is the same as that in the previous studies introduced by Cheng et al. [3].
VPC | VNPC | VPE | VNPE | AIC | BIC | VaRaS | VsRaS | VaRmS | VsRmS | |
g = 2 | 0.7813 | 0.5626 | 0.3369 | 0.5720 | 4.1518 | 4.2989 | 0.5894 | 0.4437 | 0.5894 | 0.4437 |
g = 3 | 0.6684 | 0.5027 | 0.5260 | 0.6135 | 4.5012 | 4.7262 | 0.3783 | 0.1974 | 0.5016 | 0.2943 |
g = 4 | 0.5581 | 0.4109 | 0.7564 | 0.7075 | 4.7967 | 5.0996 | 0.4746 | 57.572 | 0.6123 | 98.534 |
Note: (1) g represents the number of risk types when estimating. (2) The optimal values of g according to each index are highlighted in bold. |
From existing medical experience and a previous study, we presume that these model components may agree with some particular types of risk and thus can decide whether there are significant relationships between the covariates and the survival times by using the Wald statistical test. Based on the cause-specific hazard approach, Cheng et al. [3] found that treatment with a higher DES dosage (RX = 1) significantly reduces the risk of death due to prostate cancer. Table 7 shows that the DES dosage has a significant effect on time to death due to the 1st type of risk, and that the estimated regression coefficients of RX is negative. Byar and Green [26] found that patients with a big size of primary lesion (SZ = 1) and high-grade tumors (SG = 1) are at greater risk of prostate cancer death. Table 7 lists that SZ and SG have a significant effect on time to death due to the 1st type of risk, and that the estimated regression coefficients are all positive. Accordingly, we presume the 1st type of risk relates to prostate cancer. Furthermore, based on the cause-specific hazard approach, Cheng et al. [3] found that treatment with a higher DES dosage (RX = 1) significantly increases the risk of death due to CVD. From Table 7, we see that DES dosage has a significant effect on time to death due to the 2nd and 3rd types of risk, and that the estimated regression coefficient of RX is positive.
1st type of risk | 2nd type of risk | 3rd type of risk | |||
p | 0.2132 | 0.3930 | 0.3936 | ||
β | RX | −0.0296*(0.1267) | 0.3546*(0.1414) | 0.7589*(0.1425) | |
AG | 0.3144*(0.1143) | 1.7445*(0.1041) | 1.8104*(0.1396) | ||
WT | −0.0817*(0.0916) | 1.7915*(0.0967) | −0.5555*(0.1290) | ||
PF | 1.4742*(0.2233) | 0.1244*(0.2527) | 1.6468*(0.3325) | ||
HX | 3.0027*(0.1176) | 1.2829*(0.1377) | −0.6092*(0.1486) | ||
HG | 0.8489*(0.1536) | 1.6074*(0.1669) | −5.2153*(0.7267) | ||
SZ | 0.8567*(0.2119) | 3.0334*(0.1998) | −3.2661*(0.4074) | ||
SG | 4.3184*(0.1010) | −0.3907*(0.1419) | −0.9933*(0.1560) | ||
Note: * denotes P-value < 0.05. |
We know that patients with a history of cardiovascular disease (HX = 1) have a higher probability of death due to CVD, compared to those patients without such a history. Table 7 shows that the estimated regression coefficient of HX is positive due to the 2nd type of risk. Hence, we presume the 2nd type of risk may relate to CVD. There is no explicit relationship between covariates and survival times adhering to the 3rd type of risk. Thus, we only presume the 3rd type of risk may relate to other death causes without specification. According to the significant relationship of covariates and survival times, we assess that the 1st, 2nd and 3rd types of risk for estimation from an EM-based semi-parametric mixture hazard model are classified to prostate cancer, CVD, and other unspecified causes, respectively.
This study introduces four new validity indices, VaRaS , VsRaS , VaRmS , and VsRmS , for deciding the number of model components when applying an EM-based Cox proportional hazards mixture model to a dataset of competing risks. We incorporate the posterior probabilities and the sum of standard residuals to constitute the new validity indices. Moreover, our study sets up an extended kernel approach to estimate the baseline functions more smoothly and accurately. Extensive simulations show that the kernel procedure for the baseline hazard estimation is helpful for increasing the correct rate of classifying individual into the true attributable type of risk. Furthermore, simulation results demonstrate that the proposed validity indices are consistent and have a higher percentage of selecting the optimal number of model components than the traditional competitors. Thus, the proposed indices are superior to several traditional indices such as the most commonly used in statistics, AIC and BIC. We also employ the propose method to a prostate cancer data-set to illustrate its practicability.
It is obvious that if we apply the four new validity indices at the same time, then we have the best chance to select the optimal number of model components. One concern is picking the best one among the proposed validity indices. In fact, the average separation versions ( VaRaS , VsRaS ) easily neutralizes the effects of small and large distances among the expectations of component models. On the other hand, as long as there is a small distance among the expectations of component models, the minimum separation versions ( VaRmS , VsRmS ) will catch the information about the overfitting model. Under the analysis of prostate cancer data, we see that VaRmS and VsRmS behave more sensitively than VaRaS and VsRaS for detecting the overfitting models (i.e., the distances of indices between overfitting and optimal models are much larger than those between underfitting and optimal models). Furthermore, according to the simulation results, the index VsRmS performs slightly poor on a certain model, we thus recommend employing VaRmS if just one of the proposed validity indices is to be used.
In the future we may test the effectiveness of the proposed validity indices on statistical models other than the mixture Cox proportional hazards regression models. We could also advance the efficiency of the proposed indices in determining the number of components of mixture models. Another issue is to reduce the computation cost. For instance, the bandwidth of the kernel procedure for baseline hazard function estimates is recalculated on each iteration, which consumes computation time. All these factors need further investigation and will be covered in our future research.
The authors thank the anonymous reviewers for their insightful comments and suggestions which have greatly improved this article. This work was partially supported by the Ministry of Science and Technology, Taiwan [Grant numbers MOST 108-2118-M-025-001-and MOST 108-2118-M-003-003-].
[1] |
Alsulmi M, Al-Shahrani N (2022) Machine Learning-Based Decision-Making for Stock Trading: Case Study for Automated Trading in Saudi Stock Exchange. Sci Program, 6542862. https://doi.org/10.1155/2022/6542862 doi: 10.1155/2022/6542862
![]() |
[2] | Attanasio G, Cagliero L, Garza P, et al. (2019) Quantitative cryptocurrency trading: exploring the use of machine learning techniques. 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets, 1–6. https://doi.org/10.1145/3336499.3338003 |
[3] |
Ayyildiz N, Iskenderoglu O (2024) How effective is machine learning in stock market predictions? Heliyon 10: 1–10. https://doi.org/10.1016/j.heliyon.2024.e24123 doi: 10.1016/j.heliyon.2024.e24123
![]() |
[4] |
Basak S, Kar S, Saha S, et al. (2019) Predicting the direction of stock market prices using tree-based classifiers. N Am J Econ Financ 47: 552–567. https://doi.org/10.1016/j.najef.2018.06.013 doi: 10.1016/j.najef.2018.06.013
![]() |
[5] |
Bezerra PCS, Albuquerque PHM (2017) Volatility forecasting via SVR–GARCH with mixture of Gaussian kernels. Comput Manag Sci 14: 179–196. https://doi.org/10.1007/s10287-016-0267-0 doi: 10.1007/s10287-016-0267-0
![]() |
[6] |
Brooks C, Persand G (2003) Volatility forecasting for risk management. J Forecast 22: 1–22. https://doi.org/10.1002/for.841 doi: 10.1002/for.841
![]() |
[7] |
Diane L, Brijlal P (2024) Forecasting Stock Market Realized Volatility using Random Forest and Artificial Neural Network in South Africa. Int J Econ Financ Iss 14: 5–14. https://doi.org/10.32479/ijefi.15431 doi: 10.32479/ijefi.15431
![]() |
[8] |
Epaphra M (2016) Modeling exchange rate volatility: Application of the GARCH and EGARCH models. J Math Financ 7: 121–143. https://doi.org/10.4236/jmf.2017.71007 doi: 10.4236/jmf.2017.71007
![]() |
[9] |
Gao Y, Wang R, Zhou E (2021) Stock prediction based on optimized LSTM and GRU models. Sci Program, 1–8. https://doi.org/10.1155/2021/4055281 doi: 10.1155/2021/4055281
![]() |
[10] |
Herwartz H (2017) Stock return prediction under GARCH—An empirical assessment. Int J Forecast 33: 569–580. https://doi.org/10.1016/j.ijforecast.2017.01.002 doi: 10.1016/j.ijforecast.2017.01.002
![]() |
[11] | Karasan A (2021) Machine Learning for Financial Risk Management with Python. O'Reilly. |
[12] |
Khan W, Ghazanfar MA, Azam MA, et al. (2020) Stock market prediction using machine learning classifiers and social media, news. J Amb Intel Hum Comp 13: 3433–3456. https://doi.org/10.1007/s12652-020-01839-w doi: 10.1007/s12652-020-01839-w
![]() |
[13] |
Khand S, Anand V, Qureshi MN, et al. (2019) The performance of exponential moving average, moving average convergence-divergence, relative strength index and momentum trading rules in the Pakistan stock market. Indian J Sci Technol 12: 1–22. https://doi.org/10.17485/ijst/2019/v12i26/145117 doi: 10.17485/ijst/2019/v12i26/145117
![]() |
[14] | Khanderwal S, Mohanty D (2021) Stock price prediction using ARIMA model. In J Market Hum Resource Res 2: 98–107. |
[15] |
Kumbure MM, Lohrmann C, Luukka P, et al. (2022) Machine learning techniques and data for stock market forecasting: A literature review. Expert Syst Appl 197: 116659. https://doi.org/10.1016/j.eswa.2022.116659 doi: 10.1016/j.eswa.2022.116659
![]() |
[16] | Lai CY, Chen RC, Caraka RE (2019) Prediction stock price based on different index factors using LSTM. 2019 International conference on machine learning and cybernetics (ICMLC), 1–6. https://doi.org/10.1109/icmlc48188.2019.8949162 |
[17] |
Levy RA (1967) Relative strength as a criterion for investment selection. J Financ 22: 595–610. https://doi.org/10.2307/2326004 doi: 10.2307/2326004
![]() |
[18] |
Li Y, Yan K (2023) Prediction of Barrier Option Price Based on Antithetic Monte Carlo and Machine Learning Methods. Cloud Comput Data Sci 4: 77–86. https://doi.org/10.37256/ccds.4120232110 doi: 10.37256/ccds.4120232110
![]() |
[19] |
Lo HC, Chan CY(2023) Mean reverting in stock ratings distribution. Rev Quantit Financ Account 60: 1065–1097. https://doi.org/10.1007/s11156-022-01121-4 doi: 10.1007/s11156-022-01121-4
![]() |
[20] |
Luong C, Dokuchaev N (2018) Forecasting of realised volatility with the random forests algorithm. J Risk Financ Manage 11: 61. https://doi.org/10.3390/jrfm11040061 doi: 10.3390/jrfm11040061
![]() |
[21] |
Monfared SA, Enke D (2014) Volatility forecasting using a hybrid GJR-GARCH neural network model. Procedia Comput Sci 36: 246–253. https://doi.org/10.1016/j.procs.2014.09.087 doi: 10.1016/j.procs.2014.09.087
![]() |
[22] | Müller AC, Guido S (2016) Introduction to machine learning with Python: a guide for data scientists. O'Reilly Media. |
[23] |
Nikou M, Mansourfar G, Bagherzadeh J (2019) Stock price prediction using DEEP learning algorithm and its comparison with machine learning algorithms. Intel Syst Account Financ Manage 26: 164–174. https://doi.org/10.1002/isaf.1459 doi: 10.1002/isaf.1459
![]() |
[24] |
Nti IK, Adekoya AF, Weyori BA (2020) A systematic review of fundamental and technical analysis of stock market predictions. Artif Intell Rev 53: 3007–3057. https://doi.org/10.1007/s10462-019-09754-z doi: 10.1007/s10462-019-09754-z
![]() |
[25] |
Patel J, Shah S, Thakkar P, et al. (2015) Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Syst Appl 42: 259–268. https://doi.org/10.1016/j.eswa.2014.07.040 doi: 10.1016/j.eswa.2014.07.040
![]() |
[26] |
Rezaei H, Faaljou H, Mansourfar G (2021) Stock price prediction using deep learning and frequency decomposition. Expert Syst Appl 169: 114332. https://doi.org/10.1016/j.eswa.2020.114332 doi: 10.1016/j.eswa.2020.114332
![]() |
[27] |
Rouf N, Malik MB, Arif T, Sharma S, et al. (2021) Stock market prediction using machine learning techniques: a decade survey on methodologies, recent developments, and future directions. Electronics 10: 2717. https://doi.org/10.3390/electronics10212717 doi: 10.3390/electronics10212717
![]() |
[28] |
Schwert GW (1990) Stock market volatility. Financ Anal J 46: 23–34. https://doi.org/10.2469/faj.v46.n3.23 doi: 10.2469/faj.v46.n3.23
![]() |
[29] |
Shahi TB, Shrestha A, Neupane A, et al. (2020) Stock price forecasting with deep learning: A comparative study. Mathematics 8: 1441. https://doi.org/10.3390/math8091441 doi: 10.3390/math8091441
![]() |
[30] |
Sun H, Yu B (2020) Forecasting financial returns volatility: a GARCH-SVR model. Comput Econ 55: 451–471. https://doi.org/10.1007/s10614-019-09896-w doi: 10.1007/s10614-019-09896-w
![]() |
[31] | Tatsat H, Puri S, Lookabaugh B (2020) Machine Learning and Data Science Blueprints for Finance. O'Reilly Media. |
[32] |
Vijh M, Chandola D, Tikkiwal VA, et al. (2020) Stock closing price prediction using machine learning techniques. Procedia Comput Sci 167: 599–606. https://doi.org/10.1016/j.procs.2020.03.326 doi: 10.1016/j.procs.2020.03.326
![]() |
[33] |
Wang J, Kim J (2018) Predicting stock price trend using MACD optimized by historical volatility. Math Probl Eng 2018: 1–12. https://doi.org/10.1155/2018/9280590 doi: 10.1155/2018/9280590
![]() |
[34] | Wang Y, Yan K (2022) Prediction of Significant Bitcoin Price Changes Based on Deep Learning. 5th International Conference on Data Science and Information Technology (DSIT 2022), 1–5. https://doi.org/10.1109/dsit55514.2022.9943971 |
[35] |
Wang Y, Yan K (2023) Machine learning-based quantitative trading strategies across different time intervals in the American market. Quant Financ Econ 7: 569–594. https://doi.org/10.3934/qfe.2023028 doi: 10.3934/qfe.2023028
![]() |
[36] | Yahoo Finance (2024) Available from: https://finance.yahoo.com/. |
[37] |
Yan K, Wang N, Li Y (2024) Research on Double Fusion Modeling for Volatility Quantitative Trading Strategies in Hong Kong Stock Market. Finance 14: 844–855. https://doi.org/10.12677/fin.2024.143090 doi: 10.12677/fin.2024.143090
![]() |
[38] | Yan K, Wang Y (2023) Prediction of Bitcoin prices' trends with ensemble learning models. 5th International Conference on Computer Information Science and Artificial Intelligence (CISAI 2022), 900–905. https://doi.org/10.1117/12.2667793 |
[39] |
Yan K, Wang Y, Li Y (2023) Enhanced Bollinger Band Stock Quantitative Trading Strategy Based on Random Forest. Art Intell Evolution 4: 22–33. https://doi.org/10.37256/aie.4120231991 doi: 10.37256/aie.4120231991
![]() |
[40] |
Yu P, Yan X (2020) Stock price prediction based on deep neural networks. Neural Comput Appl 32: 1609–1628. https://doi.org/10.1007/s00521-019-04212-x doi: 10.1007/s00521-019-04212-x
![]() |
[41] |
Zhang YJ, Zhang H (2023) Volatility forecasting of crude oil market: which structural change based GARCH models have better performance? Energ J 44: 175–194. https://doi.org/10.5547/ej44-1-Zhang doi: 10.5547/ej44-1-Zhang
![]() |
1. | Yunfei Xu, Xianjun Wang, Huaizhi Yu, 2024, Chapter 32, 978-981-97-1978-5, 361, 10.1007/978-981-97-1979-2_32 |
Kernel function | K(u) |
Gaussian | K(u)=1√2πe−12u2,−∞<u<∞ |
Epanechnikov | K(u)=34(1−u2),|u|⩽1 |
Biweight | K(u)=1516(1−u2)2,|u|⩽1 |
Triweight | K(u)=3532(1−u2)3,|u|⩽1 |
Model | n1 | g2 | d3 | k4 | p=[p1⋮pg] | BH5 | λλ=[λ1⋮λg] | ρρ=[ρ1⋮ρg] | ββ=[ββ1T⋮ββgT] | Ui 6 |
M1 | 200 | 2 | 1 | 1 | [0.70.3] | Weibull | [0.0051.5] | [3.02.0] | [0.30.5] | U1(5,9)U2(2,6) |
M2 | 200 | 2 | 1 | 2 | [0.50.5] | Gompertz | [0.20.7] | [1.52.0] | [0.80.1−0.60.1] | U1(4,9)U2(4,9) |
M3 | 400 | 2 | 2 | 1 | [0.50.5] | Weibull | [0.0030.002] | [0.50.7] | [0.8−0.5−0.60.5] | U1(12,15)U2(10,13) |
M4 | 400 | 3 | 1 | 1 | [0.350.300.35] | Gompertz | [0.00020.0020.0003] | [0.72.00.8] | [−0.80.21.0] | U1(10,15)U2(4,6)U3(15,20) |
1: sample size; 2: number of risk types; 3: number of covariates; 4: degree of models; 5: baseline hazard function; 6: censored times are generated from a uniform distribution Ui(a,b) for i=1, …, g. |
p | ββ | CR | MsSSE/n | ||
M1 | True1 | [0.70.3] | [0.30.5] | ||
Piecewise2 | [0.5610.439] | [0.5280.851] | 0.860 | 0.810 | |
Kernel3, bw4=1.0 | [0.6720.328] | [0.3360.586] | 0.945 | 0.659 | |
M2 | True | [0.50.5] | [0.80.1−0.60.1] | ||
Piecewise | [0.6410.958] | [0.6740.136−1.1360.298] | 0.705 | 0.963 | |
Kernel, bw=0.5 | [0.5230.476] | [0.7380.078−0.7620.146] | 0.855 | 0.910 | |
M3 | True | [0.50.5] | [0.8−0.5−0.60.5] | ||
Piecewise | [0.5080.491] | [0.993−0.562−0.5620.608] | 0.838 | 1.240 | |
Kernel, bw=0.4 | [0.4780.522] | [0.885−0.534−0.6280.521] | 0.843 | 1.142 | |
M4 | True | [0.350.300.35] | [−0.80.21.0] | ||
Piecewise | [0.3990.2650.335] | [−0.9380.9201.137] | 0.693 | 1.211 | |
Kernel, bw=0.9 | [0.3680.3060.325] | [−0.8060.1920.927] | 0.873 | 0.828 | |
1: true parameters; 2: piecewise constant estimates; 3: kernel estimates; 4: bandwidth. |
bias_ p 3 | MSE_ p 4 | bias_ ββ 5 | MSE_ ββ 6 | ¯ARB | ¯CR | ¯MsSSE | ||
M1 | Piecewise1 | [−0.1600.160] | [0.0260.026] | [0.0880.275] | [0.0200.076] | 0.401 | 0.699 | 0.796 |
Kernel2 | [−0.0350.035] | [0.0020.002] | [−0.073−0.007] | [0.0070.000] | 0.107 | 0.856 | 0.653 | |
M2 | Piecewise | [0.132−0.132] | [0.0170.017] | [−0.0970.041−0.6520.172] | [0.0100.0010.4290.029] | 0.646 | 0.680 | 1.329 |
Kernel | [0.089−0.089] | [0.0080.008] | [−0.1230.054−0.3110.017] | [0.0180.0060.1240.000] | 0.292 | 0.774 | 1.009 | |
M3 | Piecewise | [0.028−0.028] | [0.0000.000] | [0.167−0.091−0.0790.046] | [0.0280.0080.0060.002] | 0.122 | 0.847 | 1.271 |
Kernel | [−0.0060.006] | [0.0000.000] | [0.033−0.0200.069−0.051] | [0.0010.0000.0040.002] | 0.054 | 0.849 | 1.097 | |
M4 | Piecewise | [0.043−0.0550.012] | [0.0010.0030.000] | [−0.0030.7910.251] | [0.0020.6270.063] | 0.766 | 0.646 | 0.737 |
Kernel | [0.018−0.0420.023] | [0.0000.0010.000] | [0.0320.071−0.014] | [0.0020.0090.000] | 0.112 | 0.799 | 0.565 | |
1: piecewise constant estimates; 2: kernel estimates; 3: bias of p ; 4: mean square error (MSE) of p ; 5: bias of ββ ; 6: mean square error (MSE) of ββ . |
VPC | VNPC | VPE | VNPE | AIC | BIC | VaRaS | VsRaS | VaRmS | VsRmS | |
M1 | 0.962 | 0.880 | 0.976 | 0.894 | 0.964 | 0.950 | 0.896 | 0.896 | 0.984 | 0.992 |
M2 | 0.954 | 0.564 | 0.963 | 0.485 | 0.524 | 0.631 | 0.863 | 0.851 | 0.981 | 0.990 |
M3 | 1.000 | 0.798 | 1.000 | 0.868 | 0.998 | 0.998 | 0.994 | 0.998 | 1.000 | 1.000 |
M4 | 0.486 | 0.780 | 0.413 | 0.810 | 0.646 | 0.660 | 0.923 | 0.916 | 0.813 | 0.703 |
VPC | VNPC | VPE | VNPE | AIC | BIC | VaRaS | VsRaS | VaRmS | VsRmS | |
g = 2 | 0.7813 | 0.5626 | 0.3369 | 0.5720 | 4.1518 | 4.2989 | 0.5894 | 0.4437 | 0.5894 | 0.4437 |
g = 3 | 0.6684 | 0.5027 | 0.5260 | 0.6135 | 4.5012 | 4.7262 | 0.3783 | 0.1974 | 0.5016 | 0.2943 |
g = 4 | 0.5581 | 0.4109 | 0.7564 | 0.7075 | 4.7967 | 5.0996 | 0.4746 | 57.572 | 0.6123 | 98.534 |
Note: (1) g represents the number of risk types when estimating. (2) The optimal values of g according to each index are highlighted in bold. |
1st type of risk | 2nd type of risk | 3rd type of risk | |||
p | 0.2132 | 0.3930 | 0.3936 | ||
β | RX | −0.0296*(0.1267) | 0.3546*(0.1414) | 0.7589*(0.1425) | |
AG | 0.3144*(0.1143) | 1.7445*(0.1041) | 1.8104*(0.1396) | ||
WT | −0.0817*(0.0916) | 1.7915*(0.0967) | −0.5555*(0.1290) | ||
PF | 1.4742*(0.2233) | 0.1244*(0.2527) | 1.6468*(0.3325) | ||
HX | 3.0027*(0.1176) | 1.2829*(0.1377) | −0.6092*(0.1486) | ||
HG | 0.8489*(0.1536) | 1.6074*(0.1669) | −5.2153*(0.7267) | ||
SZ | 0.8567*(0.2119) | 3.0334*(0.1998) | −3.2661*(0.4074) | ||
SG | 4.3184*(0.1010) | −0.3907*(0.1419) | −0.9933*(0.1560) | ||
Note: * denotes P-value < 0.05. |
Kernel function | K(u) |
Gaussian | K(u)=1√2πe−12u2,−∞<u<∞ |
Epanechnikov | K(u)=34(1−u2),|u|⩽1 |
Biweight | K(u)=1516(1−u2)2,|u|⩽1 |
Triweight | K(u)=3532(1−u2)3,|u|⩽1 |
Model | n1 | g2 | d3 | k4 | p=[p1⋮pg] | BH5 | λλ=[λ1⋮λg] | ρρ=[ρ1⋮ρg] | ββ=[ββ1T⋮ββgT] | Ui 6 |
M1 | 200 | 2 | 1 | 1 | [0.70.3] | Weibull | [0.0051.5] | [3.02.0] | [0.30.5] | U1(5,9)U2(2,6) |
M2 | 200 | 2 | 1 | 2 | [0.50.5] | Gompertz | [0.20.7] | [1.52.0] | [0.80.1−0.60.1] | U1(4,9)U2(4,9) |
M3 | 400 | 2 | 2 | 1 | [0.50.5] | Weibull | [0.0030.002] | [0.50.7] | [0.8−0.5−0.60.5] | U1(12,15)U2(10,13) |
M4 | 400 | 3 | 1 | 1 | [0.350.300.35] | Gompertz | [0.00020.0020.0003] | [0.72.00.8] | [−0.80.21.0] | U1(10,15)U2(4,6)U3(15,20) |
1: sample size; 2: number of risk types; 3: number of covariates; 4: degree of models; 5: baseline hazard function; 6: censored times are generated from a uniform distribution Ui(a,b) for i=1, …, g. |
p | ββ | CR | MsSSE/n | ||
M1 | True1 | [0.70.3] | [0.30.5] | ||
Piecewise2 | [0.5610.439] | [0.5280.851] | 0.860 | 0.810 | |
Kernel3, bw4=1.0 | [0.6720.328] | [0.3360.586] | 0.945 | 0.659 | |
M2 | True | [0.50.5] | [0.80.1−0.60.1] | ||
Piecewise | [0.6410.958] | [0.6740.136−1.1360.298] | 0.705 | 0.963 | |
Kernel, bw=0.5 | [0.5230.476] | [0.7380.078−0.7620.146] | 0.855 | 0.910 | |
M3 | True | [0.50.5] | [0.8−0.5−0.60.5] | ||
Piecewise | [0.5080.491] | [0.993−0.562−0.5620.608] | 0.838 | 1.240 | |
Kernel, bw=0.4 | [0.4780.522] | [0.885−0.534−0.6280.521] | 0.843 | 1.142 | |
M4 | True | [0.350.300.35] | [−0.80.21.0] | ||
Piecewise | [0.3990.2650.335] | [−0.9380.9201.137] | 0.693 | 1.211 | |
Kernel, bw=0.9 | [0.3680.3060.325] | [−0.8060.1920.927] | 0.873 | 0.828 | |
1: true parameters; 2: piecewise constant estimates; 3: kernel estimates; 4: bandwidth. |
bias_ p 3 | MSE_ p 4 | bias_ ββ 5 | MSE_ ββ 6 | ¯ARB | ¯CR | ¯MsSSE | ||
M1 | Piecewise1 | [−0.1600.160] | [0.0260.026] | [0.0880.275] | [0.0200.076] | 0.401 | 0.699 | 0.796 |
Kernel2 | [−0.0350.035] | [0.0020.002] | [−0.073−0.007] | [0.0070.000] | 0.107 | 0.856 | 0.653 | |
M2 | Piecewise | [0.132−0.132] | [0.0170.017] | [−0.0970.041−0.6520.172] | [0.0100.0010.4290.029] | 0.646 | 0.680 | 1.329 |
Kernel | [0.089−0.089] | [0.0080.008] | [−0.1230.054−0.3110.017] | [0.0180.0060.1240.000] | 0.292 | 0.774 | 1.009 | |
M3 | Piecewise | [0.028−0.028] | [0.0000.000] | [0.167−0.091−0.0790.046] | [0.0280.0080.0060.002] | 0.122 | 0.847 | 1.271 |
Kernel | [−0.0060.006] | [0.0000.000] | [0.033−0.0200.069−0.051] | [0.0010.0000.0040.002] | 0.054 | 0.849 | 1.097 | |
M4 | Piecewise | [0.043−0.0550.012] | [0.0010.0030.000] | [−0.0030.7910.251] | [0.0020.6270.063] | 0.766 | 0.646 | 0.737 |
Kernel | [0.018−0.0420.023] | [0.0000.0010.000] | [0.0320.071−0.014] | [0.0020.0090.000] | 0.112 | 0.799 | 0.565 | |
1: piecewise constant estimates; 2: kernel estimates; 3: bias of p ; 4: mean square error (MSE) of p ; 5: bias of ββ ; 6: mean square error (MSE) of ββ . |
VPC | VNPC | VPE | VNPE | AIC | BIC | VaRaS | VsRaS | VaRmS | VsRmS | |
M1 | 0.962 | 0.880 | 0.976 | 0.894 | 0.964 | 0.950 | 0.896 | 0.896 | 0.984 | 0.992 |
M2 | 0.954 | 0.564 | 0.963 | 0.485 | 0.524 | 0.631 | 0.863 | 0.851 | 0.981 | 0.990 |
M3 | 1.000 | 0.798 | 1.000 | 0.868 | 0.998 | 0.998 | 0.994 | 0.998 | 1.000 | 1.000 |
M4 | 0.486 | 0.780 | 0.413 | 0.810 | 0.646 | 0.660 | 0.923 | 0.916 | 0.813 | 0.703 |
VPC | VNPC | VPE | VNPE | AIC | BIC | VaRaS | VsRaS | VaRmS | VsRmS | |
g = 2 | 0.7813 | 0.5626 | 0.3369 | 0.5720 | 4.1518 | 4.2989 | 0.5894 | 0.4437 | 0.5894 | 0.4437 |
g = 3 | 0.6684 | 0.5027 | 0.5260 | 0.6135 | 4.5012 | 4.7262 | 0.3783 | 0.1974 | 0.5016 | 0.2943 |
g = 4 | 0.5581 | 0.4109 | 0.7564 | 0.7075 | 4.7967 | 5.0996 | 0.4746 | 57.572 | 0.6123 | 98.534 |
Note: (1) g represents the number of risk types when estimating. (2) The optimal values of g according to each index are highlighted in bold. |
1st type of risk | 2nd type of risk | 3rd type of risk | |||
p | 0.2132 | 0.3930 | 0.3936 | ||
β | RX | −0.0296*(0.1267) | 0.3546*(0.1414) | 0.7589*(0.1425) | |
AG | 0.3144*(0.1143) | 1.7445*(0.1041) | 1.8104*(0.1396) | ||
WT | −0.0817*(0.0916) | 1.7915*(0.0967) | −0.5555*(0.1290) | ||
PF | 1.4742*(0.2233) | 0.1244*(0.2527) | 1.6468*(0.3325) | ||
HX | 3.0027*(0.1176) | 1.2829*(0.1377) | −0.6092*(0.1486) | ||
HG | 0.8489*(0.1536) | 1.6074*(0.1669) | −5.2153*(0.7267) | ||
SZ | 0.8567*(0.2119) | 3.0334*(0.1998) | −3.2661*(0.4074) | ||
SG | 4.3184*(0.1010) | −0.3907*(0.1419) | −0.9933*(0.1560) | ||
Note: * denotes P-value < 0.05. |