Flexible functional data smoothing and optimization using beta spline

Wan Anis Farhah Wan Amir; Md Yushalify Misro; Mohd Hafiz Mohd; Wan Anis Farhah Wan Amir; Md Yushalify Misro; Mohd Hafiz Mohd

doi:10.3934/math.20241126

AIMS Mathematics

2024, Volume 9, Issue 9: 23158-23181. doi: 10.3934/math.20241126

Previous Article Next Article

Research article Special Issues

Flexible functional data smoothing and optimization using beta spline

School of Mathematical Sciences, Universiti Sains Malaysia, 11800 Gelugor, Pulau Pinang, Malaysia

Received: 27 May 2024 Revised: 19 July 2024 Accepted: 23 July 2024 Published: 30 July 2024
MSC : 62R10, 65D07, 93E14

Functional data analysis (FDA) is a method used to analyze data represented in its functional form. The method is particularly useful for exploring both curve and longitudinal data in both exploratory and inferential contexts, with minimal constraints on the parameters. In FDA, the choice of basis function is crucial for the smoothing process. However, traditional basis functions lack flexibility, limiting the ability to modify the shape of curves and accurately represent abnormal details in modern and complex datasets. This study introduced a novel and flexible data smoothing technique for interpreting functional data, employing the beta spline introduced by Barsky in 1981. The beta spline offers flexibility due to the inclusion of two shape parameters. The proposed methodology integrated the roughness penalty approach and generalized cross-validation (GCV) to identify the optimal curve that best fitted the data, ensuring appropriate parameters were considered for transforming data into a functional form. The effectiveness of the approach was assessed by analyzing the GCV color grid chart to determine the optimal curve. In contrast to existing methodologies, the proposed method enhanced flexibility by incorporating the beta spline into the smoothing procedure. This approach was anticipated to effectively handle various forms of time series data, offering improved interpretability and accuracy in data analysis, including forecasting.

Keywords:

Citation: Wan Anis Farhah Wan Amir, Md Yushalify Misro, Mohd Hafiz Mohd. Flexible functional data smoothing and optimization using beta spline[J]. AIMS Mathematics, 2024, 9(9): 23158-23181. doi: 10.3934/math.20241126

Related Papers:

[1]	Xin Zhang, Zhaobin Ma, Bowen Ding, Wei Fang, Pengjiang Qian . A coevolutionary algorithm based on the auxiliary population for constrained large-scale multi-objective supply chain network. Mathematical Biosciences and Engineering, 2022, 19(1): 271-286. doi: 10.3934/mbe.2022014
[2]	Jianquan Guo, Guanlan Wang, Mitsuo Gen . Green closed-loop supply chain optimization strategy considering CER and incentive-compatibility theory under uncertainty. Mathematical Biosciences and Engineering, 2022, 19(9): 9520-9549. doi: 10.3934/mbe.2022443
[3]	Zhuang Shan, Leyou Zhang . A new Tseng method for supply chain network equilibrium model. Mathematical Biosciences and Engineering, 2023, 20(5): 7828-7844. doi: 10.3934/mbe.2023338
[4]	Jing Yin, Ran Huang, Hao Sun, Taosheng Lin . A collaborative scheduling model for production and transportation of ready-mixed concrete. Mathematical Biosciences and Engineering, 2023, 20(4): 7387-7406. doi: 10.3934/mbe.2023320
[5]	Ping Wang, Rui Chen, Qiqing Huang . Does supply chain finance business model innovation improve capital allocation efficiency? Evidence from the cost of capital. Mathematical Biosciences and Engineering, 2023, 20(9): 16421-16446. doi: 10.3934/mbe.2023733
[6]	Chun Li, Ying Chen, Zhijin Zhao . Frequency hopping signal detection based on optimized generalized S transform and ResNet. Mathematical Biosciences and Engineering, 2023, 20(7): 12843-12863. doi: 10.3934/mbe.2023573
[7]	Gang Zhao, Chang-ping Liu, Qi-sheng Zhao, Min Lin, Ying-bao Yang . A study on aviation supply chain network controllability and control effect based on the topological structure. Mathematical Biosciences and Engineering, 2022, 19(6): 6276-6295. doi: 10.3934/mbe.2022293
[8]	Pablo Flores-Sigüenza, Jose Antonio Marmolejo-Saucedo, Joaquina Niembro-Garcia, Victor Manuel Lopez-Sanchez . A systematic literature review of quantitative models for sustainable supply chain management. Mathematical Biosciences and Engineering, 2021, 18(3): 2206-2229. doi: 10.3934/mbe.2021111
[9]	Siyuan Yin, Yanmei Hu, Yuchun Ren . The parallel computing of node centrality based on GPU. Mathematical Biosciences and Engineering, 2022, 19(3): 2700-2719. doi: 10.3934/mbe.2022123
[10]	Tinghuai Ma, Lei Guo, Xin Wang, Yurong Qian, Yuan Tian, Najla Al-Nabhan . Friend closeness based user matching cross social networks. Mathematical Biosciences and Engineering, 2021, 18(4): 4264-4292. doi: 10.3934/mbe.2021214

Abstract

1. Introduction

Survival analysis is a branch of statistics for analyzing time-to-event data. When looking into survival data, one frequently encounters the problem of competing risks in which samples are subject to multiple kinds of failure. The Cox proportional hazards model, introduced by Cox ^[1], is popular in survival analysis for describing the relationship between the distributions of survival times and covariates and is commonly employed to analyze cause-specific survival data. The traditional approach is to separately fit a Cox proportional hazards model to the data for each failure type, after considering the data with other kinds of failure censored. However, this conventional method encounters problems like the estimates are hard to interpret and the confidence bands of estimated hazards are wide, because the method does not cover all failure types ^[2,3].

An alternative approach is to fit competing risks data by using a mixture model that incorporates the distinct types of failure to partition the population into groups, and it assumes an individual will fail from each risk with the probabilities being attributed to the proportions of each group, respectively. Moreover, the mixture approach is helpful for estimating the effects of covariates in each group through parametric proportional hazard regressions such as Cox’s model. McLachlan and Peel ^[4] noted that a mixture model is allowed for both dependent and independent competing risks and it can improve a model’s fit to the data than the traditional approach in which the causes of failure are assumed to be independent. Mixture models are popular in competing risks analysis, because their resultant estimates are easy to interpret ^[2], although complex.

Semi-parametric mixture models are a generalization of parametric mixture models and have become a prominent approach for modelling data with competing risks. Semiparametric approaches to mixture models are preferable for their ability to adjust for the associated variables and allow for assessing the effects of these variables on both the probabilities of eventual causes of failure through a logistic model and the relevant conditional hazard functions by applying the Cox proportional hazards model (cf. ^[2]). Below, we review the existing semiparametric methods of mixture models for competing risks data.

Ng and McLachlan ^[5] proposed an ECM-based semi-parametric mixture method without specifying the baseline hazard function to analyze competing risks data. They noted that when the component-baseline hazard is not monotonic increasing their semi-parametric approach can consistently produce less biased estimates than those done by fully parametric approaches. Moreover, when the component-baseline hazard is monotonic increasing, the two approaches demonstrate comparable efficiency in the estimation of parameters for mildly and moderately censoring. Chang et al. ^[6] studied non-parametric maximum-likelihood estimators through a semiparametric mixture model for competing risks data. Their model is feasible for right censored data and can provide estimates of quantities like a covariate-specific fatality rate or a covariate-specific expected time length. Moreover, Lu and Peng ^[7] set up a semiparametric mixture regression model to analyze competing risks data under the ordinary mechanism of conditional independent censoring. Choi and Huang ^[8] offered a maximum likelihood scheme for semiparametric mixture models to make efficient and reliable estimations for the cumulative hazard function. One advantage with their approach is that the joint estimations for model parameters connect all considered competing risks under the constraint that all the probabilities of failing from respective causes sum to 1 given any covariates. Other research studies for competing risks data are based on semiparametric mixture models, e.g. ^[5,6,7,8].

Although the mixture hazard model is preferable to direct approaches, two important but challenging issues frequently encountered in the applications are the determination of the number of risk types and the identification of the failure type of each individual.

It is understandable that the results of a mixture model analysis highly depend on the number of components. It is also conceivably hard to cover all types of competing risks in a mixture model. Validity indices are a vital technique in model selection. The cluster validity index is a kind of criterion function to determine the optimal number of clusters. Some cluster validity indices presented by ^[9,10,11] are designed to find an optimal cluster number for fuzzy clustering algorithms; some are only related to the membership, while some take into account the distance between the data sets and cluster centers. Wu et al. ^[12] proposed median-type validity indices, which are robust to noises and outliers. Zhou et al. ^[13] introduced a weighted summation type of validity indices for fuzzy clustering, but they are unfeasible for mixture regression models. Conversely, Henson et al. ^[14] evaluated the ability of several statistical criteria such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC) to produce a proper number of components for latent variable mixture modeling. However, AIC and BIC may present the problem of over- and under-estimation on the number of components ^[15], respectively.

As to the identification of failure types, many studies on the problems of competing risks like ^[5,6,7,8] assumed that the failure type of an individual is known if the subject’s failure time is observed, but if an individual is censored and only the censored time is known, then which failure type the subject fails from is unknown. In fact, even if one observes the failure time, then the true cause of failure might not be clear and needs further investigations. Thus, deciding the number of competing risks and recognizing the failure type of each individual are critical in competing risks analysis, but scant work has been done on them.

Besides the above problems, another critical issue existing in mixture Cox hazard models particularly is the estimation of the baseline hazard function. The Cox proportional hazards model consists of two parts: the baseline hazard function and the proportional regression model. Bender et al. ^[16] assumed that the baseline hazard function follows a specific lifetime distribution, but this assumption is obviously restrictive. A single lifetime distribution may not adequately explain all data —for example, the failure rate is not monotonic increasing or decreasing. Alternatively, some scholars adopted nonparametric approaches to estimate the baseline hazard function that are more flexible. Ng and McLachlan ^[5] assumed the baseline hazard function to be piecewise constant by treating each observed survival time as a cut-off point, but the piecewise constant assumption has the disadvantage that the estimated curve is not smooth, while smoothing is required in several applications ^[17]. In fact, our simulation results also show that the derived estimates based on a piecewise constant hazard function are not sufficient in some cases (e.g. model 4 in Figure 4). Understandably, an inadequate estimation of the baseline function affects the selection of the number of model components; and hence leads to insufficient estimates of the model parameters.

Figure 1. Plot of different types of kernel function.

DownLoad: Full-Size Img PowerPoint

Figure 2. The scatter plot of the observed data and the expectation of the survival time E(T) from non-mixture model (a), mixture models with two types of risk (b), and three types of risk (c). Note that ER represents the type of risk for estimation.

DownLoad: Full-Size Img PowerPoint

Figure 3. The scatter plot of a sample data set for models M₁~M₄ respectively.

DownLoad: Full-Size Img PowerPoint

Figure 4. Plots of a single run simulated series by models M₁~M₄ respectively; E(T)_Ri: the i^th type of risk for estimation; CBH: cumulative baseline hazard rate; R_Ri: real cumulative baseline hazard function for the i^th type of risk; (M_i-j): model M_i is fitted and j = 1, 3: the scatter plots of the observed data and the estimated expectation curves; j = 2, 4: plots of the estimated cumulative baseline hazard functions; j = 1, 2: estimated by piecewise constant assumption; j = 3, 4: estimated by kernel method.

DownLoad: Full-Size Img PowerPoint

In order to solve the above mentioned problems with the Cox mixture hazard modelling for competing risks data, we propose four indices and the kernel estimation for the base line function in this paper. Validity indices are a vital technique in model selection, but they have been less utilized for deciding the number of components of a mixture regression model. By using posterior probabilities and residual functions, we propose four validity indices that are applicable to regression models in this study. Under the EM-based mixture model, the posterior probabilities play an important role in classifying data, which take role of data memberships in fuzzy clustering. Unlike the traditional regression model, the survival model does not meet the assumption that survival time variation is constant for each covariate. Therefore, we incorporate the functions of posterior probabilities and the sum of standard residuals to constitute the new validity indices and verify the effectiveness of the proposed new indices through extensive simulations. Moreover, we extend the kernel method of Guilloux et al. ^[18] to estimate the baseline hazard function smoothly and hence more accurately.

The remainder of this paper is organized as follows. Section 2 introduces the mixture Cox proportional hazards regression model, develops an EM-based algorithm to estimate the model parameters, and also discusses kernel estimations for the baseline hazard function. Section 3 constructs four validity indices for selecting the number of model components in a mixture Cox proportional hazards regression model. Section 4 carries out several simulations and assesses the effectiveness of our validity indices. Section 5 analyzes a practical data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Finally, Section 6 states conclusions and a discussion.

2. Mixture Cox proportional hazards model with kernel estimation

2.1. Mixture Cox proportional hazards model

For mixture model analysis, suppose each member of a population can be categorized into g mutually exclusive clusters according to its failure type. Let $D = \left\{ {\; ({t_j}, \mathit{\boldsymbol{X}}_j^T, {\delta _j})\; :j = 1, \cdots, n} \right\}$ , be a sample drawn from this population where T denotes the transpose of a vector, ${t_j}$ is the failure or right censoring time, ${\mathit{\boldsymbol{X}}_j} = {({x_{j1}}, {x_{j2}}, ..., {x_{j\, d}})^T}$ is a d-dimensional vector of covariates, and:

${\delta _j} = \left\{ {\begin{array}{*{20}{c}} {1, }&{\rm{if}}\;\;{\rm{the}}\;\;j{\rm{ - th}}\;\;{\rm{individaul}}\;\;{\rm{is}}\;\;{\rm{uncensored}}, \\ {0, }&{\rm{if}}\;\;{\rm{the}}\;\;j{\rm{ - th}}\;\;{\rm{individaul}}\;\;{\rm{is}}\;\;{\rm{censored}}. \end{array}} \right.$

The mixture probability density function (pdf) of t is defined by:

$f(t) = \sum\limits_{i = 1}^g {{p_i} \cdot {f_i}(t)}$ , subject to

$\sum\limits_{i = 1}^g {{p_i}} = 1,$

(1)

where ${p_i}$ is the mixing probability of failure due to the i^th type of risk and g is the number of model components.

In the i^th component, the hazard function $\; {h_i}(t\left| {{\mathit{\boldsymbol{X}}_j}, {\mathit{\boldsymbol{\beta }}_i}} \right.)\;$ given covariate ${\mathit{\boldsymbol{X}}_j}$ follows a Cox proportional hazards model defined by

${h_i}(t\left| {{\mathit{\boldsymbol{X}}_j}, {\mathit{\boldsymbol{\beta }}_i}} \right.) = {h_{0i}}(t)\exp ({\mathit{\boldsymbol{X}}_j}^T{\mathit{\boldsymbol{\beta }}_i}),$

(2)

where ${\mathit{\boldsymbol{\beta }}_i} = {({\beta _{i1}}, {\beta _{i2}}, ..., {\beta _{id}})^T}$ is the vector of regression coefficients, and ${h_{0i}}(t)$ is the baseline hazard function of the i^th component. We define the i^th component-survival function and pdf by:

${S_i}(t\left| {{\mathit{\boldsymbol{X}}_j}} \right., {\mathit{\boldsymbol{\beta }}_i}) = \exp \left[ { - {H_{0i}}(t)\exp ({\mathit{\boldsymbol{X}}_j}^T{\mathit{\boldsymbol{\beta }}_i})} \right]$

and

${f_i}(t\left| {{\mathit{\boldsymbol{X}}_j}} \right., {\mathit{\boldsymbol{\beta }}_i}) = {h_{0i}}(t)\exp \left[ {{\mathit{\boldsymbol{X}}_j}^T{\mathit{\boldsymbol{\beta }}_i} - {H_{0i}}(t)\exp ({\mathit{\boldsymbol{X}}_j}^T{\mathit{\boldsymbol{\beta }}_i})} \right] ,$

where ${H_{0i}}(t) = \int_0^t {{h_{0i}}(s)} \; ds$ is the i^th component-cumulative baseline hazard function.

The unknown parameters are the mixing probabilities $\mathit{\boldsymbol{p}} = {({p_{\rm{1}}}, {p_2}, ..., {p_{g - 1}})^T}$ and regression coefficients $\mathit{\boldsymbol{\beta }} = {({\mathit{\boldsymbol{\beta }}_{\rm{1}}}^T, {\mathit{\boldsymbol{\beta }} _2}^T, ..., {\mathit{\boldsymbol{\beta }}_g}^T)^T}$ , where ${\mathit{\boldsymbol{\beta }}_i} = {({\beta _{i1}}, {\beta _{i2}}, ..., {\beta _{id}})^T}$ . Based on (1) and Zhou ^[19], the log-likelihood function under the mixture hazards model with right censored data is given by

$l(\mathit{\boldsymbol{p}}, \mathit{\boldsymbol{\beta }}) = \sum\limits_{j = 1}^n {\sum\limits_{i = 1}^g {\left\{ {{\delta _j}\log \left[ {{p_i} \cdot {f_i}({t_j}\left| {{\mathit{\boldsymbol{X}}_j}, {\mathit{\boldsymbol{\beta }}_i}} \right.)} \right] + (1 - {\delta _j})\log \left[ {{p_i} \cdot {S_i}({t_j}\left| {{\mathit{\boldsymbol{X}}_j}, {\mathit{\boldsymbol{\beta }}_i}} \right.)} \right]} \right\}} } ,$

where $f({t_j}\left| {{\mathit{\boldsymbol{X}}_j}} \right., \mathit{\boldsymbol{\beta }}) = \sum\limits_{i = 1}^g {{p_i} \cdot {f_i}({t_j}\left| {{\mathit{\boldsymbol{X}}_j}, {\mathit{\boldsymbol{\beta }}_i}} \right.)}$ and $S({t_j}\left| {{\mathit{\boldsymbol{X}}_j}, \mathit{\boldsymbol{\beta }}} \right.) = \sum\limits_{i = 1}^g {{p_i} \cdot {S_i}({t_j}\left| {{\mathit{\boldsymbol{X}}_j}} \right., {\mathit{\boldsymbol{\beta }}_i})}$ .

Assume that the true causes of failure for an individual are unobserved, and hence the data are incomplete. We introduce the latent variable ${z_{ij}}$ as:

${z_{ij}} = \left\{ \begin{array}{l} {\rm{1,if}}\;j - {\rm{th}}\;\;{\rm{individual}}\;\;{\rm{fails}}\;\;{\rm{due}}\;\;{\rm{to}}\;\;i{\rm{ - th}}\;\;{\rm{type}}\;\;{\rm{of}}\;\;{\rm{risk;}}\\ {\rm{0,}}\;{\rm{otherwise}}. \end{array} \right.$

The complete-data log-likelihood function is given by:

${l_c}(\mathit{\boldsymbol{p}}, \mathit{\boldsymbol{\beta }} ) = \sum\limits_{j = 1}^n {\sum\limits_{i = 1}^g {{z_{ij}}\left\{ {{\delta _j}\log \left[ {{p_i} \cdot {f_i}({t_j}\left| {{\mathit{\boldsymbol{X}}_j}} \right., {\mathit{\boldsymbol{\beta }}_i})} \right] + (1 - {\delta _j})\log \left[ {{p_i} \cdot {S_i}({t_j}\left| {{\mathit{\boldsymbol{X}}_j}} \right., {\mathit{\boldsymbol{\beta }}_i})} \right]} \right\}} }.$

(3)

Subsequently, the parameters are estimated through the expectation and maximization (EM) algorithm.

E-step: On the (k+1)^th iteration of E-step, we calculate the conditional expectation of the complete-data log-likelihood (3) given the current estimates of the parameters, i.e.:

$\begin{array}{l} E\left[ {{l_c}(\mathit{\boldsymbol{p}}, \mathit{\boldsymbol{\beta }} )\left| {{\mathit{\boldsymbol{p}}^{(k)}}, {\mathit{\boldsymbol{\beta }} ^{(k)}}} \right.} \right] = \sum\limits_{j = 1}^n {\sum\limits_{i = 1}^g {{z_{ij}}^{(k)}\left\{ {{\delta _j}\log \left[ {{p_i} \cdot {f_i}({t_j}\left| {{\mathit{\boldsymbol{X}}_j}} \right., {\mathit{\boldsymbol{\beta }}_i})} \right] + (1 - {\delta _j})\log \left[ {{p_i} \cdot {S_i}({t_j}\left| {{\mathit{\boldsymbol{X}}_j}} \right., {\mathit{\boldsymbol{\beta }}_i})} \right]} \right\}} } \\ \quad \quad \quad \quad \quad \quad \quad \;\;\quad \; = \sum\limits_{j = 1}^n {\sum\limits_{i = 1}^g {{z_{ij}}^{(k)}\log {p_i}} } \\ \quad \quad \quad \quad \quad \quad \quad \;\;\quad \;\;\;\; + \sum\limits_{j = 1}^n {{z_{1j}}^{(k)}\left[ {{\delta _j}\log {f_1}({t_j}\left| {{\mathit{\boldsymbol{X}}_j}} \right., {\mathit{\boldsymbol{\beta }} _1}) + (1 - {\delta _j})\log {S_1}({t_j}\left| {{\mathit{\boldsymbol{X}}_j}} \right., {\mathit{\boldsymbol{\beta }} _1})} \right]} \\ \quad \quad \quad \quad \quad \quad \quad \;\;\quad \;\;\;\; + \quad \vdots \\ \quad \quad \quad \quad \quad \quad \quad \;\;\quad \;\;\;\; + \sum\limits_{j = 1}^n {{z_{gj}}^{(k)}\left[ {{\delta _j}\log {f_g}({t_j}\left| {{\mathit{\boldsymbol{X}}_j}, {\mathit{\boldsymbol{\beta }} _g}} \right.) + (1 - {\delta _j})\log {S_g}({t_j}\left| {{\mathit{\boldsymbol{X}}_j}, {\mathit{\boldsymbol{\beta }} _g}} \right.)} \right]} \\ \quad \quad \quad \quad \quad \quad \quad \;\;\quad \; = {Q_0} + {Q_1} + \cdots + {Q_g}\;{\rm{.}} \\ \end{array}$

(4)

Here, ${\mathit{\boldsymbol{p}}^{(k)}}$ and ${\mathit{\boldsymbol{\beta }}^{(k)}}$ are the estimates of $\mathit{\boldsymbol{p}}$ and $\mathit{\boldsymbol{\beta }}$ , respectively, in the k^th iteration. By Baye’s Theorem, we have:

${z_{ij}}^{(k)} = E\left( {{z_{ij}}\left| {{\mathit{\boldsymbol{p}}^{(k)}}, {\mathit{\boldsymbol{\beta }} ^{(k)}}} \right.} \right) = \frac{{{p_i}^{(k)}{f_i}{{({t_j}\left| {{\mathit{\boldsymbol{X}}_j}, {\mathit{\boldsymbol{\beta }}_i}^{(k)}} \right.)}^{{\delta _j}}}{S_i}{{({t_j}\left| {{\mathit{\boldsymbol{X}}_j}, {\mathit{\boldsymbol{\beta }}_i}^{(k)}} \right.)}^{1 - {\delta _j}}}}}{{\sum\limits_{l = 1}^g {{p_l}^{(k)}{f_l}{{({t_j}\left| {{\mathit{\boldsymbol{X}}_j}, {\mathit{\boldsymbol{\beta }} _l}^{(k)}} \right.)}^{{\delta _j}}}{S_l}{{({t_j}\left| {{\mathit{\boldsymbol{X}}_j}, {\mathit{\boldsymbol{\beta }} _l}^{(k)}} \right.)}^{1 - {\delta _j}}}} }}$

(5)

which is the posterior probability that the j^th individual with survival time $\; {t_j}\;$ fails due to the i^th type of risk.

M-step: The (k+1)^th iteration of M-step provides the updated estimates ${\mathit{\boldsymbol{p}}^{(k + 1)}}$ and ${\mathit{\boldsymbol{\beta }}^{(k + 1)}}$ that maximizes (4) with respect to $\mathit{\boldsymbol{p}}$ and $\mathit{\boldsymbol{\beta }}$ .

Under the constraints $\sum\limits_{i = 1}^g {{p_i}} = 1$ , to maximize ${Q_0} = \sum\limits_{j = 1}^n {\sum\limits_{i = 1}^g {{z_{ij}}^{(k)}\log {p_i}} }$ from (4), we obtain the estimation of mixing probability with

$\;{p_i}^{(k + 1)} = \frac{{\sum\limits_{j = 1}^n {{z_{ij}}^{(k)}} }}{n}.$

(6)

The equation ${Q_i}$ from (4) for $i = 1, ..., g$ can be written as:

${Q_i} = \sum\limits_{j = 1}^n {{z_{ij}}^{(k)}\left\{ {{\delta _j}\left[ {\log {h_{0i}}({t_j}) + {\mathit{\boldsymbol{X}}_j}^T{\mathit{\boldsymbol{\beta }}_i}} \right] - \exp ({\mathit{\boldsymbol{X}}_j}^T{\mathit{\boldsymbol{\beta }} _i}){H_{0i}}({t_j})} \right\}} \;{\rm{.}}$

(7)

Define the score vector $U({\mathit{\boldsymbol{\beta }} _i})$ for $i = 1, ..., g$ as the first derivate of (7) with respect to the vector ${\mathit{\boldsymbol{\beta }}_i}$ given ${H_{0i}}(t)$ fixed at ${H_{0i}}^{{\rm{(}}k + 1{\rm{)}}}(t)$ , and the estimation ${\mathit{\boldsymbol{\beta }}_i}^{{\rm{(}}k{\rm{ + 1)}}}$ satisfies the equation:

$U({\mathit{\boldsymbol{\beta }} _i}) = {\left. {\frac{{\partial {Q_i}}}{{\partial {\mathit{\boldsymbol{\beta }} _i}}}} \right|_{{H_{0i}}({t_j}){\rm{ = }}{H_{0i}}^{{\rm{(}}k + 1{\rm{)}}}({t_j})}} = \sum\limits_{j = 1}^n {{z_{ij}}^{(k)}\left[ {{\delta _j} - \exp ({\mathit{\boldsymbol{X}}_j}^T{\mathit{\boldsymbol{\beta }} _i}){H_{0i}}^{{\rm{(}}k + 1{\rm{)}}}({t_j})} \right]\;} {\mathit{\boldsymbol{X}}_j}{\rm{ = }}\;{\rm{0}}.$

(8)

2.2. Kernel estimation for the baseline hazard function

To estimate the baseline hazard function under the mixture hazards model, we propose the kernel estimator. Define ${N_j}(t) = I({t_j} \leqslant t\; \wedge \; {\delta _j} = 1)$ as an event counting process and ${Y_j}(t) = I({t_j} \geqslant t)$ as risk process. The updated kernel estimator of i^th component-baseline hazard function ${h_{0i}}(t)$ on the (k+1)^th iteration is defined by:

${h_{0i}}^{{\rm{(}}k + 1{\rm{)}}}(t\left| {\mathit{\boldsymbol{X}}, {\mathit{\boldsymbol{Z}}_i}^{(k)}, {\mathit{\boldsymbol{\beta }} _i}^{(k)}} \right., {b^{(k)}}) = \frac{1}{{{b^{(k)}}}}\int_0^\tau {K\left( {\frac{{t - u}}{{{b^{(k)}}}}} \right)} \;d{H_{0i}}^{{\rm{(}}k{\rm{ + 1)}}}(u\left| {\mathit{\boldsymbol{X}}, {\mathit{\boldsymbol{Z}}_i}^{(k)}, {\mathit{\boldsymbol{\beta }} _i}^{(k)}} \right.), \tau \geqslant 0,$

(9)

where $\; K:\mathbb{R} \to \mathbb{R}\;$ is a kernel function, and ${b^{(k)}}$ is a positive parameter called the bandwidth. There are several types of kernel functions commonly used, appearing in Table 1 and Figure 1. We try these kernel functions in the simulated examples and find no significant differences. In this paper, we choose biweight as the kernel function to estimate the baseline hazard function.

Table 1. Different types of kernel function.

Kernel function	$K(u)$
Gaussian	$K(u) = \frac{{\rm{1}}}{{\sqrt {{\rm{2}}\pi } }}{e^{ - \frac{1}{2}{u^2}}}\quad, \; - \infty < u < \infty$
Epanechnikov	$K(u) = \frac{{\rm{3}}}{{\rm{4}}}{\rm{(1}} - {u^2}{\rm{)}}\quad, \; \left\| u \right\| \leqslant 1$
Biweight	$K(u) = \frac{{{\rm{15}}}}{{{\rm{16}}}}{{\rm{(1}} - {u^2}{\rm{)}}^{\rm{2}}}\quad, \; \left\| u \right\| \leqslant 1$
Triweight	$K(u) = \frac{{{\rm{35}}}}{{{\rm{32}}}}{{\rm{(1}} - {u^2}{\rm{)}}^{\rm{3}}}\quad, \; \left\| u \right\| \leqslant 1$

| Show Table

DownLoad: CSV

Derived by smoothing the increments of the Breslow estimator, the kernel estimator (9) can be written as:

${h_{0i}}^{{\rm{(}}k + 1{\rm{)}}}(t\left| {\mathit{\boldsymbol{X}}, {\mathit{\boldsymbol{Z}}_i}^{(k)}, {\mathit{\boldsymbol{\beta }} _i}^{(k)}} \right., {b^{(k)}}) = \frac{1}{{{b^{(k)}}}}\sum\limits_{j = 1}^n {\int_0^\tau {K\left( {\frac{{t - u}}{{{b^{(k)}}}}} \right)} \frac{{{z_{ij}}^{(k)}I(\overline Y (u) \gt 0)}}{{{S_{n\, i}}(u\left| {\mathit{\boldsymbol{X}}, {\mathit{\boldsymbol{Z}}_i}^{(k)}, {\mathit{\boldsymbol{\beta }} _i}^{(k)}} \right.)}}} , \;d{N_j}(u),$

(10)

where $\overline Y = \frac{1}{n}\sum\limits_{j = 1}^n {{Y_j}}$ and ${S_{n\, i}}(u\left| {\mathit{\boldsymbol{X}}, {\mathit{\boldsymbol{Z}}_i}, {\mathit{\boldsymbol{\beta }} _i}} \right.) = \sum\limits_{j = 1}^n {{z_{ij}}\exp ({{\mathit{\boldsymbol{X}}}_j}^T{\mathit{\boldsymbol{\beta }}_i}){Y_j}(u)}$ .

Horova et al. ^[20] and Patil ^[21] introduced the cross-validation method to select the bandwidth of the kernel estimator. We define the cross-validation function for bandwidth b written as CV(b) under our model as:

$CV(b) = \sum\limits_{i = 1}^g {\sum\limits_{j = 1}^n {{z_{ij}}^{{\rm{(}}k{\rm{)}}} \cdot {{\left[ {{h_{{\rm{0}}i}}^{{\rm{(}}k + 1{\rm{)}}}(\, {t_j}\left| {{\mathit{\boldsymbol{X}}, {\mathit{\boldsymbol{Z}}_i}^{(k)}, {\mathit{\boldsymbol{\beta }} _i}^{(k)}}, {b^{(k)}}} \right.{\rm{)}}} \right]}^{\rm{2}}}} } - 2\sum\limits_{i = 1}^g {\mathop {\sum {\sum \, } }\limits_{l \ne j\;\;\;} \frac{1}{{{b^{(k)}}}}K\left( {\frac{{{t_l} - {t_j}}}{{{b^{(k)}}}}} \right)\frac{{{z_{il}}^{{\rm{(}}k{\rm{)}}}{\delta _l}}}{{Y({t_l})}}\frac{{{z_{ij}}^{{\rm{(}}k{\rm{)}}}{\delta _j}}}{{Y({t_j})}}} \;.$

The selection of bandwidth on the (k+1)^th iteration is given by:

${b^{{\rm{(}}k{\rm{ + 1)}}}} = \mathop {\arg \min }\limits_{b \in {B_n}} CV(b),$

(11)

where ${B_n}$ cover the set of acceptable bandwidths.

The algorithm is shown as follows, where we fix n and g and set up initial values for mixing probabilities ${\mathit{\boldsymbol{p}}^{(0)}}$ , which are usually ${{\rm{1}} \mathord{\left/ {\vphantom {{\rm{1}} g}} \right. } g}$ , regression coefficients ${\mathit{\boldsymbol{\beta }} ^{(0)}}$ , baseline hazard rates ${\mathit{\boldsymbol{h}}_0}^{(0)}$ , bandwidth ${b^{{\rm{(0)}}}}$ is 0.5, and a termination value, $\varepsilon \, {\rm{ > }}\, {\rm{0}}$ .

Set the initial counter $\; l = 1$ .

Step 1: Compute ${\mathit{\boldsymbol{Z}}^{(l - {\rm{1}})}}$ with ${\mathit{\boldsymbol{p}}^{(l - 1)}}$ , ${\mathit{\boldsymbol{h}}_0}^{(l - 1)}$ and ${\mathit{\boldsymbol{\beta }} ^{(l - 1)}}$ by (5);

Step 2: Update ${\mathit{\boldsymbol{p}}^{{\rm{(}}l{\rm{)}}}}$ with ${\mathit{\boldsymbol{Z}}^{(l - 1)}}$ by (6);

Step 3: Update $\mathit{\boldsymbol{h}}_0^{(l )}$ with ${\mathit{\boldsymbol{Z}}^{{\rm{(}}l - 1{\rm{)}}}}$ , ${\mathit{\boldsymbol{\beta }} ^{{\rm{(}}l - 1{\rm{)}}}}$ and ${b^{{\rm{(}}l - 1{\rm{)}}}}$ by (10);

Step 4: Update bandwidth ${b^{{\rm{(}}l{\rm{)}}}}$ with ${\mathit{\boldsymbol{Z}}^{(l - 1)}}$ , $\mathit{\boldsymbol{h}}_0^{(l )}$ and ${\mathit{\boldsymbol{\beta }} ^{(l - 1)}}$ by (11);

Step 5: Update ${\mathit{\boldsymbol{\beta }} ^{{\rm{(}}l{\rm{)}}}}$ with ${\mathit{\boldsymbol{Z}}^{(l - 1)}}$ , $\mathit{\boldsymbol{h}}_0^{(l )}$ and ${\mathit{\boldsymbol{\beta }} ^{(l - 1)}}$ by (8);

Step 6: IF $\; {\left\| {\; {\mathit{\boldsymbol{p}}^{(l)}} - \; {\mathit{\boldsymbol{p}}^{(l - 1)}}} \right\|_{\rm{2}}} + {\left\| {\; \mathit{\boldsymbol{h}}_0^{(l )} - \; \mathit{\boldsymbol{h}}_0^{(l - 1)}} \right\|_{\rm{2}}} + {\left\| {\; {\mathit{\boldsymbol{\beta }} ^{(l)}} - \; {\mathit{\boldsymbol{\beta }} ^{(l - 1)}}} \right\|_{\rm{2}}} < \varepsilon$ , THEN stop;

ELSE let $\; l = l{\rm{ + 1}}\;$ and GOTO Step 1.

Note that the superscript (.) represents the number of iterations, $\mathit{\boldsymbol{h}}_0^{(0)} = {({\mathit{\boldsymbol{h}}_{{\rm{01}}}}^{(0)}, {\mathit{\boldsymbol{h}}_{{\rm{0}}2}}^{(0)}, ..., {\mathit{\boldsymbol{h}}_{{\rm{0}}g}}^{(0)})^T}$ is a $g \times n$ matrix, where $\mathit{\boldsymbol{h}}_{{\rm{0}}i}^{(0)} = {(h_{{\rm{0}}i}^{(0)}{\rm{(}}{t_1}{\rm{)}}, h_{{\rm{0}}i}^{(0)}{\rm{(}}{t_2}{\rm{)}}, ..., h_{{\rm{0}}i}^{(0)}{\rm{(}}{t_n}{\rm{)}})^T}$ , and each row is initialized by a constant vector.

3. Validity indices

In traditional regression analysis, we select the best model by picking the one that minimizes the sum of squared residuals, but unlike the traditional regression model, the survival model does not meet the assumption that the standard deviation of the survival time is a constant at each covariate. From Figure 2(a), we see that the survival time with higher expectation has higher standard deviation. Therefore, to select the best model we need to adjust the standard deviation to avoid being greatly affected by data that have large standard deviations. Moreover, if the model fits the data well, then each observed survival time will be close to the expectation of the component model, which has the largest posterior probability corresponding to one’s risk type.

Figure 2(b) illustrate that observation A is closer to the mean line (red line) of the component model corresponding to risk type 1, say model 1, than to the mean line (blue line) of model 2. From (5), we see that the posterior probabilities of the observation A corresponding to the first type of risk (red line) will be much larger than that of the second type of risk (blue line). Hence, to build up validity indices for mixture models, we consider the posterior probabilities as the role of weights and define the mixture sum of standard absolute residuals (MsSAE) and mixture sum of standard squared residuals (MsSSE) as follows:

$MsSAE = \sum\limits_{i = 1}^g {\sum\limits_{j = 1}^n {\frac{{{{\hat z}_{ij}}\left| {{t_j} - {{\hat E}_i}({t_j})} \right|}}{{\sqrt {\hat Va{r_i}({t_j})} }}} } \;$ ;

$MsSSE = \sum\limits_{i = 1}^g {\sum\limits_{j = 1}^n {\frac{{{{\hat z}_{ij}}{{\left[ {{t_j} - {{\hat E}_i}({t_j})} \right]}^2}}}{{\hat Va{r_i}({t_j})}}} }$ ,

where ${\hat E_i}({t_j}) = \int\limits_0^\infty {\exp {{\left[{- {{\hat H}_{0i}}(t)} \right]}^{\exp \left({{x_j}^T{{\hat \beta }_i}} \right)}}\; dt}$ and $\hat Va{r_i}({t_j}) = {\rm{2}}\int\limits_0^\infty {t \cdot \exp {{\left[{- {{\hat H}_{0i}}(t)} \right]}^{\exp \left({{x_j}^T{{\hat \beta }_i}} \right)}}dt} - {\hat E_i}{({t_j})^{\rm{2}}}$ . The squared distance is considered, because it is easier to catch an abnormal model.

From we can see that the expectation (green line) of the survival time according to the third type of risk (ER3) is close to that (blue line) corresponding to the second type of risk (ER2). In order to penalize the overfitting model, which is the model with too many model components, we consider the distance between the expectations of each survival time according to any two types of risk as the penalty. Define the average absolute separation $\overline {ASep}$ , the average squared separation $\overline {SSep}$ , the minimum absolute separation $\min ASep$ and the minimum squared separation $\min SSep$ as:

$\overline {ASep} = \frac{{\rm{2}}}{{g(g - 1)}}\sum\limits_{i = 1}^g {\sum\limits_{l \gt i}^g {\sum\limits_{j = 1}^n {\left| {{{\hat E}_i}({t_j}) - {{\hat E}_l}({t_j})} \right|} } }$ ;

$\overline {SSep} = \frac{{\rm{2}}}{{g(g - 1)}}\sum\limits_{i = 1}^g {\sum\limits_{l \gt i}^g {\sum\limits_{j = 1}^n {{{\left[ {{{\hat E}_i}({t_j}) - {{\hat E}_l}({t_j})} \right]}^2}} } };$

$\min ASep = \mathop {\min }\limits_{i \ne l} \sum\limits_{j = 1}^n {\left| {{{\hat E}_i}({t_j}{\rm{)}} - {{\hat E}_l}({t_j})} \right|}$ ;

$\min SSep = \mathop {\min }\limits_{i \ne l} \sum\limits_{j = 1}^n {{{\left[ {{{\hat E}_i}({t_j}{\rm{)}} - {{\hat E}_l}({t_j})} \right]}^2}} .$

A good model will possess small mixture standard residuals and large separation of expectations. Hence, based on the above-mentioned functions of residuals and separation of means, we propose four validity indices V₁ ~ V₄ for selecting the number of model components under the mixture hazards regression model.

(V₁). Absolute standard residuals and average separation ${V_{aRaS}}$

${V_{aRaS}} = \frac{{MsSAE}}{{\overline {ASep} }} = \frac{{\sum\nolimits_{i = 1}^g {\sum\nolimits_{j = 1}^n {{{\hat z}_{ij}}{{\left| {{t_j} - {{\hat E}_i}({t_j})} \right|} \mathord{\left/ {\vphantom {{\left| {{t_j} - {{\hat E}_i}({t_j})} \right|} {\sqrt {\hat Va{r_i}({t_j})} }}} \right. } {\sqrt {\hat Va{r_i}({t_j})} }}} } }}{{\frac{{\rm{2}}}{{g(g - 1)}}\sum\limits_{i = 1}^g {\sum\limits_{l{\rm{ \gt }}i}^g {\sum\limits_{j = 1}^n {\left| {{{\hat E}_i}({t_j}) - {{\hat E}_l}({t_j})} \right|} } } }}$

We find an optimal number g of types of risk by solving ${\min _{2 \leqslant g \leqslant n - 1}}{V_{aRaS}}$ .

(V₂). Squared standard residuals and average separation ${V_{sRaS}}$

${V_{sRaS}} = \frac{{MsSSE}}{{\overline {SSep} }} = \frac{{\sum\nolimits_{i = 1}^g {\sum\nolimits_{j = 1}^n {{{{{\hat z}_{ij}}{{\left[ {{t_j} - {{\hat E}_i}({t_j})} \right]}^2}} \mathord{\left/ {\vphantom {{{{\hat z}_{ij}}{{\left[ {{t_j} - {{\hat E}_i}({t_j})} \right]}^2}} {\hat Va{r_i}({t_j})}}} \right. } {\hat Va{r_i}({t_j})}}} } }}{{\frac{{\rm{2}}}{{g(g - 1)}}\sum\limits_{i = 1}^g {\sum\limits_{l{\rm{ \gt }}i}^g {\sum\limits_{j = 1}^n {{{\left[ {{{\hat E}_i}({t_j}) - {{\hat E}_l}({t_j})} \right]}^2}} } } }}$

We find an optimal number g of types of risk by solving ${\min _{2 \leqslant g \leqslant n - 1}}{V_{sRaS}}$ .

(V₃). Absolute standard residuals and minimum separation ${V_{aRmS}}$

${V_{aRmS}} = \frac{{MsSAE}}{{\min ASep}} = \frac{{\sum\nolimits_{i = 1}^g {\sum\nolimits_{j = 1}^n {{{\hat z}_{ij}}{{\left| {{t_j} - {{\hat E}_i}({t_j})} \right|} \mathord{\left/ {\vphantom {{\left| {{t_j} - {{\hat E}_i}({t_j})} \right|} {\sqrt {\hat Va{r_i}({t_j})} }}} \right. } {\sqrt {\hat Va{r_i}({t_j})} }}} } }}{{\mathop {\min }\limits_{i \ne l} \sum\nolimits_{j = 1}^n {\left| {{{\hat E}_i}({t_j}) - {{\hat E}_l}({t_j})} \right|} }}$

We find an optimal number g of types of risk by solving ${\min _{2 \leqslant g \leqslant n - 1}}{V_{aRmS}}$ .

(V₄). Squared standard residuals and minimum separation ${V_{sRmS}}$

${V_{sRmS}} = \frac{{MsSSE}}{{\min SSep}} = \frac{{\sum\nolimits_{i = 1}^g {\sum\nolimits_{j = 1}^n {{{\hat z}_{ij}}{{{{\left[ {{t_j} - {{\hat E}_i}({t_j})} \right]}^2}} \mathord{\left/ {\vphantom {{{{\left[ {{t_j} - {{\hat E}_i}({t_j})} \right]}^2}} {\hat Va{r_i}({t_j})}}} \right. } {\hat Va{r_i}({t_j})}}} } }}{{\mathop {\min }\limits_{i \ne l} \sum\nolimits_{j = 1}^n {{{\left[ {{{\hat E}_i}({t_j}) - {{\hat E}_l}({t_j})} \right]}^2}} }}$

We find an optimal number g of types of risk by solving ${\min _{2 \leqslant g \leqslant n - 1}}{V_{sRmS}}$ .

4. Simulation

For the simulated data we consider four different models M₁~M₄. Under the mixture Cox proportional hazards model (2), the i^th component hazard function is:

${h_i}(t\left| {{\mathit{\boldsymbol{X}}_j}, {\mathit{\boldsymbol{\beta }} _i}} \right.) = {h_{0i}}(t)\exp ({x_j}{\beta _{i, 1, {\rm{1}}}} + ... + {x_j}^k{\beta _{i, 1, k}} + ... + {x_j}{\beta _{i, d, 1}} + ... + {x_j}^k{\beta _{i, d, k}}),$

where d is the number of covariates, k is the degree of models, ${\mathit{\boldsymbol{\beta }} _i} = {({\beta _{i, 1, {\rm{1}}}}, {\beta _{i, 1, k}}, ..., {\beta _{i, d, k}}, {\beta _{i, d, k}})^T}$ is the vector of regression coefficients and ${h_{0i}}(t)$ is the i^th component-baseline hazard function.

Consider two common distributions for the baseline hazard functions, Weibull and Gompertz; the i^th component Weibull baseline and Gompertz baseline are defined by ${h_{0i}}(t) = {\lambda _i}\, {\rho _i}\, {t^{{\rho _i} - 1}}$ and ${h_{0i}}(t) = {\lambda _i}\exp ({\rho _i}{t_j})$ , respectively, where ${\lambda _i}$ and ${\rho _i}$ are the scale and shape parameters. Let ${\bf{ \pmb{\mathit{ λ}} }} = {{\rm{(}}{\lambda _{\rm{1}}}{\rm{, }}...{\rm{, }}{\lambda _g}{\rm{)}}^T}$ , ${\bf{ \pmb{\mathit{ ρ}} }} = {{\rm{(}}{\rho _{\rm{1}}}{\rm{, }}...{\rm{, }}{\rho _g}{\rm{)}}^T}$ , and $\mathit{\boldsymbol{\beta }} = {({\mathit{\boldsymbol{\beta }} _{\rm{1}}}^T, {\mathit{\boldsymbol{\beta }}_2}^T, ..., {\mathit{\boldsymbol{\beta }} _g}^T)^T}$ . The covariates $X = {({x_1}, {x_2}, ..., {x_n})^T}$ in all cases are generated independently from a uniform distribution $U(- 4, 4)$ . The information for four models is shown in Table 2, and the scatter plots of a sample dataset are presented in Figure 3.

Table 2. The information for models M₁~M₄ respectively.

Model	n¹	g²	d³	k⁴	$\mathit{\boldsymbol{p}} = \left[{\begin{array}{*{20}{c}} {{p_{\rm{1}}}} \\ \vdots \\ {{p_g}} \end{array}} \right]$	BH⁵	${\bf{ \pmb{\mathit{ λ}} }} = \left[{\begin{array}{*{20}{c}} {{\lambda _{\rm{1}}}} \\ \vdots \\ {{\lambda _g}} \end{array}} \right]$	${\bf{ \pmb{\mathit{ ρ}} }} = \left[{\begin{array}{*{20}{c}} {{\rho _{\rm{1}}}} \\ \vdots \\ {{\rho _g}} \end{array}} \right]$	${\bf{ \pmb{\mathit{ β}} }} = \left[{\begin{array}{*{20}{c}} {{{\bf{ \pmb{\mathit{ β}} }}_{\rm{1}}}^T} \\ \vdots \\ {{{\bf{ \pmb{\mathit{ β}} }}_g}^T} \end{array}} \right]$	${U_i}$ ⁶
M₁	200	2	1	1	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.7}}} \\ {{\rm{0}}{\rm{.3}}} \end{array}} \right]$	Weibull	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.005}}} \\ {{\rm{1}}{\rm{.5}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{3}}{\rm{.0}}} \\ {{\rm{2}}{\rm{.0}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.3}}} \\ {{\rm{0}}{\rm{.5}}} \end{array}} \right]$	$\begin{gathered} {U_1}(5, 9) \\ {U_2}(2, 6) \\ \end{gathered}$
M₂	200	2	1	2	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.5}}} \\ {{\rm{0}}{\rm{.5}}} \end{array}} \right]$	Gompertz	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.2}}} \\ {{\rm{0}}{\rm{.7}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{1}}{\rm{.5}}} \\ {{\rm{2}}{\rm{.0}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.8}}} & {{\rm{0}}{\rm{.1}}} \\ { - {\rm{0}}{\rm{.6}}} & {{\rm{0}}{\rm{.1}}} \end{array}} \right]$	$\begin{gathered} {U_1}({\rm{4}}, {\rm{9}}) \\ {U_2}({\rm{4}}, {\rm{9}}) \\ \end{gathered}$
M₃	400	2	2	1	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.5}}} \\ {{\rm{0}}{\rm{.5}}} \end{array}} \right]$	Weibull	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.003}}} \\ {{\rm{0}}{\rm{.002}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.5}}} \\ {{\rm{0}}{\rm{.7}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.8}}} & { - {\rm{0}}{\rm{.5}}} \\ { - 0.6} & {0.5} \end{array}} \right]$	$\begin{gathered} {U_{\rm{1}}}(12, 15) \\ {U_{\rm{2}}}(10, 13) \\ \end{gathered}$
M₄	400	3	1	1	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.35}}} \\ {{\rm{0}}{\rm{.30}}} \\ {{\rm{0}}{\rm{.35}}} \end{array}} \right]$	Gompertz	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.0002}}} \\ {{\rm{0}}{\rm{.002}}} \\ {{\rm{0}}{\rm{.0003}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.7}}} \\ {{\rm{2}}{\rm{.0}}} \\ {{\rm{0}}{\rm{.8}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} { - 0.8} \\ {{\rm{0}}{\rm{.2}}} \\ {{\rm{1}}{\rm{.0}}} \end{array}} \right]$	$\begin{gathered} {U_{\rm{1}}}(10, 15) \\ {U_{\rm{2}}}(4, 6) \\ {U_{\rm{3}}}(15, 20) \\ \end{gathered}$
1: sample size; 2: number of risk types; 3: number of covariates; 4: degree of models; 5: baseline hazard function; 6: censored times are generated from a uniform distribution $U_{i}(a, b)$ for i=1, …, g.

| Show Table

DownLoad: CSV

4.1. Compare two methods of estimating the baseline hazard function

We consider an EM-based semi-parametric mixture hazards model to analyze simulated data, and compare the two methods of estimating the baseline hazard function. For the first method proposed by Ng and McLachlan ^[5], they assume the baseline hazard function is piecewise constant and calculate this function using maximum likelihood estimation (MLE). For the second method introduced in this paper, we use a kernel estimator to estimate the baseline hazard rates and choose biweight as the kernel function.

In order to graphically demonstrate the results, we first show the results for a single run of simulation in Table 3 and Figure 4. The correct rate (CR) in Table 3 is the percentage of individuals matched into the true attributable type of risk. According to the results of the estimation, we match the individuals into one type of risk with largest posterior probability. Thus, this correct rate is defined as:

$CR = \frac{1}{n}\sum\limits_{j = 1}^n {\sum\limits_{i = 1}^g {I\left\{ {j \in risk(i) \cap {{\hat z}_{ij}} = \mathop {\max }\limits_i ({{\mathit{\boldsymbol{\hat Z}}}_j})} \right\}} }$ where

${ \mathit{\boldsymbol{\hat Z}}_j} = {({\hat z_{1j}}, {\hat z_{2j}}, ..., {\hat z_{gj}})^T}.$

Table 3. The estimation of a simulated series by models M₁~M₄ respectively.

		${\bf{p}}$	${\bf{ \pmb{\mathit{ β}} }}$	CR	MsSSE/n
M₁	True¹	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.7}}} \\ {{\rm{0}}{\rm{.3}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.3}}} \\ {{\rm{0}}{\rm{.5}}} \end{array}} \right]$
	Piecewise²	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.561}}} \\ {{\rm{0}}{\rm{.439}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.528}}} \\ {{\rm{0}}{\rm{.851}}} \end{array}} \right]$	0.860	0.810
	Kernel³, bw⁴=1.0	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.672}}} \\ {{\rm{0}}{\rm{.328}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.336}}} \\ {{\rm{0}}{\rm{.586}}} \end{array}} \right]$	0.945	0.659
M₂	True	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.5}}} \\ {{\rm{0}}{\rm{.5}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.8}}} & {{\rm{0}}{\rm{.1}}} \\ { - {\rm{0}}{\rm{.6}}} & {{\rm{0}}{\rm{.1}}} \end{array}} \right]$
	Piecewise	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.641}}} \\ {{\rm{0}}{\rm{.958}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.674}}} & {{\rm{0}}{\rm{.136}}} \\ { - 1.136} & {{\rm{0}}{\rm{.298}}} \end{array}} \right]$	0.705	0.963
	Kernel, bw=0.5	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.523}}} \\ {{\rm{0}}{\rm{.476}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.738}}} & {{\rm{0}}{\rm{.078}}} \\ { - {\rm{0}}{\rm{.762}}} & {{\rm{0}}{\rm{.146}}} \end{array}} \right]$	0.855	0.910
M₃	True	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.5}}} \\ {{\rm{0}}{\rm{.5}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.8}}} & { - {\rm{0}}{\rm{.5}}} \\ { - 0.6} & {0.5} \end{array}} \right]$
	Piecewise	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.508}}} \\ {{\rm{0}}{\rm{.491}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.993}}} & { - {\rm{0}}{\rm{.562}}} \\ { - 0.{\rm{562}}} & {0.{\rm{608}}} \end{array}} \right]$	0.838	1.240
	Kernel, bw=0.4	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.478}}} \\ {{\rm{0}}{\rm{.522}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.885}}} & { - {\rm{0}}{\rm{.534}}} \\ { - 0.6{\rm{28}}} & {0.5{\rm{21}}} \end{array}} \right]$	0.843	1.142
M₄	True	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.35}}} \\ {{\rm{0}}{\rm{.30}}} \\ {{\rm{0}}{\rm{.35}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} { - 0.8} \\ {{\rm{0}}{\rm{.2}}} \\ {{\rm{1}}{\rm{.0}}} \end{array}} \right]$
	Piecewise	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.399}}} \\ {{\rm{0}}{\rm{.265}}} \\ {{\rm{0}}{\rm{.335}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} { - 0.{\rm{938}}} \\ {{\rm{0}}{\rm{.920}}} \\ {{\rm{1}}{\rm{.137}}} \end{array}} \right]$	0.693	1.211
	Kernel, bw=0.9	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.368}}} \\ {{\rm{0}}{\rm{.306}}} \\ {{\rm{0}}{\rm{.325}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} { - 0.8{\rm{06}}} \\ {{\rm{0}}{\rm{.192}}} \\ {{\rm{0}}{\rm{.927}}} \end{array}} \right]$	0.873	0.828
1: true parameters; 2: piecewise constant estimates; 3: kernel estimates; 4: bandwidth.

| Show Table

DownLoad: CSV

When using a piecewise constant estimator under M₁, from the estimated mixing probabilities, CR in Table 3 and the expectation line in Figure 4(M₁-1), it can be seen that we will misclassify some data into the 2^nd type of risk where their true risk type is the 1^st one. As a result, the estimates of regression coefficients in Table 3 and the cumulative baseline hazard rate in Figure 4(M₁-2) are not close to the true model. Furthermore, under M₄, from the expectation line according to the 1^st and 2^nd types of risk in Figure 4(M₄-1), it can be seen that we will misclassify some data between the 1^st and 2^nd types of risk when using piecewise constant estimator. The estimates of regression coefficients in Table 3 and the cumulative baseline hazard rate in Figure 4(M₄-2) are mismatched with the real model. It is obvious that using the kernel procedure for the baseline hazard estimation will increase CR compared to using the piecewise constant procedure.

We next show the results for 1000 simulations in Table 4. The absolute relative bias (ARB) for parameter θ is defined by:

$ARB(\theta ) = \left| {\frac{{E(\hat \theta ) - \theta }}{{E(\hat \theta )}}} \right|.$

Table 4. The estimation of 1000 simulated series by models M₁~M₄ respectively.

		bias_ ${\bf{p}}$ ³	MSE_ ${\bf{p}}$ ⁴	bias_ ${\bf{ \pmb{\mathit{ β}} }}$ ⁵	MSE_ ${\bf{ \pmb{\mathit{ β}} }}$ ⁶	$\overline {ARB}$	$\overline {CR}$	$\overline {MsSSE}$
M₁	Piecewise¹	$\left[{\begin{array}{*{20}{c}} { - 0.160} \\ {{\rm{0}}{\rm{.160}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.026}}} \\ {{\rm{0}}{\rm{.026}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.088}}} \\ {{\rm{0}}{\rm{.275}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.020}}} \\ {{\rm{0}}{\rm{.076}}} \end{array}} \right]$	0.401	0.699	0.796
	Kernel²	$\left[{\begin{array}{*{20}{c}} { - 0.{\rm{035}}} \\ {{\rm{0}}{\rm{.035}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.002}}} \\ {{\rm{0}}{\rm{.002}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} { - 0.073} \\ { - 0.007} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.007}}} \\ {{\rm{0}}{\rm{.000}}} \end{array}} \right]$	0.107	0.856	0.653
M₂	Piecewise	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.132}}} \\ { - 0.132} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.017}}} \\ {0.{\rm{017}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} { - 0.097} & {{\rm{0}}{\rm{.041}}} \\ { - {\rm{0}}{\rm{.652}}} & {{\rm{0}}{\rm{.172}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.010}}} & {{\rm{0}}{\rm{.001}}} \\ {{\rm{0}}{\rm{.429}}} & {{\rm{0}}{\rm{.029}}} \end{array}} \right]$	0.646	0.680	1.329
	Kernel	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.089}}} \\ { - 0.{\rm{089}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.008}}} \\ {0.{\rm{008}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} { - 0.{\rm{123}}} & {{\rm{0}}{\rm{.054}}} \\ { - {\rm{0}}{\rm{.311}}} & {{\rm{0}}{\rm{.017}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.018}}} & {{\rm{0}}{\rm{.006}}} \\ {{\rm{0}}{\rm{.124}}} & {{\rm{0}}{\rm{.000}}} \end{array}} \right]$	0.292	0.774	1.009
M₃	Piecewise	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.028}}} \\ { - 0.{\rm{028}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.000}}} \\ {0.{\rm{000}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.167}}} & { - {\rm{0}}{\rm{.091}}} \\ { - {\rm{0}}{\rm{.079}}} & {{\rm{0}}{\rm{.046}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.028}}} & {{\rm{0}}{\rm{.008}}} \\ {{\rm{0}}{\rm{.006}}} & {{\rm{0}}{\rm{.002}}} \end{array}} \right]$	0.122	0.847	1.271
	Kernel	$\left[{\begin{array}{*{20}{c}} { - {\rm{0}}{\rm{.006}}} \\ {0.{\rm{006}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.000}}} \\ {0.{\rm{000}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.033}}} & { - {\rm{0}}{\rm{.020}}} \\ {{\rm{0}}{\rm{.069}}} & { - {\rm{0}}{\rm{.051}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.001}}} & {{\rm{0}}{\rm{.000}}} \\ {{\rm{0}}{\rm{.004}}} & {{\rm{0}}{\rm{.002}}} \end{array}} \right]$	0.054	0.849	1.097
M₄	Piecewise	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.043}}} \\ { - 0.055} \\ {{\rm{0}}{\rm{.012}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.001}}} \\ {0.0{\rm{03}}} \\ {{\rm{0}}{\rm{.000}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} { - 0.003} \\ {0.791} \\ {{\rm{0}}{\rm{.251}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {0.00{\rm{2}}} \\ {0.{\rm{627}}} \\ {{\rm{0}}{\rm{.063}}} \end{array}} \right]$	0.766	0.646	0.737
	Kernel	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.018}}} \\ { - 0.0{\rm{42}}} \\ {{\rm{0}}{\rm{.023}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {{\rm{0}}{\rm{.000}}} \\ {0.0{\rm{01}}} \\ {{\rm{0}}{\rm{.000}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {0.0{\rm{32}}} \\ {0.{\rm{071}}} \\ { - {\rm{0}}{\rm{.014}}} \end{array}} \right]$	$\left[{\begin{array}{*{20}{c}} {0.00{\rm{2}}} \\ {0.{\rm{009}}} \\ {{\rm{0}}{\rm{.000}}} \end{array}} \right]$	0.112	0.799	0.565
1: piecewise constant estimates; 2: kernel estimates; 3: bias of ${\bf{p}}$ ; 4: mean square error (MSE) of ${\bf{p}}$ ; 5: bias of ${\bf{ \pmb{\mathit{ β}} }}$ ; 6: mean square error (MSE) of ${\bf{ \pmb{\mathit{ β}} }}$ .

| Show Table

DownLoad: CSV

In the mean absolute relative bias ( $\overline {ARB}$ ) of the model with k parameters is defined by $\overline {ARB} = {{\sum\nolimits_{i = 1}^k {ARB({\theta _i})} } \mathord{\left/ {\vphantom {{\sum\nolimits_{i = 1}^k {ARB({\theta _i})} } k}} \right. } k}$ . Moreover, $\overline {CR}$ and $\overline {MsSSE}$ are the mean of CR and MsSSE/n for each simulation. presents that the $\overline {ARB}$ and $\overline {MsSSE}$ of the kernel estimate are smaller than those of the piecewise constant estimate. Moreover, the $\overline {CR}$ of the kernel estimate is larger than that of the piecewise constant estimate in all cases considered. Thus, the model with the baseline hazard functions estimated by the kernel method fits the data better than that with piecewise constant baseline.

4.2. Select appropriate number of model components

In section 4.2, we consider an EM-based semi-parametric mixture hazards model to analyze simulated data under models M₁~M₄ by considering several possible number of risk types, that is model components, and use the kernel estimator to estimate the baseline hazard rates with biweight as the kernel function. Next, we use validity indices to select the optimal number of model components. The following six validity indices are used to compare with the validity indices we have come up with ( ${V_{aRaS}}$ , ${V_{sRaS}}$ , ${V_{aRmS}}$ , and ${V_{sRmS}}$ ).

1. Partition coefficient ${V_{PC}}$ proposed by Bezdek ^[22].

2. Normalized partition coefficient ${V_{NPC}}$ proposed by Dave ^[23].

3. Partition entropy ${V_{PE}}$ proposed by Bezdek ^[24].

4. Normalized partition entropy ${V_{NPE}}$ proposed by Dunn ^[25].

5. Akaike information criterion AIC.

6. Bayesian information criterion BIC.

It is well known that memberships play an important role in fuzzy clustering. Similarly, under the EM-based mixture model, the posterior probabilities are closely related to the role of memberships. Therefore, we replace the role of memberships with posterior probabilities in the validity indices ${V_{PC}}$ , ${V_{NPC}}$ , ${V_{PE}}$ , and ${V_{NPE}}$ . Moreover, the formulas for AIC and BIC are computed by

$AIC = - 2 \cdot {l_c}(\mathit{\boldsymbol{\hat p}}, \mathit{\boldsymbol{\hat \beta }}) + 2k ; BIC = - 2 \cdot {l_c}(\hat p, \hat \beta ) + k\log (n),$

where $\; {l_c}(\mathit{\boldsymbol{\hat p}}, \mathit{\boldsymbol{\hat \beta }})$ is the complete-data log-likelihood (3) given the estimated parameters, and k is the number of parameters for estimation.

All in all we consider ten indices, including ${V_{PC}}$ , ${V_{NPC}}$ , ${V_{PE}}$ , ${V_{NPE}}$ , AIC, BIC, ${V_{aRaS}}$ , ${V_{sRaS}}$ , ${V_{aRmS}}$ , and ${V_{sRmS}}$ , to select the optimal number of model components. Table 5 shows the proportion of choosing the correct number of model components over 1000 simulation runs based on the considered indices respectively. In each simulation run, each model of M₁~M₄ is fitted for components 2, 3, and 4 separately. Note that we assume the number of model components is greater than one for satisfying the requirement of the proposed validity indices. We define the proportion of choosing the correct number of risk types by each index in Table 5 as:

$\frac{{\# ({\rm{choose}}\;{\rm{correct}}\;{\rm{g}}\;{\rm{by}}\;{\rm{index}})}}{{\# ({\rm{simulaiton}})}}.$

Table 5. The proportion of choosing the correct g by each index for 1000 simulation runs under models M₁~M₄ respectively.

	${V_{PC}}$	${V_{NPC}}$	${V_{PE}}$	${V_{NPE}}$	AIC	BIC	${V_{aRaS}}$	${V_{sRaS}}$	${V_{aRmS}}$	${V_{sRmS}}$
M₁	0.962	0.880	0.976	0.894	0.964	0.950	0.896	0.896	0.984	0.992
M₂	0.954	0.564	0.963	0.485	0.524	0.631	0.863	0.851	0.981	0.990
M₃	1.000	0.798	1.000	0.868	0.998	0.998	0.994	0.998	1.000	1.000
M₄	0.486	0.780	0.413	0.810	0.646	0.660	0.923	0.916	0.813	0.703

| Show Table

DownLoad: CSV

shows that the proportion of choosing the correct g by traditional indices ${V_{PC}}$ , ${V_{NPC}}$ , ${V_{PE}}$ , ${V_{NPE}}$ , AIC, and BIC are not consistent under models M₁~M₄, where at least one model is not performing well (denoted by fluorescent yellow color in the table). On the other hand, the proposed indices ( ${V_{aRaS}}$ , ${V_{sRaS}}$ , ${V_{aRmS}}$ , and ${V_{sRmS}}$ ) are consistent and possess high proportions for every model, except that the proportion of ${V_{sRmS}}$ under M₄ is 0.703, which is slightly low, but it is still higher than that of most traditional indices. Hence, the proposed validity indices are superior than others in selecting the correct number of components.

5. Analysis of prostate cancer data

As a practical illustration of the proposed EM-based semi-parametric mixture hazard model, we consider the survival times of 506 patients with prostate cancer who entered a clinical trial during 1967–1969. These data were randomly allocated to different levels of treatment with the drug diethylstilbestrol (DES) and were considered by Byar and Green ^[26] and published by Andrews and Herzberg ^[27]. Kay ^[28] analyzed a subset of the data by considering eight types of risk, defined by eight categorical variables: drug treatment (RX: 0, 0.0 or 0.2 mg estrogen; 1, 1.0 or 5.0 mg estrogen); age group (AG: 0, < 75 years; 1, 75 to 79 years; 2, > 79 years); weight index (WT: 0, > 99 kg; 1, 80 to 99 kg; 2, < 80 kg); performance rating (PF: 0, normal; 1, limitation of activity); history of cardiovascular disease (HX: 0, no; 1, yes); serum haemoglobin (HG: 0, > 12 g/100 ml; 1, 9 to 12 g/100 ml; 2, < 9 g/100 ml); size of primary lesion (SZ: 0, < 30 cm²; 1, ≥ 30 cm2), and Gleason stage/grade category (SG: 0, ≤ 10; 1, > 10). Cheng et al. ^[26] classified this dataset with three types of risk as: (1) death due to prostate cancer; (2) death due to cardiovascular (CVD) disease; and (3) other causes.

We analyze the same dataset with eight categorical variables (RX, AG, WT, PF, HX, SZ, SG). There are 483 patients with complete information on these covariates, and the proportion of censored observations is 28.8%. We ignore the information about the risk factors and use indices, including ${V_{PC}}$ , ${V_{NPC}}$ , ${V_{PE}}$ , ${V_{NPE}}$ , AIC, BIC, ${V_{aRaS}}$ , ${V_{sRaS}}$ , ${V_{aRmS}}$ , and ${V_{sRmS}}$ to select the optimal number of risk types. From , the number of risk types selected by ${V_{aRaS}}$ , ${V_{sRaS}}$ , ${V_{aRmS}}$ , and ${V_{sRmS}}$ is three, and that selected by other indices is two. The number of model components selected by the indices we have proposed is the same as that in the previous studies introduced by Cheng et al. ^[3].

Table 6. The value of each index with different number of risk types under prostate cancer data.

	${V_{PC}}$	${V_{NPC}}$	${V_{PE}}$	${V_{NPE}}$	AIC	BIC	${V_{aRaS}}$	${V_{sRaS}}$	${V_{aRmS}}$	${V_{sRmS}}$
g = 2	0.7813	0.5626	0.3369	0.5720	4.1518	4.2989	0.5894	0.4437	0.5894	0.4437
g = 3	0.6684	0.5027	0.5260	0.6135	4.5012	4.7262	0.3783	0.1974	0.5016	0.2943
g = 4	0.5581	0.4109	0.7564	0.7075	4.7967	5.0996	0.4746	57.572	0.6123	98.534
Note: (1) g represents the number of risk types when estimating. (2) The optimal values of g according to each index are highlighted in bold.

| Show Table

DownLoad: CSV

From existing medical experience and a previous study, we presume that these model components may agree with some particular types of risk and thus can decide whether there are significant relationships between the covariates and the survival times by using the Wald statistical test. Based on the cause-specific hazard approach, Cheng et al. ^[3] found that treatment with a higher DES dosage (RX = 1) significantly reduces the risk of death due to prostate cancer. Table 7 shows that the DES dosage has a significant effect on time to death due to the 1^st type of risk, and that the estimated regression coefficients of RX is negative. Byar and Green ^[26] found that patients with a big size of primary lesion (SZ = 1) and high-grade tumors (SG = 1) are at greater risk of prostate cancer death. Table 7 lists that SZ and SG have a significant effect on time to death due to the 1^st type of risk, and that the estimated regression coefficients are all positive. Accordingly, we presume the 1^st type of risk relates to prostate cancer. Furthermore, based on the cause-specific hazard approach, Cheng et al. ^[3] found that treatment with a higher DES dosage (RX = 1) significantly increases the risk of death due to CVD. From Table 7, we see that DES dosage has a significant effect on time to death due to the 2^nd and 3^rd types of risk, and that the estimated regression coefficient of RX is positive.

Table 7. The model estimates (with standard errors) of prostate cancer data given the number of risk types equal to 3.

		1^st type of risk	2^nd type of risk	3^rd type of risk
$\mathit{\boldsymbol{p}}$		0.2132	0.3930	0.3936
$\mathit{\boldsymbol{\beta }}$	RX	−0.0296*(0.1267)	0.3546*(0.1414)	0.7589*(0.1425)
	AG	0.3144*(0.1143)	1.7445*(0.1041)	1.8104*(0.1396)
	WT	−0.0817*(0.0916)	1.7915*(0.0967)	−0.5555*(0.1290)
	PF	1.4742*(0.2233)	0.1244*(0.2527)	1.6468*(0.3325)
	HX	3.0027*(0.1176)	1.2829*(0.1377)	−0.6092*(0.1486)
	HG	0.8489*(0.1536)	1.6074*(0.1669)	−5.2153*(0.7267)
	SZ	0.8567*(0.2119)	3.0334*(0.1998)	−3.2661*(0.4074)
	SG	4.3184*(0.1010)	−0.3907*(0.1419)	−0.9933*(0.1560)
Note: denotes P-value < 0.05.*

| Show Table

DownLoad: CSV

We know that patients with a history of cardiovascular disease (HX = 1) have a higher probability of death due to CVD, compared to those patients without such a history. Table 7 shows that the estimated regression coefficient of HX is positive due to the 2^nd type of risk. Hence, we presume the 2^nd type of risk may relate to CVD. There is no explicit relationship between covariates and survival times adhering to the 3^rd type of risk. Thus, we only presume the 3^rd type of risk may relate to other death causes without specification. According to the significant relationship of covariates and survival times, we assess that the 1^st, 2^nd and 3^rd types of risk for estimation from an EM-based semi-parametric mixture hazard model are classified to prostate cancer, CVD, and other unspecified causes, respectively.

6. Conclusions and discussion

6.1. Conclusions

This study introduces four new validity indices, ${V_{aRaS}}$ , ${V_{sRaS}}$ , ${V_{aRmS}}$ , and ${V_{sRmS}}$ , for deciding the number of model components when applying an EM-based Cox proportional hazards mixture model to a dataset of competing risks. We incorporate the posterior probabilities and the sum of standard residuals to constitute the new validity indices. Moreover, our study sets up an extended kernel approach to estimate the baseline functions more smoothly and accurately. Extensive simulations show that the kernel procedure for the baseline hazard estimation is helpful for increasing the correct rate of classifying individual into the true attributable type of risk. Furthermore, simulation results demonstrate that the proposed validity indices are consistent and have a higher percentage of selecting the optimal number of model components than the traditional competitors. Thus, the proposed indices are superior to several traditional indices such as the most commonly used in statistics, AIC and BIC. We also employ the propose method to a prostate cancer data-set to illustrate its practicability.

6.2. Discussion

It is obvious that if we apply the four new validity indices at the same time, then we have the best chance to select the optimal number of model components. One concern is picking the best one among the proposed validity indices. In fact, the average separation versions ( ${V_{aRaS}}$ , ${V_{sRaS}}$ ) easily neutralizes the effects of small and large distances among the expectations of component models. On the other hand, as long as there is a small distance among the expectations of component models, the minimum separation versions ( ${V_{aRmS}}$ , ${V_{sRmS}}$ ) will catch the information about the overfitting model. Under the analysis of prostate cancer data, we see that ${V_{aRmS}}$ and ${V_{sRmS}}$ behave more sensitively than ${V_{aRaS}}$ and ${V_{sRaS}}$ for detecting the overfitting models (i.e., the distances of indices between overfitting and optimal models are much larger than those between underfitting and optimal models). Furthermore, according to the simulation results, the index ${V_{sRmS}}$ performs slightly poor on a certain model, we thus recommend employing ${V_{aRmS}}$ if just one of the proposed validity indices is to be used.

In the future we may test the effectiveness of the proposed validity indices on statistical models other than the mixture Cox proportional hazards regression models. We could also advance the efficiency of the proposed indices in determining the number of components of mixture models. Another issue is to reduce the computation cost. For instance, the bandwidth of the kernel procedure for baseline hazard function estimates is recalculated on each iteration, which consumes computation time. All these factors need further investigation and will be covered in our future research.

Acknowledgements

The authors thank the anonymous reviewers for their insightful comments and suggestions which have greatly improved this article. This work was partially supported by the Ministry of Science and Technology, Taiwan [Grant numbers MOST 108-2118-M-025-001-and MOST 108-2118-M-003-003-].

References

[1]	Y. Xu, Functional Data Analysis, London: Springer, 2023. https://doi.org/10.1007/978-1-4471-7503-2_4
[2]	P, Hall, M, Hosseini-Nasab, On Properties of Functional Principal Components Analysis, J. R. Stat. Soc. Ser. B: Stat. Methodol., 68 (2006), 109–126. https://doi.org/10.1111/j.1467-9868.2005.00535.x doi: 10.1111/j.1467-9868.2005.00535.x
[3]	W. Seo, Functional principal component analysis for cointegrated functional time series, J. Time Ser. Anal., 45 (2023), 320–330. https://doi.org/10.1111/jtsa.12707 doi: 10.1111/jtsa.12707
[4]	O. A. Montesinos López, A. Montesinos López, J. Crossa, Multivariate Statistical Machine Learning Methods for Genomic Prediction, Cham: Springer, 2022. https://doi.org/10.1007/978-3-030-89010-0
[5]	H. Hullait, D. S. Leslie, N. G. Pavlidis, S. King, Robust Function-on-Function Regression, Technometrics, 63 (2020), 396–409. https://doi.org/10.1080/00401706.2020.1802350 doi: 10.1080/00401706.2020.1802350
[6]	J. O. Razo-De-Anda, L. L. Romero-Castro, F. Venegas-Martínez, Contagion Patterns Classification in Stock Indices: A Functional Clustering Analysis Using Decision Trees, Mathematics, 11 (2023), 2961. https://doi.org/10.3390/math11132961 doi: 10.3390/math11132961
[7]	F. Centofanti, A. Lepore, B. Palumbo, Sparse and smooth functional data clustering, Stat. Pap., 65 (2024), 795–825. https://doi.org/10.1007/s00362-023-01408-1 doi: 10.1007/s00362-023-01408-1
[8]	J. A. Arias-López, C. Cadarso-Suárez, P. Aguiar-Fernánde, Computational Issues in the Application of Functional Data Analysis to Imaging Data, Lect. Notes Comput. Sci., 42 (2021), 630–638. https://doi.org/10.1007/978-3-030-86960-1_46 doi: 10.1007/978-3-030-86960-1_46
[9]	C. Tang, T. Wang, P. Zhang, Functional data analysis: An application to COVID‐19 data in the United States in 2020, Quant. Bio., 10 (2022), 172–187. https://doi.org/10.15302/J-QB-022-0300 doi: 10.15302/J-QB-022-0300
[10]	C. Zhang, H. Lin, L. Liu, J. Liu, Y. Li, Functional Data Analysis with Covariate-Dependent Mean and Covariance Structures, Biometrics, 79 (2023), 2232–2245. https://doi.org/10.1111/biom.13744 doi: 10.1111/biom.13744
[11]	I. Shah, P. Mubassir, S. Ali, O. Albalawi, A functional autoregressive approach for modeling and forecasting short-term air temperature, Front. Environ. Sci., 12 (2024), 1411237. https://doi.org/10.3389/fenvs.2024.1411237 doi: 10.3389/fenvs.2024.1411237
[12]	V. Villani, E. Romano, J. Mateu, Climate model selection via conformal clustering of spatial functional data, Environ. Ecol. Stat., 31 (2024), 365–385. https://doi.org/10.1007/s10651-024-00616-8 doi: 10.1007/s10651-024-00616-8
[13]	A. Palummo, E. Arnone, L. Formaggia, L. M. Sangalli, Functional principal component analysis for incomplete space-time data, Environ. Ecol. Stat., 31 (2024), 555–582. https://doi.org/10.1007/s10651-024-00598-7 doi: 10.1007/s10651-024-00598-7
[14]	J. O. Ramsay, B. W. Silverman, Functional Data Analysis, 2 Eds., New York: Springer, 2005. https://doi.org/10.1007/b98888
[15]	M. A. Hael, Unveiling air pollution patterns in Yemen: A spatial-temporal functional data analysis, Environ. Sci. Pollut. Res., 30 (2023), 50067–50095. https://doi.org/10.1007/s11356-023-25790-3 doi: 10.1007/s11356-023-25790-3
[16]	M. Gong, R. O'Donnell, C. Miller, M. Scott, S. Simis, S. Groom, et. al, Adaptive smoothing to identify spatial structure in global lake ecological processes using satellite remote sensing data, Spat. Stat., 50 (2022), 100615. https://doi.org/10.1016/j.spasta.2022.100615 doi: 10.1016/j.spasta.2022.100615
[17]	R. Raturi, Large Data Analysis via Interpolation of Functions: Interpolating Polynomials vs Artificial Neural Networks, Amer. J. Intell. Syst., 8 (2018), 6–11. https://doi.org/10.5923/j.ajis.20180801.02 doi: 10.5923/j.ajis.20180801.02
[18]	N. A. Mazelan, J. Suhaila, Exploring rainfall variabilities using statistical functional data analysis, IOP Conf. Ser.: Earth Environ. Sci., 1167 (2023), 012007. https://doi.org/10.1088/1755-1315/1167/1/012007 doi: 10.1088/1755-1315/1167/1/012007
[19]	C. Sözen, Y. Öner, The investigation of temperature data in Turkey's Black Sea Region using functional data analysis, J. Appl. Stat., 49 (2021), 2403–2415. https://doi.org/10.1080/02664763.2021.1896683 doi: 10.1080/02664763.2021.1896683
[20]	J. Baz, J. Davis, L. Han, C. Stracke, The value of smoothing, J. Portfolio Manag., 48 (2022), 73–85. https://doi.org/10.3905/jpm.2022.1.399 doi: 10.3905/jpm.2022.1.399
[21]	A. Falini, F. Mazzia, C. Tamborrino, Spline based Hermite quasi-interpolation for univariate time series, Discrete Cont. Dyn. Syst. - S, 15 (2022), 3667–3688. https://doi.org/10.3934/dcdss.2022039 doi: 10.3934/dcdss.2022039
[22]	L. Brugnano, D. Giordano, F. Iavernaro, G. Rubino, An entropy-based approach for a robust least squares spline approximation, J. Comput. Appl. Math., 443 (2024), 115773. https://doi.org/10.1016/j.cam.2024.115773 doi: 10.1016/j.cam.2024.115773
[23]	M. Spreafico, F. Ieva, M. Fiocco, Modelling time-varying covariates effect on survival via functional data analysis: Application to the MRC BO06 trial in osteosarcoma, Stat. Methods Appl., 32 (2023), 271–298. https://doi.org/10.1007/s10260-022-00647-0 doi: 10.1007/s10260-022-00647-0
[24]	A. Rahman, D. Jiang, Regional and temporal patterns of influenza: Application of functional data analysis, Infect. Dis. Modell., 6 (2021), 1061–1072. https://doi.org/10.1016/j.idm.2021.08.006 doi: 10.1016/j.idm.2021.08.006
[25]	M. Rangata, S. Das, M. Ali, Analysing Maximum Monthly Temperatures in South Africa for 45 years Using Functional Data Analysis, Adv. Decis. Sci., 24 (2020), 1–27.
[26]	U. Beyaztas, S. Q. Salih, K.-W. Chau, N. Al-Ansari, Z. M. Yaseen, Construction of functional data analysis modeling strategy for global solar radiation prediction: Application of cross-station paradigm, Eng. Appl. Comput. Fluid Mech., 13 (2019), 1165–1181. http://doi.org/10.1080/19942060.2019.1676314 doi: 10.1080/19942060.2019.1676314
[27]	S. Curceac, C. Ternynck, T. B. Ouarda, F. Chebana, S. D. Niang, Short-term air temperature forecasting using Nonparametric Functional Data Analysis and SARMA models, Environ. Modell. Software, 111 (2019), 394–408. http://doi.org/10.1016/j.envsoft.2018.09.017 doi: 10.1016/j.envsoft.2018.09.017
[28]	M. Ammad, M. Y. Misro, A. Ramli, A novel generalized trigonometric Bézier curve: Properties, continuity conditions and applications to the curve modeling, J. Amer. Math. Soc., 194 (2022), 744–763. http://doi.org/10.1016/j.matcom.2021.12.011 doi: 10.1016/j.matcom.2021.12.011
[29]	S. A. A. A. Said Mad Zain, M. Y. Misro, K. T. Miura, Generalized Fractional Bézier Curve with Shape Parameters, Mathematics, 9 (2021), 2141. https://doi.org/10.3390/math9172141 doi: 10.3390/math9172141
[30]	B. A. Barsky, The Beta-Spline: A Local Representation based on Shape Parameters and Fundamental Geometric Measures, PhD thesis, The University of Utah, 1981.
[31]	B. A. Barsky, Rational Beta-splines for representing curves and surfaces, IEEE Comput. Graph. Appl., 13 (1993), 24–32. http://doi.org/10.1109/38.252550 doi: 10.1109/38.252550
[32]	N. A. Hadi, A. Ibrahim, F. Yahya, J. M. Ali, A Comparative Study on Cubic Bezier and Beta-Spline Curves, Mathematika, 29 (2013), 55–64.
[33]	B. Sambhunath, C. L. Brian, Bézier and Splines in Image Processing and Machine Vision, London: Springer, 2008. https://doi.org/10.1007/978-1-84628-957-6
[34]	N. A. Hadi, N. S. M. Kamal, H. Nordin, Computational Method for Digital Khat Calligraphy Using Beta-Spline Curve Fitting, ASM Sc. J., 13 (2020). https://doi.org/10.32802/asmscj.2020.sm26(5.8) doi: 10.32802/asmscj.2020.sm26(5.8)
[35]	S. A. Suliman, N. A. Hadi, Optimizing the Shape Parameters of Beta-Spline Using Particle Swarm Optimization, Int. J. Eng. Technol., 7 (2018), 93–97. http://doi.org/10.14419/ijet.v7i4.33.23492 doi: 10.14419/ijet.v7i4.33.23492
[36]	M. S. A. Halim, N. A. Hadi, H. Sulaiman, S. Abd Halim, An algorithm for beta-spline surface reconstruction from multi slice CT scan images using MATLAB pmode, 2017 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), 2017, 1–6. http://doi.org/10.1109/ISCAIE.2017.8074939 doi: 10.1109/ISCAIE.2017.8074939
[37]	B. A. Barsky, J. C. Beatty, Local Control of Bias and Tension in Beta-splines, ACM Trans. Graph., 2 (1983), 109–134. http://doi.org/10.1145/357318.357321 doi: 10.1145/357318.357321
[38]	B. A. Barsky, Computer Graphics and Geometric Modeling Using Beta-splines, Berlin, Heidelberg: Springer, 1988. https://doi.org/10.1007/978-3-642-72292-9
[39]	B. A. Barsky, J. C. Beatty, Varying the Betas in Beta-splines, Technical Report UCB/CSD-83-112, EECS Department, University of California, Berkeley, 1982. Available from: https://digicoll.lib.berkeley.edu/record/137388/files/CSD-83-112.pdf.
[40]	E. Holtanová, T. Mendlik, J. Koláček, I. Horová, J. Mikšovský, Similarities within a multi-model ensemble: functional data analysis framework, Geosci. Model Dev., 12 (2019), 735–747. http://doi.org/10.5194/gmd-12-735-2019 doi: 10.5194/gmd-12-735-2019
[41]	D. A. Shah, E. D. De Wolf, P. A. Paul, L. V. Madden, Functional Data Analysis of Weather Variables Linked to Fusarium Head Blight Epidemics in the United States, Phytopathology®, 109 (2019), 96–110. http://doi.org/10.1094/PHYTO-11-17-0386-R doi: 10.1094/PHYTO-11-17-0386-R
[42]	B. Guo, H. Wu, L. Pei, X. Zhu, D. Zhang, Y. Wang, et al., Study on the spatiotemporal dynamic of ground-level ozone concentrations on multiple scales across China during the blue sky protection campaign, Environ. Int., 170 (2022), 107606. http://doi.org/10.1016/j.envint.2022.107606 doi: 10.1016/j.envint.2022.107606
[43]	P. Craven, G. Wahba, Smoothing noisy data with spline functions, Numer. Math., 31 (1978), 377–403. http://doi.org/10.1007/BF01404567 doi: 10.1007/BF01404567
[44]	M. Gubian, F. Torreira, L. Boves, Using Functional Data Analysis for investigating multidimensional dynamic phonetic contrasts, J. Phonetics, 49 (2015), 16–40. http://doi.org/10.1016/j.wocn.2014.10.001 doi: 10.1016/j.wocn.2014.10.001
[45]	L. Tavi, T. Kinnunen, R. González Hautamäki, Improving speaker de-identification with functional data analysis of f0 trajectories, Speech Commun, , 140 (2022), 1–10. http://doi.org/10.1016/j.specom.2022.03.010 doi: 10.1016/j.specom.2022.03.010

This article has been cited by:

Yunfei Xu, Xianjun Wang, Huaizhi Yu, 2024, Chapter 32, 978-981-97-1978-5, 361, 10.1007/978-981-97-1979-2_32

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)