Review Special Issues

A review of spatial scan statistics for survival data

  • We propose a review of spatial scan statistics approaches in the context of survival data. After presenting the general principle of spatial scan statistics, we review the literature and find that few approaches exist. We distinguish between the first parametric approaches, based on a specific distribution model and therefore not very flexible, and a semi-parametric method. However, these approaches do not allow taking into account the spatial dependence frequently observed in the data. We then present a more recent approach allowing us to take them into account. Finally, we describe the adjustment of cluster detection on covariates before illustrating the methods on the detection of abnormal survival time clusters following the diagnosis of leukemia.

    Citation: Camille Frévent. A review of spatial scan statistics for survival data[J]. AIMS Mathematics, 2025, 10(6): 14088-14101. doi: 10.3934/math.2025634

    Related Papers:

    [1] Jia Chen, Renato De Leone . A survival tree for interval-censored failure time data. AIMS Mathematics, 2022, 7(10): 18099-18126. doi: 10.3934/math.2022996
    [2] M. E. Bakr . Non-parametric hypothesis testing to address fundamental life testing issues in reliability analysis with some real applications. AIMS Mathematics, 2024, 9(8): 22513-22531. doi: 10.3934/math.20241095
    [3] M. E. Bakr, M. Nagy, Abdulhakim A. Al-Babtain . Non-parametric hypothesis testing to model some cancers based on goodness of fit. AIMS Mathematics, 2022, 7(8): 13733-13745. doi: 10.3934/math.2022756
    [4] Iliyas Karim khan, Hanita Binti Daud, Nooraini binti Zainuddin, Rajalingam Sokkalingam, Abdussamad, Abdul Museeb, Agha Inayat . Addressing limitations of the K-means clustering algorithm: outliers, non-spherical data, and optimal cluster selection. AIMS Mathematics, 2024, 9(9): 25070-25097. doi: 10.3934/math.20241222
    [5] Refah Alotaibi, Hassan Okasha, Hoda Rezk, Abdullah M. Almarashi, Mazen Nassar . On a new flexible Lomax distribution: statistical properties and estimation procedures with applications to engineering and medical data. AIMS Mathematics, 2021, 6(12): 13976-13999. doi: 10.3934/math.2021808
    [6] Salvador Merino, Juergen Doellner, Javier Martínez, Francisco Guzmán, Rafael Guzmán, Juan de Dios Lara . A space-time model for analyzing contagious people based on geolocation data using inverse graphs. AIMS Mathematics, 2023, 8(5): 10196-10209. doi: 10.3934/math.2023516
    [7] Hatim Solayman Migdadi, Nesreen M. Al-Olaimat, Maryam Mohiuddin, Omar Meqdadi . Statistical inference for the Power Rayleigh distribution based on adaptive progressive Type-II censored data. AIMS Mathematics, 2023, 8(10): 22553-22576. doi: 10.3934/math.20231149
    [8] Renqing Liu, Guangming Deng, Hanji He . Generalized Jaccard feature screening for ultra-high dimensional survival data. AIMS Mathematics, 2024, 9(10): 27607-27626. doi: 10.3934/math.20241341
    [9] Heba S. Mohammed, Zubair Ahmad, Alanazi Talal Abdulrahman, Saima K. Khosa, E. H. Hafez, M. M. Abd El-Raouf, Marwa M. Mohie El-Din . Statistical modelling for Bladder cancer disease using the NLT-W distribution. AIMS Mathematics, 2021, 6(9): 9262-9276. doi: 10.3934/math.2021538
    [10] Gaosheng Liu, Yang Bai . Statistical inference in functional semiparametric spatial autoregressive model. AIMS Mathematics, 2021, 6(10): 10890-10906. doi: 10.3934/math.2021633
  • We propose a review of spatial scan statistics approaches in the context of survival data. After presenting the general principle of spatial scan statistics, we review the literature and find that few approaches exist. We distinguish between the first parametric approaches, based on a specific distribution model and therefore not very flexible, and a semi-parametric method. However, these approaches do not allow taking into account the spatial dependence frequently observed in the data. We then present a more recent approach allowing us to take them into account. Finally, we describe the adjustment of cluster detection on covariates before illustrating the methods on the detection of abnormal survival time clusters following the diagnosis of leukemia.



    In epidemiology, the detection of spatial clusters enables the identification of geographical areas presenting an abnormally high (or low) risk. Spatial scan statistics are a well-known method for objectively detecting spatial clusters, and providing an indication of their statistical significance.

    In the context of health data, spatial scan statistics based on Poisson and Bernoulli models have been proposed by Kulldorff and Nagarwalla [1] and Kulldorff [2]. From observed cases and at-risk populations, these can be used, for example, to detect geographical areas where the risk of having or dying from a disease is higher than elsewhere.

    A spatial scan statistic based on an ordinal model has also been proposed by Jung et al. [3] and enables the detection of spatial clusters in which the stages of patients suffering from a disease are more severe than elsewhere.

    Researchers may also be interested in detecting areas where the time to an event of interest (typically death or recovery from a disease) is higher or lower than elsewhere. In this case, the data observed is not a number of cases and a number of people at risk, but for each individual, a time to the event of interest. However, some individuals may never experience the event during the study, in which case they are said to be censored.

    This article provides a review of the literature on spatial scan statistics methods for survival data. Section 2 introduces the general principle of spatial scan statistics, while Section 3 presents the different methods proposed in the literature for survival data. Section 4 explains how to adjust cluster detection on covariates and illustrates the approaches on the LeukSurv dataset. Finally, Section 5 concludes the article.

    Let s1,,sK be K nonoverlapping spatial locations of an observation domain SR2. Spatial scan statistics aim at detecting spatial clusters in which the distribution of the observed data is different than elsewhere, as well as determining their statistical significance. More precisely, they consider the following test hypotheses:

    H0:There is no cluster in the data,vs.H1:There is at least one spatial cluster in which the data present abnormal values compared with therest of the domain.

    These test hypotheses can be expressed more explicitly, depending on the type of data and model considered. They will be clarified for each survival data model in the following.

    The spatial scan statistic approach is a two-stage process. The first stage is the scanning step. It consists in determining the most likely cluster (MLC) from a set of potential clusters W. Here, we will consider the set of circular clusters containing between 1 and 50% of observations, but it should be noted that other approaches exist, especially elliptical clusters [4], graph-based [5] or arbitrarily-shaped clusters [6,7]. Next, a concentration index is computed for each potential cluster wW. It compares the distribution within the potential cluster with that outside, so that the greater the difference, the higher the concentration index. Finally, the spatial scan statistic Λ is defined as the maximum of the concentration index over W, and the MLC is the potential cluster for which this maximum is reached.

    The second step is to assess the statistical significance of the MLC. Since the distribution of the spatial scan statistic is generally impossible to determine under H0, two Monte Carlo approaches are commonly used: (ⅰ) M permutations of the data are generated, and the scan statistic Λ(m) is calculated on each permuted dataset m{1,,M} [8,9,10]; (ⅱ) if the distribution of the data is known under H0, M datasets are generated under H0, and the scan statistic is calculated on each generated dataset [1,2,3]. In both approaches, the p-value is then estimated by

    ˆp=1+Mm=11Λ(m)ΛM+1.

    In the context of spatial survival data, spatial scan statistics allow researchers to identify risk or protective factors related to a study event. The test hypotheses are the following:

    H0:There is no cluster of abnormal survival times,vs.H1:There is at least one cluster wW of abnormal survival times.

    We can also define the alternative hypothesis H(w)1 associated with a potential cluster w as

    H(w)1:wW is a cluster of abnormal survival times.

    Then, H1=wWH(w)1.

    Let i(1)1,,i(1)N1,,i(K)1,,i(K)NK be the observed individuals in s1,,sK, where i(k)n corresponds to the nth individual in spatial unit sk. For each individual i(k)n, we observe survival data consisting of an observed delay Ti(k)n and a censoring indicator δi(k)n (equal to 1 if Ti(k)n corresponds to the true delay until the event, 0 otherwise, which corresponds to censoring). In the following, we only consider right-censoring (assuming that the event of interest could not have occurred before the beginning of the study). Censoring is assumed to be uninformative, and the event times are assumed to be independent of the censoring times.

    This section presents the different scan statistics for survival data proposed in the literature, as well as their limitations.

    Several parametric approaches have been proposed in the literature. Huang et al. [11] first proposed a scan statistic assuming that the true (but not necessarily observed) survival times Yi(k)n follow an exponential model. The test hypotheses can be rewritten as

    H0: For all k{1,,K},n{1,,Nk},Yi(k)nE(1θ),vs.H1: There exists wW such that for all i(k)n so that skw,Yi(k)nE(1θw), and for all i(k)n so that skwC,Yi(k)nE(1θwC),θwθwC.

    The alternative hypothesis associated with a potential cluster w is then

    H(w)1: For all i(k)n so that skw,Yi(k)nE(1θw), for all i(k)n so that skwC,Yi(k)nE(1θwC),θwθwC.

    Next, the log-likelihood under H0 is

    H0(θ)=Kk=1Nkn=1[δi(k)nln(θ)Ti(k)nθ],

    which is maximized when ˆθ=Kk=1Nkn=1Ti(k)nKk=1Nkn=1δi(k)n, and the log-likelihood under H(w)1 is defined as

    H(w)1(θw,θwC)=i(k)n,skw[δi(k)nln(θw)Ti(k)nθw]+i(k)n,skwC[δi(k)nln(θwC)Ti(k)nθwC],

    which is maximized when ˆθw=i(k)n,skwTi(k)ni(k)n,skwδi(k)n and ˆθwC=i(k)n,skwCTi(k)ni(k)n,skwCδi(k)n.

    Then the spatial scan statistic is defined as

    Λexp=maxwW H(w)1(ˆθw,ˆθwC)H0(ˆθ).

    However, the exponential distribution is somewhat too simplistic in reality, since it assumes a constant hazard rate over time. An alternative parametric model is the Weibull model, which allows for increasing or decreasing hazard rate over time. Bhatt and Tiwari [12] proposed a scan statistic in this context, where the test hypotheses can be rewritten as

    H0: For all k{1,,K},n{1,,Nk},Yi(k)nWei(1θ,α),vs.H1: There exists wW such that for all i(k)n so that skw,Yi(k)nWei(1θw,αw), and for all i(k)n so that skwC,Yi(k)nWei(1θwC,αwC),θwθwC.

    The log-likelihood under H0 is

    H0(θ,α)=Kk=1Nkn=1{δi(k)n[ln(α)+(α1)ln(Ti(k)n)ln(θ)]Tαi(k)nθ},

    which is maximized when ˆθ=Kk=1Nkn=1Tˆαi(k)nKk=1Nkn=1δi(k)n, and the log-likelihood under H(w)1 is defined as

    H(w)1(θw,θwC,αw,αwC)=i(k)n,skw[δi(k)n(ln(αw)+(αw1)ln(Ti(k)n)ln(θw))Tαwi(k)nθw]+i(k)n,skwC[δi(k)n(ln(αwC)+(αwC1)ln(Ti(k)n)ln(θwC))TαwCi(k)nθwC],

    which is maximized when ˆθw=i(k)n,skwTˆαwi(k)ni(k)n,skwδi(k)n and ˆθwC=i(k)n,skwCTˆαwCi(k)ni(k)n,skwCδi(k)n.

    For α,αw and αwC, the expressions of the maximum likelihood estimators are more complicated, and, since ˆα,ˆαw and ˆαwC appear in the formulas of ˆθ,ˆθw and ˆθwC, we can in practice use an optimization algorithm to estimate all the parameters.

    Finally, the spatial scan statistic is

    ΛWei=maxwW H(w)1(ˆθw,ˆθwC,ˆαw,ˆαwC)H0(ˆθ,ˆα).

    These models have been generalized by Bhatt and Tiwari [13] to any density function for the Yi(k)n of the form

    f(t;γ,a,b,c)=ctac1exp(tcγb)γabΓ(a),t>0,

    where a,b,c>0 are known and γ is to be estimated from the data.

    In this context, the test hypotheses are

    H0: For all i(k)n, the density function of Yi(k)n is f(.;γ,a,b,c),vs.H1: There exists wW such that for all i(k)n so that skw, the density function of Yi(k)n is f(.;γw,a,b,c) and for all i(k)n so that skwC, the density function of Yi(k)n is f(.;γwC,a,b,c),γwγwC.

    γ,γw and γwC can be estimated as in previous models using the maximum likelihood estimators and then the scan statistic is

    Λgen=maxwW H(w)1(ˆγw,ˆγwC)H0(ˆγ).

    It should be noted that if a=b=c=1, this approach is equivalent to the exponential model; if a=1 and b=c, it results in a Weibull model; and if a=b=1 and c=2, it is equivalent to a Rayleigh model.

    More recently, Usman and Rosychuk [14] proposed an approach based on a log-Weibull distribution and considering the following test hypotheses:

    H0: For all i(k)n, the density function of Yi(k)n is of the form f(t;a,b)=1bexp(tab)exp[exp(tab)],vs.H1: There exists wW such that for all i(k)n so that skw, the density function of Yi(k)n is of the form f(t;aw,bw)=1bwexp(tawbw)exp[exp(tawbw)] and for all i(k)n so that skwC, the density functionof Yi(k)n is of the form f(t;awC,bwC)=1bwCexp(tawCbwC)exp[exp(tawCbwC)],bwbwC.

    The log-likelihoods under H0 and under H(w)1 are, respectively,

    H0(a,b)=Kk=1Nkn=1[δi(k)n(ln(b)+Ti(k)nab)exp(Ti(k)nab)]

    and

    H(w)1(aw,awC,bw,bwC)=i(k)n,skw[δi(k)n(ln(bw)+Ti(k)nawbw)exp(Ti(k)nawbw)]+i(k)n,skwC[δi(k)n(ln(bwC)+Ti(k)nawCbwC)exp(Ti(k)nawCbwC)].

    And the spatial scan statistic is

    Λlog-Wei=maxwW H(w)1(ˆaw,ˆawC,ˆbw,ˆbwC)H0(ˆa,ˆb).

    Once the spatial scan statistic Λ{Λexp,ΛWei,Λgen,Λlog-Wei} is computed, the MLC is defined as the potential cluster of W corresponding to this maximum. The statistical significance of the MLC is then determined using a Monte Carlo procedure with permutations of the individuals (that is, the (Ti(k)n,δi(k)n)).

    Although these approaches are based on conventional models for survival data, they remain parametric and are therefore less flexible than nonparametric or semiparametric approaches. Thus, a method based on a Cox model has been developed by Cook et al. [15].

    Cook et al. [15] proposed a spatial scan statistic based on a Cox model, which presents the advantage of not assuming a distribution for the data.

    They considered the following Cox model on the hazard function λ for a potential cluster w:

    λ(w)i(k)n(t)=λ(w)0(t)exp(αw1skw).

    In the context of cluster detection, the test hypotheses are

    H0: For all wW,αw=0, that is for all i(k)n,λi(k)n(t)=λ0(t),vs.H1: There exists wW such that αw0, that is there exists wW, such that for all i(k)n so that skw,λi(k)n(t)=λ(w)0(t)exp(αw), for all i(k)n so that skwC,λi(k)n(t)=λ(w)0(t).

    In this context, the partial log-likelihood under H(w)1 is

    H(w)1(αw)=Kk=1Nkn=1δi(k)n[αw1skwln(Kl=1Nlm=1Ti(l)mTi(k)nexp(αw1slw))].

    In order to test H0:αw=0 vs. H(w)1:αw0, Cook et al. [15] proposed to use the score statistic defined as

    LR(w)=U(0)I(0),

    where U(αw)=H(w)1(αw)αw and I(αw)=E(U(αw)αw). We obtain

    U(0)=Kk=1Nkn=1δi(k)n[1skwCard({i(l)m,slw,Ti(l)mTi(k)n})Card({i(l)m,Ti(l)mTi(k)n})],
    I(0)=Kk=1Nkn=1δi(k)n{Card({i(l)m,slw,Ti(l)mTi(k)n})Card({i(l)m,Ti(l)mTi(k)n})[Card({i(l)m,slw,Ti(l)mTi(k)n})Card({i(l)m,Ti(l)mTi(k)n})]2},

    and the spatial scan statistic and the MLC are defined as ΛCox=maxwW |LR(w)| and MLCCox=argmaxwW |LR(w)|, respectively.

    Finally, the statistical significance of the MLC is determined as previously, by permuting the individuals (that is, the (Ti(k)n,δi(k)n)).

    The spatial scan statistics described until now make the conventional assumption of independence of the observations. However, this assumption is very strong and rather unrealistic in practice, since the spatial nature of the observations leads to potential spatial autocorrelation, as specified by Tobler's first law of geography [16]. Moreover, for confidentiality reasons, survival data are often only available on an aggregated spatial level. Thus, we can distinguish two phenomena: (i) the survival times of individuals located in the same spatial unit may be correlated (intra-spatial unit correlation), for example, due to similar healthcare supply, and (ii) there may be the presence of spatial dependence at the level of spatial units. Thus, Frévent et al. [17] proposed a spatial scan statistic based on a Cox model with spatially correlated shared frailties. This takes into account both of the above-mentioned phenomena.

    Frévent et al. [17] considered the following Cox model with shared frailties:

    for all i(k)n within spatial unit sk,λi(k)n(t|φk)=λ0(t)exp(φk),

    where φ1,,φK are the shared frailties associated with the spatial locations s1,,sK, respectively, and include the cluster effect.

    Thus, Frévent et al. [17] decomposed the frailties into two terms:

     for a potential cluster w,φ(w)k=αw1skw+Xk where E(Xk)=0.

    The test hypotheses can be written as

    H0:wW,αw=0,vs.H1:wW,αw0.

    The shared frailties allow us to take into account the potential intra-spatial unit correlation. To take into account the potential spatial dependence between the spatial units, a spatial model, namely the conditional autoregressive (CAR) model, is assumed on the Xk:

    Xk|{X1,,Xk1,Xk+1,,XK}N(ρKl=1wk,lXlρKl=1wk,lXl+1ρ,σ2XρKl=1wk,lXl+1ρ),

    where ρ[0,1] is the spatial dependence parameter, and wk,l=1 if sk and sl share a common boundary and 0 if not. It should be noted that if ρ is assumed to be 0, then the model takes into account the intra-spatial unit correlation but not spatial dependence.

    The proposed method is decomposed into two stages. The first one consists in estimating the φk and ρ. To this end, Frévent et al. [17] proposed to estimate the φk and ρ under H0 and under each alternative hypothesis H(w)1, and then to extract the estimates {φ1,,φK,ρ} associated with the "best model" according to the Bayes factor criterion, i.e., under H0 if the Bayes factors comparing the models under each H(w)1 to the model under H0 never exceed 30, and the model under H(w)1 associated with the highest Bayes factor otherwise.

    Next, the second stage consists in computing a scan statistic on the φk. At this stage, the test hypotheses can be rewritten on φ=(φ1,,φK) as

    H0:φN(α1,σ2(0)A1),vs.H1:wW,φN(αw1w+αwC1wC,σ2(w)A1),αwαwC,

    where 1,1w and 1wC correspond, respectively, to the column vector composed only of 1, the column vector composed of 1 for the locations in w and 0 otherwise, and the column vector composed of 1 for the locations outside w and 0 otherwise. A is a squared matrix that results in the variance-covariance structure of the CAR model (see [17] for more details).

    Then, the spatial scan statistic and the MLC are defined as

    Λfrail.Cox=maxwW H(w)1(ˆαw,ˆαwC,^σ2(w))H0(ˆα,^σ2(0))=maxwW K2ln(^σ2(0)^σ2(w)),

    and

    MLCfrail.Cox=argmaxwW H(w)1(ˆαw,ˆαwC,^σ2(w))H0(ˆα,^σ2(0))=argmaxwW K2ln(^σ2(0)^σ2(w)),

    respectively, where

    ^σ2(0)=1K(φAφ2ˆα1Aφ+ˆα21A1),ˆα=1Aφ1A1,

    and

    ^σ2(w)=1K(φˆαw1wˆαwC1wC)A(φˆαw1wˆαwC1wC),
    ˆαwC=(1wCA1wC1wA1wC1wA1wC1wA1w)1(1wCAφ1wAφ1wA1wC1wA1w),ˆαw=1wAφˆαwC1wA1wC1wA1w.

    Since the distribution of the φk is known, Frévent et al. [17] generates M datasets of the φk under H0 to estimate the p-value associated with the MLC. It should be noted that using permutations of the φk is not possible here, as this would alter the spatial dependence of the data.

    In many applications it may be relevant to adjust cluster detection on covariates such as the age of individuals. Thus, this section presents the adjustment procedure proposed by the authors.

    For the exponential model, Huang et al. [11] considered the following model to adjust for p covariates Z(1),,Z(p):

    ln(Yi(k)n)=β0+β1Z(1)i(k)n++βpZ(p)i(k)n+εi(k)n,

    where εi(k)n is an error term with density function fε(e)=exp(e)exp[exp(e)]. β0,,βp can be estimated from the (Ti(k)n,δi(k)n) using an exponential regression. Next, the observed times are adjusted as

    Tadji(k)n=Ti(k)n×exp[ˆβ1(Z(1)i(k)nμ1)ˆβp(Z(p)i(k)nμp)],

    where μj=mini(k)n Z(j)i(k)n. Finally, the spatial scan statistic is applied on the (Tadji(k)n,δi(k)n).

    For the approaches based on the Weibull model, its generalization, or the log-Weibull model, the authors did not propose any adjustment on the covariates.

    When the models can directly include covariates, the conventional approach for adjusting cluster detection on covariates in spatial scan statistics is to fit the model under H0, in order to make the effect of covariates independent of the potential cluster.

    In the presence of covariates, the Cox model considered by Cook et al. [15] is as follows:

    λ(w)i(k)n(t)=λ(w)0(t)exp(β1Z(1)i(k)n++βpZ(p)i(k)n+αw1skw).

    Thus, the score statistic is now expressed as

    LR(w)cov(ˆβ1,,ˆβp)=Ucov(0;^β1,,ˆβp)Icov(0;ˆβ1,,ˆβp)

    with

    U(0;β1,,βp)=Kk=1Nkn=1δi(k)n[1skwslwNlm=1Ti(l)mTi(k)nexp(β1Z(1)i(l)m++βpZ(p)i(l)m)Kl=1Nlm=1Ti(l)mTi(k)nexp(β1Z(1)i(l)m++βpZ(p)i(l)m)]

    and

    I(0;β1,,βp)=Kk=1Nkn=1δi(k)n[slwNlm=1Ti(l)mTi(k)nexp(β1Z(1)i(l)m++βpZ(p)i(l)m)Kl=1Nlm=1Ti(l)mTi(k)nexp(β1Z(1)i(l)m++βpZ(p)i(l)m)(slwNlm=1Ti(l)mTi(k)nexp(β1Z(1)i(l)m++βpZ(p)i(l)m)Kl=1Nlm=1Ti(l)mTi(k)nexp(β1Z(1)i(l)m++βpZ(p)i(l)m))2].

    The spatial scan statistic is still defined as ΛCox=maxwW |LR(w)cov(ˆβ1,,ˆβp)|.

    The adjustment on covariates is performed similarly in the approach based on shared frailties. Frévent et al. [17] considered the following Cox model: for an individual i(k)n within spatial unit sk,

    λi(k)n(t|Z(1)i(k)n,,Z(p)i(k)n,φk)=λ0(t)exp(β1Z(1)i(k)n++βpZ(p)i(k)n+φk).

    β1,,βp are estimated under H0 and fixed to these values in the models under each alternative hypothesis H(w)1. Next, the estimates {φ1,,φK,ρ} of {φ1,,φK,ρ} retained are those obtained with the "best model" according to the Bayes factor criterion, and the scan step is performed on them, as described above.

    In this section, we illustrate the covariate adjustment procedure on the LeukSurv dataset studied by Henderson et al. [18] and available in the R package spBayesSurv.

    The dataset consists of 1,043 patients with acute myeloid leukemia within 24 districts in northwest England. For each patient, the survival time in days, status (dead or censored), age, sex, white blood cell count at diagnosis (wbc, truncated at 500), Townsend score (tpi, higher values indicate less affluent areas), and district of residence are available. The median survival times are presented in Figure 1.

    Figure 1.  Median survival time for acute myeloid leukemia in each district in the LeukSurv dataset.

    We applied the exponential scan statistic as well as the Cox-based approach with and without shared frailties, using the adjustment procedure described above. Cluster detection was first adjusted on age, sex and wbc at diagnosis, and then we also adjusted the clusters on the Townsend score. The estimated frailties are presented in Figure 2.

    Figure 2.  Estimated frailties φk with the i.i.d. and the CAR models when adjusting cluster detection on age, sex, and wbc (panels (a) and (b)) and when adjusting on age, gender, wbc, and tpi (panels (c) and (d)).

    The MLC, presented in Figure 3, is the same for all four models (exponential, Cox without and with i.i.d. or CAR frailties) and both adjustments considered. Tables 1 and 2, respectively, describe the MLC and its statistical significance for the four models and the two covariate adjustments. Similarly to Frévent et al. [17], the hazard ratio in Table 2 was estimated in a conventional Cox model adjusted for the covariates.

    Figure 3.  MLC detected by the exponential model, the Cox model without shared frailties, and the Cox model with i.i.d. and CAR shared frailties. The MLC is identical whatever the covariate adjustment or the approach considered.
    Table 1.  Description of the MLC detected by the exponential model, the Cox model without shared frailties, and the Cox model with i.i.d. and CAR shared frailties. The MLC is identical whatever the covariate adjustment or the approach considered.
    Inside the MLC Outside the MLC
    Number of spatial units 5 19
    Number of individuals 234 809
    Number of events 193 686
    Average patient age (years) 65.5 59.3
    Percentage of men 55.1% 51.7%
    Average patient wbc 33.3 40.1
    Average tpi -0.75 0.66

     | Show Table
    DownLoad: CSV
    Table 2.  Estimated p-value and hazard ratio for the MLC detected with each model and each covariate adjustment.
    Adjustment on age, sex, wbc Adjustment on age, sex, wbc, tpi
    Exponential model 0.001 0.004
    Cox model without shared frailties 0.001 0.001
    Cox model with i.i.d. shared frailties 0.025 0.067
    Cox model with CAR shared frailties 0.032 0.065
    Hazard ratio 0.65 0.67

     | Show Table
    DownLoad: CSV

    Although the MLC is identical whatever the method, it should be noted that when we take into account the correlation of observations (with the i.i.d. and the CAR shared frailties approaches), the MLC becomes less statistically significant. When the Townsend score is included in the adjustment, the cluster is no longer statistically significant with these two models (see Table 2).

    This article presents a review of the literature on scan statistics for survival data. The first approach developed is based on an exponential model. This has the disadvantage of assuming a constant hazard rate over time, which is rather simplistic. The parametric approaches developed later avoid this problem, as does the approach based on the Cox model, which is even more flexible. However, these scan statistics assume the strong and rather unrealistic, albeit popular, hypothesis of independence of the observations. A more recent approach that does not require this assumption and takes account of the potential correlation, in the data is then presented.

    Most applications of spatial scan statistics for survival data require the adjustment of cluster detection on covariates. This is therefore also detailed and illustrated on the LeukSurv dataset.

    Several drawbacks to the current spatial scan statistics approaches can be mentioned. The estimation of the p-value associated with the MLC is carried out using Monte Carlo simulations as the distribution of the spatial scan statistic is intractable under H0. This leads to high computation times, which limit the practical application on large datasets. A solution is to approximate the p-value using the method proposed by [19]. Briefly, this approach consists in estimating the p-value accurately from only a small number of Monte Carlo simulations. Further work would involve obtaining the distribution of the scan statistic under H0.

    Moreover, in practice, it is sometimes necessary to detect secondary clusters, i.e., other clusters that are also statistically significant. Several approaches have been suggested in the literature. For example, Kulldorff [2] proposed to perform statistical inference on the other potential clusters in exactly the same way as for the MLC, while other authors suggest removing the MLC from the data [20], before repeating the scan procedure. However, these approaches do not maintain the type I error. This is a challenging subject that requires further work.

    The author declare that she has not used Artificial Intelligence (AI) tools in the creation of this article.

    The author is grateful to the reviewers for their helpful comments, which improved the quality of the paper. The author would also like to thank Sophie Dabo-Niang and Michaël Genin, thanks to whom she developed an expertise in spatial scan statistics during her PhD.

    The author declares no conflict of interest in this paper



    [1] M. Kulldorff, N. Nagarwalla, Spatial disease clusters: detection and inference, Stat. Med., 14 (1995), 799–810. http://dx.doi.org/10.1002/sim.4780140809 doi: 10.1002/sim.4780140809
    [2] M. Kulldorff, A spatial scan statistic, Commun. Stat.-Theory Methods, 26 (1997), 1481–1496. http://dx.doi.org/10.1080/03610929708831995
    [3] I. Jung, M. Kulldorff, A. C. Klassen, A spatial scan statistic for ordinal data, Stat. Med., 26 (2007), 1594–1607. http://dx.doi.org/10.1002/sim.2607 doi: 10.1002/sim.2607
    [4] M. Kulldorff, L. Huang, L. Pickle, L. Duczmal, An elliptic spatial scan statistic, Stat. Med., 25 (2006), 3929–3943. http://dx.doi.org/10.1002/sim.2490 doi: 10.1002/sim.2490
    [5] L. Cucala, C. Demattei, P. Lopes, A. Ribeiro, A spatial scan statistic for case event data based on connected components, Comput. Stat., 28 (2013), 357–369. http://dx.doi.org/10.1007/s00180-012-0304-6 doi: 10.1007/s00180-012-0304-6
    [6] T. Tango, K. Takahashi, A flexibly shaped spatial scan statistic for detecting clusters, Int. J. Health Geogr., 4 (2005), 11. http://dx.doi.org/10.1186/1476-072X-4-11 doi: 10.1186/1476-072X-4-11
    [7] P. S. Lin, Y. H. Kung, M. Clayton, Spatial scan statistics for detection of multiple clusters with arbitrary shapes, Biometrics, 72 (2016), 1226–1234. http://dx.doi.org/10.1111/biom.12509 doi: 10.1111/biom.12509
    [8] M. Kulldorff, L. Huang, K. Konty, A scan statistic for continuous data based on the normal probability model, Int. J. Health Geogr., 8 (2009), 58. http://dx.doi.org/10.1186/1476-072X-8-58 doi: 10.1186/1476-072X-8-58
    [9] L. Cucala, A distribution-free spatial scan statistic for marked point processes, Spat. Stat., 10 (2014), 117–125. http://dx.doi.org/10.1016/j.spasta.2014.03.004 doi: 10.1016/j.spasta.2014.03.004
    [10] I. Jung, H. J. Cho, A nonparametric spatial scan statistic for continuous data, Int. J. Health Geogr., 14 (2015), 30. http://dx.doi.org/10.1186/s12942-015-0024-6 doi: 10.1186/s12942-015-0024-6
    [11] L. Huang, M. Kulldorff, D. Gregorio, A spatial scan statistic for survival data, Biometrics, 63 (2007), 109–118. http://dx.doi.org/10.1111/j.1541-0420.2006.00661.x doi: 10.1111/j.1541-0420.2006.00661.x
    [12] V. Bhatt, N. Tiwari, A spatial scan statistic for survival data based on Weibull distribution, Stat. Med., 33 (2014), 1867–1876. http://dx.doi.org/10.1002/sim.6075 doi: 10.1002/sim.6075
    [13] V. Bhatt, N. Tiwari, A spatial scan statistic for survival data based on generalized life distribution, Commun. Stat.-Theory Methods, 45 (2016), 5730–5744. http://dx.doi.org/10.1080/03610926.2014.948207 doi: 10.1080/03610926.2014.948207
    [14] I. Usman, R. J. Rosychuk, A log-Weibull spatial scan statistic for time to event data, Int. J. Health Geogr., 17 (2018), 20. http://dx.doi.org/10.1186/s12942-018-0137-9 doi: 10.1186/s12942-018-0137-9
    [15] A. J. Cook, D. R. Gold, Y. Li, Spatial cluster detection for censored outcome data, Biometrics, 63 (2007), 540–549. http://dx.doi.org/10.1111/j.1541-0420.2006.00714.x doi: 10.1111/j.1541-0420.2006.00714.x
    [16] W. R. Tobler, A computer movie simulating urban growth in the Detroit region, Econ. Geogr., 46 (1970), 234–240. http://dx.doi.org/10.2307/143141 doi: 10.2307/143141
    [17] C. Frévent, M. S. Ahmed, S. Dabo-Niang, M. Genin, A shared‐frailty spatial scan statistic model for time‐to‐event data, Biometrical J., 66 (2024), e202300200. http://dx.doi.org/10.1002/bimj.202300200 doi: 10.1002/bimj.202300200
    [18] R. Henderson, S. Shimakura, D. Gorst, Modeling spatial variation in leukemia survival data, J. Am. Stat. Assoc., 97 (2002), 965–972. http://dx.doi.org/10.1198/016214502388618753 doi: 10.1198/016214502388618753
    [19] A. M. Abrams, K. Kleinman, M. Kulldorff, Gumbel based p-value approximations for spatial scan statistics, Int. J. Health Geogr., 9 (2010), 61. http://dx.doi.org/10.1186/1476-072X-9-61 doi: 10.1186/1476-072X-9-61
    [20] Z. Zhang, R. Assunção, M. Kulldorff, Spatial scan statistics adjusted for multiple clusters, J. Probab. Stat., 2010. http://dx.doi.org/10.1155/2010/642379
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(140) PDF downloads(13) Cited by(0)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog