Research article Special Issues

Sex attribution, gender identity and quality of life in disorders of sex development due to 45,X/46,XY mosaicism: methods for clinical and psychosocial assessment

  • Received: 30 November 2014 Accepted: 23 March 2015 Published: 26 March 2015
  • The choice of sex in newborns with genital ambiguity is challenging. Information concerning the satisfaction of subjects with disorders of sex development from childhood to adulthood is required in order to address sex attribution policies. This study focuses on the methods that enable clinicians to investigate the alignment of phenotypes with gender identity and quality of life in people with disorders of this kind. These methods are presented as tools for studying a cohort of ten subjects with 45,X/46,XY mosaicism examined between 1985 and 2014 in the Department of Pediatric Endocrinology, Regina Margherita Children's Hospital, Turin: five children and five young adults, four reared as females and six as males. Clinical outcome was assessed by means of a clinical scoring system considering height, genital appearance, gonads and pubertal development. The Gender Identity Questionnaire for Children and the World Health Organization Quality of Life assessment were adopted. The four male children strongly identified with their assigned sex: male attribution was satisfactory until pubertal age. In young adults the clinical scores ranged between 55-65% for both genders. In the young male, the reduced sexual activity and the poor body image perception strongly affected his quality of life. The clinical scores of the two young female adults (60% for both) were not balanced with their quality of life scores (87.5% and 68.75% respectively): individual traits and social-familial context should be investigated in order to explain these differences. Clinical and psychosocial assessment in people with disorders of sex development is mandatory in order to plan care procedures; a detailed analysis requires adequate tools. Clinical scoring system, Gender Identity Questionnaire for Children and World Health Organization Quality of Life assessment can be used to investigate the alignment of physical phenotype with gender identity and quality of life.

    Citation: Roberta Risso, Silvia Einaudi, Chiara Crespi, Angela Caldarera, Francesca Verna, Emilio Merlini, Roberto Lala. Sex attribution, gender identity and quality of life in disorders of sex development due to 45,X/46,XY mosaicism: methods for clinical and psychosocial assessment[J]. AIMS Genetics, 2015, 2(2): 127-147. doi: 10.3934/genet.2015.2.127

    Related Papers:

    [1] Wei Wang, Tongping Shen, Jiaming Wang . Analysis of the impact of radiotherapy and surgical treatment regimens based on the SEER database on the survival outcomes of rectal cancer patients over 70 years. Mathematical Biosciences and Engineering, 2024, 21(3): 4463-4484. doi: 10.3934/mbe.2024197
    [2] Li-Sha Pan, Zacharia Ackbarkha, Jing Zeng, Min-Li Huang, Zhen Yang, Hao Liang . Immune marker signature helps to predict survival in uveal melanoma. Mathematical Biosciences and Engineering, 2021, 18(4): 4055-4070. doi: 10.3934/mbe.2021203
    [3] Yi-Wen Chang, Kang-Ping Lu, Shao-Tung Chang . Cluster validity indices for mixture hazards regression models. Mathematical Biosciences and Engineering, 2020, 17(2): 1616-1636. doi: 10.3934/mbe.2020085
    [4] Xin Yu, Jun Liu, Ruiwen Xie, Mengling Chang, Bichun Xu, Yangqing Zhu, Yuancai Xie, Shengli Yang . Construction of a prognostic model for lung squamous cell carcinoma based on seven N6-methylandenosine-related autophagy genes. Mathematical Biosciences and Engineering, 2021, 18(5): 6709-6723. doi: 10.3934/mbe.2021333
    [5] Peng Lu, Rong He, Dianhan Chen . Exploring S-shape curves and heterogeneity effects of rumor spreading in online collective actions. Mathematical Biosciences and Engineering, 2022, 19(3): 2355-2380. doi: 10.3934/mbe.2022109
    [6] Huili Yang, Wangren Qiu, Zi Liu . Anoikis-related mRNA-lncRNA and DNA methylation profiles for overall survival prediction in breast cancer patients. Mathematical Biosciences and Engineering, 2024, 21(1): 1590-1609. doi: 10.3934/mbe.2024069
    [7] Feiyan Ruan, Xiaotong Ding, Huiping Li, Yixuan Wang, Kemin Ye, Houming Kan . Back propagation neural network model for medical expenses in patients with breast cancer. Mathematical Biosciences and Engineering, 2021, 18(4): 3690-3698. doi: 10.3934/mbe.2021185
    [8] Pei Zhou, Caiyun Wu, Cong Ma, Ting Luo, Jing Yuan, Ping Zhou, Zhaolian Wei . Identification of an endoplasmic reticulum stress-related gene signature to predict prognosis and potential drugs of uterine corpus endometrial cancer. Mathematical Biosciences and Engineering, 2023, 20(2): 4018-4039. doi: 10.3934/mbe.2023188
    [9] Siyuan Tian, Yinan Hu, Chunmei Yang, Jiahao Yu, Jingyi Liu, Guoyun Xuan, Yansheng Liu, Keshuai Sun, Miao Zhang, Shuoyi Ma, Yulong Shang, Xia Zhou, Ying Han . A novel immune checkpoint-related gene signature for hepatocellular carcinoma to predict clinical outcomes and therapeutic response. Mathematical Biosciences and Engineering, 2022, 19(5): 4719-4736. doi: 10.3934/mbe.2022220
    [10] Jing Liu, Xuefang Zhang, Ting Ye, Yongjian Dong, Wenfeng Zhang, Fenglin Wu, Huaben Bo, Hongwei Shao, Rongxin Zhang, Han Shen . Prognostic modeling of patients with metastatic melanoma based on tumor immune microenvironment characteristics. Mathematical Biosciences and Engineering, 2022, 19(2): 1448-1470. doi: 10.3934/mbe.2022067
  • The choice of sex in newborns with genital ambiguity is challenging. Information concerning the satisfaction of subjects with disorders of sex development from childhood to adulthood is required in order to address sex attribution policies. This study focuses on the methods that enable clinicians to investigate the alignment of phenotypes with gender identity and quality of life in people with disorders of this kind. These methods are presented as tools for studying a cohort of ten subjects with 45,X/46,XY mosaicism examined between 1985 and 2014 in the Department of Pediatric Endocrinology, Regina Margherita Children's Hospital, Turin: five children and five young adults, four reared as females and six as males. Clinical outcome was assessed by means of a clinical scoring system considering height, genital appearance, gonads and pubertal development. The Gender Identity Questionnaire for Children and the World Health Organization Quality of Life assessment were adopted. The four male children strongly identified with their assigned sex: male attribution was satisfactory until pubertal age. In young adults the clinical scores ranged between 55-65% for both genders. In the young male, the reduced sexual activity and the poor body image perception strongly affected his quality of life. The clinical scores of the two young female adults (60% for both) were not balanced with their quality of life scores (87.5% and 68.75% respectively): individual traits and social-familial context should be investigated in order to explain these differences. Clinical and psychosocial assessment in people with disorders of sex development is mandatory in order to plan care procedures; a detailed analysis requires adequate tools. Clinical scoring system, Gender Identity Questionnaire for Children and World Health Organization Quality of Life assessment can be used to investigate the alignment of physical phenotype with gender identity and quality of life.


    One of the main goals of statistical modeling is to understand the effect of some explanatory variables, $ X_i, i = 1, ..., p $, on a dependent variable, $ Y $, also known as the response. This type of dependence is often modeled using a generalized linear regression model (GLM) [1] that imposes a linear relationship between explanatory and dependent variables without specifying in advance the function that links them. Generalized additive models (GAMs) [2,3] offer an extension of the GLM through the incorporation of nonlinear forms for the explanatory variables. GAM has been widely used as an effective technique for conducting nonlinear regression analysis in various fields such as economics, finance, the environment, medicine, and biology. In a wide range of these applications, problems can be observed which it might be useful to compare two or more regression curves. This often occurs when it is necessary to check if the effect of a continuous covariate on the response varies across groups defined by levels of a categorical variable. Testing the hypothesis of equality of the regression functions is a topic of statistical inference that has been widely researched in the literature. One important review on this topic may be found in [4] (see their Section 7). Examples of methods for clustering regression curves using $ L^2 $ distance and other metrics are in [5,6,7]. For nonparametric models, most approaches rely on smoothing techniques, such as splines or Nadaraya-Watson [8], in the construction of the nonparametric estimators of regression functions. The general idea is to use these nonparametric estimators directly, either by contrasting all individual estimators with the pooled estimator, or by performing pairwise comparisons. Other authors considered the use of empirical processes to avoid the necessity of selecting a smoothing parameter required in the construction of the nonparametric estimators of regression functions [9,10,11,12,13,14]. The idea behind these tests is to use the distribution of the residuals and to compare them via Kolmogorov-Smirnov or Cramér-von Mises type statistics. ANOVA-type test statistics have also been used to compare regression functions (see for example [15] and [16]). Reference [17] illustrates how the SiZer exploratory tool is capable of comparing multiple curves based on the residuals. In [18] a kernel-based nonparametric approach is considered while [19] proposed a testing procedure for single index models.

    A second, though related question is how to determine groups among a series of regression curves when the null hypothesis of equality of the regression functions is rejected. Though the aforementioned methods can be used to compare regression curves, to the best of our knowledge, those methods cannot be used to determine groups among a series of regression curves, for example, among groups defined by the levels of a categorical variable. Some of them are not recommended when confronted with a considerable number of curves. If the null hypothesis of equality of curves is rejected, then this leads to the clear conclusion that at least one regression curve is different. However, these methods cannot be used to ascertain whether individuals can be grouped into a reduced number of classes whose members all share the same regression function or if all the regression curves are different from each other. One naïve approach would be to perform pairwise comparisons. However, this approach would lead to a large number of comparisons (e.g., 7 groups would lead to 21 pairwise comparisons). This could be done but without the possibility of determining groups with similar regression curves.

    A statistical procedure to estimate the unknown group structure was proposed in [20] and later in [21]. The first paper of these authors was about financial time series whereas the second deals with environmental statistics. More recently, the comparison of survival curves was addressed [22]. Similarly, in this paper, we propose an approach that allows determining groups that share the same regression function with an automatic selection of their number. The proposed method can be used, for instance, to establish groups with similar monotonic trends in the regression curves (e.g., over the levels of categorical variables). The proposed methodology will be shown in the framework of the generalized additive model with a binary outcome and a logit link function. The generalization of the proposed methods to other link functions is possible.

    The remainder of the paper is organized as follows: The following section provides the notation and the methodological background. Then, the performance of the proposed methods is investigated through simulations, and their usage is illustrated through the analysis of a real data set from neuronal activity in the prefrontal cortex of monkeys during a discrimination task. Finally, the last section contains a discussion and the main conclusions of the work.

    Let $ Y $ denote the binary (0/1) target response and $ {\bf{X}} = (X_1, \ldots, X_p) $ a set of $ p $ continuous covariates. In this regression context, the logistic generalized additive model expresses the conditional probability $ P({\bf{X}}) = P(Y = 1| {\bf{X}}) $ as

    $ logP(X)1P(X)=α+pj=1fj(Xj)
    $
    (2.1)

    where $ \alpha $ is a constant and each partial function $ f_j $ represents the effect of the covariate $ X_j $. These unknown smooth partial functions are modeled from the data without specifying in advance any parametric structure. Therefore, the GAM in (2.1) combines flexibility with interpretability, in which each of the additive components describes the influence of each covariate separately.

    In practice, the effect of a continuous covariate $ X_j $ can vary across the levels $ 1, \ldots, K $ of a categorical factor $ F $. For simplicity of notation, here we consider that only the effect of the last covariate, $ X_p $, depends on the levels of $ F $, so the pure GAM in (2.1) can incorporate this type of factor-by-curve relationship as follows:

    $ logit(F,X)=logP(Y=1|F,X)1P(Y=1|F,X)=α+p1j=1fj(Xj)+{g1(Xp)ifF=1g2(Xp)ifF=2gK(Xp)ifF=K
    $
    (2.2)

    where $ g_1, \ldots, g_K $ represent the specific effect of $ X_p $ for each of the possible $ K $ levels established by the factor $ F $.

    The study of the partial functions $ g_1, \ldots, g_K $ can be useful in the comparison of two or more groups, which is an important problem associated with statistical inference. Interest centers on the null hypothesis $ H_0:g_1 = \ldots = g_K $, namely, that the effect of $ X_p $ does not depend on the levels of the factor $ F $. When the equality of the $ K $ curves is rejected, leading to the clear conclusion that at least one curve is different, it can be interesting to ascertain whether groups can be performed or, by contrast, that all these curves are different from each other. More clearly, if $ H_0 $ is rejected, it could be interesting to determine: a) if any pair of curves are different from each other, that is $ g_i \neq g_j $, for all $ i \neq j \in \{1, \ldots, K\} $, or b) if the indices in $ \{1, \ldots, K\} $ can be grouped in a partition of $ J (J < K) $ groups $ {\bf{I}} = ({\bf{I}}_1, \ldots, {\bf{I}}_J) $ with $ J < K $, so that $ g_i = g_j $ for each $ i, j \in I_k $, for some $ k $. Note that $ {\bf{I}} $ must be a partition of $ \{1, \ldots, K\} $ and, therefore, must satisfy $ {\bf{I}}_1 \cup \ldots \cup {\bf{I}}_J = \{1, \ldots, K\} $ and $ {\bf{I}}_i \cap {\bf{I}}_j = \emptyset $ for all $ i \neq j \in {\bf{I}} $.

    In some situations, the possible partition $ {\bf{I}} = ({\bf{I}}_1, \ldots, {\bf{I}}_J) $, can be established in advance, and the interest is to validate such partition by testing the null model

    $ logit(F,X)=α+p1j=1fj(Xj)+{g1(Xp)ifFI1g2(Xp)ifFI2gK(Xp)ifFIJ
    $
    (2.3)

    versus the general model given in (2.2). However, in general, the partitions are unknown and will have to be estimated from the data.

    Before explaining the procedure for determining the groups, let us introduce some notation. Given a partition $ {\bf{I}} = ({\bf{I}}_1, \ldots {\bf{I}}_J) $, we define a function $ G_{\bf{I}}:\{1, \ldots, K\} \rightarrow \{1, \ldots, J\} $ so $ G_{\bf{I}}(k) = j $ if $ k\in {\bf{I}}_j $. Now, given a sample $ \{F_i, {\bf{X}}_i, Y_i\}_{i = 1}^n $ and using the estimates $ \hat g_k $ ($ k = 1, \ldots, K $) of the model in (2.2), the estimated partition $ {\hat{\bf{I}}} = ({\hat{\bf{I}}}_1, \ldots {\hat{\bf{I}}}_J) $ can be obtained by minimizing the following Cramér-von Mises type distance

    $ minI=(I1,,IJ)Kk=1x(ˆgk(x)ˆCGI(k)(x))2dx
    $
    (2.4)

    Alternatively, the estimated partition can be obtained using the Kolmogorov-Smirnov type distance

    $ minI=(I1,,IJ)Kk=1x|ˆgk(x)ˆCGI(k)(x)|dx
    $
    (2.5)

    In both cases the estimated centroids for $ j = 1, \ldots, J $ are defined as

    $ ˆCj(x)=Kk=1ˆgk(x)I{GI(k)=j}Kk=1I{GI(k)=j}
    $
    (2.6)

    The above minimization problems can have a high computational cost for large values of $ K $, because they require the evaluation of all the different combinations of the $ K $ curves into $ J $ groups. To solve the quadratic minimization problem in (2.4) and L1-norm minimization problem in (2.5) we propose the use of the k-means or k-medians algorithms. In both cases, the partial functions have to be estimated in a common grid of $ X_p $, $ x_1^\bullet < \ldots < x_N^\bullet $, of size $ N $, leading to a matrix

    $ \left ({ ˆg1(x1)ˆgK(x1)ˆg1(xN)ˆgK(xN)
    } \right )_{N \times K} $

    This matrix will be the input of both heuristic methods, k-means and k-medians, and from these, the estimated partition $ {\hat{\bf{I}}} = ({\hat{\bf{I}}}_1, \ldots, {\hat{\bf{I}}}_{J}) $ is obtained.

    The procedure to obtain the estimation partition $ {\hat{\bf{I}}} = ({\hat{\bf{I}}}_1, \ldots, {\hat{\bf{I}}}_J) $ for a given $ J $ ($ J < K $) has been explained in the previous section. This section focuses on determining the number of groups for the regression functions. With this goal in mind, one possible approach consists in fitting the model in $ (2.3) $ for each value of $ j = 1, \ldots, J $, and selecting the number of groups $ K $ that minimizes the Bayesian Information Criterion (BIC). BIC is one of the most widely used tools in statistical model selection. Its popularity is derived from its computational simplicity and effective performance in many modeling approaches. Our simulations on curve clustering demonstrate that the BIC criterion can be considered a valid option, performing quite satisfactorily choosing the correct number of clusters in simulation scenarios with a large number of curves and complex data sets. Another possibility would be to test, for a given $ J $, the null hypothesis $ H_0 (J) $ that at least one partition $ {\bf{I}} $ of length $ J $ exists, so that the model in (2.3) is fulfilled, against the alternative that for any J-partition at least a group $ {\bf{I}}_j $ exists in which $ g_m \neq g_n $ for some $ m, n \in {\bf{I}}_j $. In order to test $ H_0 (J) $, for a given sample $ \{F_i, {\bf{X}}_i, Y_i\}_{i = 1}^n $, we consider the following two statistics

    $ DCM=Kk=1ni=1(ˆgk(Xip)ˆgGˆI(k)(Xip))2
    $
    (2.7)
    $ DKS=Kk=1ni=1|ˆgk(Xip)ˆgGˆI(k)(Xip)|
    $
    (2.8)

    where the estimated partition $ {\hat{\bf{I}}} = ({\hat{\bf{I}}}_1, \ldots, {\hat{\bf{I}}}_J) $ is obtained by solving the minimization problems in (2.4) or (2.5), and $ \hat g^\circ_j $ functions are obtained from model (2.3) using the same partition. Alternatively, we can use the deviance as an appropriate measure of discrepancy between observed and fitted values and consider the following likelihood ratio test

    $ DLR=ni=1Devi(Yi,ˆPi)ni=1Devi(Yi,ˆPi)ni=1Devi(Yi,ˆPi)
    $
    (2.9)

    which compares the deviance of the null model (2.3) with the deviance of the general model in (2.2), where $ \hat P_i^\circ $ and $ \hat P_i $ are the estimates values of the true probabilities $ P_i = P(Y_i = 1| {\bf{F}}_i, {\bf{X}}_i) $ under the null and general model, respectively. The individual deviance $ \mbox{Dev}_i (Y_i, \hat P_i) $ is defined as $ \mbox{Dev}_i (Y_i, \hat P_i) = -2 \left({Y_i \log \hat P_i +(1-Y_i) \log (1-\hat P_i)}\right) $.

    Note that if $ H_0(J) $ is verified, the value of the test statistic $ D $ should be close to zero. The decision rule based on each of the three statistics, $ D $, consists in rejecting the null hypothesis if $ D $ is greater than $ (1-\alpha) $-percentile obtained under the null hypothesis. Nevertheless, the theory for determining such percentiles is not closed. To approximate the distributions of the test statistics, resampling methods such as the bootstrap method can be applied [23,24,25,26]. The binary bootstrap used in our paper is a particular case of the bootstrap techniques suggested by [27] and [28] for inference in nonparametric models with response belonging to the binary family. The binary bootstrap involves the following steps:

    Step 1. Using the original sample data $ \{F_i, {\bf{X}}_i, Y_i\}_{i = 1}^n $, compute the test statistics $ D $ as explained above. Then, obtain for $ i = 1, \ldots, n $ the estimated $ \hat P_i^\circ $ of the true probabilities $ P_i = P(Y_i = 1| {\bf{F}}_i, {\bf{X}}_i) $ obtained from the null model in (2.3).

    Step 2. For $ b = 1, \ldots, B $, generate the bootstrap sample $ \{F_i, {\bf{X}}_i, Y_i^\bullet\}_{i = 1}^n $, with $ Y_i^\bullet\in Bernoulli (\hat P_i^\circ) $, and compute the bootstrap statistics $ D^{\bullet, b} $. Note that the estimation of the partition is required in this Step.

    Finally, the decision rule, based on each test statistic $ D $, consists in rejecting the null hypothesis if $ D > \hat D^{1-\alpha} $, where $ \hat D^{1-\alpha} $ is the empirical $ (1-\alpha) $-percentile of the values $ D^{\bullet 1}, \ldots, D^{\bullet B} $ obtained previously.

    The bootstrap-based test introduced here can be very useful to automatically determine the number of groups $ J $. Specifically, the rule proposed is as follows: the procedure begins by testing $ H_0(1) $. If this hypothesis is true, we decide that $ J = 1 $ (all regression curves are equal). When the previous hypothesis is rejected, it will be necessary to test $ H_0(2) $. If this new hypothesis is not rejected, we decide that $ J = 2 $. When $ H_0(2) $ is rejected, it will be required to test $ H_0(3) $ and so on until a certain $ H_0(J) $ is not rejected. Finally, note that we are aware that this approach could deal with the problem of multiple hypothesis testing where a set of $ J $ p-values corresponding to the $ J $ null hypotheses, $ H_0(1), H_0(2), \ldots, H_0(J) $, are given. Even though several methods have been proposed to deal with this problem (see e.g. [29] for an introduction to this area), there is still a challenge because there is no information about the minimum number of tests needed to apply these techniques. Accordingly, and considering that our rule finishes with a low number of them, we have not considered this problem.

    This section reports the results of a simulation study conducted to evaluate the practical performance of the proposed methodology. The response $ Y $ was generated under model (2.3) with

    $ logP(Y=1|F,X)1P(Y=1|F,X)={g1(X)=1.5XifFI1={1,2,3}g2(X)=2X23ifFI2={4,5}g3(X)=2sin(2X)2ifFI3={6,7}g4(X)=2cos(2X)ifFI4={8,9}g5(X)=2cos(2X)+aexp(0.5X)ifFI5={10,11,12,13}g6(X)=42XifFI6={14,15}
    $
    (2.10)

    The categorical covariate $ X $ was drawn from an uniform distribution $ U[-2, 2] $, and its factors, $ F $, were generated by taking a random value in $ \{1, \ldots, 15\} $ with associated probabilities $ {\bf{p}} = (p_1, \ldots, p_{15}) = {\bf{n}}/ \sum n_k $, $ k = 1, \ldots, 15 $, where:

    $ n=(n1,,n15)=(1.0,2.0,1.0,1.5,1.0,2.0,1.5,2.0,2.0,1.0,1.0,1.0,1.5,1.5,1.0)
    $

    In this way, and as can be seen in Table 1, we have considered unequal sample sizes for each $ (k = 1, \ldots, 15) $ curve.

    Table 1.  Sample size $ n_k $ for each regression curve $ k $ $ (k = 1, \ldots, 15 $) for different total sample sizes $ n = \sum n_k $.
    $ n $ $ n_1 $ $ n_2 $ $ n_3 $ $ n_4 $ $ n_5 $ $ n_6 $ $ n_7 $ $ n_8 $ $ n_9 $ $ n_{10} $ $ n_{11} $ $ n_{12} $ $ n_{13} $ $ n_{14} $ $ n_{15} $
    300 14 28 14 21 14 28 21 28 28 14 14 14 21 21 14
    500 23 47 23 35 23 47 35 47 47 23 23 23 35 35 23
    1000 47 95 47 71 47 95 71 95 95 47 47 47 71 71 47
    2000 95 190 95 142 95 190 142 190 190 95 95 95 142 142 95
    4000 190 380 190 285 190 380 285 380 380 190 190 190 285 285 190

     | Show Table
    DownLoad: CSV

    We explore the validity of the proposed tests assuming that we aim to test if the $ K = 15 $ regression curves can be grouped in five ($ J = 5 $) groups. To study the size and power of the proposed tests, different values were considered for $ a $, ranging from 0 to 2. It should be noted that $ g_5(X) = g_4(X)+a \exp(0.5X) $, so a value of $ a = 0 $ corresponds to a null hypothesis of five groups, while a value of $ a > 0 $ corresponds to a null hypothesis of six groups.

    The data analyzed in this section come from laboratory experiments of the extra-cellular single unit activity in the prefrontal cortex of a monkey. These experiments were conducted in the Laboratory of Neurophysiology of the University of Vigo (Spain). The monkey was trained to discriminate between different stimuli. The stimuli consisted of stationary bright line segments presented on a monitor screen in front of the monkey that changed their orientation over time.

    A trial was initiated when the monkey pressed a lever key with its right hand and then two stimuli (reference and test), each of 500 ms duration, were presented in sequence with a fixed inter-stimulus interval (ISI: 1000 ms). At the end of the second stimulus, the subject released the key in a 1200 ms time window and pressed one of the two switches (left or right), indicating whether the orientation of the second stimulus was clockwise (right) or counter-clockwise (left) to the reference stimulus. While the monkey worked on the task, its extracellular unit activity was recorded. The monkey was rewarded for correct discrimination.

    To gather enough data and to account for the cell response variability, the neuron was recorded over a number of $ T = 80 $ trials. For each of the trials, we consider the angle (trial specific), TA, corresponding to the test stimulus. The reference angle is $ 90^\circ $. The following TAs were considered in the experiment:

    ● $ TA \in \{ 78^\circ, 102^\circ\} $: test stimuli more separated from the reference, and therefore very easy to discriminate

    ● $ TA \in \{ 81^\circ $, $ 99^\circ\} $: test stimuli is easy to discriminate

    ● $ TA \in \{ 84^\circ $, $ 96^\circ\} $: test stimuli is difficult to discriminate

    ● $ TA \in \{ 87^\circ, 93^\circ\} $: test stimuli closest to the reference, therefore more difficult to discriminate

    In this experiment, the outcome of interest is the neuronal activity in the interval $ t \in [t_{min}, t_{max}] = [-500, 3000] $ in ms. At each instant $ t $ and trial $ j = 1, \ldots, T $, this outcome may then be represented by a temporal binary sequence, $ Y_t^j $, where $ Y_t^j $ = 1 if there is a spike in $ [t, t+1) $ ms and 0 otherwise. The explanatory variables are the time $ t $ when a spike was detected and the interval of time $ \Delta t $ since the last spike. The last one is an adjustment variable that takes into account that the neuron might fire more easily if it has been fired recently. Accordingly, for each trial, the data set consists of the following information

    $ \left \{TA, \left(t, \Delta t \right), Y_t \right\}_{t = t_{min}}^{t_{max}} $

    We use generalized additive modeling to assess whether the association between neural activity and decision-making depends on the difficulty of the discrimination task related to the angle of the test stimulus. We consider the following GAM:

    $ log(P(TA,t,Δt)1P(TA,t,Δt))=α+f(Δt)+{g1(t)ifTA=78g2(t)ifTA=102g3(t)ifTA=81g4(t)ifTA=99g5(t)ifTA=84g6(t)ifTA=96g7(t)ifTA=87g8(t)ifTA=93
    $
    (2.11)

    where $ P(TA, t, \Delta t) = P(Y_t = 1|F, t, \Delta t) $, $ \alpha $ is a fixed parameter, and $ g_k $ for $ k = 1, \ldots, 8 $ are the specific time functions associated to each of the test angles considered in the study.

    Type I error rates and power values of the proposed tests as a function of $ a $ are shown in Figure 1. They were calculated from $ 1000 $ simulation runs using $ B = 400 $ bootstrap samples at each repetition. We compare the results of the three test statistics under sample sizes $ n = 1000 $ and 2000, at the significance levels of 0.05 and 0.10. As can be seen in Figure 1, the three curves show the expected behavior pattern, with an increase in the power as $ a $ increases, and an improvement in it as the sample size grows.

    Figure 1.  Rejection probabilities (power of the hypothesis test) of the proposed test statistics as a function of $a$, for sample size $n = 1000$ and $n = 2000$ (upper and lower panels, respectively) at a 0.05 and 0.10 (left and right panels, respectively) significant levels (red line).

    The test statistic based on the likelihood ratio test (labeled as $ D_{\rm{LR}} $) is that with the highest values of power for $ n = 1000 $, but the differences between the three test statistics are minimal for $ n = 2000 $. The Cramér-von Mises–type statistic test (labeled as $ D_{\rm{CM}} $) revealed a poorer behavior with lower rejection probabilities for a sample size of 1000. Values for type Ⅰ error, for the three test statistics and for different significance levels and different sample sizes ($ n = 1000, 2000 $, and 3000), are reported in Table 2. The three test statistics work satisfactorily according to the type Ⅰ error, coming quite close to the nominal level regardless of the sample size.

    Table 2.  Estimated type Ⅰ error (in %) and nominal level percentages (1, 5, 10, 15, and 20) for different sample sizes.
    level $ D_{\rm{CM}} $ $ D_{\rm{KS}} $ $ D_{\rm{LR}} $
    n = 1000 1 1.4 1.3 1.0
    5 5.5 5.3 5.6
    10 9.7 9.8 11.5
    15 14.5 15.3 16.5
    20 18.7 20.0 21.1
    n = 2000 1 0.9 0.7 0.8
    5 4.2 3.8 5.7
    10 8.4 8.0 9.4
    15 12.8 12.4 15.6
    20 16.5 18.0 21.8
    n = 3000 1 0.9 0.7 1.2
    5 4.2 4.0 6.5
    10 8.8 7.6 11.1
    15 13.0 12.1 16.2
    20 16.2 17.0 22.3

     | Show Table
    DownLoad: CSV

    In Table 3, we report results that can be used to evaluate the accuracy of the bootstrap-based algorithm introduced in Section 2.3. Again, different values of $ a $ were considered, ranging from 0 to 2. Recall that the value $ a = 0 $ corresponds to the null hypothesis, which assumes that the fifteen regression functions can be classified into five groups, while, when $ a\neq0 $, the number of groups is six. Note that to select the correct number $ J $ of groups of regression functions, the bootstrap-based algorithm must first reject the first null hypothesis, $ H_0(1) $, then reject the second hypothesis, $ H_0(2) $, and so on until it accepts $ H_0(5) $ if $ a = 0 $, or until $ H_0(6) $ when $ a > 0 $. Results shown in Table 3 for the three test statistics, using a nominal level of 5%, display the number of times that the procedure selects the number of groups $ J $. Results in bold denote the correct classifications according to model (2.3), revealing the high accuracy of the proposed bootstrap-based algorithm for $ a = 0 $, showing the correct number of groups in percentages quite close to the nominal level. As the value of $ a $ increases, so does the percentage of cases in which the proposed method suggests six groups. When comparing the three test statistics, it can be observed that all have a similar performance with a small advantage for the test statistic based on the likelihood ratio test (labeled as $ D_{\rm{LR}} $).

    Table 3.  Percentage of simulations (over 1000 repetitions) that the bootstrap-based algorithm estimated the number of groups J. Results for the three test statistics were obtained for sample size $ n = 2000 $ and different values of $ a $, with a nominal level of 5%.
    $ D_{\rm{CM}} $ $ D_{\rm{KS}} $ $ D_{\rm{LR}} $
    $ a $ J = 5 J = 6 J = 7 J = 5 J = 6 J = 7 J = 5 J = 6 J = 7
    0.0 95.8 3.1 1.2 96.2 2.5 1.3 94.3 3.4 2.3
    0.2 95.5 3.2 1.3 94.7 3.8 1.6 94.2 3.2 2.6
    0.4 91.1 6.1 2.8 89.5 7.6 2.9 87.6 7.9 4.5
    0.6 79.3 17.6 3.1 75 21.2 3.8 72.4 22.2 5.4
    0.8 53.9 42.2 3.9 49.2 46.7 4.1 40.7 53.9 5.4
    1.0 12.3 83 4.7 17.6 78.5 3.8 10.2 84.5 5.3
    1.2 1 94.5 4.5 4 91.8 4.2 1.4 93.6 4.9
    1.4 0.1 95.3 4.6 1.3 95.2 3.5 0.2 94.6 5.2
    1.6 0.1 96.1 3.8 0.7 95.4 3.9 0.2 94.8 5.1
    1.8 0.1 95.7 4.2 1.2 94.7 4.1 0.3 94.9 4.8
    2.0 0.1 96.5 3.4 1 95.4 3.7 0.2 94.6 5.2

     | Show Table
    DownLoad: CSV

    According to our simulation scenario, the $ K = 15 $ regression curves should be grouped into five groups if $ a = 0 $ and six groups otherwise. In addition to this, the assignment of each curve should follow model (2.3). Results reported in Table 4 can be used to check if the proposed algorithm is performing well by assigning each of the fifteen regression curves to the correct group. Specifically, the results correspond to a value of $ a = 0.7 $, for different sample sizes. The entries in the first column specify the number of regression curves with an incorrect classification, whereas the remaining columns show the corresponding percentages for different sample sizes. Therefore, the results reported in the first line correspond to the ideal situation in which none of the fifteen curves has an incorrect classification. As we can see, the percentages in this line increase with an increase in the sample size, up to rates of success of around 95%, coming quite close to that established. In addition, we can also see that the number of incorrect classifications decreases with an increase in the sample size. For $ n = 2000 $, for example, the percentages of incorrect classifications have a sharp decrease whereas for $ n = 4000 $ only 1 or 2 curves have incorrect classifications with low percentage values of 4.9 and 0.6%, respectively.

    Table 4.  Percentages of misclassifications over the total of the 15 regression curves. Scenario with $ a = 0.7 $ for different sample sizes.
    n.misclass $ n=300 $ $ n=500 $ $ n=1000 $ $ n=2000 $ $ n=4000 $
    0 0.3 0.9 20.4 67.3 94.6
    1 2.7 6.1 26.4 22.7 4.9
    2 5.1 12.0 15.0 4.3 0.6
    3 12.9 32.9 26.4 4.9 0.0
    4 15.4 16.0 2.6 0.1 0.0
    5 18.1 14.0 4.1 0.3 0.0
    6 17.0 8.1 3.1 0.1 0.0
    7 10.4 4.9 0.6 0.0 0.0
    8 8.9 3.6 1.1 0.3 0.0
    9 5.6 1.0 0.1 0.0 0.0
    10 2.3 0.4 0.0 0.0 0.0
    11 1.1 0.1 0.0 0.0 0.0
    12 0.1 0.0 0.0 0.0 0.0
    13 0.0 0.0 0.0 0.0 0.0
    14 0.0 0.0 0.0 0.0 0.0
    15 0.0 0.0 0.0 0.0 0.0

     | Show Table
    DownLoad: CSV

    To measure how well the procedure assigns each of the $ k = 1, ..., 15 $ to their correct group $ {\bf{I}}_{1}, ..., {\bf{I}}_{6} $, we report in Table 4 the percentage of incorrect classifications for each regression curve, for different sample sizes, considering $ a = 0.7 $. While it can be seen that the procedure performs quite well for large sample sizes, it also reveals some difficulties in the case of small samples.

    Note that, according to Table 1, for a sample size of $ n = 300 $ the number of observations in each group is between 14 and 28. This can explain the high misclassification rates for this sample size, in particular in the case of regression curves with fewer observations. As the sample size increases, the percentage of incorrect classifications for each regression curve decreases.

    In Table 5 we show which group each of the $ k = 1, \ldots, 15 $ regression curves is assigned to. Results reported in this table were obtained for $ a = 0.7 $ and four sample sizes. The numbers in bold denote the correct classifications according to model (2.3). The results are in agreement with the previous findings, revealing the accuracy of the proposed method for higher sample sizes. We can also see that for all the samples, the assignment is, in most cases, correct. As expected, higher misclassification rates can be observed for those regression curves ($ k = 10, 11, 12, 13 $) that belong to group $ {\bf{I}}_5 $ but were classified in group $ {\bf{I}}_4 $. Results not reported here show that this misclassification is less evident as the value of $ a $ increases.

    Table 5.  Assignment of curves to groups. The entries in the table specify the percentage of times that curve $ K $ ($ k = 1, \ldots, 15 $) is classified in cluster $ {\bf{I}}_j $ ($ j = 1, \ldots, 6 $). Results for four different sample sizes $ n $ with $ a = $ 0.7.
    n = 500 n = 1000
    $ K $ $ {\bf{I}}_1 $ $ {\bf{I}}_2 $ $ {\bf{I}}_3 $ $ {\bf{I}}_4 $ $ {\bf{I}}_5 $ $ {\bf{I}}_6 $ $ {\bf{I}}_1 $ $ {\bf{I}}_2 $ $ {\bf{I}}_3 $ $ {\bf{I}}_4 $ $ {\bf{I}}_5 $ $ {\bf{I}}_6 $
    1 80.9 1.8 5.2 4.1 7.6 0.3 95.3 0.0 1.0 0.8 2.9 0.0
    2 82.4 1.0 4.0 4.2 8.4 0.0 94.6 0.0 0.4 1.0 4.1 0.0
    3 79.7 3.5 4.8 4.5 7.1 0.3 94.5 0.2 0.7 1.0 3.6 0.0
    4 2.0 89.9 5.8 1.4 0.8 0.0 0.5 97.0 0.5 0.7 1.3 0.0
    5 1.6 89.2 5.8 2.3 0.7 0.4 0.6 97.8 0.7 0.6 0.2 0.0
    6 3.4 1.8 89.8 3.4 1.6 0.0 0.2 0.1 97.7 1.4 0.5 0.0
    7 3.0 2.3 89.9 2.8 2.0 0.0 0.0 0.1 97.7 1.6 0.6 0.0
    8 0.3 0.3 4.1 66.9 28.5 0.0 0.0 0.0 0.0 83.5 16.5 0.0
    9 0.4 0.1 1.8 70.3 27.3 0.0 0.0 0.0 0.1 84.4 15.5 0.0
    10 12.6 0.4 0.7 35.1 46.0 5.1 2.6 0.0 0.1 32.6 64.1 0.5
    11 10.9 0.1 1.6 38.8 43.2 5.4 2.4 0.0 0.0 34.4 62.8 0.4
    12 10.8 0.4 2.0 37.7 44.8 4.4 3.0 0.0 0.0 32.0 64.7 0.2
    13 9.1 0.1 0.4 41.9 44.9 3.5 0.8 0.0 0.0 35.4 63.8 0.0
    14 0.3 0.1 0.0 0.8 2.1 96.6 0.0 0.0 0.0 0.0 0.4 99.6
    15 0.1 0.1 0.0 0.6 2.1 97.0 0.0 0.0 0.0 0.1 0.5 99.4
    n = 2000 n = 4000
    $ K $ $ {\bf{I}}_1 $ $ {\bf{I}}_2 $ $ {\bf{I}}_3 $ $ {\bf{I}}_4 $ $ {\bf{I}}_5 $ $ {\bf{I}}_6 $ $ {\bf{I}}_1 $ $ {\bf{I}}_2 $ $ {\bf{I}}_3 $ $ {\bf{I}}_4 $ $ {\bf{I}}_5 $ $ {\bf{I}}_6 $
    1 99.6 0.0 0.1 0.1 0.1 0.0 100.0 0.0 0.0 0.0 0.0 0.0
    2 99.5 0.0 0.1 0.1 0.3 0.0 100.0 0.0 0.0 0.0 0.0 0.0
    3 99.5 0.1 0.1 0.1 0.1 0.0 100.0 0.0 0.0 0.0 0.0 0.0
    4 0.1 99.7 0.0 0.0 0.1 0.0 0.0 100.0 0.0 0.0 0.0 0.0
    5 0.1 99.7 0.0 0.0 0.1 0.0 0.0 100.0 0.0 0.0 0.0 0.0
    6 0.0 0.0 99.9 0.1 0.0 0.0 0.0 0.0 100.0 0.0 0.0 0.0
    7 0.0 0.0 99.9 0.1 0.0 0.0 0.0 0.0 100.0 0.0 0.0 0.0
    8 0.0 0.0 0.0 98.2 1.8 0.0 0.0 0.0 0.0 100.0 0.0 0.0
    9 0.0 0.0 0.0 98.1 1.9 0.0 0.0 0.0 0.0 100.0 0.0 0.0
    10 0.0 0.0 0.0 12.0 88.0 0.0 0.0 0.0 0.0 1.3 98.7 0.0
    11 0.3 0.0 0.0 10.4 89.4 0.0 0.0 0.0 0.0 1.7 98.3 0.0
    12 0.4 0.0 0.0 12.1 87.4 0.1 0.0 0.0 0.0 1.9 98.1 0.0
    13 0.0 0.0 0.0 10.2 89.8 0.0 0.0 0.0 0.0 1.1 98.9 0.0
    14 0.0 0.0 0.0 0.0 0.0 100.0 0.0 0.0 0.0 0.0 0.0 100.0
    15 0.0 0.0 0.0 0.0 0.1 99.9 0.0 0.0 0.0 0.0 0.0 100.0

     | Show Table
    DownLoad: CSV

    The success of the proposed methods depends on recovering the true functional forms of the corresponding six curves $ \hat g_k $, $ k = 1, \ldots, 6 $, introduced in model (2.3). In this regard, we present in Figure 2 the estimated curves with the corresponding 95% confidence limits for a sample size of $ n = 2000 $. This figure reveals that the resulting estimates recover the functional form of the corresponding true curves very successfully.

    Figure 2.  Estimated centroids $ g_j $ $ (j = 1, \ldots, 6) $ for the six clusters with the 95% confidence limits. Results obtained from $ n = 2000 $ and $ a = 0.7 $.

    As mentioned above, the Bayesian Information Criterion (BIC) can be used to select the number of groups $ J $. This is a method of computational simplicity that consists of choosing the model (2.3) that minimizes the BIC statistics. Table 6 can be used to compare the performance of the BIC against the results reported in Table 3, based on the bootstrap method introduced in Section 3. Both tables report the percentage of simulations (over 1000 repetitions) that the BIC procedure and the bootstrap-based algorithm estimated to have a number of groups $ J $ for different values of $ a $. BIC is able to identify the correct number of groups (five when $ a = 0 $ and six otherwise) for sample sizes $ n\geq1000 $ (correct classifications represented in bold). Even though the results shown in the two tables indicate good agreement between the two procedures, one can see that the BIC procedure is more conservative in detecting important differences between the regression curves, whereas, the bootstrap method is more effective in detecting less important differences. Since the BIC procedure has obvious computational advantages, we suggest that it should be used to contrast and thoroughly examine conclusions obtained by applying the proposed bootstrap method. We should not consider the BIC procedure as merely an alternative to the bootstrap but rather as a supplement that offers additional information. In this sense, BIC could be used to obtain an initial approximation to the number of groups, then our bootstrapping method will be used to tune this value.

    Table 6.  Percentage of simulations (over 1000 repetitions) that the BIC procedure estimated the number of groups J. Results for different values of $ a $.
    Number of selected groups
    a J = 1 J = 2 J = 3 J = 4 J = 5 J = 6 J = 7 J > 7
    n = 500 0.0 0.0 0.0 3.4 33.3 59.1 4.2 0.0 0.0
    0.2 0.0 0.0 3.9 28.8 62.1 4.8 0.3 0.1
    0.4 0.0 0.1 4.7 28.7 61.2 5.1 0.2 0.0
    0.6 0.0 0.1 5.0 28.8 58.5 7.4 0.2 0.0
    0.8 0.0 0.3 7.9 31.4 50.8 8.8 0.7 0.1
    1.0 0.0 0.3 9.0 30.1 45.0 13.7 1.8 0.1
    1.2 0.0 0.5 8.0 31.9 39.1 18.2 2.1 0.2
    1.4 0.0 0.6 9.6 27.0 34.8 24.8 2.8 0.4
    1.6 0.0 0.2 7.8 24.8 35.2 28.0 3.6 0.4
    1.8 0.0 0.4 6.8 24.0 35.8 29.2 3.2 0.6
    2.0 0.0 0.8 4.8 21.6 36.8 32.4 3.6 0.0
    n = 1000 0.0 0.0 0.0 0.0 1.9 97.3 0.7 0.1 0.0
    0.2 0.0 0.0 0.0 1.8 97.3 0.9 0.0 0.0
    0.4 0.0 0.0 0.0 1.3 97.2 1.5 0.0 0.0
    0.6 0.0 0.0 0.0 1.2 94.8 3.8 0.2 0.0
    0.8 0.0 0.0 0.0 2.1 84.0 13.5 0.4 0.0
    1.0 0.0 0.0 0.0 1.2 60.4 37.4 1.0 0.0
    1.2 0.0 0.0 0.0 1.7 29.0 67.9 1.4 0.0
    1.4 0.0 0.0 0.0 1.4 16.4 79.8 2.4 0.0
    1.6 0.0 0.0 0.1 0.7 10.4 85.3 3.4 0.1
    1.8 0.0 0.0 0.0 0.2 11.0 85.3 3.2 0.3
    2.0 0.0 0.0 0.0 0.6 10.3 85.2 3.4 0.5
    n = 4000 0.0 0.0 0.0 0.0 0.0 99.9 0.1 0.0 0.0
    0.2 0.0 0.0 0.0 0.0 99.7 0.3 0.0 0.0
    0.4 0.0 0.0 0.0 0.0 99.3 0.7 0.0 0.0
    0.6 0.0 0.0 0.0 0.0 80.2 19.8 0.0 0.0
    0.8 0.0 0.0 0.0 0.0 13.9 86.1 0.0 0.0
    1.0 0.0 0.0 0.0 0.0 0.2 99.8 0.0 0.0
    1.2 0.0 0.0 0.0 0.0 0.0 100.0 0.0 0.0
    1.4 0.0 0.0 0.0 0.0 0.0 100.0 0.0 0.0
    1.6 0.0 0.0 0.0 0.0 0.0 100.0 0.0 0.0
    1.8 0.0 0.0 0.0 0.0 0.0 100.0 0.0 0.0
    2.0 0.0 0.0 0.0 0.0 0.0 100.0 0.0 0.0

     | Show Table
    DownLoad: CSV

    Figure 3 shows the effects of both explanatory variables, $ t $ and $ \Delta t $, on the response variable, applying model 2.11 to the experimental data.

    Figure 3.  Effect of each explanatory variable on the probability of spike.

    As can be appreciated, the neuron activation decreases with the time interval $ \Delta t $. The activation time $ t $, for its part, tends to increase during the period when the angle of the rod is changed, reaching a maximum at about 2200 ms and then decreasing. This means that it takes time for the neuron to fire from the moment the monkey sees the rod.

    Figure 4 shows the raster plot of the response of the neuron recorded in the prefrontal cortex of a monkey while it is carrying out the visual discrimination task. Each row represents one of the total of 80 trials, and each tick represents a potential action (spike). The spontaneous cell activity, i.e., in the absence of any stimuli, is very irregular, as can be seen in the interval from -500 to 0 ms, where there was no stimulus for the neuron.

    Figure 4.  Response of a neuron duringthe performanceof the discriminationtask. Theobservedspikesfor each of the 80 trials are on separatelines. The shaded regions correspond to reference stimulus (from 0 to 500 ms) and to the interval to test the response (1600–2100 ms).

    The panel on the left of Figure 5 shows the data pooled across trials represented as counts of spikes within 10 ms intervals. Units of mean firing rate are the number of spikes per trial per second. The fitted curve, obtained using a generalized additive model, is shown in the panel on the right. As can be seen in both plots, the temporal evolution of the cell firing rate during the task indicates that there is an increase in the firing rate between 1750 and 2750 ms, which corresponds to the presentation of the test stimulus and the reaction time, before the monkey motor response. Therefore, we restricted the data to the time interval between 1500 and 3000 ms. One hundred milliseconds before the test stimuli were taken as control because there were no stimuli present and the firing rate variability from trial to trial was very low. Accordingly, for each trial, 1500 measurements of the neuron activity were made.

    Figure 5.  Representation of neural activity over time. Observed spikes pool counts of spikes within 10 ms intervals (left) and the corresponding smoothing version (right plot).

    Figure 6 shows the estimated regression curves of the firing rate (spikes) for the eight different experimental conditions, i.e., for each of the eight levels of the trial angle (TA) as defined in the model (2.11). Two groups stand out, particularly those with a trial angle below $ 90^\circ $, with higher firing rates per second in the time interval between 1900 and 2700 ms, and those with a trial angle above $ 90^\circ $, with lower firing rate per second. Although it may appear strange, higher neuron firing rates for angles less than $ 90^\circ $ (clockwise rotation) were previously reported in a study [26].

    Figure 6.  Fitted regression curves for the eight different experimental conditions.

    Results for the bootstrap method using the three test statistics defined in Section 3 are shown in Table 7. As can be seen, the three test statistics indicate that there are two groups. $ D_{KD} $, and slightly less so $ D_{CM} $, are close to tipping the balance in favor of three groups.

    Table 7.  Probability values for testing the null hypothesis $ H_0(J) $. Results based on the bootstrap method with different test statistics.
    J $ D_{\rm{CM}} $ $ D_{\rm{KS}} $ $ D_{\rm{LR}} $
    1 $ < 0.001 $ $ < 0.001 $ $ < 0.001 $
    2 $ < 0.06 $ $ < 0.05 $ $ < 0.29 $
    3 $ < 0.19 $ $ < 0.13 $ $ < 0.53 $
    4 $ < 0.09 $ $ < 0.76 $ $ < 0.48 $

     | Show Table
    DownLoad: CSV

    We have also applied the BIC criterion to determine the number of groups. The results of this cluster solution also lead to the same conclusion, namely, $ J = 2 $ groups. These findings are in agreement with those reported in our simulations that indicate that BIC is more conservative, but capable of detecting important differences.

    Clusters for a fixed number of groups between $ J = 2 $ and $ J = 5 $ are shown in Figure 7 to facilitate the comprehension of the problem, although the results obtained indicate that there are only two groups.

    Figure 7.  Estimated regression curves according to the groups to which they belong. The curves assigned into two (J = 2), until five groups (J = 5) are represented in each of the four panels. The statistics used to estimate the number of groups found only two.

    In this work we propose a bootstrap-based method to determine the number of groups in generalized additive models when the effect of a continuous covariate on the response varies across groups defined by levels of a categorical variable. The simulation analysis confirms the capacity of the three proposed statistical tests to reproduce the theoretical values of type Ⅰ and type Ⅱ errors, even in the presence of a large number of curves. However, the power turned out to be higher for the test based on the likelihood test ratio. As expected, the percentage of misclassifications decreases with the sample size, $ n $, being almost null for $ n = 4000 $.

    The simulated data were also used to compare our method with BIC, a well-known approach that has been largely used previously in clustering analysis to estimate the number of clusters in an artificial data set. The number of detected groups was the same for both methods, but the simulation study showed that BIC has a lower statistical power than the bootstrap method, although they are close for larger sample sizes. Accordingly, BIC could be used to obtain an initial estimation of the number of clusters that could be tuned using our method.

    The application of the proposed method to an experiment to determine the activity of a neuron of a monkey subject to visual stimuli led to the same conclusions when we compared our approach with BIC. In both cases the neuron only reacts to two of the eight stimuli, corresponding to the most evident differences with respect to a reference state. Regarding the three statistics tested, they provide the same result, although the evidence for two clusters is stronger for the statistic based on the likelihood ratio test.

    Finally, it can be said that although the proposed method was designed to detect groups of regression curves in generalized additive models with a binary response, it can be extended without much difficulty to determine groups in models with another kind of response.

    This work was partially supported by project $ 2017/00001/006/001/097 $: Ayudas para el mantenimiento de actividades de investigación de institutos universitarios de investigación y grupos de investigación de la Universidad de Oviedo para el ejercicio 2021.

    Luís Meira-Machado acknowledges financial support from Portuguese Funds through FCT - "Fundação para a Ciência e a Tecnologia", within the projects UIDB/00013/2020, UIDP/00013/2020.

    Javier Roca-Pardiñas acknowledges financial support from Grant PID2020-118101GB-I00, Ministerio de Ciencia e Innovación (MCIN/ AEI /10.13039/501100011033).

    The authors declare there is no conflict of interest.

    [1] Lee PA, Houk CP, Ahmed SF, et al. (2006) Consensus statement on management of intersex disorders. International Consensus Conference on Intersex. Pediatrics 118: e488-500.
    [2] Rothkopf AC, John RM (2014) Understanding Disorders of Sexual Development. J Pediatr Nurs 29: e23-e34 doi: 10.1016/j.pedn.2014.04.002
    [3] Ahmed SF, Rodie M (2010) Investigation and initial management of ambiguous genitalia. Best Pract Res Clin Endocrinol Metab 24: 197-218 doi: 10.1016/j.beem.2009.12.001
    [4] Thyen U, Lanz K, Holterhus PM, et al. (2006) Epidemiology and initial management of ambiguous genitalia at birth in Germany. Horm Res 66: 195-203 doi: 10.1159/000094782
    [5] Hughes IA (2008) Disorders of sex development: a new definition and classification. Best Pract Res Clin Endocrinol Metab 22: 119-134. doi: 10.1016/j.beem.2007.11.001
    [6] Skakkebaek NE (2003) Testicular dysgenesis syndrome. Horm Res 60: 49.
    [7] Chang HJ, Clark RD, Bachman H (1990) The phenotype of 45,X/46,XY mosaicism: an analysis of 92 prenatally diagnosed cases. Am J Hum Genet 46: 156-167.
    [8] Lindhardt Johansen M, Hagen CP, Rajpert-De Meyts E, et al. (2012) 45,X/46,XY mosaicism: phenotypic characteristics, growth, and reproductive function-a retrospective longitudinal study. J Clin Endocrinol Metab 97: E1540-1549. doi: 10.1210/jc.2012-1388
    [9] Ocal G, Berberoğlu M, Sıklar Z, et al. (2012) The clinical and genetic heterogeneity of mixed gonadal dysgenesis: does ‘disorders of sexual development (DSD)' classification based on new Chicago consensus cover all sex chromosome DSD? Eur J Pediatr 171: 1497-1502. doi: 10.1007/s00431-012-1754-0
    [10] Telvi L, Lebbar A, Del Pino O, et al. (1999) 45,X/46,XY mosaicism: report of 27 cases. Pediatrics 104: 304-308. doi: 10.1542/peds.104.2.304
    [11] Cools M, Pleskacova J, Stoop H, et al. (2011) Gonadal pathology and tumor risk in relation to clinical characteristics in patients with 45,X/46,XY mosaicism. J Clin Endocrinol Metab 96: E1171-1180. doi: 10.1210/jc.2011-0232
    [12] Farrugia MK, Sebire NJ, Achermann JC, et al. (2013) Clinical and gonadal features and early surgical management of 45,X/46,XY and 45,X/47,XYY chromosomal mosaicism presenting with genital anomalies. J Pediatr Urol 9: 139-144. doi: 10.1016/j.jpurol.2011.12.012
    [13] Tosson H, Rose SR, Gartner LA (2010) Children with 45,X/46,XY karyotype from birth to adult height. Horm Res Paediatr 74: 190-200. doi: 10.1159/000281468
    [14] Richter Unruh A, Knauer Fischer S, Kaspers S, et al. (2004) Short stature in children with an apparently normal male phenotype can be caused by 45,X/46,XY mosaicism and is susceptible to growth hormone treatment. Eur J Ped 163: 251-256. doi: 10.1007/s00431-004-1406-0
    [15] Rappold GA, Durand C, Decker E, et al. (2012) New roles of SHOX as regulator of target genes. Pediatr Endocrinol Rev 9: 733-738.
    [16] Rappold GA, Fukami M, Niesler B, et al. (2002) Deletions of the homeobox gene SHOX (short stature homeobox) are an important cause of growth failure in children with short stature. J Clin Endocrinol Metab 87: 1402-1406. doi: 10.1210/jcem.87.3.8328
    [17] Vilain E, Sarafoglou K, Yehya N (2009) Disorders of Sex Development. In: Sarafoglou K (ed) Pediatric endocrinology and inborn errors of metabolism. New York: McGraw-Hill Medical; London, pp 527-555.
    [18] Cools M, Drop SLS, Wolffenbuttel KP, et al. (2006) Germ cell tumors in the intersex gonad: old paths, new directions, moving frontiers. Endocr Rev 27: 468-484. doi: 10.1210/er.2006-0005
    [19] Cohen-Kettenis PT (2009) Psychosocial and psychosexual aspects of disorders of sex development. Best Pract Res Clin Endocrinol Metab 24: 325-334.
    [20] Szarras-Czapnik M, Lew-Starowicz Z, Zucker KJ (2007) A psychosexual follow-up study of patients with mixed or partial gonadal dysgenesis. J Pediatr Adolesc Gynecol 20: 333-338. doi: 10.1016/j.jpag.2007.03.096
    [21] Birnbacher R, Marberger M, Weissenbacher G, et al. (1999) Gender identity reversal in an adolescent with mixed gonadal dysgenesis. J Pediatr Endocrinol Metab 12: 687-690.
    [22] Johnson LL, Bradley SJ, Birkenfeld-Adams AS, et al. (2004) A parent-report gender identity questionnaire for children. Arch Sex Behav 33: 105-116. doi: 10.1023/B:ASEB.0000014325.68094.f3
    [23] WHOQOL Group (1998) The World Health Organization Quality of Life Assessment (WHOQOL): development and general psychometric properties. Soc Sci Med 46: 1569-1585. doi: 10.1016/S0277-9536(98)00009-4
    [24] WHOQOL Group (1995) The World Health Organization Quality of Life assessment (WHOQOL): position paper from the World Health Organization. Soc Sci Med 41: 1403-1409. doi: 10.1016/0277-9536(95)00112-K
    [25] World Health Organization, The World Health Organization Quality of Life (WHOQOL). WHO, 2013. Available from: http://www.who.int/mental_health/publications/whoqol/en/.
    [26] Ahmed SF, Khwaja O, Hughes IA (2000) The role of a clinical score in the assessment of ambiguous genitalia. BJU Int 85: 120-124. doi: 10.1046/j.1464-410x.2000.00354.x
    [27] Tanner JM, Whitehouse RH (1976) Clinical longitudinal standards for height, weight, height velocity, weight velocity, and stages of puberty. Arch Dis Child 51: 170-179. doi: 10.1136/adc.51.3.170
    [28] Caldarera A, Brustia P (2013) Italian version of the GIQC: a preliminary study on the psychometric properties. Abstract Book XV Congresso Nazionale della sezione di Psicologia clinica e dinamica - AIP, Napoli, Sept. 27-29 (pp.42-43). Napoli: Fridericiana Editrice Universitaria.
    [29] Alizai NK, Thomas DF, Lilford RJ, et al. (1999) Feminizing genitoplasty for congenital adrenal hyperplasia: what happens at puberty? J Urol 161(5): 1588-1591.
    [30] Eroğlu E, Tekant G, Gündoğdu G, et al. (2004) Feminizing surgical management of intersex patients. Pediatr Surg Int 20: 543-547. doi: 10.1007/s00383-004-1208-5
    [31] Migeon CJ, Wisniewski AB, Gearhart JP, et al. (2002) Ambiguous genitalia with perineoscrotal hypospadias in 46,XY individuals: long-term medical, surgical, and psychosexual outcome. Pediatrics 110: e31. doi: 10.1542/peds.110.3.e31
    [32] Wiesemann C, Ude-Koeller S, Sinnecker GHG, et al. (2010) Ethical principles and recommendations for the medical management of differences of sex development (DSD)/intersex in children and adolescents. Eur J Pediatr 169: 671-679. doi: 10.1007/s00431-009-1086-x
    [33] Meyer-Bahlberg HFL, Gruen RS, New MI, et al. (1996) Gender change from female to male in classical conginital adrenal hyperplasia. Horm Behav 30: 319-332. doi: 10.1006/hbeh.1996.0039
    [34] Mazur T (2005) Gender dysphoria and gender change in androgen insensitivity or micropenis. Arch Sex Behav 34: 411-421. doi: 10.1007/s10508-005-4341-x
    [35] Cohen-Kettenis PT (2005) Gender change in 46,XY persons with 5alphareductase-2 deficiency and 17beta-hydroxysteroid dehydrogenase-3 deficiency. Arch Sex Behav 34: 399-410. doi: 10.1007/s10508-005-4339-4
    [36] Reiner WG, Reiner DT (2012) Thoughts on the Nature of Identity: How Disorders of Sex Development Inform Clinical Research about Gender Identity Disorders. J Homosex 59: 434-449. doi: 10.1080/00918369.2012.653312
    [37] Reiner WG (2005) Gender Identity and Sex-of-rearing in Children with Disorders of Sexual Differentiation. J Pediatr Endocrinol Metab 18: 549-553.
    [38] Mattila AK, Fagerholm R, Santtila P, et al. (2012) Gender Identity and Gender Role Orientation in Female Assigned Patients with Disorders of Sex Development. J Urol 188: 1930-1934. doi: 10.1016/j.juro.2012.07.018
    [39] Meyer-Bahlburg HFL (1999) Health-related quality of life in intersexuality. Acta Paediatr Suppl 428: 114-115.
    [40] Steensma TD, Kreukels BPC, de Vries ALC, et al. (2013) Gender identity development in adolescence. Horm Behav 64: 288-297. doi: 10.1016/j.yhbeh.2013.02.020
    [41] Moons P, Norekval TM (2006) Is sense of coherence a pathway for improving the quality of life of patients who grow up with chronic diseases? A hypothesis. Eur J Cardiovasc Nurs 5: 16-20. doi: 10.1016/j.ejcnurse.2005.10.009
    [42] Jürgensen M, Lux A, Wien SB, et al. (2014) Health-related quality of life in children with disorders of sex development (DSD). Eur J Pediatr 173: 893-903. doi: 10.1007/s00431-014-2264-z
    [43] Kleinemeier E, Jurgensen M, Lux A, et al. (2010) Psychological Adjustment and Sexual Development of Adolescents With Disorders of Sex Development. J Adolesc Health 47: 463-471. doi: 10.1016/j.jadohealth.2010.03.007
    [44] Jürgensen M, Kleinemeier, E, Lux A, et al. (2013) Psychosexual Development in Adolescents and Adults with Disorders of Sex Development—Results from the German Clinical Evaluation Study. J Sex Med 10: 2703-2714. doi: 10.1111/j.1743-6109.2012.02751.x
    [45] Cohen-Kettenis PT (2010) Psychosocial and psychosexual aspects of disorders of sex development. Best Pract Res Clin Endocrinol Metab 24: 325-334. doi: 10.1016/j.beem.2009.11.005
    [46] Sandberg DE, Gardner M, Cohen-Kettenis PT (2012) Psychological Aspects of the Treatment of Patients with Disorders of Sex Development. Semin Reprod Med 30: 443-452. doi: 10.1055/s-0032-1324729
  • Reader Comments
  • © 2015 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(5827) PDF downloads(1094) Cited by(1)

Figures and Tables

Figures(2)  /  Tables(4)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog