
In this work, we analyzed simultaneous observations of solar particles and solar electromagnetic ultraviolet (UV) radiation during solar events from January 2024 to May 2024. Measurement campaigns to study the effects of space radiation on the terrestrial atmosphere were conducted in the framework of the project BIOSPHERE. We show the results of the campaign in Brussels from 1 January 2024 to 31 March 2024, during which several solar energetic particle (SEP) events were observed by the spacecraft GOES and OMNI, together with two big geomagnetic storms in March 2024 and May 2024 associated with solar eruptions. The last two events combine the arrival of a SEP event with a geomagnetic storm. On 11 May 2024, the biggest geomagnetic storm for the last 20 years was observed. These events enabled us to identify effects due to UV, solar particles, and geomagnetic storms. The impact of these events on the terrestrial radiation belts, illustrated by satellite observations like PROBA-V/EPT and on the atmospheric ozone using AURA/MLS is demonstrated. For the measurement campaign, muon and neutron monitors showed a Forbush decrease only during the geomagnetic storm at the end of March 2024 and in May 2024. Complemented by a simulation of radiation effects on the ionization rate of the atmosphere as a function of the altitude, the extensive range of different observations available during this measurement campaign demonstrated that SEP and geomagnetic storms due to solar eruptions had very different effects on the terrestrial atmosphere. The geomagnetic storms mainly modified the energetic electrons trapped in the space environment of the Earth and affected the ionization of the atmosphere above 60 km. They also modified the cosmic ray injections, mainly at high latitudes, creating Forbush decrease for the most intense ones. SEP events injected energetic protons in the atmosphere that could penetrate deeper in the atmosphere because they had more energy than the electrons. They could impact ozone, mainly at high altitude in the thermosphere. Solar activity variation associated with the rotation of the solar active regions in 27 days modulated UV. The measurements of these electromagnetic and particle radiations are crucial because they have important health implications.
Citation: Viviane Pierrard, David Bolsée, Alexandre Winant, Amer Al-Qaaod, Faton Krasniqi, Maximilien Péters de Bonhome, Edith Botek, Lionel Van Laeken, Danislav Sapundjiev, Roeland Van Malderen, Alexander Mangold, Iva Ambrozova, Marek Sommer, Jakub Slegl, Styliani A Geronikolou, Alexandros G Georgakilas, Alexander Dorn, Benjamin Rapp, Jaroslav Solc, Lukas Marek, Cristina Oancea, Lionel Doppler, Ronald Langer, Sarah Walsh, Marco Sabia, Marco Vuolo, Alex Papayannis, Carlos Granja. BIOSPHERE measurement campaign from January 2024 to March 2024 and in May 2024: Effects of the solar events on the radiation belts, UV radiation and ozone in the atmosphere[J]. AIMS Geosciences, 2025, 11(1): 117-154. doi: 10.3934/geosci.2025007
[1] | Zohreh Heydarpour, Maryam Naderi Parizi, Rahimeh Ghorbnian, Mehran Ghaderi, Shahram Rezapour, Amir Mosavi . A study on a special case of the Sturm-Liouville equation using the Mittag-Leffler function and a new type of contraction. AIMS Mathematics, 2022, 7(10): 18253-18279. doi: 10.3934/math.20221004 |
[2] | Weerawat Sudsutad, Chatthai Thaiprayoon, Aphirak Aphithana, Jutarat Kongson, Weerapan Sae-dan . Qualitative results and numerical approximations of the (k,ψ)-Caputo proportional fractional differential equations and applications to blood alcohol levels model. AIMS Mathematics, 2024, 9(12): 34013-34041. doi: 10.3934/math.20241622 |
[3] | Ridha Dida, Hamid Boulares, Bahaaeldin Abdalla, Manar A. Alqudah, Thabet Abdeljawad . On positive solutions of fractional pantograph equations within function-dependent kernel Caputo derivatives. AIMS Mathematics, 2023, 8(10): 23032-23045. doi: 10.3934/math.20231172 |
[4] | Weerawat Sudsutad, Sotiris K. Ntouyas, Chatthai Thaiprayoon . Nonlocal coupled system for ψ-Hilfer fractional order Langevin equations. AIMS Mathematics, 2021, 6(9): 9731-9756. doi: 10.3934/math.2021566 |
[5] | Iyad Suwan, Mohammed S. Abdo, Thabet Abdeljawad, Mohammed M. Matar, Abdellatif Boutiara, Mohammed A. Almalahi . Existence theorems for Ψ-fractional hybrid systems with periodic boundary conditions. AIMS Mathematics, 2022, 7(1): 171-186. doi: 10.3934/math.2022010 |
[6] | Jehad Alzabut, Yassine Adjabi, Weerawat Sudsutad, Mutti-Ur Rehman . New generalizations for Gronwall type inequalities involving a ψ-fractional operator and their applications. AIMS Mathematics, 2021, 6(5): 5053-5077. doi: 10.3934/math.2021299 |
[7] | Tamer Nabil . Ulam stabilities of nonlinear coupled system of fractional differential equations including generalized Caputo fractional derivative. AIMS Mathematics, 2021, 6(5): 5088-5105. doi: 10.3934/math.2021301 |
[8] | Abdelatif Boutiara, Mohammed S. Abdo, Manar A. Alqudah, Thabet Abdeljawad . On a class of Langevin equations in the frame of Caputo function-dependent-kernel fractional derivatives with antiperiodic boundary conditions. AIMS Mathematics, 2021, 6(6): 5518-5534. doi: 10.3934/math.2021327 |
[9] | Dinghong Jiang, Chuanzhi Bai . On coupled Gronwall inequalities involving a ψ-fractional integral operator with its applications. AIMS Mathematics, 2022, 7(5): 7728-7741. doi: 10.3934/math.2022434 |
[10] | Deepak B. Pachpatte . On some ψ Caputo fractional Čebyšev like inequalities for functions of two and three variables. AIMS Mathematics, 2020, 5(3): 2244-2260. doi: 10.3934/math.2020148 |
In this work, we analyzed simultaneous observations of solar particles and solar electromagnetic ultraviolet (UV) radiation during solar events from January 2024 to May 2024. Measurement campaigns to study the effects of space radiation on the terrestrial atmosphere were conducted in the framework of the project BIOSPHERE. We show the results of the campaign in Brussels from 1 January 2024 to 31 March 2024, during which several solar energetic particle (SEP) events were observed by the spacecraft GOES and OMNI, together with two big geomagnetic storms in March 2024 and May 2024 associated with solar eruptions. The last two events combine the arrival of a SEP event with a geomagnetic storm. On 11 May 2024, the biggest geomagnetic storm for the last 20 years was observed. These events enabled us to identify effects due to UV, solar particles, and geomagnetic storms. The impact of these events on the terrestrial radiation belts, illustrated by satellite observations like PROBA-V/EPT and on the atmospheric ozone using AURA/MLS is demonstrated. For the measurement campaign, muon and neutron monitors showed a Forbush decrease only during the geomagnetic storm at the end of March 2024 and in May 2024. Complemented by a simulation of radiation effects on the ionization rate of the atmosphere as a function of the altitude, the extensive range of different observations available during this measurement campaign demonstrated that SEP and geomagnetic storms due to solar eruptions had very different effects on the terrestrial atmosphere. The geomagnetic storms mainly modified the energetic electrons trapped in the space environment of the Earth and affected the ionization of the atmosphere above 60 km. They also modified the cosmic ray injections, mainly at high latitudes, creating Forbush decrease for the most intense ones. SEP events injected energetic protons in the atmosphere that could penetrate deeper in the atmosphere because they had more energy than the electrons. They could impact ozone, mainly at high altitude in the thermosphere. Solar activity variation associated with the rotation of the solar active regions in 27 days modulated UV. The measurements of these electromagnetic and particle radiations are crucial because they have important health implications.
Group testing, or pooled testing, was first introduced by Dorfman [1] to identify syphilis infections among U.S. Army personnel during World War II. This approach involves combining specimens (e.g., blood, plasma, urine, swabs) from multiple individuals and conducting a single test to check for infection. According to Dorfman's procedure, if the combined sample tests negative, all individuals in this sample can be confirmed disease-free. Conversely, a positive result necessitates further testing to identify the affected individuals. This strategy gained prominence during the COVID-19 pandemic [2,3,4] and has been applied to detect various infectious diseases, including HIV [5,6], chlamydia and gonorrhea [7], influenza [8], and the Zika virus [9]. The primary motivation for pooled testing lies in its economic efficiency; for instance, the State Hygienic Laboratory at the University of Iowa saved approximately fanxiexian_myfh3.1 million over five years by employing a modified Dorfman protocol for testing chlamydia and gonorrhea among residents of Iowa [10,11].
Despite its cost-effectiveness, group testing poses significant challenges for statistical analysis due to the absence of individual response data [12]. However, advancements in digital technology have provided access to rich covariate information, including demographic data, electronic health records, genomic data, lifestyle data, physiological monitoring data, imaging data, and environmental variables [13]. Integrating these covariates into various statistical models for group testing has been shown to enhance accuracy and robustness, as evidenced by studies from Mokalled et al. [14], Huang and Warasi [15], Haber et al. [16]. This integration leads to improved estimations of individual risk probabilities, thereby reducing the number of tests required and overall costs.
In managing covariates, single-index models offer advantages, such as less restrictive assumptions, good interpretability, and adaptability to high-dimensional data [17]. For high-dimensional single-index models, Radchenko [18] proposed a novel estimation method based on L1 regularization, extending it to generalized linear models. Elmezouar et al. [19] developed a functional single index expectile model with a nonparametric estimator to address spatial dependency in financial data, showing strong consistency and practical applicability. Chen and Samworth [20] explored generalized additive models, deriving non-parametric estimators for each additive component by maximizing the likelihood function, and adapted this approach to generalized additive index models. Kereta et al. [21] employed a k-nearest neighbor estimator, enhanced by geodesic metrics, to extend local linear regression for single-index models. However, research on generalized semi-parametric single-index models in high-dimensional contexts remains limited, particularly in group testing applications, which are still underexplored.
Most current integrations of covariate information with group testing are developed based on parametric regression models. For example, Wang et al. [22] introduced a comprehensive binary regression framework, while McMahan et al. [11] developed a Bayesian regression framework. Gregory et al. [23] adopted an adaptive elastic net method, which remains effective as data dimensionality increases. Ko et al. [24] compared commonly used group testing procedures with group lasso regarding true positive selection in high-dimensional genomic data analysis. Furthermore, nonparametric regression methods have gained traction for applying covariates in group testing. Delaigle and Hall [25] proposed a nonparametric method for estimating conditional probabilities and testing specificity and sensitivity, addressing the unique dilution effects and complex data structures inherent in group testing. Self et al. [26] introduced a Bayesian generalized additive regression method to tackle dilution effects further, while Yuan et al. [12] developed a semiparametric monotone regression model using the expectation-maximization (EM) algorithm to navigate the complexities of group testing data. Zuo et al. [27] proposed a more flexible generalized nonparametric additive model, utilizing B-splines and group lasso methods for model estimation in high-dimensional data.
This article proposes a generalized single-index group testing model aimed at enhancing flexibility in addressing various nonlinear models and facilitating the selection of important variables. Given the absence of individual disease testing results in group testing data, the EM algorithm is employed to perform the necessary calculations for the model. B-spline functions are utilized to approximate the nonlinear unknown smooth functions, with model parameters estimated by maximizing the likelihood function. In modern group testing, a substantial amount of individual covariate information is typically collected during sample testing. Consequently, a penalty term is incorporated into the likelihood function, promoting the construction of a sparse model and enabling effective variable selection. We apply the method to four group testing strategies: master pool, Dorfman, halving, and array. The method is evaluated using both simulated and real data.
The remaining sections are organized as follows. Section 2 introduces our model with B-spline approximation, detailing the corresponding algorithm employing the EM algorithm. Section 3 elaborates on the E-step in the EM algorithm, facilitating the acceleration of our algorithm's convergence. Sections 4 and 5 present comprehensive simulations and real data application, demonstrating the method's robust performance. Finally, we conclude our findings and provide some discussion in Section 6.
Consider a dataset comprising n individuals. For each i∈{1,2,…,n}, let the true disease status of the i-th individual be denoted by ˜Yi∈{0,1}, where ˜Yi=1 indicates disease presence, and ˜Yi=0 indicates absence. Additionally, the dataset includes covariate information for each individual, represented as Xi=(Xi1,…,Xiqn)T∈Rqn, where Rqn denotes a qn-dimensional real vector space. We assume the number of covariates qn is high-dimensional.
Let the risk probability for the i-th individual be defined as pi=Pr(˜Yi=1∣Xi), where i∈{1,2,…,n}. In many cases, the influence of covariates may be nonlinear; imposing linearity can result in inaccurate estimations. This study explores nonlinear scenarios, assuming pi follows a flexible logistic single-index model, expressed as
Pr(˜Yi=1|Xi)=exp[g(X⊤iβ)]1+exp[g(X⊤iβ)], | (2.1) |
where β=(β1,β2,…,βqn)⊤∈Rqn represents the unknown parameters, and g(⋅) is an unknown smooth function capturing the relationship between covariates and risk probabilities.
In semiparametric single-index models, the true parameters are generally considered non-identifiable without imposed constraints. To ensure the identifiability of β, we impose a classical constraint: β1=√1−‖β−1‖22, where β−1=(β2,β3,…,βqn)⊤∈Rqn−1, and ‖⋅‖2 denotes the L2-norm. Note that both the function g(⋅) and the coefficient β in the single-index model are unknown. The L2-norm constraint ‖β‖2=1 is crucial for the identifiability of β as shown by Carroll et al. [28], Zhu et al. [29], Lin et al. [30], Cui et al. [31], and Guo et al. [32]. We assume that the true parameter β∗ is sparse, defining the true model as M∗={j∈{1,2,…,qn}:β∗j≠0}.
For i∈{1,2,…,n}, ˜Yi follows a Binomial distribution with parameter pi, denoted as ˜Yi∼Binom(1,pi). In traditional single-index model studies, the true status ˜Y={˜Yi,i=1,2,…,n}, is directly observable. However, in group testing, ˜Y is unobservable [33]. This paper investigates parameter estimation and statistical inference of single-index models based on group testing data. Moreover, if a group test result is positive, further testing is required to identify infected individuals. These results may depend on shared characteristics, leading to correlations within group test outcomes, complicating the modeling.
In group testing, we partition n individuals into J groups, denoted as P1,1,P2,1,…,PJ,1. Here, Pj,1 represents the initial index set of individuals for the j-th group, ensuring ∪Jj=1Pj,1={1,2,…,n}. For j∈{1,2,…,J}, if any testing result for Pj,1 is positive, further testing may be warranted. Define Zj={Zj,l,l=1,2,…,Lj} as the set of testing outcomes for the j-th group, where Lj denotes the total number of tests conducted within j-th group. Each Zj,l∈{0,1}, where Zj,l=0 indicates a negative result and Zj,l=1 indicates a positive result. If Zj,1=0, then Lj=1; otherwise, Lj≥1. Let Pj={Pj,l,l=1,2,…,Lj}, where Pj,l corresponds to the individuals associated with Zj,l. Define ˜Zj={˜Zj,l,l=1,2,…,Lj} as the true status corresponding to Zj. The true statuses of individuals determine the group's true status, defined as ˜Zj,l=I(∑i∈Pj,l˜Yi), where I(⋅) denotes the indicator function.
In practical applications, measurement error of the test kits exists. We define Se=Pr(Zj,l=1∣˜Zj,l=1) as sensitivity, representing the probability of correctly identifying positive samples, and Sp=Pr(Zj,l=0∣˜Zj,l=0) as specificity, denoting the probability of correctly identifying negative samples, where l∈{1,2,…,Lj} and j∈{1,2,…,J}. According to the definitions of Se and Sp, given the true status ˜Zj,l, the group's testing results satisfy Zj,l|˜Zj,l∼Binom(1,Se˜Zj,l(1−Sp)1−˜Zj,l).
Our approach is based on two widely accepted fundamental assumptions in group testing. The first assumption is that Se and Sp are independent of group size, supported by various studies [34,35,36,37]. The second assumption posits that, given the true statuses of individuals in the j-th group {˜Yi,i∈Pj,1}, the group's true statuses ˜Zj are mutually independent, as supported by previous research [23,34,35].
We apply our method to four group testing methods: master pool testing, Dorfman testing, halving testing, and array testing. Figure 1 illustrates the process of four testing methods: (a) Master pool testing, where a group of individuals (e.g., Pj,1 consisting of individuals 1, 2, 3, and 4) is tested as a whole to obtain the group testing result Zj,1; (b) Dorfman testing, where initially the same group testing as in master pool testing is conducted, and if the result of the master pool testing is positive (Zj,1=1), each individual in the group is then tested separately to obtain individual testing results Zj,2, Zj,3, Zj,4, and Zj,5; (c) Halving testing, where the entire group (e.g., Pj,1) is tested as a whole, and if the result is positive (Zj,1=1), the group is divided into two subgroups (e.g., Pj,2 and Pj,3) for subgroup testing, and if the result of subgroup testing is positive (e.g., Zj,2=1), individuals in the positive subgroup are then tested individually; and (d) Array testing, where multiple individuals (e.g., 16 individuals) are arranged in an array for group testing to obtain multiple group testing results such as Zj,1, and if a specific group testing result is positive (e.g., Zj,1=1), further subgroup testing is performed (e.g., obtaining results Zj,2, Zj,3), and if the group testing results for both the row and column where an individual is located are positive (e.g., Zj,3=Zj,4=Zj,7=1), the individuals (e.g., 6-th individual and 10-th individual) are then tested.
Due to the nature of group testing, the true status of individuals, denoted as ˜Y, remains unknown. Our objective is to estimate M∗, β∗, and g(⋅) based on observed data Z={Zj,j=1,2,…,J} and covariate information X=(X1,X2,…,Xn)T∈Rn×qn to ascertain individual risk probabilities. The likelihood function based on the observed data Z is defined as
P(Z|X)=∑˜Y∈{0,1}nP(Z|˜Y)P(˜Y|X), | (2.2) |
where
P(Z∣˜Y)=J∏j=1Lj∏l=1P(Zj,l∣˜YPj,l), |
and ˜YPj,l={˜Yi,i∈Pj,l} represents the set of true statuses for individuals in Pj,l. Furthermore, the conditional probability P(Zj,l∣˜YPj,l) is expressed as
P(Zj,l∣˜YPj,l)={S˜Zj,le(1−Sp)1−˜Zj,l}Zj,l{(1−Se)˜Zj,lS1−˜Zj,lp}1−Zj,l. |
The likelihood function for the true disease status ˜Y can be written as
P(˜Y|X)=n∏i=1p˜Yii(1−pi)1−˜Yi. |
Combining this with the logistic single-index model defined in (2.1), we obtain the log-likelihood function for ˜Y:
lnP(˜Y|X)=n∑i=1{˜Yig(X⊤iβ)−ln(1+exp[g(X⊤iβ)])}. | (2.3) |
Since the smooth function g(⋅) is unknown, we approximate it using B-spline functions. Let the support interval of g(⋅) be [a,b]. We partition [a,b] at points a=d0<d1<…<dN<b=dN+1 into several segments, referred to as knots or internal nodes. This division generates subintervals Ik=[dk,dk+1) for 0≤k≤N−1 and IN=[dN,dN+1], ensuring that
max0≤k≤N|dk−dk+1|min0≤k≤N|dk−dk+1|≤M, |
where M∈(0,∞). The B-spline basis functions of order q are denoted as Φ(⋅)=(ϕ1(⋅),ϕ2(⋅),…,ϕS(⋅))⊤∈RS, with S=N+q. Thus, g(⋅) can be approximated as
g(⋅)≈S∑s=1ϕs(⋅)γs, |
where γs are the spline coefficients to be estimated [38]. Denote γ=(γ1,γ2,…,γS)⊤∈RS. We approximate g(XTiβ) as
g(XTiβ)=Φ⊤(X⊤iβ)γ, |
where Φ(X⊤iβ)=(ϕ1(X⊤iβ),ϕ2(X⊤iβ),…,ϕS(X⊤iβ))⊤. Therefore, we approximate pi by using a spline function, and denote the spline approximation of pi as piB, which is defined as follows:
piB=exp[Φ⊤(X⊤iβ)γ]1+exp[Φ⊤(X⊤iβ)γ]. | (2.4) |
In the following, we use the spline approximation piB of pi to construct the log-likelihood function and the objective function in the subsequent EM algorithm. Thus, the log-likelihood function (2.3) for ˜Y can be reformulated as
lnPB(˜Y|X)=n∑i=1{˜YiΦ⊤(X⊤iβ)γ−ln(1+exp[Φ⊤(X⊤iβ)γ])}. |
Furthermore, the target likelihood function (2.2) can be represented as
PB(Z|X)=∑˜Y∈{0,1}nP(Z|˜Y)PB(˜Y|X). |
By employing spline approximation, we transform the estimation problem of β∗−1 and g(⋅) into estimating β∗−1 and γ.
For high-dimensional group testing data, we aim to estimate β∗−1 using the penalized approach within a single-index model framework. The penalized log-likelihood function is defined as follows:
lnPB(Z|X)−qn∑j=2Pλ(βj), | (2.5) |
where Pλ(⋅) is the penalty function and λ is a tuning parameter. We consider three common penalty functions: LASSO [39], SCAD [40], and MCP [41]. Specifically, for LASSO, Pλ(x)=λ|x|. For SCAD, it is defined as
Pλ(x)={λ|x|if |x|≤λ,−x2+2δλ|x|−λ22(δ−1)if λ<|x|≤δλ,(δ+1)λ22if |x|>δλ, |
where δ>2. In MCP, the penalty function is given by
Pλ(x)={λ|x|−x22δif |x|≤δλ,12δλ2if |x|>δλ, |
with δ>1. The following section will detail the parameter estimation process.
The penalized log-likelihood function (2.5) lacks the individual latent status ˜Y. The complete data penalized log-likelihood function can be expressed as
lnPB(Z,˜Y|X)−qn∑j=2Pλ(βj)=lnP(Z|˜Y)+lnPB(˜Y|X)−qn∑j=2Pλ(βj). | (2.6) |
Notably, lnP(Z|˜Y) depends solely on known parameters Se and Sp, allowing us to disregard it in computations. The presence of the latent variable ˜Y complicates direct maximization of the complete data penalized log-likelihood function (2.6). Therefore, we employ the EM algorithm, comprising two steps: the Expectation (E) step, and the Maximization (M) step.
In the E step, given the observed data Z and the parameters from the t-th iteration (β(t)−1,γ(t)), calculate the following function:
S(t)(β−1,γ)=E{n∑i=1{˜YiΦ⊤(X⊤iβ)γ−ln(1+exp[Φ⊤(X⊤iβ)γ])}|Z,β(t)−1,γ(t)}−qn∑j=2Pλ(βj)=n∑i=1{w(t)iΦ⊤(X⊤iβ)γ−ln(1+exp[Φ⊤(X⊤iβ)γ])}−qn∑j=2Pλ(βj), | (2.7) |
where w(t)i=E[˜Yi|Z,γ(t),β(t)−1],i=1,2,…,n. The calculation of the w(t)i varies among the four grouping testing methods, which will be discussed in Section 3.
In the M step, we update β(t+1)−1 and γ(t+1), respectively. Initially, we update γ(t+1) by maximizing:
S(t)(β(t)−1,γ)=n∑i=1{w(t)iΦ⊤(X⊤iβ(t))γ−ln(1+exp[Φ⊤(X⊤iβ(t))γ])}−qn∑j=2Pλ(β(t)j). | (2.8) |
Subsequently, we maximize S(t)(β−1,γ(t+1)) to update the parameters β(t+1)−1:
S(t)(β−1,γ(t+1))=n∑i=1{w(t)iΦ⊤(X⊤iβ)γ(t+1)−ln(1+exp[Φ⊤(X⊤iβ)γ(t+1)])}−qn∑j=2Pλ(βj). | (2.9) |
Given that β−1 appears in each B-spline basis function ϕ(X⊤iβ), direct iteration presents challenges. Let ˜g(t)(XTiβ)=Φ⊤(XTiβ)γ(t+1). We apply the approach by Guo et al. [42], approximating ˜g(t)(XTiβ) via a first-order Taylor expansion
˜g(t)(XTiβ)≈˜g(t)(XTiβ(t))+˜g(t)′(XTiβ(t))XTiJ(β(t))(β−1−β(t)−1), |
where J(β)=∂β/∂β−1=(−β−1/√1−‖β−1‖22,Iqn−1)⊤ represents the Jacobian matrix of size qn×(qn−1) and Iqn−1 denotes the (qn−1)-dimensional identity matrix. This approximation is incorporated into S(t)(β−1,γ(t+1)) to maximize the expression and update β(t+1)−1. Therefore, we approximate S(t)(β−1,γ(t+1)) by ˜S(t)(β−1,γ(t+1)) as follows:
˜S(t)(β−1,γ(t+1))=n∑i=1{w(t)i˜g(t)(X⊤iβ)−ln(1+exp[˜g(t)(X⊤iβ)])}−qn∑j=2Pλ(βj)=n∑i=1{w(t)i[˜g(t)(XTiβ(t))+˜g(t)′(XTiβ(t))XTiJ(β(t))(β−1−β(t)−1)]−ln(1+exp[˜g(t)(XTiβ(t))+˜g(t)′(XTiβ(t))XTiJ(β(t))(β−1−β(t)−1)])}−qn∑j=2Pλ(βj). | (2.10) |
We employ stochastic gradient descent [43] and coordinate descent [44] to update γ and β, respectively. Let ˆγ and ˆβ−1 denote the estimated parameters, and ˆM={j∈{1,2,…,qn}:ˆβj≠0} represent the estimated model. Furthermore, ˆγ and ˆβ−1 can be used to calculate individual risk probabilities and guide subsequent testing strategies. In summary, the EM algorithm offers a structured approach to handle the latent variable ˜Y and estimate model parameters. The detailed steps of this method are summarized in Algorithm 1.
Algorithm 1: Regularized single-index model for group testing. |
Input: Z, X, tmax and initialization (β(0)−1,γ(0)). For: t=0,1,2,…,tmax ● Step 1 (E-step): In the E step, given the parameters (β(t)−1,γ(t)) and Z, calculate the conditional expectation S(t)(β−1,γ) in (2.7). ● Step 2 (M-step): Update the iterative parameters β(t+1)−1 and γ(t+1) in two substeps: 1. Update γ(t+1) by maximizing S(t)(β(t)−1,γ) in (2.8). 2. Update β(t+1)−1 by maximizing ˜S(t)(β−1,γ(t+1)) in (2.10). End for: Repeat steps 1 and 2 until parameters converge or reach the maximum number of iterations tmax. Output: The estimates ˆβ−1 and ˆγ. |
Implementing Algorithm 1 requires deriving formulas to calculate the conditional expectations of individuals' true statuses. These expressions are essential for the effective application of the EM algorithm in various testing scenarios. Common group testing methods include master pool testing, Dorfman testing, halving testing, and array testing. We have derived the conditional expectation formula of these methods under our methodological framework, which will facilitate our other calculations.
For master pool testing, samples are divided into J distinct groups, with each sample assigned to only one group, and each group undergoes a single test without subsequent testing. When the i-th individual is assigned to the j-th group, consider two cases for w(t)i:
While Zj=0,
w(t)i=P(˜Yi=1,Zj=0)P(Zj=0)=P(Zj=0|˜Yi=1)P(˜Yi=1)P(Zj=0). |
Due to
P(Zj=1)=P(Zj=1|˜Zj=1)P(˜Zj=1)+P(Zj=1|˜Zj=0)P(˜Zj=0)=Se[1−∏i∈Pj(1−p(t)iB)]+(1−Sp)∏i∈Pj(1−p(t)iB)=Se+(1−Se−Sp)∏i∈Pj(1−p(t)iB), |
let Δj=Se+(1−Se−Sp)∏i∈Pj(1−p(t)iB), where piB is an approximate result of pi in (2.4). Therefore,
P(Zj=0)=1−[Se+(1−Se−Sp)∏i∈Pj(1−p(t)iB)]=1−Δj. |
Then,
w(t)i=(1−Se)⋅p(t)iB1−[Se+(1−Se−Sp)∏i∈Pj(1−p(t)iB)]=(1−Se)⋅p(t)iB(1−Δj). |
While Zj=1,
w(t)i=P(˜Yi=1|Zj=1)=p(Zj=1|˜Yi=1)P(˜Yi=1)P(Zj=1)=Se⋅p(t)iBSe+(1−Se−Sp)∏i∈Pj(1−p(t)iB)=Se⋅p(t)iBΔj. |
In conclusion,
w(t)i={P(˜Yi=1|Zj=0)=(1−Se)⋅p(t)iB/(1−Δj),ifZj=0,P(˜Yi=1|Zj=1)=Se⋅p(t)iB/Δj,ifZj=1. |
We apply our method to four group testing algorithms: master pool testing, Dorfman testing, halving testing, and array testing. For other algorithms, detailed expressions can be found in Appendix C. Using these expressions, we apply the EM algorithm to estimate the model parameters.
In this section, we assess the performance of the proposed method using simulated datasets. The generation of covariates follows the approach described by Guo et al. [42]. Specifically, covariates X∈Rn×qn are drawn from a truncated multivariate normal distribution. We first generate covariates from N(0,Σ), where Σ∈Rqn×qn and Σij=0.5|i−j| for 1≤i,j≤qn. These covariates are then truncated to the range (−2,2) to obtain X. We consider logistic single-index models to describe pi=Pr(˜Yi=1∣Xi), with the function g(X⊤iβ) in the model (2.1) defined as follows,
Example 4.1. We set n=500 and β∗=(3√15.25,2.5√15.25,0,…,0)⊤. We consider two scenarios: qn=50 and qn=100. The model is described as follows:
g(X⊤iβ∗)=exp(X⊤iβ∗)−7. |
Under this setting, the disease prevalence is approximately 8.93%.
Example 4.2. We set n=1000 and β∗=(1√3,1√3,1√3,0,…,0)⊤. We consider two scenarios: qn=100 and qn=500. The model is described as follows:
g(X⊤iβ∗)=X⊤iβ∗(1−X⊤iβ∗)+exp(X⊤iβ∗)−6. |
In this example, the disease prevalence is approximately 11.41%.
Example 4.3. We set qn=50 and β∗=(9√181,8√181,6√181,0,…,0)⊤. We consider two scenarios: n=500 and n=1000. The model is described as follows:
g(X⊤iβ∗)=X⊤iβ∗(1−X⊤iβ∗)+0.5⋅sin(πX⊤iβ∗2)−6. |
In this example, the disease prevalence is approximately 9.42%.
Example 4.4. We set qn=100 and β∗=(0.5,0.5,0.5,0.5,0,…,0)⊤. Two scenarios are considered: n=750 and n=1000. The model is described as follows:
g(X⊤iβ∗)=X⊤iβ∗(1−X⊤iβ∗)+exp(X⊤iβ∗)+0.1⋅sin(πX⊤iβ∗2)−6. |
In this scenario, the disease prevalence is approximately 10.32%.
In our simulation study, we employed four group testing algorithms: master pool testing (MPT), Dorfman testing (DT), halving testing (HT), and array testing (AT) to evaluate the model. For MPT, DT, and HT, the group size was set to 4, while in AT, individuals were arranged in a 4×4 array. Both sensitivity and specificity were fixed at Se=Sp=0.98. Based on the methodologies of Fan and Li [40] and Zhang [41], we set δ values of 3.7 and 2 for SCAD and MCP, respectively. Each scenario was simulated B=100 times, where ˆβ[b] denotes the estimated β∗ in the b-th simulation, with b∈{1,2,…,B}.
Following the approach of Guan et al. [45], we measured the estimation accuracy of ˆβj (j=1,2,3,4) using the mean squared error (MSE), defined as
MSE=1BB∑b=1(β∗j−ˆβ[b]j)2,j=1,2,3,4. |
We utilized average mean squared error (AMSE) to assess the accuracy of ˆβ, consistent with methods employed by Wang and Yang [46]:
AMSE=1BqnB∑b=1‖β∗−ˆβ[b]‖22. |
Average mean absolute error (AMAE) was used to evaluate the estimation performance of g(⋅) and individual risk probabilities pi [42]. The AMAE for g(⋅) is defined as
AMAEg=1BnB∑b=1n∑i=1|g(X⊤iβ∗)−g(X⊤iˆβ[b])|, |
while the AMAE for ˆp[b]i=eg(X⊤iˆβ[b])1+eg(X⊤iˆβ[b]) is defined as
AMAEp=1BnB∑b=1n∑i=1|p∗i−ˆp[b]i|, |
where p∗i=g(X⊤iβ∗)1+eg(X⊤iβ∗).
To evaluate variable selection performance, we employed true positive rate (TPR) and false positive rate (FPR). The FPR represents the proportion of false positives among identified predictors, while the TPR indicates the proportion of true positives among relevant predictors. Table 1 shows the results of variable selection. TPR and FPR are defined as follows:
TPR=TPTP+FN,FPR=FPFP+TN. |
Metric | Implication |
True positive (TP) | Actual positive and predicted positive |
False positive (FP) | Actual negative and predicted positive |
False negative (FN) | Actual positive and predicted negative |
True negative (TN) | Actual negative and predicted negative |
The simulation results are summarized in Tables 2 to 5. As shown in the tables, the TPR was approximately 97%, with a very low FPR. The result shows that the probability that M∗ is contained in ˆM is very close to 1. This demonstrates the notable performance of our model in variable selection. The AMAE for g(⋅) and pi was approximately 0.5 and 0.01, respectively. This shows that we have accurately captured the form of the unknown smooth function g(⋅) and are able to precisely predict the individual risk probability. The AMSE for the model parameters β was around 10−4, while the AMSE for significant variables βj was approximately 10−3. This demonstrates the accuracy of our model in parameter estimation.
AMAE | AMSE | MSE | ||||||||
Model | Setting | Test | Penalty | TPR | FPR | g(⋅) | Prob | β | β1 | β2 |
Example 4.1 (n=500) | qn=50 | MPT | MCP | 0.980 | 0.061 | 0.325 | 0.011 | 0.0003 | 0.0022 | 0.0015 |
HT | 0.985 | 0.003 | 0.413 | 0.007 | 0.0002 | 0.0068 | 0.0025 | |||
DT | 0.968 | 0.062 | 0.295 | 0.011 | 0.0003 | 0.0001 | 0.0004 | |||
AT | 0.987 | 0.035 | 0.388 | 0.009 | 0.0004 | 0.0019 | 0.0009 | |||
MPT | SCAD | 0.967 | 0.060 | 0.508 | 0.014 | 0.0003 | 0.0012 | 0.0021 | ||
HT | 0.988 | 0.001 | 0.479 | 0.008 | 0.0001 | 0.0021 | 0.0023 | |||
DT | 0.980 | 0.051 | 0.511 | 0.012 | 0.0003 | 0.0005 | 0.0004 | |||
AT | 0.974 | 0.063 | 0.432 | 0.009 | 0.0003 | 0.0035 | 0.0024 | |||
MPT | LASSO | 0.964 | 0.060 | 0.337 | 0.011 | 0.0003 | 0.0038 | 0.0051 | ||
HT | 0.986 | 0.003 | 0.436 | 0.006 | 0.0001 | 0.0003 | 0.0002 | |||
DT | 1.000 | 0.029 | 0.522 | 0.013 | 0.0001 | 0.0004 | 0.0003 | |||
AT | 0.981 | 0.034 | 0.320 | 0.007 | 0.0001 | 0.0006 | 0.0008 | |||
qn=100 | MPT | MCP | 0.985 | 0.010 | 0.511 | 0.009 | 0.0001 | 0.0004 | 0.0004 | |
HT | 0.973 | 0.038 | 0.374 | 0.009 | 0.0002 | 0.0022 | 0.0033 | |||
DT | 0.986 | 0.023 | 0.338 | 0.010 | 0.0001 | 0.0004 | 0.0001 | |||
AT | 0.982 | 0.023 | 0.470 | 0.005 | 0.0001 | 0.0004 | 0.0005 | |||
MPT | SCAD | 0.987 | 0.031 | 0.265 | 0.013 | 0.0002 | 0.0002 | 0.0003 | ||
HT | 0.988 | 0.038 | 0.458 | 0.015 | 0.0005 | 0.0017 | 0.0001 | |||
DT | 0.978 | 0.051 | 0.451 | 0.011 | 0.0001 | 0.0008 | 0.0004 | |||
AT | 0.985 | 0.010 | 0.422 | 0.009 | 0.0001 | 0.0058 | 0.0047 | |||
MPT | LASSO | 0.987 | 0.026 | 0.478 | 0.010 | 0.0001 | 0.0008 | 0.0012 | ||
HT | 0.966 | 0.044 | 0.364 | 0.011 | 0.0003 | 0.0029 | 0.0052 | |||
DT | 0.984 | 0.031 | 0.503 | 0.012 | 0.0001 | 0.0001 | 0.0003 | |||
AT | 0.987 | 0.031 | 0.401 | 0.008 | 0.0001 | 0.0016 | 0.0014 |
AMAE | AMSE | MSE | |||||||||
Model | Setting | Test | Penalty | TPR | FPR | g(⋅) | Prob | β | β1 | β2 | β3 |
Example 4.2 (n=1000) | qn=100 | MPT | MCP | 0.980 | 0.001 | 0.569 | 0.011 | 0.0001 | 0.0006 | 0.0025 | 0.0073 |
HT | 0.974 | 0.001 | 0.626 | 0.012 | 0.0003 | 0.0059 | 0.0167 | 0.0054 | |||
DT | 0.971 | 0.027 | 0.601 | 0.012 | 0.0002 | 0.0035 | 0.0022 | 0.0019 | |||
AT | 0.986 | 0.010 | 0.582 | 0.011 | 0.0001 | 0.0059 | 0.0011 | 0.0032 | |||
MPT | SCAD | 0.970 | 0.019 | 0.551 | 0.010 | ∗ | 0.0014 | 0.0006 | 0.0011 | ||
HT | 0.964 | 0.029 | 0.588 | 0.011 | 0.0001 | 0.0021 | 0.0041 | 0.0001 | |||
DT | 0.972 | 0.021 | 0.572 | 0.011 | 0.0001 | 0.0037 | 0.0002 | 0.0034 | |||
AT | 0.971 | 0.021 | 0.575 | 0.011 | 0.0001 | 0.0057 | 0.0005 | 0.0042 | |||
MPT | LASSO | 0.974 | 0.048 | 0.553 | 0.010 | 0.0001 | 0.0000 | 0.0002 | 0.0003 | ||
HT | 0.972 | 0.056 | 0.601 | 0.010 | 0.0001 | 0.0003 | 0.0001 | 0.0006 | |||
DT | 0.982 | 0.021 | 0.574 | 0.010 | 0.0001 | 0.0035 | 0.0001 | 0.0042 | |||
AT | 0.986 | 0.010 | 0.584 | 0.011 | 0.0001 | 0.0041 | 0.0002 | 0.0056 | |||
qn=500 | MPT | MCP | 0.964 | 0.011 | 0.562 | 0.013 | 0.0001 | 0.0005 | 0.0015 | 0.0042 | |
HT | 0.972 | 0.010 | 0.670 | 0.018 | 0.0001 | 0.0056 | 0.0001 | 0.0115 | |||
DT | 0.987 | 0.011 | 0.567 | 0.012 | ∗ | 0.0044 | 0.0003 | 0.0058 | |||
AT | 0.986 | 0.020 | 0.669 | 0.015 | 0.0001 | 0.0022 | 0.0108 | 0.0012 | |||
MPT | SCAD | 0.965 | 0.014 | 0.515 | 0.010 | 0.0001 | 0.0003 | 0.0055 | 0.0045 | ||
HT | 0.968 | 0.018 | 0.547 | 0.015 | 0.0001 | 0.0023 | 0.0112 | 0.0069 | |||
DT | 0.989 | 0.007 | 0.534 | 0.011 | 0.0001 | 0.0048 | 0.0001 | 0.0047 | |||
AT | 0.985 | 0.005 | 0.608 | 0.010 | ∗ | 0.0042 | 0.0021 | 0.0007 | |||
MPT | LASSO | 0.978 | 0.006 | 0.536 | 0.012 | 0.0001 | 0.0013 | 0.0132 | 0.0104 | ||
HT | 0.970 | 0.002 | 0.644 | 0.015 | 0.0001 | 0.0000 | 0.0092 | 0.0126 | |||
DT | 0.987 | 0.005 | 0.545 | 0.012 | ∗ | 0.0015 | 0.0007 | 0.0019 | |||
AT | 0.981 | 0.002 | 0.526 | 0.012 | ∗ | 0.0011 | 0.0093 | 0.0045 | |||
Symbol ∗ indicates value smaller than 0.0001. |
AMAE | AMSE | MSE | |||||||||
Model | Setting | Test | Penalty | TPR | FPR | g(⋅) | Prob | β | β1 | β2 | β3 |
Example 4.3 (qn=50) | n=500 | MPT | MCP | 0.951 | 0.103 | 0.466 | 0.019 | 0.0003 | 0.0003 | 0.0009 | 0.0011 |
HT | 0.966 | 0.091 | 0.571 | 0.021 | 0.0005 | 0.0007 | 0.0036 | 0.0045 | |||
DT | 0.982 | 0.043 | 0.360 | 0.006 | 0.0001 | 0.0002 | 0.0001 | 0.0001 | |||
AT | 0.981 | 0.021 | 0.464 | 0.012 | 0.0001 | 0.0005 | 0.0009 | 0.0006 | |||
MPT | SCAD | 0.957 | 0.139 | 0.527 | 0.023 | 0.0005 | 0.0001 | 0.0031 | 0.0098 | ||
HT | 0.968 | 0.082 | 0.433 | 0.020 | 0.0004 | 0.0006 | 0.0001 | 0.0003 | |||
DT | 0.954 | 0.140 | 0.411 | 0.013 | 0.0002 | 0.0011 | 0.0018 | 0.0012 | |||
AT | 0.972 | 0.064 | 0.793 | 0.018 | 0.0002 | 0.0038 | 0.0021 | 0.0004 | |||
MPT | LASSO | 0.981 | 0.024 | 0.604 | 0.021 | 0.0003 | 0.0042 | 0.0014 | 0.0019 | ||
HT | 0.983 | 0.021 | 0.432 | 0.026 | 0.0001 | 0.0017 | 0.0005 | 0.0016 | |||
DT | 0.971 | 0.094 | 0.470 | 0.013 | 0.0002 | 0.0004 | 0.0014 | 0.0023 | |||
AT | 0.980 | 0.061 | 0.447 | 0.013 | 0.0002 | 0.0002 | 0.0004 | 0.0015 | |||
n=1000 | MPT | MCP | 0.988 | 0.040 | 0.358 | 0.015 | 0.0002 | 0.0011 | 0.0024 | 0.0042 | |
HT | 0.984 | 0.021 | 0.399 | 0.017 | 0.0006 | 0.0008 | 0.0009 | 0.0013 | |||
DT | 0.989 | 0.000 | 0.583 | 0.014 | 0.0001 | 0.0001 | 0.0019 | 0.0024 | |||
AT | 0.985 | 0.009 | 0.405 | 0.013 | 0.0001 | 0.0017 | 0.0041 | 0.0012 | |||
MPT | SCAD | 0.989 | 0.043 | 0.537 | 0.016 | 0.0002 | 0.0025 | 0.0004 | 0.0038 | ||
HT | 0.987 | 0.003 | 0.512 | 0.012 | 0.0001 | 0.0012 | 0.0032 | 0.0031 | |||
DT | 0.986 | 0.003 | 0.515 | 0.012 | 0.0001 | 0.0001 | 0.0002 | 0.0004 | |||
AT | 1.000 | 0.000 | 0.410 | 0.013 | 0.0001 | 0.0013 | 0.0022 | 0.0013 | |||
MPT | LASSO | 0.988 | 0.004 | 0.441 | 0.011 | 0.0002 | 0.0029 | 0.0012 | 0.0021 | ||
HT | 0.982 | 0.007 | 0.326 | 0.007 | 0.0001 | 0.0002 | 0.0004 | 0.0002 | |||
DT | 0.987 | 0.008 | 0.489 | 0.013 | 0.0001 | 0.0008 | 0.0001 | 0.0032 | |||
AT | 0.977 | 0.043 | 0.283 | 0.007 | 0.0001 | 0.0012 | 0.0024 | 0.0034 |
AMAE | AMSE | MSE | ||||||||||
Model | Setting | Test | Penalty | TPR | FPR | g(⋅) | Prob | β | β1 | β2 | β3 | β4 |
Example 4.4 (qn=100) | n=750 | MPT | MCP | 0.979 | 0.053 | 0.744 | 0.019 | 0.0004 | 0.0028 | 0.0015 | 0.0036 | 0.0076 |
HT | 0.959 | 0.100 | 0.970 | 0.027 | 0.0011 | 0.0024 | 0.0005 | 0.0022 | 0.0018 | |||
DT | 0.986 | 0.035 | 0.611 | 0.011 | 0.0001 | 0.0001 | 0.0025 | 0.0034 | 0.0016 | |||
AT | 0.984 | 0.043 | 0.789 | 0.013 | 0.0002 | 0.0012 | 0.0051 | 0.0026 | 0.0012 | |||
MPT | SCAD | 0.966 | 0.059 | 0.723 | 0.014 | 0.0003 | 0.0030 | 0.0078 | 0.0081 | 0.0001 | ||
HT | 0.978 | 0.069 | 0.576 | 0.014 | 0.0002 | 0.0004 | 0.0083 | 0.0052 | 0.0002 | |||
DT | 0.989 | 0.063 | 0.698 | 0.013 | 0.0003 | 0.0034 | 0.0155 | 0.0011 | 0.0051 | |||
AT | 0.981 | 0.052 | 0.671 | 0.022 | 0.0005 | 0.0078 | 0.0023 | 0.0085 | 0.0073 | |||
MPT | LASSO | 0.977 | 0.072 | 0.620 | 0.014 | 0.0003 | 0.0047 | 0.0041 | 0.0141 | 0.0002 | ||
HT | 0.964 | 0.069 | 0.680 | 0.015 | 0.0003 | 0.0018 | 0.0071 | 0.0073 | 0.0007 | |||
DT | 0.986 | 0.041 | 0.581 | 0.016 | 0.0003 | 0.0034 | 0.0090 | 0.0014 | 0.0005 | |||
AT | 0.984 | 0.065 | 0.679 | 0.016 | 0.0003 | 0.0001 | 0.0095 | 0.0065 | 0.0003 | |||
n=1000 | MPT | MCP | 0.967 | 0.029 | 0.706 | 0.015 | 0.0002 | 0.0068 | 0.0097 | 0.0022 | 0.0015 | |
HT | 0.986 | 0.001 | 0.818 | 0.012 | 0.0001 | 0.0035 | 0.0061 | 0.0007 | 0.0001 | |||
DT | 0.987 | 0.032 | 0.872 | 0.012 | 0.0002 | 0.0007 | 0.0074 | 0.0017 | 0.0016 | |||
AT | 0.988 | 0.037 | 0.800 | 0.027 | 0.0002 | 0.0013 | 0.0061 | 0.0002 | 0.0025 | |||
MPT | SCAD | 0.961 | 0.059 | 0.724 | 0.015 | 0.0002 | 0.0081 | 0.0087 | 0.0030 | 0.0006 | ||
HT | 0.974 | 0.010 | 0.779 | 0.013 | 0.0001 | 0.0036 | 0.0066 | 0.0012 | 0.0001 | |||
DT | 0.983 | 0.071 | 0.405 | 0.010 | 0.0001 | 0.0013 | 0.0059 | 0.0008 | 0.0001 | |||
AT | 0.981 | 0.041 | 0.422 | 0.010 | 0.0001 | 0.0003 | 0.0009 | 0.0020 | 0.0011 | |||
MPT | LASSO | 0.977 | 0.029 | 0.819 | 0.017 | 0.0004 | 0.0057 | 0.0012 | 0.0083 | 0.0079 | ||
HT | 0.951 | 0.004 | 0.545 | 0.043 | 0.0001 | 0.0093 | 0.0004 | 0.0004 | 0.0025 | |||
DT | 0.985 | 0.021 | 0.408 | 0.009 | 0.0001 | 0.0002 | 0.0011 | 0.0026 | 0.0007 | |||
AT | 0.989 | 0.008 | 0.581 | 0.010 | 0.0001 | 0.0042 | 0.0003 | 0.0004 | 0.0008 |
We set up two different sample sizes (n) or covariate scenarios (qn) for each example. Results of Examples 4.1 and 4.2 suggest that our method maintains robust estimation performance as dimensionality increases in small sample scenarios. Furthermore, results of Examples 4.3 and 4.4 demonstrate that estimation accuracy improves with increased sample size. Figure 2 illustrates the estimation performance of g(⋅) and individual risk probabilities pi, confirming our method's efficacy in estimating unknown functions and risk probabilities.
Moreover, we aim to evaluate our method's performance under different group sizes. Using Example 4.4, we investigated group sizes of 2, 4, 6, and 8 with the Dorfman algorithm and LASSO penalty function. Results are presented in Table 6, reporting the means of ˆβj for j=1,2,3,4. The simulation results indicate that our method consistently delivers strong estimation performance across various group sizes. At the same time, we set up comparative experiments with different Se and Sp, and the simulation results are shown in Tables 8 to 11 in Appendix A. As shown in these tables, our model maintains a certain level of stability, ensuring that M∗ is still contained within ˆM.
AMAE | MEAN | |||||||||
Model | Setting | Group Size | TPR | FPR | g(⋅) | Prob | β1 | β2 | β3 | β4 |
Example 4 (qn=100) | n=750 | 2 | 0.970 | 0.015 | 0.611 | 0.011 | 0.452 | 0.465 | 0.478 | 0.460 |
4 | 0.965 | 0.020 | 0.581 | 0.016 | 0.445 | 0.405 | 0.464 | 0.477 | ||
6 | 0.986 | 0.041 | 0.627 | 0.009 | 0.519 | 0.497 | 0.487 | 0.495 | ||
8 | 0.973 | 0.020 | 0.594 | 0.012 | 0.471 | 0.467 | 0.484 | 0.477 | ||
n=1000 | 2 | 0.974 | 0.014 | 0.447 | 0.009 | 0.468 | 0.484 | 0.473 | 0.485 | |
4 | 0.964 | 0.018 | 0.408 | 0.009 | 0.489 | 0.468 | 0.450 | 0.474 | ||
6 | 0.985 | 0.021 | 0.440 | 0.011 | 0.486 | 0.478 | 0.443 | 0.471 | ||
8 | 0.974 | 0.010 | 0.466 | 0.009 | 0.494 | 0.494 | 0.447 | 0.437 |
In this section, we validate the effectiveness of our method using the diabetes dataset from the National Health and Nutrition Examination Survey (NHANES) conducted between 1999 and 2004. NHANES is a probability-based cross-sectional survey representing the U.S. population, collecting demographic, health history, and behavioral information through household interviews. Participants were also invited to equip mobile examination centers for detailed physical, psychological, and laboratory assessments. The dataset is accessible at https://wwwn.cdc.gov/Nchs/Nhanes/.
The dataset comprises n=5515 records and 17 variables, categorizing individuals as diabetic or non-diabetic. Covariates include age (X1), waist circumference (X2), BMI (X3), height (X4), weight (X5), smoking age (X6), alcohol use (X7), leg length (X8), total cholesterol (X9), hypertension (X10), education level (X11), household income (X12), family history (X13), physical activity (X14), gender (X15), and race (X16). Notably, nominal variables from X10 to X16 are transformed using one-hot encoding, resulting in qn=47 covariates per individual. The first nine variables are continuous, while the remainder are binary. A detailed explanation of the variables as well as the content of the questionnaire can be found at https://wwwn.cdc.gov/Nchs/Nhanes/search/default.aspx. For convenience, the nominal variables are explained in Table 12. in Appendix B.
For i∈{1,2,…,n}, we define ˜Yi=1 for diabetes and ˜Yi=0 for non-diabetes. Individual covariate information is represented as Xi=(Xi1,Xi2,…,Xiqn)⊤. We construct the following single-index model for the probability of diabetes risk for the i-th individual:
Pr(˜Yi=1|Xi)=exp[g(X⊤iβ)]1+exp[g(X⊤iβ)], |
where the smooth function g(⋅) is unknown, and our objective is to estimate the coefficients β.
To verify the accuracy of our method, we compare the results with those obtained from two other methods. The first method is penalized logistic regression (PLR), which uses the true individual status, ˜Yi. This method is implemented using the R package "glmnet". The second method is the adaptive elastic net for group testing (aenetgt) data, as introduced by Gregory et al. [23]. This approach utilizes group testing data and employs a penalized Expectation-Maximization (EM) algorithm to fit an adaptive elastic net logistic regression model. The R package "aenetgt" is used for implementation. We generate Dorfman group testing data with a group size of 6, setting both sensitivity and specificity at Se=Sp=0.98.
To ensure comparability, we adhere to the standardization techniques referenced in Cui et al. [31]. First, we center the covariates to facilitate the comparison of relative effects across different explanatory variables. Second, we normalize the PLR and aenetgt coefficients by dividing them by their L2-norm, as follows:
ˆβnormPLR=ˆβPLR‖ˆβPLR‖2,ˆβnormaenet=ˆβaenet‖ˆβaenet‖2, |
thereby obtaining coefficients with unit norm. This enables a comparison of regression coefficients from PLR, aenetgt, and the single-index group testing model.
The estimated coefficients from the three models are summarized in Table 7, and the parameter estimation of our method is denoted as ˆβour. In this study, the estimated coefficients for age, ˆβnormPLR and ˆβour, are 0.280 and 0.307, respectively, indicating that the risk of diabetes increases with age, consistent with the findings of Turi et al. [47]. However, the coefficient ˆβnormaenet is close to zero. For waist circumference, the coefficients ˆβnormPLR, ˆβour, and ˆβnormaenet are 0.178, 0.194, and 0.271, respectively, suggesting a positive association between waist circumference and diabetes risk, which is supported by Bai et al. [48] and Snijder et al. [49]. In addition, all three methods also identified leg length [50], hypertension [51], race [52], family history [53], and sex [54] as variables associated with diabetes. These covariates are widely recognized as being related to diabetes in the biomedical field [55].
Variable | ˆβnormPLR | ˆβour | ˆβnormaenet | Variable | ˆβnormPLR | ˆβour | ˆβnormaenet | Variable | ˆβnormPLR | ˆβour | ˆβnormaenet |
age | 0.280 | 0.307 | -0.085 | Family history | Household income | ||||||
waist circumference | 0.178 | 0.194 | 0.271 | family history1 | 0.000 | 0.000 | 0.000 | household income1 | 0.000 | 0.000 | 0.000 |
BMI | 0.000 | 0.000 | 0.000 | family history2 | -0.492 | -0.567 | -0.466 | household income2 | 0.024 | 0.000 | 0.000 |
height | 0.000 | 0.000 | 0.000 | family history9 | 0.000 | 0.000 | 0.000 | household income3 | 0.000 | 0.000 | 0.000 |
weight | 0.000 | 0.000 | 0.000 | Physical activity | household income4 | 0.000 | -0.069 | 0.000 | |||
smoking age | 0.000 | 0.007 | 0.000 | physical activity1 | 0.000 | 0.056 | 0.000 | household income5 | 0.000 | 0.000 | 0.000 |
alcohol use | 0.009 | 0.013 | 0.000 | physical activity2 | -0.086 | -0.018 | 0.000 | household income6 | 0.000 | 0.000 | 0.000 |
leg length | -0.048 | -0.100 | -0.043 | physical activity3 | -0.134 | -0.039 | 0.000 | household income7 | 0.000 | 0.000 | 0.000 |
total cholesterol | 0.000 | 0.000 | 0.000 | physical activity4 | -0.088 | 0.000 | 0.000 | household income8 | 0.001 | 0.065 | 0.000 |
Hypertension | physical activity9 | 0.000 | 0.000 | 0.000 | household income9 | 0.000 | 0.000 | 0.000 | |||
hypertension1 | 0.000 | 0.000 | 0.000 | Sex | household income10 | 0.000 | 0.000 | 0.000 | |||
hypertension2 | -0.350 | -0.372 | -0.641 | sex1 | -0.010 | 0.000 | 0.000 | household income11 | 0.000 | 0.000 | 0.000 |
Education | sex2 | -0.237 | -0.225 | -0.424 | household income12 | 0.000 | 0.000 | 0.000 | |||
education1 | 0.000 | 0.000 | 0.000 | race | household income13 | 0.000 | 0.000 | 0.000 | |||
education2 | 0.000 | 0.000 | 0.000 | race1 | 0.000 | 0.000 | 0.000 | household income77 | 0.000 | 0.231 | 0.000 |
education3 | 0.000 | 0.000 | 0.000 | race2 | -0.019 | -0.073 | 0.000 | household income99 | 0.000 | 0.000 | 0.000 |
education4 | 0.000 | 0.000 | 0.000 | race3 | -0.399 | -0.380 | -0.330 | ||||
education5 | -0.014 | -0.052 | 0.000 | race4 | 0.000 | 0.000 | 0.000 | ||||
education7 | -0.523 | -0.335 | 0.000 | race5 | 0.000 | 0.124 | 0.000 |
We found that the covariable physical activity is associated with diabetes, but the aenetgt method failed to identify this association. The results of a study by Yu et al. [55], which used the same dataset as ours, are consistent with this finding. In addition, we found that education level was also a covariable associated with diabetes (ˆβnormPLR and ˆβour are -0.523 and -0.335). Evidence for this association can also be found in the study by Aldossari et al. [56], and in this dataset, the probability that these participants will not develop diabetes is 100%. We also identified that household income is associated with diabetes, which is consistent with the study by Yen et al. [57]. In this dataset, the probability of developing diabetes for those who refused to answer about their household income is 60%. Furthermore, our model yields results similar to those obtained by the PLR method, which uses individual observations (˜Y), suggesting that our method is able to extract information from group observations.
This study presents a group testing framework based on a logistic regression single-index model for disease screening in low-prevalence environments. By employing B-splines to estimate unknown functions and incorporating penalty functions, our approach achieves high flexibility in capturing the relationships between covariates and individual risk probabilities while accurately identifying important variables. To address potential computational challenges in individual disease status estimation, we implemented an iterative EM algorithm for model estimation. Our simulation experiments demonstrate the proposed method's performance in high-dimensional covariate contexts with limited sample sizes, while application to real data confirms its efficacy. Our framework offers a unified approach for various group testing methods, showcasing its practical application value.
Despite these promising outcomes, our study acknowledges several limitations. First, our model assumes that sensitivity and specificity of testing are independent of group size, which may not always hold in practical applications. Second, data quality and variations in the testing population can impact the model's applicability. Therefore, exploring how to integrate prior information to enhance model accuracy and practical value remains a critical research direction. Furthermore, the potential high dimensionality of individual covariates poses significant challenges, necessitating the development of models capable of handling ultra-high-dimensional data.
Future research could explore the following directions. Firstly, examining model performance under varying group testing configurations, such as changes in testing errors and group sizes, could yield valuable insights. Secondly, investigating methods to incorporate additional prior knowledge to improve estimation accuracy is a worthwhile endeavor. Additionally, considering computational efficiency, developing faster algorithms for processing large-scale datasets will be a key focus for future work.
Changfu Yang: Methodolog, formal analysis, writing-original draft; Wenxin Zhou: Methodology, formal analysis; Wenjun Xiong: Conceptualization, methodology, writing-original draft, funding acquisition; Junjian Zhang: Conceptualization, methodology, writing-review and editing, funding acquisition; Juan Ding: Conceptualization, formal analysis, writing-review and editin, funding acquisition. All authors have read and approved the final version of the manuscript for publication.
The authors declare that they have not used Artificial Intelligence (AI) tools in the creation of this article.
This research was supported by the National Natural Science Foundation of China (Grant Nos. 12361055, 11801102), Guangxi Natural Science Foundation (2021GXNSFAA220054), and the Fundamental Research Funds for the Central Universities (B240201095).
The authors declare that there are no conflicts of interest regarding the publication of this paper.
In this part, we tested the performance of four examples at different sensitivity and specificity, using the Dofman algorithm and the LASSO penalty function. The simulation results are shown in Tables 8 to 11.
AMAE | AMSE | MSE | |||||||
Model | Setting | (Se,Sp) | TPR | FPR | g(⋅) | Prob | β | β1 | β2 |
Example 4.1 (qn=50) | n=500 | (0.98, 0.98) | 1.000 | 0.029 | 0.522 | 0.013 | 0.0001 | 0.0004 | 0.0003 |
(0.95, 0.95) | 0.987 | 0.020 | 0.474 | 0.011 | 0.0001 | 0.0003 | 0.0003 | ||
(0.90, 0.90) | 0.982 | 0.036 | 0.532 | 0.011 | 0.0001 | 0.0006 | 0.0007 | ||
(0.85, 0.85) | 0.984 | 0.040 | 0.578 | 0.016 | 0.0003 | 0.0001 | 0.0002 |
AMAE | AMSE | MSE | ||||||||
Model | Setting | (Se,Sp) | TPR | FPR | g(⋅) | Prob | β | β1 | β2 | β3 |
Example 4.2 (qn=100) | n=1000 | (0.98, 0.98) | 0.982 | 0.021 | 0.574 | 0.010 | 0.0001 | 0.0035 | 0.0001 | 0.0042 |
(0.95, 0.95) | 0.975 | 0.030 | 0.612 | 0.011 | 0.0001 | 0.0047 | 0.0001 | 0.0069 | ||
(0.90, 0.90) | 0.978 | 0.020 | 0.556 | 0.012 | 0.0001 | 0.0023 | 0.0002 | 0.0049 | ||
(0.85, 0.85) | 0.965 | 0.020 | 0.717 | 0.016 | 0.0004 | 0.0158 | 0.0002 | 0.0212 |
AMAE | AMSE | MSE | ||||||||
Model | Setting | (Se,Sp) | TPR | FPR | g(⋅) | Prob | β | β1 | β2 | β3 |
Example 4.3 (qn=50) | n=1000 | (0.98, 0.98) | 0.987 | 0.008 | 0.489 | 0.013 | 0.0001 | 0.0008 | 0.0001 | 0.0032 |
(0.95, 0.95) | 0.971 | 0.064 | 0.404 | 0.011 | 0.0003 | 0.0005 | 0.0033 | 0.0085 | ||
(0.90, 0.90) | 0.963 | 0.048 | 0.465 | 0.011 | 0.0001 | 0.0002 | 0.0012 | 0.0055 | ||
(0.85, 0.85) | 0.966 | 0.018 | 0.377 | 0.015 | 0.0004 | 0.0016 | 0.0023 | 0.0007 |
AMAE | AMSE | MSE | |||||||||
Model | Setting | (Se,Sp) | TPR | FPR | g(⋅) | Prob | β | β1 | β2 | β3 | β4 |
Example 4.4 (qn=100) | n=750 | (0.98, 0.98) | 0.986 | 0.041 | 0.581 | 0.016 | 0.0003 | 0.0034 | 0.0090 | 0.0014 | 0.0005 |
(0.95, 0.95) | 0.981 | 0.026 | 0.534 | 0.018 | 0.0001 | 0.0016 | 0.0045 | 0.0018 | 0.0005 | ||
(0.90, 0.90) | 0.974 | 0.018 | 0.546 | 0.016 | 0.0002 | 0.0004 | 0.0024 | 0.0014 | 0.0028 | ||
(0.85, 0.85) | 0.976 | 0.024 | 0.539 | 0.011 | 0.0002 | 0.0047 | 0.0085 | 0.0004 | 0.0039 |
Variable | Implication | Variable | Implication |
Hypertension circumstance | Family history of diabetes | ||
hypertension1 | Have a history of hypertension | family history1 | Blood relatives with diabetes |
hypertension2 | No history of hypertension | family history2 | Blood relatives do not have diabetes |
Education level | family history9 | Not known if any blood relatives have diabetes | |
education1 | Less Than 9th Grade | Physical activity | |
education2 | 9 - 11th Grade (Includes 12th grade with no diploma) | physical activity1 | Sit during the day and do not walk about very much |
education3 | High School Grad/GED or Equivalent | physical activity2 | Stand or walk about a lot during the day, but do not have to carry or lift things very often |
education4 | Some College or AA degree | physical activity3 | Lift light load or has to climb stairs or hills often |
education5 | College Graduate or above | physical activity4 | Do heavy work or carry heavy loads |
education7 | Refuse to answer about the level of education | physical activity9 | Don't know physical activity level |
Household income | Sex | ||
household income1 | 0 to 4,999 fanxiexian_myfh | sex1 | Male |
household income2 | 5,000 to 9,999 fanxiexian_myfh | sex2 | Female |
household income3 | 10,000 to 14,999 fanxiexian_myfh | Race/Ethnicity | |
household income4 | 15,000 to 19,999 fanxiexian_myfh | race1 | Mexican American |
household income5 | 20,000 to 24,999 fanxiexian_myfh | race2 | Other Hispanic |
household income6 | 25,000 to 34,999 fanxiexian_myfh | race3 | Non - Hispanic White |
household income7 | 35,000 to 44,999 fanxiexian_myfh | race4 | Non - Hispanic Black |
household income8 | 45,000 to 54,999 fanxiexian_myfh | race5 | Other Race - Including Multi - Racial |
household income9 | 55,000 to 64,999 fanxiexian_myfh | ||
household income10 | 65,000 to 74,999 fanxiexian_myfh | ||
household income11 | 75,000 and Over fanxiexian_myfh | ||
household income12 | Over 20,000 fanxiexian_myfh | ||
household income13 | Under 20,000 fanxiexian_myfh | ||
household income77 | Refusal to answer about household income | ||
household income99 | Don't know household income |
In this part, we derive the conditional expectation formulas for Dorfman testing, halving testing, and array testing within the framework of our method. Before proceeding, it is necessary to clarify some notations. Let Pj∖{i} represent the set of individuals in Pj excluding the i-th individual, and |Pj| denotes the number of individuals in Pj. Let Yi represent the test result of the i-th individual and YPj,l={Yi,i∈Pj,l} represent the set of testing results of individuals in Pj,l.
If the initial group testing result is negative, no re-testing is performed. However, if Zj,1=1, each individual in the group needs to undergo a separate re-testing.
1) When Zj,1=0, the result is the same as the master poor testing:
w(t)i,0=P(˜Yi=1,Zj,1=0)P(Zj,1=0)=(1−Se)⋅p(t)iB1−[Se+(1−Se−Sp)∏i∈Pj(1−p(t)iB)]. |
2) When Zj,1=1, each individual in the group must undergo a separate re-test. In total, the group has undergone |Pj|+1 tests.
w(t)i,1=P(˜Yi=1,Zj,1,YPj)P(Zj,1,YPj)=P(˜Yi=1)P(Zj,1,YPj|˜Yi=1)P(Zj,1,YPj). |
The denominator is
P(Zj,1,YPj)=∑˜YPjP(Zj,1,YPj|˜YPj)P(˜YPj)=∑˜YPjP(Zj,1|˜Zj,1)∏i∈PjP(Yi|˜Yi)P(˜Yi)=∑˜YPj[S˜Zj,1e(1−Sp)1−˜Zj,1]∏i∈Pj[SYie(1−Se)(1−Yi)]˜Yi×[(1−Sp)YiS(1−Yi)p](1−˜Yi)[p(t)iB]˜Yi[1−p(t)iB]1−˜Yi=∑˜YPj[S˜Zj,1e(1−Sp)1−˜Zj,1]∏i∈Pj[SYie(1−Se)(1−Yi)p(t)iB]˜Yi×[(1−Sp)YiS(1−Yi)p(1−p(t)iB)](1−˜Yi). |
Thus, the numerator is
P(˜Yi=1,Zj,1,YPj)=P(Zj,1,YPj|˜Yi=1)P(˜Yi=1)=∑˜YPj∖iP(Zj,1,YPj|˜Yi=1,˜YPj∖i)P(˜YPj∖i)P(˜Yi=1)=∑˜YPj∖iP(Zj,1|˜Zj,1=1)P(Yi|˜Yi=1)∏i∈Pj∖{i}P(Yi|˜Yi)P(˜Yi)P(˜Yi=1)=∑˜YPj∖iSeSYie(1−Se)(1−Yi)∏i∈Pj∖{i}[SYie(1−Se)(1−Yi)]˜Yi×[(1−Sp)YiS(1−Yi)p](1−˜Yi)[p(t)iB]˜Yi[1−p(t)iB]1−˜Yip(t)iB=∑˜YPj∖iS1+Yie(1−Se)(1−Yi)p(t)iB×∏i∈Pj∖{i}[SYie(1−Se)(1−Yi)p(t)iB]˜Yi×[(1−Sp)YiS(1−Yi)p(1−p(t)iB)](1−˜Yi). |
Therefore, the final expression is
w(t)i=Zj,1w(t)i,1+(1−Zj,1)w(t)i,0. |
Assume that the maximum number of partitions required during testing is two. Let the test result of the first testing be Zj,1. At this time, the set of all unpartitioned individuals is Pj,1, which contains |Pj| individuals. After the first partition, the partitioning method is to divide into two equal parts, with the two subsets of individuals being Pj,2 and Pj,3, respectively. The responses of the second testing are Zj,2 and Zj,3. There are five types of testing results in halving testing.
1) When Zj,1=0:
Only one testing is performed, and the process is the same as master pool testing. Since the result of one testing is negative, no further partitioning and testing are performed. At this time,
w(t)i=P(˜Yi=1|Zj,1=0)=P(˜Yi=1,Zj,1=0)P(Zj,1=0)=P(˜Yi=1)P(Zj,1=0|˜Yi=1)P(Zj,1=0)=p(t)iB(1−Se)1−[Se+(1−Se−Sp)∏i∈Pj(1−p(t)iB)]. |
2) When Zj,1=1,Zj,2=0,Zj,3=0:
That is, the result of the first testing is Zj,1=1. Subsequently, the first partition is performed, dividing into two equal parts Pj,2 and Pj,3. Then, testings are performed on the two sets respectively, with the testing results being Zj,2=Zj,3=0. At this time,
w(t)i=P(˜Yi=1|Zj,1=1,Zj,2=0,Zj,3=0)=P(Zj,1=1,Zj,2=0,Zj,3=0|˜Yi=1)P(˜Yi=1)P(Zj,1=1,Zj,2=0,Zj,3=0). |
The denominator is
P(Zj,1=1,Zj,2=0,Zj,3=0)=∑˜YPj,1P(Zj,1=1,Zj,2=0,Zj,3=0|˜YPj,1)P(˜YPj,2)P(˜YPj,3)=∑˜YPj,1P(Zj,1=1|˜YPj,1)P(Zj,2=0|˜YPj,2)P(Zj,3=0|˜YPj,3)P(˜YPj,2)P(˜YPj,3)=∑˜YPj,1[S˜Zj,1e(1−Sp)1−˜Zj,1][(1−Se)˜Zj,2S1−˜Zj,2p]∏i∈Pj,2[p(t)iB]˜Yi[1−p(t)iB]1−˜Yi×[(1−Se)˜Zj,3S1−˜Zj,3p]∏i∈Pj,3[p(t)iB]˜Yi[1−p(t)iB]1−˜Yi=∑˜YPj,1[S˜Zj,1e(1−Sp)1−˜Zj,1]3∏u=2(1−Se)˜Zj,uS1−˜Zj,up∏i∈Pj[p(t)iB]˜Yi[1−p(t)iB]1−˜Yi. |
Since the placement of the i-th individual in the sets Pj,2 and Pj,3 is symmetric, assume that i-th individual is placed in the set Pj,2. Then, the numerator is
P(Zj,1=1,Zj,2=0,Zj,3=0,˜Yi=1)=P(Zj,1=1,Zj,2=0,Zj,3=0|˜Yi=1)P(˜Yi=1)=∑˜YPj∖iP(Zj,1=1,Zj,2=0,Zj,3=0|˜Yi=1,˜YPj,2)×P(˜YPj,2∖i)P(˜YPj,3)P(˜Yi=1)=∑˜YPj∖iP(Zj,1=1|˜Zj,1=1)P(Zj,2=0|˜Zj,2=1)P(Zj,3=0|˜Zj,3)×P(˜YPj,2∖i)P(˜YPj,3)P(˜Yi=1)=∑˜YPj∖iSe(1−Se)∏i∈Pj,2∖{i}[p(t)iB]˜Yi[1−p(t)iB]1−˜Yi(1−Se)˜Zj,3S1−˜Zj,3p∏i∈Pj,3[p(t)iB]˜Yi[1−p(t)iB]1−˜Yip(t)iB=∑˜YPj∖iSe(1−Se)1+˜Zj,3S1−˜Zj,3pp(t)iB∏i∈Pj∖{i}[p(t)iB]˜Yi[1−p(t)iB]1−˜Yi. |
3) When Zj,1=1,Zj,2=0,Zj,3=1:
At this time, the second partitions are performed. The first partition divides all individuals into two sets, Pj,2 and Pj,3, with testing results Zj,2=0 and Zj,3=1, respectively. Individual testings are performed separately on the individuals in Pj,3, and the set of testing results is YPj,3. At this time,
w(t)i=P(˜Yi=1|Zj,1=1,Zj,2=0,Zj,3=1,YPj,3)=P(Zj,1=1,Zj,2=0,Zj,3=1,YPj,3|˜Yi=1)P(˜Yi=1)P(Zj,1=1,Zj,2=0,Zj,3=1,YPj,3). |
The denominator is
P(Zj,1=1,Zj,2=0,Zj,3=1,YPj,3)=∑˜YPjP(Zj,1=1,Zj,2=0,Zj,3=1,YPj,3|˜YPj,2,˜YPj,3)P(˜YPj,2)P(˜YPj,3)=∑˜YPjP(Zj,1=1|˜YPj)P(Zj,2=0|˜YPj,2)P(Zj,3=1|˜YPj,3)×P(˜YPj,3)P(˜YPj,2)P(YPj,3|˜YPj,3)=∑˜YPjP(Zj,1=1|˜Zj,1)P(Zj,2=0|˜Zj,2)P(Zj,3=1|˜Zj,3)×∏i∈Pj,3P(Yi|˜Yi)P(˜YPj,2)P(˜YPj,3)=∑˜YPj[S˜Zj,1e(1−Sp)1−˜Zj,1][(1−Se)˜Zj,2S1−˜Zj,2p][S˜Zj,3e(1−Sp)1−˜Zj,3]×∏i∈Pj,2[p(t)iB]˜Yi[1−p(t)iB]1−˜Yi∏i∈Pj,3[p(t)iB]˜Yi[1−p(t)iB]1−˜Yi×∏i∈Pj,3[SYie(1−Se)1−Yi]˜Yi[(1−Sp)YiS1−Yip]1−˜Yi=∑˜YPj[S˜Zj,1+˜Zj,3e(1−Sp)2−˜Zj,1−˜Zj,3][(1−Se)˜Zj,2S1−˜Zj,2p]×∏i∈Pj[p(t)iB]˜Yi[1−p(t)iB]1−˜Yi∏i∈Pj,3[SYie(1−Se)1−Yi]˜Yi[(1−Sp)YiS1−Yip]1−˜Yi. |
Since an i-th individual may belong to either set Pj,2 or Pj,3, the numerator is discussed accordingly.
(a) Assume that i-th individual belongs to set Pj,2. Then, the numerator is
P(˜Yi=1,Zj,1=1,Zj,2=0,Zj,3=1,YPj,3)=∑˜YPj∖iP(Zj,1=1,Zj,2=0,Zj,3=1,YPj,3|˜Yi=1,˜YPj,2∖i,˜YPj,3)×P(˜YPj,2∖i,˜YPj,3)P(˜Yi=1)=∑˜YPj∖iP(Zj,1=1|˜Zj,1=1)P(Zj,2=0|˜Zj,2=1)P(Zj,3=1|˜Zj,3)×P(YPj,3|˜YPj,3)P(˜YPj,2∖i)P(˜YPj,3)P(˜Yi=1)=∑˜YPj∖iSe(1−Se)S˜Zj,3e(1−Sp)1−˜Zj,3∏i∈Pj,2∖{i}[p(t)iB]˜Yi[1−p(t)iB]1−˜Yi×∏i∈Pj,3[p(t)iB]˜Yi[1−p(t)iB]1−˜Yip(t)iB×∏i∈Pj,3[SYie(1−Se)1−Yi]˜Yi[(1−Sp)YiS1−Yip]1−˜Yi. |
(b) Assume that i-th individual belongs to set Pj,3. Then, the numerator is
P(˜Yi=1,Zj,1=1,Zj,2=0,Zj,3=1,YPj,3)=∑˜YPj∖iP(Zj,1=1,Zj,2=0,Zj,3=1,YPj,3|˜Yi=1,˜YPj,2,˜YPj,3∖i)×P(˜YPj,2,˜YPj,3∖i)P(˜Yi=1)=∑˜YPj∖iP(Zj,1=1|˜Zj,1=1)P(Zj,2=0|˜Zj,2)P(Zj,3=1|˜Zj,3=1)×P(YPj,3∖i|˜YPj,3∖i)P(Yi|˜Yi=1)P(˜YPj,2)P(˜YPj,3 ∖i)P(˜Yi=1)=∑˜YPj∖iS2e(1−Se)˜Zj,2S1−˜Zj,2pp(t)iBSYie(1−Se)1−Yi×∏i∈Pj,3∖{i}[SYie(1−Se)1−Yi]˜Yi[(1−Sp)YiS1−Yip]1−˜Yi×∏i∈Pj∖{i}[p(t)iB]˜Yi[1−p(t)iB]1−˜Yi. |
4) When Zj,1=1,Zj,2=1, and Zj,3=0, the process is the same as when Zj,1=1,Zj,2=0, and Zj,3=1, and the numerator needs to be discussed accordingly. At this time,
w(t)i=P(˜Yi=1|Zj,1=1,Zj,2=1,Zj,3=0,YPj,2)=P(Zj,1=1,Zj,2=1,Zj,3=0,YPj,2|˜Yi=1)P(˜Yi=1)P(Zj,1=1,Zj,2=1,Zj,3=0,YPj,2). |
First, the denominator is
P(Zj,1=1,Zj,2=1,Zj,3=0,YPj,2)=∑˜YP(Zj,1=1,Zj,2=1,Zj,3=0,YPj,2|˜YPj,2,˜YPj,3)P(˜YPj,2)P(˜YPj,3)=∑˜YP(Zj,1=1|˜Zj,1)P(Zj,2=1|˜Zj,2)P(Zj,3=0|˜Zj,3)×∏i∈Pj,2P(Yi|˜Yi)P(˜YPj,2)P(˜YPj,3)=∑˜YS˜Zj,1+˜Zj,2e(1−Sp)2−˜Zj,1−˜Zj,2(1−Se)˜Zj,3S1−˜Zj,3p×∏i∈Pj[p(t)iB]˜Yi[1−p(t)iB]1−˜Yi∏i∈Pj,2[SYie(1−Se)1−Yi]˜Yi[(1−Sp)YiS1−Yip]1−˜Yi. |
Next, the numerator is discussed.
(a) Assume that i-th individual belongs to set Pj,2. Then, the numerator is
P(˜Yi=1,Zj,1=1,Zj,2=1,Zj,3=0,YPj,2)=∑˜YPj∖iP(Zj,1=1,Zj,2=1,Zj,3=0,YPj,2|˜Yi=1,˜YPj,2∖i,˜YPj,3)P(˜YPj,2∖i,˜YPj,3)P(˜Yi=1)=∑˜YPj∖iP(Zj,1=1|˜Zj,1=1)P(Zj,2=1|˜Zj,2=1)P(Zj,3=0|˜Zj,3)×P(YPj,2∖i|˜YPj,2∖i)P(Yi|˜Yi=1)P(˜YPj,2∖i)P(˜YPj,3)P(˜Yi=1)=∑˜YPj∖iS2e(1−Se)˜Zj,3S1−˜Zj,3pSYie(1−Se)1−Yip(t)iB×∏i∈Pj,2∖{i}[SYie(1−Se)1−Yi]˜Yi[(1−Sp)YiS1−Yip]1−˜Yi×∏i∈Pj∖{i}[p(t)iB]˜Yi[1−p(t)iB]1−˜Yi. |
(b) Assume that i-th individual belongs to set Pj,3. Then, the numerator is
P(˜Yi=1,Zj,1=1,Zj,2=1,Zj,3=0,YPj,2)=P(Zj,1=1,Zj,2=1,Zj,3=0,YPj,2|˜Yi=1)P(˜Yi=1)=∑˜YPj∖iP(Zj,1=1,Zj,2=1,Zj,3=0,YPj,2|˜Yi=1,˜YPj,2,˜YPj,3∖i)P(˜YPj,2,˜YPj,3∖i)P(˜Yi=1)=∑˜YPj∖iP(Zj,1=1|˜Zj,1=1)P(Zj,2=1|˜Zj,2)P(Zj,3=0|˜Zj,3=1)×P(YPj,2|˜YPj,2)P(˜YPj,2)P(YPj,3∖i|˜YPj,3∖i)P(˜YPj,3∖i)P(˜Yi=1)=∑˜YPj∖iSe(1−Se)S˜Zj,2e(1−Sp)1−˜Zj,2p(t)iB×∏i∈Pj,2[SYie(1−Se)1−Yi]˜Yi×[(1−Sp)YiS1−Yip]1−˜Yi∏i∈Pj∖{i}[p(t)iB]˜Yi[1−p(t)iB]1−˜Yi. |
5) When Zj,1=1,Zj,2=1, and Zj,3=1, two similar partitions are performed as above, and individual retests are conducted separately for all individuals in Pj. At this time, YPj=YPj,2∪YPj,3, and we have
w(t)i=P(˜Yi=1|Zj,1=1,Zj,2=1,Zj,3=1,YPj,2,YPj,3)=P(Zj,1=1,Zj,2=1,Zj,3=1,YPj|˜Yi=1)P(˜Yi=1)P(Zj,1=1,Zj,2=1,Zj,3=1,YPj). |
The denominator is
P(Zj,1=1,Zj,2=1,Zj,3=1,YPj)=∑˜YPjP(Zj,1=1,Zj,2=1,Zj,3=1,YPj|˜YPj)P(˜YPj)=∑˜YPjP(Zj,1=1|˜Zj,1)P(Zj,2=1|˜Zj,2)P(Zj,3=1|˜Zj,3)∏i∈PjP(Yi|˜Yi)P(˜Yi)=∑˜YPjS˜Zj,1e(1−Sp)1−˜Zj,13∏u=2S˜Zj,ue(1−Sp)1−˜Zj,u×∏i∈Pj[SYie(1−Se)1−Yip(t)iB]˜Yi[(1−Sp)YiS1−Yip(1−p(t)iB)]1−˜Yi. |
The results of i-th individual belonging to either set Pj,2 or Pj,3 are symmetric. Assume that i-th individual belongs to set Pj,2. Then, the numerator is
P(Zj,1=1,Zj,2=1,Zj,3=1,YPj,˜Yi=1)=∑˜YPj∖iP(Zj,1=1,Zj,2=1,Zj,3=1,YPj|˜Yi=1,˜YPj,2∖i,˜YPj,3)P(˜YPj,2∖i,˜YPj,3)P(˜Yi=1)=∑˜YPj∖iP(Zj,1=1|˜Zj,1=1)P(Zj,2=1|˜Zj,2=1)P(Zj,3=1|˜Zj,3)×P(Yi|˜Yi=1)P(YPj∖i|˜YPj∖i)P(˜YPj∖i)P(˜Yi=1)=∑˜YPj∖iS2eS˜Zj,3e(1−Sp)1−˜Zj,3p(t)iBSYie(1−Se)1−Yi×∏i∈Pj∖{i}[SYie(1−Se)1−Yip(t)iB]˜Yi[(1−Sp)YiS1−Yip(1−p(t)iB)]1−˜Yi. |
For convenience, assume that the set of all individuals is G, and all individuals can be arranged into an R×C array, that is, G={(r,c),r∈R,c∈C}. Define R=(R1,R2,⋯,RR) and C=(C1,C2,⋯,CC) as the collections of row and column testing results, respectively. Let R=maxRr and C=maxCc. Furthermore, define ˜Rr=maxc˜Yrc and ˜Cc=maxr˜Yrc as the true result sets for rows and columns, respectively. Let Yrc denote the testing result of the individual in the r-th row and c-th column of the array, and ˜Yrc represents the true disease status of the individual in the r-th row and c-th column of the array. Let
Q={(s,t)∣Rs=1,Ct=1,1≤s≤R,1≤t≤C,or Rs=1,C1=⋯=CC=0,1≤s≤R,or R1=⋯=RR=0,Ct=1,1≤t≤C}. |
YQ represents the collection of responses from all potentially positive individuals, and ˜YQ denotes the true disease statuses of all potentially positive individuals. Let ZG=(R,C) denote the group testing responses. Since (r,c)∈G, define
˜YG∖(r,c)={˜Yr′c′,r′∈R∖{r},c′∈C∖{c}}. |
Then,
w(t)rc=P(˜Yrc=1∣ZG,YQ)=P(˜Yrc=1,ZG,YQ)P(ZG,YQ). |
1) When ZG=(0,0), there is no need to retest individuals within the group. At this time,
w(t)rc=P(˜Yrc=1∣ZG=(0,0))=P(˜Yrc=1,ZG=(0,0))P(ZG=(0,0)). |
The denominator is
\begin{align*} P\big(\mathcal{Z}_G = (0, 0)\big) = &\sum\limits_{\tilde{\mathcal{Y}}_G}P\big(\mathcal{Z}_G = (0, 0) | \tilde{\mathcal{Y}}_G\big)P(\tilde{\mathcal{Y}}_G)\\ = &\sum\limits_{\tilde{\mathcal{Y}}_G}P(\mathcal{R} = 0 | \tilde{\mathcal{Y}}_G)P(\mathcal{C} = 0 | \tilde{\mathcal{Y}}_G)P(\tilde{\mathcal{Y}}_G)\\ = &\sum\limits_{\tilde{\mathcal{Y}}_G} \bigg[ \prod\limits_{r' = 1}^{R} P\Big(R_{r'} = 0 | \tilde{Y}_{r'1}, \tilde{Y}_{r'2}, \cdots, \tilde{Y}_{r'C} \Big) \bigg] \bigg[ \prod\limits_{c' = 1}^{C} P\Big(C_{c'} = 0 | \tilde{Y}_{1c'}, \tilde{Y}_{2c'}, \cdots, \tilde{Y}_{Rc'} \Big) \bigg] \\ & \times \prod\limits_{r' \in R} \prod\limits_{c' \in C} {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} \Big(1-{p_{r'c'B}^{(t)}}\Big)^{1-{\tilde{Y}_{r'c'}}}\\ = &\sum\limits_{\tilde{\mathcal{Y}}_G} \prod\limits_{r' = 1}^{R} \bigg[(1-S_e)^{\tilde{R}_{r'}} S_p^{1-\tilde{R}_{r'}} \bigg] \prod\limits_{c' = 1}^{C} \bigg[(1-S_e)^{\tilde{C}_{c'}} S_p^{1-\tilde{C}_{c'}} \bigg] \prod\limits_{(r', c') \in G} \bigg\{ {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} \Big(1-{p_{r'c'B}^{(t)}}\Big)^{1-{\tilde{Y}_{r'c'}}} \bigg\}\\ = &\sum\limits_{\tilde{\mathcal{Y}}_G} \prod\limits_{r' = 1}^{R} \prod\limits_{c' = 1}^{C} \bigg[ (1-S_e)^{\tilde{R}_{r'}+\tilde{C}_{c'}} S_p^{2-\tilde{R}_{r'}-\tilde{C}_{c'}} \bigg] \prod\limits_{(r', c') \in G} \bigg\{ {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} \Big(1-{p_{r'c'B}^{(t)}}\Big)^{1-{\tilde{Y}_{r'c'}}} \bigg\}. \end{align*} |
The numerator is
\begin{align*} P\big(\mathcal{Z}_G = (0, 0), \tilde{Y}_{rc} = 1\big) & = P\big(\mathcal{Z}_G = (0, 0) | \tilde{Y}_{rc} = 1\big) P(\tilde{Y}_{rc} = 1)\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}} P(\mathcal{R} = 0, \mathcal{C} = 0 | \tilde{\mathcal{Y}}_{G \setminus (r, c)}, \tilde{Y}_{rc} = 1) P(\tilde{\mathcal{Y}}_{G \setminus (r, c)})\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}} P(R_r = 0 | \tilde{R}_r = 1) \bigg [\prod\limits_{r'\in R\setminus \{r\}} P(R_{r'} = 0 | \tilde{Y}_{r'1}, \tilde{Y}_{r'2}, \cdots, \tilde{Y}_{r'C}) \bigg ]\\ & \times P(C_c = 0 | \tilde{C}_c = 1) \bigg [\prod\limits_{c'\in C\setminus \{c\}} P(C_{c'} = 0 | \tilde{Y}_{1c'}, \cdots, \tilde{Y}_{Rc'}) \bigg ]\\ & \times \prod\limits_{r' \in R\setminus \{r\}} \prod\limits_{c' \in C\setminus \{c\}} {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} (1-{p_{r'c'B}^{(t)}})^{1-{\tilde{Y}_{r'c'}}} p_{rcB}^{(t)}\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}} (1-S_e)^2 \prod\limits_{r'\in R \setminus \{r\}} (1-S_e)^{\tilde{R}_{r'}} S_p^{1-\tilde{R}_{r'}} \prod\limits_{c'\in C \setminus \{c\}} (1-S_e)^{\tilde{C}_{c'}} S_p^{1-\tilde{C}_{c'}}\\ & \times \prod\limits_{(r', c') \in G} {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} (1-{p_{r'c'B}^{(t)}})^{1-{\tilde{Y}_{r'c'}}} p_{rcB}^{(t)}\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}} \prod\limits_{r' \in R\setminus \{r\}} \prod\limits_{c' \in C\setminus \{c\}} (1-S_e)^{2+\tilde{R}_i+\tilde{C}_c} S_p^{2-\tilde{R}_i-\tilde{C}_c}\\ & \times \prod\limits_{(r', c') \in G\setminus(r, c)} {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} (1-{p_{r'c'B}^{(t)}})^{1-{\tilde{Y}_{r'c'}}} p_{rcB}^{(t)}. \end{align*} |
2) When \mathcal{Z}_G \neq (0, 0) , \mathcal{Z}_G = (R, C) has multiple scenarios, specifically \mathcal{Z}_G = (R, C) = (1, 0) , (R, C) = (0, 1) , and (R, C) = (1, 1) . Therefore, when \mathcal{Z}_G \neq (0, 0) , the following classifications can be discussed:
(a) When (R, C) = (1, 0) ,
\begin{equation*} {w}^{(t)}_{rc} = P\big(\tilde{Y}_{rc} = 1 \mid \mathcal{Z}_G = (1, 0)\big) = \frac{P\big(\tilde{Y}_{rc} = 1, \mathcal{Z}_G = (1, 0)\big)}{P\big(\mathcal{Z}_G = (1, 0)\big)}. \end{equation*} |
The denominator is
\begin{align*} P\big(\mathcal{Z}_G = (1, 0)\big) = & \sum\limits_{\tilde{\mathcal{Y}}_G} P(\mathcal{R} \neq 0, \mathcal{C} = 0, \mathcal{Y}_Q | \tilde{\mathcal{Y}}_G) P(\tilde{\mathcal{Y}}_G) \\ = & \sum\limits_{\tilde{\mathcal{Y}}_G} \bigg[ \prod\limits_{r' = 1}^{R} P\Big(R_{r'} | \tilde{Y}_{r'1}, \tilde{Y}_{r'2}, \dots, \tilde{Y}_{r'C}\Big) \bigg] \bigg[ \prod\limits_{c' = 1}^{C} P\Big(C_{c'} = 0 | \tilde{Y}_{1c'}, \tilde{Y}_{2c'}, \dots, \tilde{Y}_{Rc'}\Big) \bigg] \\ &\times \bigg[ \prod\limits_{(s, t) \in \mathcal{Q}} P\Big(Y_{st} | \tilde{Y}_{st}\Big) \bigg] \bigg[ \prod\limits_{r' \in R} \prod\limits_{c' \in C} {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} \Big(1 - {p_{r'c'B}^{(t)}}\Big)^{1 - \tilde{Y}_{r'c'}} \bigg] \\ = & \sum\limits_{\tilde{\mathcal{Y}}_G} \prod\limits_{r' = 1}^{R} \bigg[ S_e^{R_{r'}} (1 - S_e)^{1 - R_{r'}} \bigg]^{\tilde{R}_{r'c}} \bigg[ (1 - S_p)^{R_{r'}} S_p^{1 - R_{r'}} \bigg]^{1 - \tilde{R}_{r'}} \\ & \times \prod\limits_{c' = 1}^{C} \bigg[ (1 - S_e)^{1 - C_{c'}} \bigg]^{\tilde{C}_{c'}} \bigg[ S_p^{1 - C_{c'}} \bigg]^{1 - \tilde{C}_{c'}} \times \prod\limits_{(s, t) \in Q} \bigg[ S_e^{Y_{st}} (1 - S_e)^{1 - Y_{st}} \bigg]^{\tilde{Y}_{st}} \bigg[ (1 - S_p)^{Y_{st}} S_p^{1 - Y_{st}} \bigg]^{1 - \tilde{Y}_{st}} \\ &\times \prod\limits_{(r', c') \in G} {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} \Big(1 - {p_{r'c'B}^{(t)}}\Big)^{1 - \tilde{Y}_{r'c'}}. \end{align*} |
At this point, the numerator requires further discussion:
(i) If (r, c) \in Q and R_r = 1 and C_r = 0 , then
\begin{align*} &P\big(\tilde{Y}_{rc} = 1, \mathcal{Z}_G = (1, 0)\big)\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}P\big(\mathcal{Z}_G = \big(1, 0\big), \mathcal{Y}_Q\big|\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)}\big)P\big(\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)}\big) \\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}P\big(\mathcal{R}\neq0\big|\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)}\big)P\big(\mathcal{C} = 0\big|\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)}\big) \\ & \times P\big(\mathcal{Y}_Q\big|\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{Q \setminus (r, c)}\big)P\big(\tilde{Y}_{rc} = 1\big)P\big(\tilde{\mathcal{Y}}_{G \setminus (r, c)}\big) \\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}P\Big(R_r = 1|\tilde{R}_r = 1\Big) \bigg[\prod\limits_{r'\in R\setminus \{r\}} P\Big(R_{r'}|\tilde{Y}_{r'1}, \tilde{Y}_{r'2}, \cdots, \tilde{Y}_{r'C}\Big)\bigg] \\ & \times P\Big(C_c = 0|\tilde{C}_c = 1\Big)\bigg[\prod\limits_{c'\in C\setminus \{c\}}P\Big(C_{c'} = 0|\tilde{Y}_{1c'}, \cdots, \tilde{Y}_{Rc'}\Big)\bigg]P\Big(Y_{rc}|\tilde{Y}_{rc} = 1\Big) \\ & \times \bigg[\prod\limits_{(s, t)\in Q\setminus {(r, c)}}P\Big(Y_{st}|\tilde{Y}_{st}\Big)\bigg] p_{rcB}^{(t)} \times \bigg[\prod\limits_{r'\in R\setminus \{r\}}\prod\limits_{c'\in C\setminus \{c\}} {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} \Big(1-{p_{r'c'B}^{(t)}}\Big)^{1-{\tilde{Y}_{r'c'}}}\bigg] \\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}S_e^{1+Y_{rc}}\Big(1-S_e\Big)^{2-Y_{rc}}p_{rcB}^{(t)}\\ & \times \prod\limits_{r' \in R\setminus \{r\}}\bigg[S_e^{R_{r'}}\Big(1-S_e\Big)^{1-{R_{r'}}}\bigg]^{\tilde{R}_{r'c}} \Big[(1-S_p)^{R_{r'}}S_p^{1-R_{r'}}\Big]^{1-\tilde{R}_{r'}} \\ & \times \prod\limits_{c' \in C\setminus \{c\}}\bigg[(1-S_e)^{1-{C_{c'}}}\bigg]^{\tilde{C}_{c'}} \Big[S_p^{1-C_{c'}}\Big]^{1-\tilde{C}_{c'}} \\ & \times \prod\limits_{(s, t)\in Q\setminus\{(r, c)\}} \bigg[S_e^{Y_{st}}\Big(1-S_e\Big)^{1-Y_{st}}\bigg]^{\tilde{Y}_{st}} \bigg[(1-S_p)^{Y_{st}}S_p^{1-Y_{st}}\bigg]^{1-\tilde{Y}_{st}} \\ & \times \prod\limits_{(r', c')\in G\setminus \{(r, c)\}} {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} \Big(1-{p_{r'c'B}^{(t)}}\Big)^{1-{\tilde{Y}_{r'c'}}} . \end{align*} |
(ii) If (r, c) \notin Q , then \mathcal{R} = 0 , \mathcal{C} \neq 0 , but R_r = 0 and C_r = 0 :
\begin{align*} &P\big(\tilde{Y}_{rc} = 1, \mathcal{Z}_G = (1, 0)\big)\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}P\bigg(\mathcal{R}, \mathcal{C}, \mathcal{Y}_Q|\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)}\bigg)P\big(\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)}\big)\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}P\bigg(\mathcal{R}|\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)}\bigg) P\bigg(\mathcal{C}|\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)}\bigg) P\big(\mathcal{Y}_Q\big|\tilde{\mathcal{Y}}_Q\big)P\big(\tilde{Y}_{rc} = 1\big)P\big(\tilde{\mathcal{Y}}_{G \setminus (r, c)}\big)\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}P\Big(R_r = 0|\tilde{R}_r = 1\Big) \bigg[\prod\limits_{r'\in R\setminus \{r\}} P\Big(R_{r'}|\tilde{Y}_{r'1}, \tilde{Y}_{r'2}, \cdots, \tilde{Y}_{r'C}\Big)\bigg]\\ & \times P\Big(C_c = 0|\tilde{C}_c = 1\Big) \bigg[\prod\limits_{c'\in C\setminus \{c\}}P\Big(C_{c'} = 0|\tilde{Y}_{1c'}, \cdots, \tilde{Y}_{Rc'}\Big)\bigg]\\ & \times \bigg[\prod\limits_{(s, t)\in Q}P\Big(Y_{st}|\tilde{Y}_{st}\Big)\bigg] p_{rcB}^{(t)} \prod\limits_{(r', c')\in G\setminus \{(r, c)\}} {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} \Big(1-{p_{r'c'B}^{(t)}}\Big)^{1-{\tilde{Y}_{r'c'}}}\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}S_e^{Y_{rc}}\Big(1-S_e\Big)^{3-Y_{rc}}p_{rcB}^{(t)}\\ & \times \prod\limits_{r' \in R\setminus \{r\}} \bigg[S_e^{R_{r'}}\big(1-S_e\big)^{1-{R_{r'}}}\bigg]^{\tilde{R}_{r'c}} \Big[(1-S_p)^{R_{r'}}S_p^{1-R_{r'}}\Big]^{1-\tilde{R}_{r'c}}\\ & \times \prod\limits_{c' \in C\setminus \{c\}} \bigg[(1-S_e)^{1-{C_{c'}}}\bigg]^{\tilde{C}_{c'}} \Big[S_p^{1-C_{c'}}\Big]^{1-\tilde{C}_{c'}}\\ & \times \prod\limits_{(s, t)\in Q} \bigg[S_e^{Y_{st}}\big(1-S_e\big)^{1-Y_{st}}\bigg]^{\tilde{Y}_{st}} \Big[(1-S_p)^{Y_{st}}S_p^{1-Y_{st}}\Big]^{1-\tilde{Y}_{st}}\\ & \times \prod\limits_{(r', c')\in G\setminus \{(r, c)\}} \bigg[{p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} \Big(1-{p_{r'c'B}^{(t)}}\Big)^{1-{\tilde{Y}_{r'c'}}}\bigg]. \end{align*} |
(b) When (R, C) = (0, 1) , the denominator is
\begin{align*} P\big(\mathcal{Z}_G = (0, 1)\big) = &\sum\limits_{\tilde{\mathcal{Y}}_G}P(\mathcal{R} = 0, \mathcal{C}\neq0, \mathcal{Y}_Q|\tilde{\mathcal{Y}}_G)P\big(\tilde{\mathcal{Y}}_G\big)\\ = &\sum\limits_{\tilde{\mathcal{Y}}_G} \bigg[ \prod\limits_{r' = 1}^{R} P(R_r' = 0|\tilde{Y}_{r'1}, \tilde{Y}_{r'2}, \cdots, \tilde{Y}_{r'C}) \bigg] \\ & \times \bigg[ \prod\limits_{c' = 1}^{C} P(C_{c'}|\tilde{Y}_{1c'}, \tilde{Y}_{2c'}, \cdots, \tilde{Y}_{Rc'}) \bigg] \\ & \times \prod\limits_{(s, t)\in \mathcal{Q}} P\big(Y_{st}\big|\tilde{Y}_{st}\big) \prod\limits_{r' \in R}\prod\limits_{c' \in C} {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} (1-{p_{r'c'B}^{(t)}})^{1-{\tilde{Y}_{r'c'}}} \\ = &\sum\limits_{\tilde{\mathcal{Y}}_G} \prod\limits_{r' = 1}^{R} \bigg[ \big(1-S_e\big)^{1-R_{r'}} \bigg]^{\tilde{R}_{r'}} \Big[ S_p^{1-R_{r'}} \Big]^{1-\tilde{R}_{r'}} \\ & \times \prod\limits_{c' = 1}^{C} \bigg[ S_e^{C_{c'}}\big(1-S_e\big)^{1-C_{c'}} \bigg]^{\tilde{C}_{c'}} \Big[ (1-S_p)^{C_{c'}}S_p^{1-C_{c'}} \Big]^{1-\tilde{C}_{c'}} \\ & \times \prod\limits_{(s, t)\in Q} \bigg[ S_e^{Y_{st}}\big(1-S_e\big)^{1-Y_{st}} \bigg]^{\tilde{Y}_{st}} \Big[ (1-S_p)^{Y_{st}}S_p^{1-Y_{st}} \Big]^{1-\tilde{Y}_{st}} \\ & \times \prod\limits_{(r', c') \in G} \bigg[ {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} (1-{p_{r'c'B}^{(t)}})^{1-{\tilde{Y}_{r'c'}}} \bigg]. \end{align*} |
The numerator requires further discussion:
(ⅰ) If (r, c) \in Q and R_r = 0 and C_c = 1 , then
\begin{align*} &P\big(\tilde{Y}_{rc} = 1, \mathcal{Z}_G = (0, 1)\big) = P(\tilde{Y}_{rc} = 1, \mathcal{R}, \mathcal{C}, \tilde{\mathcal{Y}}_Q)\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}P(\mathcal{R}, \mathcal{C}, \mathcal{Y}_Q|\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)})P\big(\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)}\big)\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}S_e^{Y_{rc}}(3-S_e)^{1-Y_{rc}}p_{rcB}^{(t)} \prod\limits_{r' \in R\setminus \{r\}} \bigg[\big(1-S_e\big)^{1-{R_{r'}}}\bigg]^{\tilde{R}_{r'c}} \Big[S_p^{1-R_{r'}}\Big]^{1-\tilde{R}_{r'c}}\\ & \times \prod\limits_{c' \in C\setminus \{c\}} \bigg[S_e^{C_{c'}}\big(1-S_e\big)^{1-{C_{c'}}}\bigg]^{\tilde{C}_{c'}} \Big[(1-S_p)^{C_{c'}}S_p^{1-C_{c'}}\Big]^{1-\tilde{C}_{c'}}\\ & \times \prod\limits_{(s, t)\in Q} \bigg[S_e^{Y_{st}}\big(1-S_e\big)^{1-Y_{st}}\bigg]^{\tilde{Y}_{st}} \Big[(1-S_p)^{Y_{st}}S_p^{1-Y_{st}}\Big]^{1-\tilde{Y}_{st}}\\ & \times \prod\limits_{(r', c')\in G\setminus \{(r, c)\}} \bigg[{p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} (1-{p_{r'c'B}^{(t)}})^{1-{\tilde{Y}_{r'c'}}}\bigg]. \end{align*} |
(ⅱ) If (r, c) \notin Q , then \mathcal{R} = 0 , \mathcal{C} \neq 0 , but R_r = 0 and C_r = 0 :
\begin{align*} P\big(\tilde{Y}_{rc} = 1, \mathcal{Z}_G = (0, 1)\big) = &P(\tilde{Y}_{rc} = 1, \mathcal{R}, \mathcal{C}, \mathcal{Y}_Q)\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}P(\mathcal{R}, \mathcal{C}, \mathcal{Y}_Q|\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)})P(\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)})\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}S_e^{Y_{rc}}(3-S_e)^{1-Y_{rc}}p_{rcB}^{(t)} \prod\limits_{r' \in R\setminus \big\{r\big\}} \bigg[ \big(1-S_e\big)^{1-{R_{r'}}} \bigg]^{\tilde{R}_{r'c}} \bigg[ S_p^{1-R_{r'}} \bigg]^{1-\tilde{R}_{r'c}}\\ & \times \prod\limits_{c' \in C\setminus \big\{c\big\}} \bigg[ S_e^{C_{c'}}\big(1-S_e\big)^{1-{C_{c'}}} \bigg]^{\tilde{C}_{c'}} \bigg[ \big(1-S_p\big)^{C_{c'}}S_p^{1-C_{c'}} \bigg]^{1-\tilde{C}_{c'}}\\ & \times \prod\limits_{(s, t)\in Q} \bigg[ S_e^{Y_{st}}\big(1-S_e\big)^{1-Y_{st}} \bigg]^{\tilde{Y}_{st}} \bigg[ \big(1-S_p\big)^{Y_{st}}S_p^{1-Y_{st}} \bigg]^{1-\tilde{Y}_{st}}\\ & \times \prod\limits_{(r', c')\in G\setminus \big\{(r, c)\big\}} ( {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} \big(1-{p_{r'c'B}^{(t)}}\big)^{1-{\tilde{Y}_{r'c'}}} ). \end{align*} |
(c) When (R, C) = (1, 1) , the denominator is
\begin{align*} P\big(\mathcal{Z}_G = (1, 1)\big) = &\sum\limits_{\tilde{\mathcal{Y}}}P(\mathcal{R}, \mathcal{C}, \mathcal{Y}_Q | \tilde{\mathcal{Y}})P(\tilde{\mathcal{Y}})\\ = &\sum\limits_{\tilde{\mathcal{Y}}} \bigg[\prod\limits_{r' = 1}^{R} P(R_r' | \tilde{Y}_{r'1}, \tilde{Y}_{r'2}, \cdots, \tilde{Y}_{r'C} ) \bigg] \bigg[\prod\limits_{c' = 1}^{C} P(C_{c'} | \tilde{Y}_{1c'}, \tilde{Y}_{2c'}, \cdots, \tilde{Y}_{Rc'} ) \bigg] \\ & \times \prod\limits_{(s, t)\in \mathcal{Q}} P(Y_{st} | \tilde{Y}_{st} ) \prod\limits_{r' \in R} \prod\limits_{c' \in C} {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} (1-{p_{r'c'B}^{(t)}})^{1-{\tilde{Y}_{r'c'}}}\\ = &\sum\limits_{\tilde{\mathcal{Y}}} \prod\limits_{r' = 1}^{R} \bigg[ S_e^{R_{r'}}(1-S_e)^{1-{R_{r'}}} \bigg]^{\tilde{R}_{r'c}} \bigg[ (1-S_p)^{R_{r'}}S_p^{1-R_{r'}} \bigg]^{1-\tilde{R}_{r'}}\\ & \times \prod\limits_{c' = 1}^{C} \bigg[ S_e^{C_{c'}}(1-S_e)^{1-C_{c'}} \bigg]^{\tilde{C}_{c'}} \bigg[ (1-S_p)^{C_{c'}}S_p^{1-C_{c'}} \bigg]^{1-\tilde{C}_{c'}}\\ & \times \prod\limits_{(s, t)\in Q} \bigg[ S_e^{Y_{st}}(1-S_e)^{1-Y_{st}} \bigg]^{\tilde{Y}_{st}} \bigg[ (1-S_p)^{Y_{st}}S_p^{1-Y_{st}} \bigg]^{1-\tilde{Y}_{st}}\\ & \times \prod\limits_{(r', c') \in G} \bigg\{ {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} (1-{p_{r'c'B}^{(t)}})^{1-{\tilde{Y}_{r'c'}}} \bigg\}. \end{align*} |
For the numerator, we provide the following derivations:
(ⅰ) If (r, c) \in Q and R_r = 1 and C_c = 1 , then
\begin{align*} &P\big(\tilde{Y}_{rc} = 1, \mathcal{Z}_G = (1, 1)\big) = P(\tilde{Y}_{rc} = 1, \mathcal{R}, \mathcal{C}, \mathcal{Y}_Q)\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}P(\mathcal{R}, \mathcal{C}, \mathcal{Y}_Q | \tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)})P(\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)})\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}S_e^{2+Y_{rc}}(1-S_e)^{1-Y_{rc}} \cdot p_{rcB}^{(t)}\\ & \times \prod\limits_{r' \in R\setminus \{r\}}\bigg[ S_e^{R_{r'}}(1-S_e)^{1-{R_{r'}}} \bigg]^{\tilde{R}_{r'c}} \bigg[ (1-S_p)^{R_{r'}}S_p^{1-R_{r'}} \bigg]^{1-\tilde{R}_{r'c}}\\ & \times \prod\limits_{c' \in C\setminus \{c\}}\bigg[ S_e^{C_{c'}}(1-S_e)^{1-{C_{c'}}} \bigg]^{\tilde{C}_{c'}} \bigg[ (1-S_p)^{C_{c'}}S_p^{1-C_{c'}} \bigg]^{1-\tilde{C}_{c'}}\\ & \times \prod\limits_{(s, t)\in Q\setminus \{(r, c)\}} \bigg[ S_e^{Y_{st}}(1-S_e)^{1-Y_{st}} \bigg]^{\tilde{Y}_{st}} \bigg[ (1-S_p)^{Y_{st}}S_p^{1-Y_{st}} \bigg]^{1-\tilde{Y}_{st}}\\ & \times \prod\limits_{(r', c')\in G\setminus \{(r, c)\}} \bigg\{ {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} ( 1-{p_{r'c'B}^{(t)}} )^{1-{\tilde{Y}_{r'c'}}} \bigg\}. \end{align*} |
(ⅱ) If (r, c) \notin Q , then \mathcal{R} \neq 0 , \mathcal{C} \neq 0 , but R_r = 0 and C_r = 0 :
\begin{align*} &P\big(\tilde{Y}_{rc} = 1, \mathcal{Z}_G = (1, 1)\big) = P(\tilde{Y}_{rc} = 1, \mathcal{R}, \mathcal{C}, \mathcal{Y}_Q)\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}P(\mathcal{R}, \mathcal{C}, \mathcal{Y}_Q|\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)})P(\tilde{Y}_{rc} = 1, \tilde{\mathcal{Y}}_{G \setminus (r, c)})\\ = &\sum\limits_{\tilde{\mathcal{Y}}_{G \setminus (r, c)}}(1-S_e)^2S_e^{Y_{rc}}(1-S_e)^{1-Y_{rc}}p_{rcB}^{(t)}\\ & \times \prod\limits_{r' \in R\setminus \{r\}}\bigg [S_e^{R_{r'}}(1-S_e)^{1-{R_{r'}}}\bigg ]^{\tilde{R}_{r'c}} \bigg [(1-S_p)^{R_{r'}}S_p^{1-R_{r'}}\bigg ]^{1-\tilde{R}_{r'c}}\\ & \times \prod\limits_{c' \in C\setminus \{c\}}\bigg [S_e^{C_{c'}}(1-S_e)^{1-{C_{c'}}}\bigg ]^{\tilde{C}_{c'}} \bigg [(1-S_p)^{C_{c'}}S_p^{1-C_{c'}}\bigg ]^{1-\tilde{C}_{c'}}\\ & \times \prod\limits_{(s, t)\in Q} \bigg [S_e^{Y_{st}}(1-S_e)^{1-Y_{st}}\bigg ]^{\tilde{Y}_{st}}\bigg [(1-S_p)^{Y_{st}}S_p^{1-Y_{st}}\bigg ]^{1-\tilde{Y}_{st}}\\ & \times \prod\limits_{(r', c')\in G\setminus \{(r, c)\}} \bigg\{ {p_{r'c'B}^{(t)}}^{\tilde{Y}_{r'c'}} (1-{p_{r'c'B}^{(t)}})^{1-{\tilde{Y}_{r'c'}}} \bigg\}. \end{align*} |
[1] |
Sinnhuber M, Nieder H, Wieters N (2012) Energetic Particle Precipitation and the Chemistry of the Mesosphere/Lower Thermosphere. Surv Geophys 33: 1281–1334. https://doi.org/10.1007/s10712-012-9201-3 doi: 10.1007/s10712-012-9201-3
![]() |
[2] | Krasniqi F (2023) Biosphere Newsletter, 1st issue. Available from: www.euramet-biosphere.eu. |
[3] | Papayannis A, Gidarakou M, Mylonaki M, et al. (2023) Aerosol, Temperature and Water Vapor profiling during the BIOSPHERE Athens Campaign (June-August 2023). 4th European Lidar Conference, Cluj-Napoca, Romania. https://doi.org/10.5281/zenodo.10018507 |
[4] |
Pierrard V, Lopez Rosson G, Borremans K, et al. (2014) The Energetic Particle Telescope: First results. Space Sci Rev 184: 87–106. https://doi.org/10.1007/s11214-014-0097-8 doi: 10.1007/s11214-014-0097-8
![]() |
[5] | Pierrard V (2024) Effects of the Sun on the space environment of the Earth, Presses Universitaires de Louvain, 208. https://i6doc.com/en/book/?gcoi = 28001100628290 |
[6] | Evans D, Greer M (2000) Polar orbiting environmental satellite space environment monitor. NOAA National Geophysical Data Center. |
[7] | Schwartz M, Froidevaux L, Livesey N, et al. (2020) MLS/Aura Level 2 Ozone (O3) Mixing Ratio V005, Greenbelt, MD, USA, Goddard Earth Sciences Data and Information Services Center (GES DISC). |
[8] |
Smith AK, Espy PJ, Lopez-Puertas M, et al. (2018) Spatial and temporal structure of the tertiary ozone maximum in the polar winter mesosphere. J Geophys Res Atmos 123: 4373–4389. https://doi.org/10.1029/2017JD028030 doi: 10.1029/2017JD028030
![]() |
[9] | Winant A, Pierrard V, Botek E (2025) Ozone decrease observed in the upper atmosphere following the May 11th 2024 Mother's Day solar storm. Ann Geophys Discuss. [preprint]. https://doi.org/10.5194/angeo-2024-29 |
[10] |
Banjac S, Herbst K, Heber B (2019) The atmospheric radiation interaction simulator (AtRIS): Description and validation. J Geophys Res Space Phys 124: 50–67. https://doi.org/10.1029/2018JA026042 doi: 10.1029/2018JA026042
![]() |
[11] |
Winant A, Pierrard V, Botek E, et al. (2023) The atmospheric influence on cosmic ray induced ionization and absorbed dose rates. Universe 9: 502. https://doi.org/10.3390/universe9120502 doi: 10.3390/universe9120502
![]() |
[12] |
Bilitza D, Pezzopane M, Truhlik V, et al. (2022) The International Reference Ionosphere model: A review and description of an ionospheric benchmark. Rev Geophys 60: e2022RG000792. https://doi.org/10.1029/2022RG000792 doi: 10.1029/2022RG000792
![]() |
[13] |
Pierrard V, Botek E, Darrouzet F (2021) Improving Predictions of the 3rd Dynamic Model of the Plasmasphere. Front Astron Space Sci 8: 681401. https://doi.org/10.3389/fspas.2021.681401 doi: 10.3389/fspas.2021.681401
![]() |
[14] |
Andersson ME, Verronen PT, Wang S, et al. (2012) Precipitating radiation belt electrons and enhancements of mesospheric hydroxyl during 2004–2009. J Geophys Res Atmos 117. https://doi.org/10.1029/2011JD017246 doi: 10.1029/2011JD017246
![]() |
[15] |
Wolff W, Dogan M, Luna H, et al. (2024) Absolute electron impact ionization cross-sections for CF4: Three dimensional recoil-ion imaging combined with the relative flow technique. Rev Sci Instrum 95: 095103. https://doi.org/10.1063/5.0219527 doi: 10.1063/5.0219527
![]() |
[16] |
Cyamukungu M, Benck S, Borisov S, et al. (2014) The Energetic Particle Telescope (EPT) on board PROBA-V: description of a new science-class instrument for particle detection in space. IEEE Trans Nucl Sci 61: 3667–3681. https://doi.org/10.1109/TNS.2014.2361955 doi: 10.1109/TNS.2014.2361955
![]() |
[17] |
McIlwain CE (1966) Magnetic coordinates. Space Sci Rev 5: 585–598. https://doi.org/10.1007/BF00167327 doi: 10.1007/BF00167327
![]() |
[18] |
Pierrard V, Botek E, Ripoll JF, et al. (2020) Electron dropout events and flux enhancements associated with geomagnetic storms observed by PROBA-V/EPT from 2013 to 2019. J Geophys Res Space Phys 125: e2020JA028487. https://doi.org/10.1029/2020JA028487 doi: 10.1029/2020JA028487
![]() |
[19] |
Pierrard V, Lopez Rosson G (2016) The effects of the big storm events in the first half of 2015 on the radiation belts observed by EPT/PROBA-V. Ann Geophys 34: 75–84. https://doi.org/10.5194/angeo-34-75-2016 doi: 10.5194/angeo-34-75-2016
![]() |
[20] |
Pierrard V, Ripoll J-F, Cunningham G, et al. (2021) Observations and simulations of dropout events and flux enhancements in October 2013: Comparing MEO equatorial with LEO polar orbit. J Geophys Res Space Phys 126: e2020JA028850. https://doi.org/10.1029/2020JA028850 doi: 10.1029/2020JA028850
![]() |
[21] |
Pierrard V, Botek E, Ripoll J-F, et al. (2021) Links of the plasmapause with other boundary layers of the magnetosphere: ionospheric convection, radiation belts boundaries, auroral oval. Front Astron Space Sci 8. https://doi.org/10.3389/fspas.2021.728531 doi: 10.3389/fspas.2021.728531
![]() |
[22] |
Cunningham GS, Botek E, Pierrard V, et al. (2020) Observation of High-Energy Electrons Precipitated by NWC Transmitter from PROBA-V Low-Earth Orbit Satellite. Geophys Res Lett 47: e2020GL089077. https://doi.org/10.1029/2020GL089077 doi: 10.1029/2020GL089077
![]() |
[23] |
Botek E, Pierrard V, Winant A (2023) Prediction of radiation belts electron fluxes at a Low Earth Orbit using neural networks with PROBA-V/EPT data. Space Weather 21: e2023SW003466. https://doi.org/10.1029/2023SW003466 doi: 10.1029/2023SW003466
![]() |
[24] |
Girgis KM, Hada T, Yoshikawa A, et al. (2023) Geomagnetic Storm Effects on the LEO Proton Flux during Solar Energetic Particle Events. Space Weather 21: e2023SW003664. https://doi.org/10.1029/2023SW003664 doi: 10.1029/2023SW003664
![]() |
[25] |
Pierrard V, Benck S, Botek E, et al. (2023) Proton flux variations during Solar Energetic Particle Events, minimum and maximum solar activity and splitting of the proton belt in the South Atlantic Anomaly. J Geophys Res Space Phys 128: e2022JA031202. https://doi.org/10.1029/2022JA031202 doi: 10.1029/2022JA031202
![]() |
[26] |
Pierrard V, Winant A, Botek E, et al. (2024) The Mother's Day solar storm of 11 May 2024 and its effect on Earth's radiation belts. Universe 10: 391. https://doi.org/10.3390/universe10100391 doi: 10.3390/universe10100391
![]() |
[27] |
Poikela T, Plosila J, Westerlund T, et al. (2014) Timepix3: a 65K channel hybrid pixel readout chip with simultaneous ToA/ToT and sparse readout. J Instrum 9: C05013. https://doi.org/10.1088/1748-0221/9/05/C05013 doi: 10.1088/1748-0221/9/05/C05013
![]() |
[28] |
Freiherr von Forstner JL, Dumbović M, Möstl C, et al. (2021) Radial evolution of the April 2020 stealth coronal mass ejection between 0.8 and 1 AU. Comparison of Forbush decreases at Solar Orbiter and near the Earth. Astron Astrophys 656. https://doi.org/10.1051/0004-6361/202039848 doi: 10.1051/0004-6361/202039848
![]() |
[29] | Krasniqi F, Stolzenberg U, Luchkov M, et al. (2022) PTB metrological infrastructure for the environmental dose assessment. PTB-Mitt 132: 21–31. |
[30] |
Chilingarian A, Hovsepyan G, Arakelyan K, et al. (2009) Space environmental viewing and analysis network (SEVAN). Earth Moon Planets 104: 195. https://doi.org/10.1007/s11038-008-9288-1 doi: 10.1007/s11038-008-9288-1
![]() |
[31] |
Chilingarian A, Babayan V, Karapetyan T, et al. (2018) The SEVAN Worldwide network of particle detectors: 10 years of operation. Adv Space Res 61: 2680–2696. https://doi.org/10.1016/j.asr.2018.02.030 doi: 10.1016/j.asr.2018.02.030
![]() |
[32] |
Granja C, Jakubek J, Soukup P, et al. (2022) Spectral tracking of energetic charged particles in wide field-of-view with miniaturized telescope MiniPIX Timepix3 1 × 2 Stack. J Instrum 17: C03028. https://doi.org/10.1088/1748-0221/17/03/C03028 doi: 10.1088/1748-0221/17/03/C03028
![]() |
[33] |
Granja C, Pospisil S (2014) Quantum Dosimetry and Online Visualization of X-ray and Charged Particle Radiation in Aircraft at Operational Flight Altitudes with the Pixel Detector Timepix. Adv Space Res 54: 241–251. https://doi.org/10.1016/j.asr.2014.04.006 doi: 10.1016/j.asr.2014.04.006
![]() |
[34] |
Spogli L, Alberti T, Bagiacchi P, et al. (2024) The effects of the May 2024 Mother's Day superstorm over the Mediterranean sector: from data to public communication. Ann Geophys 67: PA218. https://doi.org/10.4401/ag-9117 doi: 10.4401/ag-9117
![]() |
[35] |
Themens DR, Elvidge S, McCaffrey A, et al. (2024) The high latitude ionospheric response to the major May 2024 geomagnetic storm: A synoptic view. Geophys Res Lett 51: e2024GL111677. https://doi.org/10.1029/2024GL111677 doi: 10.1029/2024GL111677
![]() |
[36] |
Pierrard V, Verhulst TGW, Chevalier J-M, et al. (2025) Effects of the geomagnetic superstorms of 10–11 May 2024 and 7–11 October 2024 on the ionosphere and plasmasphere. Atmosphere 16: 299. https://doi.org/10.3390/atmos16030299 doi: 10.3390/atmos16030299
![]() |
[37] |
Liu X, Xu J, Yue J, et al. (2025) Mesosphere and lower thermosphere temperature responses to the May 2024 Mother's Day storm. Geophys Res Lett 52: e2024GL112179. https://doi.org/10.1029/2024GL112179 doi: 10.1029/2024GL112179
![]() |
[38] |
Richard E, Harber D, Coddington O, et al. (2020) SI-traceable spectral irradiance radiometric characterization and absolute calibration of the TSIS-1 spectral irradiance monitor (SIM). Remote Sens 12: 1818. https://doi.org/10.3390/rs12111818 doi: 10.3390/rs12111818
![]() |
[39] |
DeLand MT, Cebula RP (1993) Composite MgII solar activity index for solar cycles 21 and 22. J Geophys Res Atmos 98: 12809–12823. https://doi.org/10.1029/93JD00421 doi: 10.1029/93JD00421
![]() |
[40] | Bossay S (2015) Impact de la variabilité solaire sur l'ozone de la moyenne atmosphère. PhD Thesis, Université Versailles Saint Quentin en Yvelines, France. https://theses.hal.science/tel-01139519/ |
[41] |
De Bock V, De Backer H, Van Malderen R, et al. (2014) Relations between erythemal UV dose, global solar radiation, total ozone column and aerosol optical depth at Uccle, Belgium. Atmos Chem Phys 14: 12251–12270. https://doi.org/10.5194/acp-14-12251-2014 doi: 10.5194/acp-14-12251-2014
![]() |
[42] |
Rimmer JS, Redondas A, Karppinen T (2018) EuBrewNet—A European Brewer network (COST Action ES1207), an overview. Atmos Chem Phys 18: 10347–10353. https://doi.org/10.5194/acp-18-10347-2018 doi: 10.5194/acp-18-10347-2018
![]() |
[43] | McKinlay AF, Diffey BL (1987) A Reference Action Spectrum for Ultraviolet Induced Erythema in Human Skin. CIE J 6: 17–22. International Organization for Standardization and International Commission on Illumination. Erythema reference action spectrum and standard erythema dose (ISO/CIE 17166: 2019). Available from: https://www.iso.org. |
[44] |
Geronikolou S, Zimeras S, Tsitomeneas S, et al. (2023) Total Solar Irradiance and Stroke Mortality by Neural Networks Modelling. Atmosphere 14: 114. https://doi.org/10.3390/atmos14010114 doi: 10.3390/atmos14010114
![]() |
[45] |
Zhai T, Zilli Vieira CL, Vokonas P, et al. (2024) Annual space weather fluctuations and telomere length dynamics in a longitudinal cohort of older men: the Normative Aging Study. J Expo Sci Environ Epidemiol 34: 1072–1080. https://doi.org/10.1038/s41370-023-00616-z doi: 10.1038/s41370-023-00616-z
![]() |
[46] |
Vencloviene J, Babarskiene RM, Kiznys D (2017) A possible association between space weather conditions and the risk of acute coronary syndrome in patients with diabetes and the metabolic syndrome. Int J Biometeorol 61: 159–167. https://doi.org/10.1007/s00484-016-1200-5 doi: 10.1007/s00484-016-1200-5
![]() |
[47] |
Zilli Vieira CL, Chen K, Garshick E, et al. (2022) Geomagnetic disturbances reduce heart rate variability in the Normative Aging Study. Sci Total Environ 839: 156235. https://doi.org/10.1016/j.scitotenv.2022.156235 doi: 10.1016/j.scitotenv.2022.156235
![]() |
[48] | Halberg F, Cornélissen G, Katinas G, et al. (2007) Chronomics and Genetics. Scr Med (Brno) 80: 133–150. |
[49] | Halberg F, Cornélissen G, Katinas GS, et al. (2010) Cosmic Inheritance Rules: Implications for Health Care and Science. Scr Med (Brno) 83: 5–15 |
[50] |
Geronikolou S, Leontitsis A, Petropoulos V, et al. (2020) Cyclic stroke mortality variations follow sunspot patterns. F1000Research 9: 1088. https://doi.org/10.12688/f1000research.24794.2 doi: 10.12688/f1000research.24794.2
![]() |
[51] |
Persinger MA (1997) Geomagnetic variables and behavior: LXXXIII. Increased geomagnetic activity and group aggression in chronic limbic epileptic male rats. Percept Motor Skill 85(3_suppl): 1376–1378. https://doi.org/10.2466/pms.1997.85.3f.1376 doi: 10.2466/pms.1997.85.3f.1376
![]() |
[52] | Halberg F, Siutkina EV, Cornelissen G (1998) Chronomes render predictable the otherwise-neglected human "physiological range": position paper of the BIOCOS project, BIOsphere and the COSmos. Fiziologiia Chel 24: 14–21. |
[53] |
Mulligan B, Koren S (2021) Geopsychology of instrumental aggression: daily concurrence of global terrorism and solar-geomagnetic activity (1970–2018). Adv Soc Sci Res J 8: 487–499. https://doi.org/10.14738/assrj.85.10266 doi: 10.14738/assrj.85.10266
![]() |
[54] | Geronikolou S, Petropoulos V (2005) Stratospheric ozone, density variation solar activity and biological phenomena. HELLASET International Congress Cephalonia. |
[55] | Geronikolou SA (2014) Evaluation of chronomes in the Greek population and their relation to solar and atmospheric activity cycles, The Mediterranean City 2014 Proceedings, Mariolopoulos-Kanaginis Foundation, 16–17. |
[56] |
Stoupel E (2015) Considering space weather forces interaction on human health: the equilibrium paradigm in clinical cosmobiology—is it equal? J Basic Clin Physiol Pharmacol 26: 147–151. https://doi.org/10.1515/jbcpp-2014-0059 doi: 10.1515/jbcpp-2014-0059
![]() |
[57] |
Davis GE, Lowell WE (2006) Solar cycles and their relationship to human disease and adaptability. Med Hypotheses 67: 447–461. https://doi.org/10.1016/j.mehy.2006.03.011 doi: 10.1016/j.mehy.2006.03.011
![]() |
[58] |
Papathanasopoulos P, Preka-Papadema P, Gkotsinas A, et al. (2016) The possible effects of the solar and geomagnetic activity on multiple sclerosis. Clin Neurol Neurosurg 146: 82–89. https://doi.org/10.1016/j.clineuro.2016.04.023 doi: 10.1016/j.clineuro.2016.04.023
![]() |
[59] | Halberg F, Cornélissen G, Otsuka K, et al. (2000) Cross-spectrally coherent ~10.5- and 21-year biological and physical cycles, magnetic storms and myocardial infarctions. Neuro Endocrinol Lett 21: 233–258. |
[60] | Samsonov SN, Manykina VI, Kleimenova NG, et al. (2016) The HELIO-geophysical storminess health effects in the cardio-vascular system of a human in the middle and high latitudes. Wiad Lek 69(3 pt 2): 537–541. |
[61] |
Zilli Vieira CL, Alvares D, Blomberg A, et al. (2019) Geomagnetic disturbances driven by solar activity enhance total and cardiovascular mortality risk in 263 U.S. cities. Environ Health 18: 83. https://doi.org/10.1186/s12940-019-0516-0 doi: 10.1186/s12940-019-0516-0
![]() |
[62] |
Geronikolou SA, Pavlopoulou A, Kanaka-Gantenbein C, et al. (2018) Inter-species functional interactome of nuclear steroid receptors (R1). Front Biosci (Elite Ed) 10: 208–228. https://doi.org/10.2741/e818 doi: 10.2741/e818
![]() |
1. | Kaihong Zhao, Existence, Stability and Simulation of a Class of Nonlinear Fractional Langevin Equations Involving Nonsingular Mittag–Leffler Kernel, 2022, 6, 2504-3110, 469, 10.3390/fractalfract6090469 | |
2. | Saeed M. Ali, Mohammed S. Abdo, Abdellatif Ben Makhlouf, Qualitative Analysis for Multiterm Langevin Systems with Generalized Caputo Fractional Operators of Different Orders, 2022, 2022, 1563-5147, 1, 10.1155/2022/1879152 | |
3. | Choukri Derbazi, Hadda Hammouche, Abdelkrim Salim, Mouffak Benchohra, Weak solutions for fractional Langevin equations involving two fractional orders in banach spaces, 2023, 34, 1012-9405, 10.1007/s13370-022-01035-3 | |
4. | Songkran Pleumpreedaporn, Weerawat Sudsutad, Chatthai Thaiprayoon, Juan E. Nápoles, Jutarat Kongson, A Study of ψ-Hilfer Fractional Boundary Value Problem via Nonlinear Integral Conditions Describing Navier Model, 2021, 9, 2227-7390, 3292, 10.3390/math9243292 | |
5. | Mohammed A. Almalahi, Satish K. Panchal, Fahd Jarad, Calogero Vetro, Multipoint BVP for the Langevin Equation under φ -Hilfer Fractional Operator, 2022, 2022, 2314-8888, 1, 10.1155/2022/2798514 | |
6. | Kanoktip Kotsamran, Weerawat Sudsutad, Chatthai Thaiprayoon, Jutarat Kongson, Jehad Alzabut, Analysis of a Nonlinear ψ-Hilfer Fractional Integro-Differential Equation Describing Cantilever Beam Model with Nonlinear Boundary Conditions, 2021, 5, 2504-3110, 177, 10.3390/fractalfract5040177 | |
7. | Abdulkafi M. Saeed, Mohammed A. Almalahi, Mohammed S. Abdo, Explicit iteration and unique solution for \phi -Hilfer type fractional Langevin equations, 2021, 7, 2473-6988, 3456, 10.3934/math.2022192 | |
8. | Houas MOHAMED, Sequential fractional pantograph differential equations with nonlocal boundary conditions: Uniqueness and Ulam-Hyers-Rassias stability, 2022, 5, 2636-7556, 29, 10.53006/rna.928654 | |
9. | Naoufel Hatime, Ali El Mfadel, M.’hamed Elomari, Said Melliani, EXISTENCE AND STABILITY OF SOLUTIONS FOR NONLINEAR FRACTIONAL DIFFERENTIAL EQUATIONS INVOLVING THE GRÖNWALL-FREDHOLM-TYPE INEQUALITY, 2024, 1072-3374, 10.1007/s10958-024-07202-0 | |
10. | Ricardo Almeida, Euler–Lagrange-Type Equations for Functionals Involving Fractional Operators and Antiderivatives, 2023, 11, 2227-7390, 3208, 10.3390/math11143208 | |
11. | Hamid Baghani, Juan J. Nieto, Some New Properties of the Mittag-Leffler Functions and Their Applications to Solvability and Stability of a Class of Fractional Langevin Differential Equations, 2024, 23, 1575-5460, 10.1007/s12346-023-00870-4 | |
12. | Hacen Serrai, Brahim Tellab, Sina Etemad, İbrahim Avcı, Shahram Rezapour, Ψ-Bielecki-type norm inequalities for a generalized Sturm–Liouville–Langevin differential equation involving Ψ-Caputo fractional derivative, 2024, 2024, 1687-2770, 10.1186/s13661-024-01863-1 |
Metric | Implication |
True positive (TP) | Actual positive and predicted positive |
False positive (FP) | Actual negative and predicted positive |
False negative (FN) | Actual positive and predicted negative |
True negative (TN) | Actual negative and predicted negative |
AMAE | AMSE | MSE | ||||||||
Model | Setting | Test | Penalty | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 |
Example 4.1 (n=500) | q_n =50 | MPT | MCP | 0.980 | 0.061 | 0.325 | 0.011 | 0.0003 | 0.0022 | 0.0015 |
HT | 0.985 | 0.003 | 0.413 | 0.007 | 0.0002 | 0.0068 | 0.0025 | |||
DT | 0.968 | 0.062 | 0.295 | 0.011 | 0.0003 | 0.0001 | 0.0004 | |||
AT | 0.987 | 0.035 | 0.388 | 0.009 | 0.0004 | 0.0019 | 0.0009 | |||
MPT | SCAD | 0.967 | 0.060 | 0.508 | 0.014 | 0.0003 | 0.0012 | 0.0021 | ||
HT | 0.988 | 0.001 | 0.479 | 0.008 | 0.0001 | 0.0021 | 0.0023 | |||
DT | 0.980 | 0.051 | 0.511 | 0.012 | 0.0003 | 0.0005 | 0.0004 | |||
AT | 0.974 | 0.063 | 0.432 | 0.009 | 0.0003 | 0.0035 | 0.0024 | |||
MPT | LASSO | 0.964 | 0.060 | 0.337 | 0.011 | 0.0003 | 0.0038 | 0.0051 | ||
HT | 0.986 | 0.003 | 0.436 | 0.006 | 0.0001 | 0.0003 | 0.0002 | |||
DT | 1.000 | 0.029 | 0.522 | 0.013 | 0.0001 | 0.0004 | 0.0003 | |||
AT | 0.981 | 0.034 | 0.320 | 0.007 | 0.0001 | 0.0006 | 0.0008 | |||
q_n =100 | MPT | MCP | 0.985 | 0.010 | 0.511 | 0.009 | 0.0001 | 0.0004 | 0.0004 | |
HT | 0.973 | 0.038 | 0.374 | 0.009 | 0.0002 | 0.0022 | 0.0033 | |||
DT | 0.986 | 0.023 | 0.338 | 0.010 | 0.0001 | 0.0004 | 0.0001 | |||
AT | 0.982 | 0.023 | 0.470 | 0.005 | 0.0001 | 0.0004 | 0.0005 | |||
MPT | SCAD | 0.987 | 0.031 | 0.265 | 0.013 | 0.0002 | 0.0002 | 0.0003 | ||
HT | 0.988 | 0.038 | 0.458 | 0.015 | 0.0005 | 0.0017 | 0.0001 | |||
DT | 0.978 | 0.051 | 0.451 | 0.011 | 0.0001 | 0.0008 | 0.0004 | |||
AT | 0.985 | 0.010 | 0.422 | 0.009 | 0.0001 | 0.0058 | 0.0047 | |||
MPT | LASSO | 0.987 | 0.026 | 0.478 | 0.010 | 0.0001 | 0.0008 | 0.0012 | ||
HT | 0.966 | 0.044 | 0.364 | 0.011 | 0.0003 | 0.0029 | 0.0052 | |||
DT | 0.984 | 0.031 | 0.503 | 0.012 | 0.0001 | 0.0001 | 0.0003 | |||
AT | 0.987 | 0.031 | 0.401 | 0.008 | 0.0001 | 0.0016 | 0.0014 |
AMAE | AMSE | MSE | |||||||||
Model | Setting | Test | Penalty | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 | \beta_3 |
Example 4.2 (n=1000) | q_n =100 | MPT | MCP | 0.980 | 0.001 | 0.569 | 0.011 | 0.0001 | 0.0006 | 0.0025 | 0.0073 |
HT | 0.974 | 0.001 | 0.626 | 0.012 | 0.0003 | 0.0059 | 0.0167 | 0.0054 | |||
DT | 0.971 | 0.027 | 0.601 | 0.012 | 0.0002 | 0.0035 | 0.0022 | 0.0019 | |||
AT | 0.986 | 0.010 | 0.582 | 0.011 | 0.0001 | 0.0059 | 0.0011 | 0.0032 | |||
MPT | SCAD | 0.970 | 0.019 | 0.551 | 0.010 | \ast | 0.0014 | 0.0006 | 0.0011 | ||
HT | 0.964 | 0.029 | 0.588 | 0.011 | 0.0001 | 0.0021 | 0.0041 | 0.0001 | |||
DT | 0.972 | 0.021 | 0.572 | 0.011 | 0.0001 | 0.0037 | 0.0002 | 0.0034 | |||
AT | 0.971 | 0.021 | 0.575 | 0.011 | 0.0001 | 0.0057 | 0.0005 | 0.0042 | |||
MPT | LASSO | 0.974 | 0.048 | 0.553 | 0.010 | 0.0001 | 0.0000 | 0.0002 | 0.0003 | ||
HT | 0.972 | 0.056 | 0.601 | 0.010 | 0.0001 | 0.0003 | 0.0001 | 0.0006 | |||
DT | 0.982 | 0.021 | 0.574 | 0.010 | 0.0001 | 0.0035 | 0.0001 | 0.0042 | |||
AT | 0.986 | 0.010 | 0.584 | 0.011 | 0.0001 | 0.0041 | 0.0002 | 0.0056 | |||
q_n =500 | MPT | MCP | 0.964 | 0.011 | 0.562 | 0.013 | 0.0001 | 0.0005 | 0.0015 | 0.0042 | |
HT | 0.972 | 0.010 | 0.670 | 0.018 | 0.0001 | 0.0056 | 0.0001 | 0.0115 | |||
DT | 0.987 | 0.011 | 0.567 | 0.012 | \ast | 0.0044 | 0.0003 | 0.0058 | |||
AT | 0.986 | 0.020 | 0.669 | 0.015 | 0.0001 | 0.0022 | 0.0108 | 0.0012 | |||
MPT | SCAD | 0.965 | 0.014 | 0.515 | 0.010 | 0.0001 | 0.0003 | 0.0055 | 0.0045 | ||
HT | 0.968 | 0.018 | 0.547 | 0.015 | 0.0001 | 0.0023 | 0.0112 | 0.0069 | |||
DT | 0.989 | 0.007 | 0.534 | 0.011 | 0.0001 | 0.0048 | 0.0001 | 0.0047 | |||
AT | 0.985 | 0.005 | 0.608 | 0.010 | \ast | 0.0042 | 0.0021 | 0.0007 | |||
MPT | LASSO | 0.978 | 0.006 | 0.536 | 0.012 | 0.0001 | 0.0013 | 0.0132 | 0.0104 | ||
HT | 0.970 | 0.002 | 0.644 | 0.015 | 0.0001 | 0.0000 | 0.0092 | 0.0126 | |||
DT | 0.987 | 0.005 | 0.545 | 0.012 | \ast | 0.0015 | 0.0007 | 0.0019 | |||
AT | 0.981 | 0.002 | 0.526 | 0.012 | \ast | 0.0011 | 0.0093 | 0.0045 | |||
Symbol \ast indicates value smaller than 0.0001. |
AMAE | AMSE | MSE | |||||||||
Model | Setting | Test | Penalty | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 | \beta_3 |
Example 4.3 ( q_n =50) | n=500 | MPT | MCP | 0.951 | 0.103 | 0.466 | 0.019 | 0.0003 | 0.0003 | 0.0009 | 0.0011 |
HT | 0.966 | 0.091 | 0.571 | 0.021 | 0.0005 | 0.0007 | 0.0036 | 0.0045 | |||
DT | 0.982 | 0.043 | 0.360 | 0.006 | 0.0001 | 0.0002 | 0.0001 | 0.0001 | |||
AT | 0.981 | 0.021 | 0.464 | 0.012 | 0.0001 | 0.0005 | 0.0009 | 0.0006 | |||
MPT | SCAD | 0.957 | 0.139 | 0.527 | 0.023 | 0.0005 | 0.0001 | 0.0031 | 0.0098 | ||
HT | 0.968 | 0.082 | 0.433 | 0.020 | 0.0004 | 0.0006 | 0.0001 | 0.0003 | |||
DT | 0.954 | 0.140 | 0.411 | 0.013 | 0.0002 | 0.0011 | 0.0018 | 0.0012 | |||
AT | 0.972 | 0.064 | 0.793 | 0.018 | 0.0002 | 0.0038 | 0.0021 | 0.0004 | |||
MPT | LASSO | 0.981 | 0.024 | 0.604 | 0.021 | 0.0003 | 0.0042 | 0.0014 | 0.0019 | ||
HT | 0.983 | 0.021 | 0.432 | 0.026 | 0.0001 | 0.0017 | 0.0005 | 0.0016 | |||
DT | 0.971 | 0.094 | 0.470 | 0.013 | 0.0002 | 0.0004 | 0.0014 | 0.0023 | |||
AT | 0.980 | 0.061 | 0.447 | 0.013 | 0.0002 | 0.0002 | 0.0004 | 0.0015 | |||
n=1000 | MPT | MCP | 0.988 | 0.040 | 0.358 | 0.015 | 0.0002 | 0.0011 | 0.0024 | 0.0042 | |
HT | 0.984 | 0.021 | 0.399 | 0.017 | 0.0006 | 0.0008 | 0.0009 | 0.0013 | |||
DT | 0.989 | 0.000 | 0.583 | 0.014 | 0.0001 | 0.0001 | 0.0019 | 0.0024 | |||
AT | 0.985 | 0.009 | 0.405 | 0.013 | 0.0001 | 0.0017 | 0.0041 | 0.0012 | |||
MPT | SCAD | 0.989 | 0.043 | 0.537 | 0.016 | 0.0002 | 0.0025 | 0.0004 | 0.0038 | ||
HT | 0.987 | 0.003 | 0.512 | 0.012 | 0.0001 | 0.0012 | 0.0032 | 0.0031 | |||
DT | 0.986 | 0.003 | 0.515 | 0.012 | 0.0001 | 0.0001 | 0.0002 | 0.0004 | |||
AT | 1.000 | 0.000 | 0.410 | 0.013 | 0.0001 | 0.0013 | 0.0022 | 0.0013 | |||
MPT | LASSO | 0.988 | 0.004 | 0.441 | 0.011 | 0.0002 | 0.0029 | 0.0012 | 0.0021 | ||
HT | 0.982 | 0.007 | 0.326 | 0.007 | 0.0001 | 0.0002 | 0.0004 | 0.0002 | |||
DT | 0.987 | 0.008 | 0.489 | 0.013 | 0.0001 | 0.0008 | 0.0001 | 0.0032 | |||
AT | 0.977 | 0.043 | 0.283 | 0.007 | 0.0001 | 0.0012 | 0.0024 | 0.0034 |
AMAE | AMSE | MSE | ||||||||||
Model | Setting | Test | Penalty | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 | \beta_3 | \beta_4 |
Example 4.4 ( q_n =100) | n=750 | MPT | MCP | 0.979 | 0.053 | 0.744 | 0.019 | 0.0004 | 0.0028 | 0.0015 | 0.0036 | 0.0076 |
HT | 0.959 | 0.100 | 0.970 | 0.027 | 0.0011 | 0.0024 | 0.0005 | 0.0022 | 0.0018 | |||
DT | 0.986 | 0.035 | 0.611 | 0.011 | 0.0001 | 0.0001 | 0.0025 | 0.0034 | 0.0016 | |||
AT | 0.984 | 0.043 | 0.789 | 0.013 | 0.0002 | 0.0012 | 0.0051 | 0.0026 | 0.0012 | |||
MPT | SCAD | 0.966 | 0.059 | 0.723 | 0.014 | 0.0003 | 0.0030 | 0.0078 | 0.0081 | 0.0001 | ||
HT | 0.978 | 0.069 | 0.576 | 0.014 | 0.0002 | 0.0004 | 0.0083 | 0.0052 | 0.0002 | |||
DT | 0.989 | 0.063 | 0.698 | 0.013 | 0.0003 | 0.0034 | 0.0155 | 0.0011 | 0.0051 | |||
AT | 0.981 | 0.052 | 0.671 | 0.022 | 0.0005 | 0.0078 | 0.0023 | 0.0085 | 0.0073 | |||
MPT | LASSO | 0.977 | 0.072 | 0.620 | 0.014 | 0.0003 | 0.0047 | 0.0041 | 0.0141 | 0.0002 | ||
HT | 0.964 | 0.069 | 0.680 | 0.015 | 0.0003 | 0.0018 | 0.0071 | 0.0073 | 0.0007 | |||
DT | 0.986 | 0.041 | 0.581 | 0.016 | 0.0003 | 0.0034 | 0.0090 | 0.0014 | 0.0005 | |||
AT | 0.984 | 0.065 | 0.679 | 0.016 | 0.0003 | 0.0001 | 0.0095 | 0.0065 | 0.0003 | |||
n=1000 | MPT | MCP | 0.967 | 0.029 | 0.706 | 0.015 | 0.0002 | 0.0068 | 0.0097 | 0.0022 | 0.0015 | |
HT | 0.986 | 0.001 | 0.818 | 0.012 | 0.0001 | 0.0035 | 0.0061 | 0.0007 | 0.0001 | |||
DT | 0.987 | 0.032 | 0.872 | 0.012 | 0.0002 | 0.0007 | 0.0074 | 0.0017 | 0.0016 | |||
AT | 0.988 | 0.037 | 0.800 | 0.027 | 0.0002 | 0.0013 | 0.0061 | 0.0002 | 0.0025 | |||
MPT | SCAD | 0.961 | 0.059 | 0.724 | 0.015 | 0.0002 | 0.0081 | 0.0087 | 0.0030 | 0.0006 | ||
HT | 0.974 | 0.010 | 0.779 | 0.013 | 0.0001 | 0.0036 | 0.0066 | 0.0012 | 0.0001 | |||
DT | 0.983 | 0.071 | 0.405 | 0.010 | 0.0001 | 0.0013 | 0.0059 | 0.0008 | 0.0001 | |||
AT | 0.981 | 0.041 | 0.422 | 0.010 | 0.0001 | 0.0003 | 0.0009 | 0.0020 | 0.0011 | |||
MPT | LASSO | 0.977 | 0.029 | 0.819 | 0.017 | 0.0004 | 0.0057 | 0.0012 | 0.0083 | 0.0079 | ||
HT | 0.951 | 0.004 | 0.545 | 0.043 | 0.0001 | 0.0093 | 0.0004 | 0.0004 | 0.0025 | |||
DT | 0.985 | 0.021 | 0.408 | 0.009 | 0.0001 | 0.0002 | 0.0011 | 0.0026 | 0.0007 | |||
AT | 0.989 | 0.008 | 0.581 | 0.010 | 0.0001 | 0.0042 | 0.0003 | 0.0004 | 0.0008 |
AMAE | MEAN | |||||||||
Model | Setting | Group Size | TPR | FPR | g(\cdot) | Prob | \beta_1 | \beta_2 | \beta_3 | \beta_4 |
Example 4 ( q_n =100) | n=750 | 2 | 0.970 | 0.015 | 0.611 | 0.011 | 0.452 | 0.465 | 0.478 | 0.460 |
4 | 0.965 | 0.020 | 0.581 | 0.016 | 0.445 | 0.405 | 0.464 | 0.477 | ||
6 | 0.986 | 0.041 | 0.627 | 0.009 | 0.519 | 0.497 | 0.487 | 0.495 | ||
8 | 0.973 | 0.020 | 0.594 | 0.012 | 0.471 | 0.467 | 0.484 | 0.477 | ||
n=1000 | 2 | 0.974 | 0.014 | 0.447 | 0.009 | 0.468 | 0.484 | 0.473 | 0.485 | |
4 | 0.964 | 0.018 | 0.408 | 0.009 | 0.489 | 0.468 | 0.450 | 0.474 | ||
6 | 0.985 | 0.021 | 0.440 | 0.011 | 0.486 | 0.478 | 0.443 | 0.471 | ||
8 | 0.974 | 0.010 | 0.466 | 0.009 | 0.494 | 0.494 | 0.447 | 0.437 |
Variable | \hat{\beta}^{norm}_{PLR} | \hat{\beta}_{our} | \hat{\beta}^{norm}_{aenet} | Variable | \hat{\beta}^{norm}_{PLR} | \hat{\beta}_{our} | \hat{\beta}^{norm}_{aenet} | Variable | \hat{\beta}^{norm}_{PLR} | \hat{\beta}_{our} | \hat{\beta}^{norm}_{aenet} |
age | 0.280 | 0.307 | -0.085 | Family history | Household income | ||||||
waist circumference | 0.178 | 0.194 | 0.271 | family history1 | 0.000 | 0.000 | 0.000 | household income1 | 0.000 | 0.000 | 0.000 |
BMI | 0.000 | 0.000 | 0.000 | family history2 | -0.492 | -0.567 | -0.466 | household income2 | 0.024 | 0.000 | 0.000 |
height | 0.000 | 0.000 | 0.000 | family history9 | 0.000 | 0.000 | 0.000 | household income3 | 0.000 | 0.000 | 0.000 |
weight | 0.000 | 0.000 | 0.000 | Physical activity | household income4 | 0.000 | -0.069 | 0.000 | |||
smoking age | 0.000 | 0.007 | 0.000 | physical activity1 | 0.000 | 0.056 | 0.000 | household income5 | 0.000 | 0.000 | 0.000 |
alcohol use | 0.009 | 0.013 | 0.000 | physical activity2 | -0.086 | -0.018 | 0.000 | household income6 | 0.000 | 0.000 | 0.000 |
leg length | -0.048 | -0.100 | -0.043 | physical activity3 | -0.134 | -0.039 | 0.000 | household income7 | 0.000 | 0.000 | 0.000 |
total cholesterol | 0.000 | 0.000 | 0.000 | physical activity4 | -0.088 | 0.000 | 0.000 | household income8 | 0.001 | 0.065 | 0.000 |
Hypertension | physical activity9 | 0.000 | 0.000 | 0.000 | household income9 | 0.000 | 0.000 | 0.000 | |||
hypertension1 | 0.000 | 0.000 | 0.000 | Sex | household income10 | 0.000 | 0.000 | 0.000 | |||
hypertension2 | -0.350 | -0.372 | -0.641 | sex1 | -0.010 | 0.000 | 0.000 | household income11 | 0.000 | 0.000 | 0.000 |
Education | sex2 | -0.237 | -0.225 | -0.424 | household income12 | 0.000 | 0.000 | 0.000 | |||
education1 | 0.000 | 0.000 | 0.000 | race | household income13 | 0.000 | 0.000 | 0.000 | |||
education2 | 0.000 | 0.000 | 0.000 | race1 | 0.000 | 0.000 | 0.000 | household income77 | 0.000 | 0.231 | 0.000 |
education3 | 0.000 | 0.000 | 0.000 | race2 | -0.019 | -0.073 | 0.000 | household income99 | 0.000 | 0.000 | 0.000 |
education4 | 0.000 | 0.000 | 0.000 | race3 | -0.399 | -0.380 | -0.330 | ||||
education5 | -0.014 | -0.052 | 0.000 | race4 | 0.000 | 0.000 | 0.000 | ||||
education7 | -0.523 | -0.335 | 0.000 | race5 | 0.000 | 0.124 | 0.000 |
AMAE | AMSE | MSE | |||||||
Model | Setting | (S_e, S_p) | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 |
Example 4.1 ( q_n=50 ) | n=500 | (0.98, 0.98) | 1.000 | 0.029 | 0.522 | 0.013 | 0.0001 | 0.0004 | 0.0003 |
(0.95, 0.95) | 0.987 | 0.020 | 0.474 | 0.011 | 0.0001 | 0.0003 | 0.0003 | ||
(0.90, 0.90) | 0.982 | 0.036 | 0.532 | 0.011 | 0.0001 | 0.0006 | 0.0007 | ||
(0.85, 0.85) | 0.984 | 0.040 | 0.578 | 0.016 | 0.0003 | 0.0001 | 0.0002 |
AMAE | AMSE | MSE | ||||||||
Model | Setting | (S_e, S_p) | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 | \beta_3 |
Example 4.2 ( q_n=100 ) | n=1000 | (0.98, 0.98) | 0.982 | 0.021 | 0.574 | 0.010 | 0.0001 | 0.0035 | 0.0001 | 0.0042 |
(0.95, 0.95) | 0.975 | 0.030 | 0.612 | 0.011 | 0.0001 | 0.0047 | 0.0001 | 0.0069 | ||
(0.90, 0.90) | 0.978 | 0.020 | 0.556 | 0.012 | 0.0001 | 0.0023 | 0.0002 | 0.0049 | ||
(0.85, 0.85) | 0.965 | 0.020 | 0.717 | 0.016 | 0.0004 | 0.0158 | 0.0002 | 0.0212 |
AMAE | AMSE | MSE | ||||||||
Model | Setting | (S_e, S_p) | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 | \beta_3 |
Example 4.3 ( q_n=50 ) | n=1000 | (0.98, 0.98) | 0.987 | 0.008 | 0.489 | 0.013 | 0.0001 | 0.0008 | 0.0001 | 0.0032 |
(0.95, 0.95) | 0.971 | 0.064 | 0.404 | 0.011 | 0.0003 | 0.0005 | 0.0033 | 0.0085 | ||
(0.90, 0.90) | 0.963 | 0.048 | 0.465 | 0.011 | 0.0001 | 0.0002 | 0.0012 | 0.0055 | ||
(0.85, 0.85) | 0.966 | 0.018 | 0.377 | 0.015 | 0.0004 | 0.0016 | 0.0023 | 0.0007 |
AMAE | AMSE | MSE | |||||||||
Model | Setting | (S_e, S_p) | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 | \beta_3 | \beta_4 |
Example 4.4 ( q_n=100 ) | n=750 | (0.98, 0.98) | 0.986 | 0.041 | 0.581 | 0.016 | 0.0003 | 0.0034 | 0.0090 | 0.0014 | 0.0005 |
(0.95, 0.95) | 0.981 | 0.026 | 0.534 | 0.018 | 0.0001 | 0.0016 | 0.0045 | 0.0018 | 0.0005 | ||
(0.90, 0.90) | 0.974 | 0.018 | 0.546 | 0.016 | 0.0002 | 0.0004 | 0.0024 | 0.0014 | 0.0028 | ||
(0.85, 0.85) | 0.976 | 0.024 | 0.539 | 0.011 | 0.0002 | 0.0047 | 0.0085 | 0.0004 | 0.0039 |
Variable | Implication | Variable | Implication |
Hypertension circumstance | Family history of diabetes | ||
hypertension1 | Have a history of hypertension | family history1 | Blood relatives with diabetes |
hypertension2 | No history of hypertension | family history2 | Blood relatives do not have diabetes |
Education level | family history9 | Not known if any blood relatives have diabetes | |
education1 | Less Than 9th Grade | Physical activity | |
education2 | 9 - 11th Grade (Includes 12th grade with no diploma) | physical activity1 | Sit during the day and do not walk about very much |
education3 | High School Grad/GED or Equivalent | physical activity2 | Stand or walk about a lot during the day, but do not have to carry or lift things very often |
education4 | Some College or AA degree | physical activity3 | Lift light load or has to climb stairs or hills often |
education5 | College Graduate or above | physical activity4 | Do heavy work or carry heavy loads |
education7 | Refuse to answer about the level of education | physical activity9 | Don't know physical activity level |
Household income | Sex | ||
household income1 | 0 to 4,999 fanxiexian_myfh | sex1 | Male |
household income2 | 5,000 to 9,999 fanxiexian_myfh | sex2 | Female |
household income3 | 10,000 to 14,999 fanxiexian_myfh | Race/Ethnicity | |
household income4 | 15,000 to 19,999 fanxiexian_myfh | race1 | Mexican American |
household income5 | 20,000 to 24,999 fanxiexian_myfh | race2 | Other Hispanic |
household income6 | 25,000 to 34,999 fanxiexian_myfh | race3 | Non - Hispanic White |
household income7 | 35,000 to 44,999 fanxiexian_myfh | race4 | Non - Hispanic Black |
household income8 | 45,000 to 54,999 fanxiexian_myfh | race5 | Other Race - Including Multi - Racial |
household income9 | 55,000 to 64,999 fanxiexian_myfh | ||
household income10 | 65,000 to 74,999 fanxiexian_myfh | ||
household income11 | 75,000 and Over fanxiexian_myfh | ||
household income12 | Over 20,000 fanxiexian_myfh | ||
household income13 | Under 20,000 fanxiexian_myfh | ||
household income77 | Refusal to answer about household income | ||
household income99 | Don't know household income |
Metric | Implication |
True positive (TP) | Actual positive and predicted positive |
False positive (FP) | Actual negative and predicted positive |
False negative (FN) | Actual positive and predicted negative |
True negative (TN) | Actual negative and predicted negative |
AMAE | AMSE | MSE | ||||||||
Model | Setting | Test | Penalty | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 |
Example 4.1 (n=500) | q_n =50 | MPT | MCP | 0.980 | 0.061 | 0.325 | 0.011 | 0.0003 | 0.0022 | 0.0015 |
HT | 0.985 | 0.003 | 0.413 | 0.007 | 0.0002 | 0.0068 | 0.0025 | |||
DT | 0.968 | 0.062 | 0.295 | 0.011 | 0.0003 | 0.0001 | 0.0004 | |||
AT | 0.987 | 0.035 | 0.388 | 0.009 | 0.0004 | 0.0019 | 0.0009 | |||
MPT | SCAD | 0.967 | 0.060 | 0.508 | 0.014 | 0.0003 | 0.0012 | 0.0021 | ||
HT | 0.988 | 0.001 | 0.479 | 0.008 | 0.0001 | 0.0021 | 0.0023 | |||
DT | 0.980 | 0.051 | 0.511 | 0.012 | 0.0003 | 0.0005 | 0.0004 | |||
AT | 0.974 | 0.063 | 0.432 | 0.009 | 0.0003 | 0.0035 | 0.0024 | |||
MPT | LASSO | 0.964 | 0.060 | 0.337 | 0.011 | 0.0003 | 0.0038 | 0.0051 | ||
HT | 0.986 | 0.003 | 0.436 | 0.006 | 0.0001 | 0.0003 | 0.0002 | |||
DT | 1.000 | 0.029 | 0.522 | 0.013 | 0.0001 | 0.0004 | 0.0003 | |||
AT | 0.981 | 0.034 | 0.320 | 0.007 | 0.0001 | 0.0006 | 0.0008 | |||
q_n =100 | MPT | MCP | 0.985 | 0.010 | 0.511 | 0.009 | 0.0001 | 0.0004 | 0.0004 | |
HT | 0.973 | 0.038 | 0.374 | 0.009 | 0.0002 | 0.0022 | 0.0033 | |||
DT | 0.986 | 0.023 | 0.338 | 0.010 | 0.0001 | 0.0004 | 0.0001 | |||
AT | 0.982 | 0.023 | 0.470 | 0.005 | 0.0001 | 0.0004 | 0.0005 | |||
MPT | SCAD | 0.987 | 0.031 | 0.265 | 0.013 | 0.0002 | 0.0002 | 0.0003 | ||
HT | 0.988 | 0.038 | 0.458 | 0.015 | 0.0005 | 0.0017 | 0.0001 | |||
DT | 0.978 | 0.051 | 0.451 | 0.011 | 0.0001 | 0.0008 | 0.0004 | |||
AT | 0.985 | 0.010 | 0.422 | 0.009 | 0.0001 | 0.0058 | 0.0047 | |||
MPT | LASSO | 0.987 | 0.026 | 0.478 | 0.010 | 0.0001 | 0.0008 | 0.0012 | ||
HT | 0.966 | 0.044 | 0.364 | 0.011 | 0.0003 | 0.0029 | 0.0052 | |||
DT | 0.984 | 0.031 | 0.503 | 0.012 | 0.0001 | 0.0001 | 0.0003 | |||
AT | 0.987 | 0.031 | 0.401 | 0.008 | 0.0001 | 0.0016 | 0.0014 |
AMAE | AMSE | MSE | |||||||||
Model | Setting | Test | Penalty | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 | \beta_3 |
Example 4.2 (n=1000) | q_n =100 | MPT | MCP | 0.980 | 0.001 | 0.569 | 0.011 | 0.0001 | 0.0006 | 0.0025 | 0.0073 |
HT | 0.974 | 0.001 | 0.626 | 0.012 | 0.0003 | 0.0059 | 0.0167 | 0.0054 | |||
DT | 0.971 | 0.027 | 0.601 | 0.012 | 0.0002 | 0.0035 | 0.0022 | 0.0019 | |||
AT | 0.986 | 0.010 | 0.582 | 0.011 | 0.0001 | 0.0059 | 0.0011 | 0.0032 | |||
MPT | SCAD | 0.970 | 0.019 | 0.551 | 0.010 | \ast | 0.0014 | 0.0006 | 0.0011 | ||
HT | 0.964 | 0.029 | 0.588 | 0.011 | 0.0001 | 0.0021 | 0.0041 | 0.0001 | |||
DT | 0.972 | 0.021 | 0.572 | 0.011 | 0.0001 | 0.0037 | 0.0002 | 0.0034 | |||
AT | 0.971 | 0.021 | 0.575 | 0.011 | 0.0001 | 0.0057 | 0.0005 | 0.0042 | |||
MPT | LASSO | 0.974 | 0.048 | 0.553 | 0.010 | 0.0001 | 0.0000 | 0.0002 | 0.0003 | ||
HT | 0.972 | 0.056 | 0.601 | 0.010 | 0.0001 | 0.0003 | 0.0001 | 0.0006 | |||
DT | 0.982 | 0.021 | 0.574 | 0.010 | 0.0001 | 0.0035 | 0.0001 | 0.0042 | |||
AT | 0.986 | 0.010 | 0.584 | 0.011 | 0.0001 | 0.0041 | 0.0002 | 0.0056 | |||
q_n =500 | MPT | MCP | 0.964 | 0.011 | 0.562 | 0.013 | 0.0001 | 0.0005 | 0.0015 | 0.0042 | |
HT | 0.972 | 0.010 | 0.670 | 0.018 | 0.0001 | 0.0056 | 0.0001 | 0.0115 | |||
DT | 0.987 | 0.011 | 0.567 | 0.012 | \ast | 0.0044 | 0.0003 | 0.0058 | |||
AT | 0.986 | 0.020 | 0.669 | 0.015 | 0.0001 | 0.0022 | 0.0108 | 0.0012 | |||
MPT | SCAD | 0.965 | 0.014 | 0.515 | 0.010 | 0.0001 | 0.0003 | 0.0055 | 0.0045 | ||
HT | 0.968 | 0.018 | 0.547 | 0.015 | 0.0001 | 0.0023 | 0.0112 | 0.0069 | |||
DT | 0.989 | 0.007 | 0.534 | 0.011 | 0.0001 | 0.0048 | 0.0001 | 0.0047 | |||
AT | 0.985 | 0.005 | 0.608 | 0.010 | \ast | 0.0042 | 0.0021 | 0.0007 | |||
MPT | LASSO | 0.978 | 0.006 | 0.536 | 0.012 | 0.0001 | 0.0013 | 0.0132 | 0.0104 | ||
HT | 0.970 | 0.002 | 0.644 | 0.015 | 0.0001 | 0.0000 | 0.0092 | 0.0126 | |||
DT | 0.987 | 0.005 | 0.545 | 0.012 | \ast | 0.0015 | 0.0007 | 0.0019 | |||
AT | 0.981 | 0.002 | 0.526 | 0.012 | \ast | 0.0011 | 0.0093 | 0.0045 | |||
Symbol \ast indicates value smaller than 0.0001. |
AMAE | AMSE | MSE | |||||||||
Model | Setting | Test | Penalty | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 | \beta_3 |
Example 4.3 ( q_n =50) | n=500 | MPT | MCP | 0.951 | 0.103 | 0.466 | 0.019 | 0.0003 | 0.0003 | 0.0009 | 0.0011 |
HT | 0.966 | 0.091 | 0.571 | 0.021 | 0.0005 | 0.0007 | 0.0036 | 0.0045 | |||
DT | 0.982 | 0.043 | 0.360 | 0.006 | 0.0001 | 0.0002 | 0.0001 | 0.0001 | |||
AT | 0.981 | 0.021 | 0.464 | 0.012 | 0.0001 | 0.0005 | 0.0009 | 0.0006 | |||
MPT | SCAD | 0.957 | 0.139 | 0.527 | 0.023 | 0.0005 | 0.0001 | 0.0031 | 0.0098 | ||
HT | 0.968 | 0.082 | 0.433 | 0.020 | 0.0004 | 0.0006 | 0.0001 | 0.0003 | |||
DT | 0.954 | 0.140 | 0.411 | 0.013 | 0.0002 | 0.0011 | 0.0018 | 0.0012 | |||
AT | 0.972 | 0.064 | 0.793 | 0.018 | 0.0002 | 0.0038 | 0.0021 | 0.0004 | |||
MPT | LASSO | 0.981 | 0.024 | 0.604 | 0.021 | 0.0003 | 0.0042 | 0.0014 | 0.0019 | ||
HT | 0.983 | 0.021 | 0.432 | 0.026 | 0.0001 | 0.0017 | 0.0005 | 0.0016 | |||
DT | 0.971 | 0.094 | 0.470 | 0.013 | 0.0002 | 0.0004 | 0.0014 | 0.0023 | |||
AT | 0.980 | 0.061 | 0.447 | 0.013 | 0.0002 | 0.0002 | 0.0004 | 0.0015 | |||
n=1000 | MPT | MCP | 0.988 | 0.040 | 0.358 | 0.015 | 0.0002 | 0.0011 | 0.0024 | 0.0042 | |
HT | 0.984 | 0.021 | 0.399 | 0.017 | 0.0006 | 0.0008 | 0.0009 | 0.0013 | |||
DT | 0.989 | 0.000 | 0.583 | 0.014 | 0.0001 | 0.0001 | 0.0019 | 0.0024 | |||
AT | 0.985 | 0.009 | 0.405 | 0.013 | 0.0001 | 0.0017 | 0.0041 | 0.0012 | |||
MPT | SCAD | 0.989 | 0.043 | 0.537 | 0.016 | 0.0002 | 0.0025 | 0.0004 | 0.0038 | ||
HT | 0.987 | 0.003 | 0.512 | 0.012 | 0.0001 | 0.0012 | 0.0032 | 0.0031 | |||
DT | 0.986 | 0.003 | 0.515 | 0.012 | 0.0001 | 0.0001 | 0.0002 | 0.0004 | |||
AT | 1.000 | 0.000 | 0.410 | 0.013 | 0.0001 | 0.0013 | 0.0022 | 0.0013 | |||
MPT | LASSO | 0.988 | 0.004 | 0.441 | 0.011 | 0.0002 | 0.0029 | 0.0012 | 0.0021 | ||
HT | 0.982 | 0.007 | 0.326 | 0.007 | 0.0001 | 0.0002 | 0.0004 | 0.0002 | |||
DT | 0.987 | 0.008 | 0.489 | 0.013 | 0.0001 | 0.0008 | 0.0001 | 0.0032 | |||
AT | 0.977 | 0.043 | 0.283 | 0.007 | 0.0001 | 0.0012 | 0.0024 | 0.0034 |
AMAE | AMSE | MSE | ||||||||||
Model | Setting | Test | Penalty | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 | \beta_3 | \beta_4 |
Example 4.4 ( q_n =100) | n=750 | MPT | MCP | 0.979 | 0.053 | 0.744 | 0.019 | 0.0004 | 0.0028 | 0.0015 | 0.0036 | 0.0076 |
HT | 0.959 | 0.100 | 0.970 | 0.027 | 0.0011 | 0.0024 | 0.0005 | 0.0022 | 0.0018 | |||
DT | 0.986 | 0.035 | 0.611 | 0.011 | 0.0001 | 0.0001 | 0.0025 | 0.0034 | 0.0016 | |||
AT | 0.984 | 0.043 | 0.789 | 0.013 | 0.0002 | 0.0012 | 0.0051 | 0.0026 | 0.0012 | |||
MPT | SCAD | 0.966 | 0.059 | 0.723 | 0.014 | 0.0003 | 0.0030 | 0.0078 | 0.0081 | 0.0001 | ||
HT | 0.978 | 0.069 | 0.576 | 0.014 | 0.0002 | 0.0004 | 0.0083 | 0.0052 | 0.0002 | |||
DT | 0.989 | 0.063 | 0.698 | 0.013 | 0.0003 | 0.0034 | 0.0155 | 0.0011 | 0.0051 | |||
AT | 0.981 | 0.052 | 0.671 | 0.022 | 0.0005 | 0.0078 | 0.0023 | 0.0085 | 0.0073 | |||
MPT | LASSO | 0.977 | 0.072 | 0.620 | 0.014 | 0.0003 | 0.0047 | 0.0041 | 0.0141 | 0.0002 | ||
HT | 0.964 | 0.069 | 0.680 | 0.015 | 0.0003 | 0.0018 | 0.0071 | 0.0073 | 0.0007 | |||
DT | 0.986 | 0.041 | 0.581 | 0.016 | 0.0003 | 0.0034 | 0.0090 | 0.0014 | 0.0005 | |||
AT | 0.984 | 0.065 | 0.679 | 0.016 | 0.0003 | 0.0001 | 0.0095 | 0.0065 | 0.0003 | |||
n=1000 | MPT | MCP | 0.967 | 0.029 | 0.706 | 0.015 | 0.0002 | 0.0068 | 0.0097 | 0.0022 | 0.0015 | |
HT | 0.986 | 0.001 | 0.818 | 0.012 | 0.0001 | 0.0035 | 0.0061 | 0.0007 | 0.0001 | |||
DT | 0.987 | 0.032 | 0.872 | 0.012 | 0.0002 | 0.0007 | 0.0074 | 0.0017 | 0.0016 | |||
AT | 0.988 | 0.037 | 0.800 | 0.027 | 0.0002 | 0.0013 | 0.0061 | 0.0002 | 0.0025 | |||
MPT | SCAD | 0.961 | 0.059 | 0.724 | 0.015 | 0.0002 | 0.0081 | 0.0087 | 0.0030 | 0.0006 | ||
HT | 0.974 | 0.010 | 0.779 | 0.013 | 0.0001 | 0.0036 | 0.0066 | 0.0012 | 0.0001 | |||
DT | 0.983 | 0.071 | 0.405 | 0.010 | 0.0001 | 0.0013 | 0.0059 | 0.0008 | 0.0001 | |||
AT | 0.981 | 0.041 | 0.422 | 0.010 | 0.0001 | 0.0003 | 0.0009 | 0.0020 | 0.0011 | |||
MPT | LASSO | 0.977 | 0.029 | 0.819 | 0.017 | 0.0004 | 0.0057 | 0.0012 | 0.0083 | 0.0079 | ||
HT | 0.951 | 0.004 | 0.545 | 0.043 | 0.0001 | 0.0093 | 0.0004 | 0.0004 | 0.0025 | |||
DT | 0.985 | 0.021 | 0.408 | 0.009 | 0.0001 | 0.0002 | 0.0011 | 0.0026 | 0.0007 | |||
AT | 0.989 | 0.008 | 0.581 | 0.010 | 0.0001 | 0.0042 | 0.0003 | 0.0004 | 0.0008 |
AMAE | MEAN | |||||||||
Model | Setting | Group Size | TPR | FPR | g(\cdot) | Prob | \beta_1 | \beta_2 | \beta_3 | \beta_4 |
Example 4 ( q_n =100) | n=750 | 2 | 0.970 | 0.015 | 0.611 | 0.011 | 0.452 | 0.465 | 0.478 | 0.460 |
4 | 0.965 | 0.020 | 0.581 | 0.016 | 0.445 | 0.405 | 0.464 | 0.477 | ||
6 | 0.986 | 0.041 | 0.627 | 0.009 | 0.519 | 0.497 | 0.487 | 0.495 | ||
8 | 0.973 | 0.020 | 0.594 | 0.012 | 0.471 | 0.467 | 0.484 | 0.477 | ||
n=1000 | 2 | 0.974 | 0.014 | 0.447 | 0.009 | 0.468 | 0.484 | 0.473 | 0.485 | |
4 | 0.964 | 0.018 | 0.408 | 0.009 | 0.489 | 0.468 | 0.450 | 0.474 | ||
6 | 0.985 | 0.021 | 0.440 | 0.011 | 0.486 | 0.478 | 0.443 | 0.471 | ||
8 | 0.974 | 0.010 | 0.466 | 0.009 | 0.494 | 0.494 | 0.447 | 0.437 |
Variable | \hat{\beta}^{norm}_{PLR} | \hat{\beta}_{our} | \hat{\beta}^{norm}_{aenet} | Variable | \hat{\beta}^{norm}_{PLR} | \hat{\beta}_{our} | \hat{\beta}^{norm}_{aenet} | Variable | \hat{\beta}^{norm}_{PLR} | \hat{\beta}_{our} | \hat{\beta}^{norm}_{aenet} |
age | 0.280 | 0.307 | -0.085 | Family history | Household income | ||||||
waist circumference | 0.178 | 0.194 | 0.271 | family history1 | 0.000 | 0.000 | 0.000 | household income1 | 0.000 | 0.000 | 0.000 |
BMI | 0.000 | 0.000 | 0.000 | family history2 | -0.492 | -0.567 | -0.466 | household income2 | 0.024 | 0.000 | 0.000 |
height | 0.000 | 0.000 | 0.000 | family history9 | 0.000 | 0.000 | 0.000 | household income3 | 0.000 | 0.000 | 0.000 |
weight | 0.000 | 0.000 | 0.000 | Physical activity | household income4 | 0.000 | -0.069 | 0.000 | |||
smoking age | 0.000 | 0.007 | 0.000 | physical activity1 | 0.000 | 0.056 | 0.000 | household income5 | 0.000 | 0.000 | 0.000 |
alcohol use | 0.009 | 0.013 | 0.000 | physical activity2 | -0.086 | -0.018 | 0.000 | household income6 | 0.000 | 0.000 | 0.000 |
leg length | -0.048 | -0.100 | -0.043 | physical activity3 | -0.134 | -0.039 | 0.000 | household income7 | 0.000 | 0.000 | 0.000 |
total cholesterol | 0.000 | 0.000 | 0.000 | physical activity4 | -0.088 | 0.000 | 0.000 | household income8 | 0.001 | 0.065 | 0.000 |
Hypertension | physical activity9 | 0.000 | 0.000 | 0.000 | household income9 | 0.000 | 0.000 | 0.000 | |||
hypertension1 | 0.000 | 0.000 | 0.000 | Sex | household income10 | 0.000 | 0.000 | 0.000 | |||
hypertension2 | -0.350 | -0.372 | -0.641 | sex1 | -0.010 | 0.000 | 0.000 | household income11 | 0.000 | 0.000 | 0.000 |
Education | sex2 | -0.237 | -0.225 | -0.424 | household income12 | 0.000 | 0.000 | 0.000 | |||
education1 | 0.000 | 0.000 | 0.000 | race | household income13 | 0.000 | 0.000 | 0.000 | |||
education2 | 0.000 | 0.000 | 0.000 | race1 | 0.000 | 0.000 | 0.000 | household income77 | 0.000 | 0.231 | 0.000 |
education3 | 0.000 | 0.000 | 0.000 | race2 | -0.019 | -0.073 | 0.000 | household income99 | 0.000 | 0.000 | 0.000 |
education4 | 0.000 | 0.000 | 0.000 | race3 | -0.399 | -0.380 | -0.330 | ||||
education5 | -0.014 | -0.052 | 0.000 | race4 | 0.000 | 0.000 | 0.000 | ||||
education7 | -0.523 | -0.335 | 0.000 | race5 | 0.000 | 0.124 | 0.000 |
AMAE | AMSE | MSE | |||||||
Model | Setting | (S_e, S_p) | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 |
Example 4.1 ( q_n=50 ) | n=500 | (0.98, 0.98) | 1.000 | 0.029 | 0.522 | 0.013 | 0.0001 | 0.0004 | 0.0003 |
(0.95, 0.95) | 0.987 | 0.020 | 0.474 | 0.011 | 0.0001 | 0.0003 | 0.0003 | ||
(0.90, 0.90) | 0.982 | 0.036 | 0.532 | 0.011 | 0.0001 | 0.0006 | 0.0007 | ||
(0.85, 0.85) | 0.984 | 0.040 | 0.578 | 0.016 | 0.0003 | 0.0001 | 0.0002 |
AMAE | AMSE | MSE | ||||||||
Model | Setting | (S_e, S_p) | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 | \beta_3 |
Example 4.2 ( q_n=100 ) | n=1000 | (0.98, 0.98) | 0.982 | 0.021 | 0.574 | 0.010 | 0.0001 | 0.0035 | 0.0001 | 0.0042 |
(0.95, 0.95) | 0.975 | 0.030 | 0.612 | 0.011 | 0.0001 | 0.0047 | 0.0001 | 0.0069 | ||
(0.90, 0.90) | 0.978 | 0.020 | 0.556 | 0.012 | 0.0001 | 0.0023 | 0.0002 | 0.0049 | ||
(0.85, 0.85) | 0.965 | 0.020 | 0.717 | 0.016 | 0.0004 | 0.0158 | 0.0002 | 0.0212 |
AMAE | AMSE | MSE | ||||||||
Model | Setting | (S_e, S_p) | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 | \beta_3 |
Example 4.3 ( q_n=50 ) | n=1000 | (0.98, 0.98) | 0.987 | 0.008 | 0.489 | 0.013 | 0.0001 | 0.0008 | 0.0001 | 0.0032 |
(0.95, 0.95) | 0.971 | 0.064 | 0.404 | 0.011 | 0.0003 | 0.0005 | 0.0033 | 0.0085 | ||
(0.90, 0.90) | 0.963 | 0.048 | 0.465 | 0.011 | 0.0001 | 0.0002 | 0.0012 | 0.0055 | ||
(0.85, 0.85) | 0.966 | 0.018 | 0.377 | 0.015 | 0.0004 | 0.0016 | 0.0023 | 0.0007 |
AMAE | AMSE | MSE | |||||||||
Model | Setting | (S_e, S_p) | TPR | FPR | g(\cdot) | Prob | \boldsymbol{\beta} | \beta_1 | \beta_2 | \beta_3 | \beta_4 |
Example 4.4 ( q_n=100 ) | n=750 | (0.98, 0.98) | 0.986 | 0.041 | 0.581 | 0.016 | 0.0003 | 0.0034 | 0.0090 | 0.0014 | 0.0005 |
(0.95, 0.95) | 0.981 | 0.026 | 0.534 | 0.018 | 0.0001 | 0.0016 | 0.0045 | 0.0018 | 0.0005 | ||
(0.90, 0.90) | 0.974 | 0.018 | 0.546 | 0.016 | 0.0002 | 0.0004 | 0.0024 | 0.0014 | 0.0028 | ||
(0.85, 0.85) | 0.976 | 0.024 | 0.539 | 0.011 | 0.0002 | 0.0047 | 0.0085 | 0.0004 | 0.0039 |
Variable | Implication | Variable | Implication |
Hypertension circumstance | Family history of diabetes | ||
hypertension1 | Have a history of hypertension | family history1 | Blood relatives with diabetes |
hypertension2 | No history of hypertension | family history2 | Blood relatives do not have diabetes |
Education level | family history9 | Not known if any blood relatives have diabetes | |
education1 | Less Than 9th Grade | Physical activity | |
education2 | 9 - 11th Grade (Includes 12th grade with no diploma) | physical activity1 | Sit during the day and do not walk about very much |
education3 | High School Grad/GED or Equivalent | physical activity2 | Stand or walk about a lot during the day, but do not have to carry or lift things very often |
education4 | Some College or AA degree | physical activity3 | Lift light load or has to climb stairs or hills often |
education5 | College Graduate or above | physical activity4 | Do heavy work or carry heavy loads |
education7 | Refuse to answer about the level of education | physical activity9 | Don't know physical activity level |
Household income | Sex | ||
household income1 | 0 to 4,999 fanxiexian_myfh | sex1 | Male |
household income2 | 5,000 to 9,999 fanxiexian_myfh | sex2 | Female |
household income3 | 10,000 to 14,999 fanxiexian_myfh | Race/Ethnicity | |
household income4 | 15,000 to 19,999 fanxiexian_myfh | race1 | Mexican American |
household income5 | 20,000 to 24,999 fanxiexian_myfh | race2 | Other Hispanic |
household income6 | 25,000 to 34,999 fanxiexian_myfh | race3 | Non - Hispanic White |
household income7 | 35,000 to 44,999 fanxiexian_myfh | race4 | Non - Hispanic Black |
household income8 | 45,000 to 54,999 fanxiexian_myfh | race5 | Other Race - Including Multi - Racial |
household income9 | 55,000 to 64,999 fanxiexian_myfh | ||
household income10 | 65,000 to 74,999 fanxiexian_myfh | ||
household income11 | 75,000 and Over fanxiexian_myfh | ||
household income12 | Over 20,000 fanxiexian_myfh | ||
household income13 | Under 20,000 fanxiexian_myfh | ||
household income77 | Refusal to answer about household income | ||
household income99 | Don't know household income |