Loading [MathJax]/jax/output/SVG/jax.js

Motion of discrete interfaces in low-contrast periodic media

  • We study the motion of discrete interfaces driven by ferromagnetic interactions in a two-dimensional low-contrast periodic environment, by coupling the minimizing movements approach by Almgren, Taylor and Wang and a discrete-to-continuum analysis. As in a recent paper by Braides and Scilla dealing with high-contrast periodic media, we give an example showing that in general the effective motion does not depend only on the $\Gamma$-limit, but also on geometrical features that are not detected in the static description. We show that there exists a critical value $\widetilde{\delta}$ of the contrast parameter $\delta$ above which the discrete motion is constrained and coincides with the high-contrast case. If $\delta<\widetilde{\delta}$ we have a new pinning threshold and a new effective velocity both depending on $\delta$. We also consider the case of non-uniform inclusions distributed into periodic uniform layers.

    Citation: Giovanni Scilla. Motion of discrete interfaces in low-contrast periodic media[J]. Networks and Heterogeneous Media, 2014, 9(1): 169-189. doi: 10.3934/nhm.2014.9.169

    Related Papers:

    [1] Taewan Kim, Jung Hoon Kim . A new optimal control approach to uncertain Euler-Lagrange equations: $ H_\infty $ disturbance estimator and generalized $ H_2 $ tracking controller. AIMS Mathematics, 2024, 9(12): 34466-34487. doi: 10.3934/math.20241642
    [2] Valérie Gauthier-Umaña, Henryk Gzyl, Enrique ter Horst . Decoding as a linear ill-posed problem: The entropy minimization approach. AIMS Mathematics, 2025, 10(2): 4139-4152. doi: 10.3934/math.2025192
    [3] Dayang Dai, Dabuxilatu Wang . A generalized Liu-type estimator for logistic partial linear regression model with multicollinearity. AIMS Mathematics, 2023, 8(5): 11851-11874. doi: 10.3934/math.2023600
    [4] Urmee Maitra, Ashish R. Hota, Rohit Gupta, Alfred O. Hero . Optimal protection and vaccination against epidemics with reinfection risk. AIMS Mathematics, 2025, 10(4): 10140-10162. doi: 10.3934/math.2025462
    [5] Fei Yan, Junpeng Li, Haosheng Jiang, Chongqi Zhang . $ A $-Optimal designs for mixture polynomial models with heteroscedastic errors. AIMS Mathematics, 2023, 8(11): 26745-26757. doi: 10.3934/math.20231369
    [6] Jiali Wu, Maoning Tang, Qingxin Meng . A stochastic linear-quadratic optimal control problem with jumps in an infinite horizon. AIMS Mathematics, 2023, 8(2): 4042-4078. doi: 10.3934/math.2023202
    [7] Xiaowei Zhang, Junliang Li . Model averaging with causal effects for partially linear models. AIMS Mathematics, 2024, 9(6): 16392-16421. doi: 10.3934/math.2024794
    [8] Bo Jiang, Yongge Tian . Equivalent analysis of different estimations under a multivariate general linear model. AIMS Mathematics, 2024, 9(9): 23544-23563. doi: 10.3934/math.20241144
    [9] Zuliang Lu, Ruixiang Xu, Chunjuan Hou, Lu Xing . A priori error estimates of finite volume element method for bilinear parabolic optimal control problem. AIMS Mathematics, 2023, 8(8): 19374-19390. doi: 10.3934/math.2023988
    [10] Chengjin Tang, Jiahao Guo, Yinghui Dong . Optimal investment based on performance measure with a stochastic benchmark. AIMS Mathematics, 2025, 10(2): 2750-2770. doi: 10.3934/math.2025129
  • We study the motion of discrete interfaces driven by ferromagnetic interactions in a two-dimensional low-contrast periodic environment, by coupling the minimizing movements approach by Almgren, Taylor and Wang and a discrete-to-continuum analysis. As in a recent paper by Braides and Scilla dealing with high-contrast periodic media, we give an example showing that in general the effective motion does not depend only on the $\Gamma$-limit, but also on geometrical features that are not detected in the static description. We show that there exists a critical value $\widetilde{\delta}$ of the contrast parameter $\delta$ above which the discrete motion is constrained and coincides with the high-contrast case. If $\delta<\widetilde{\delta}$ we have a new pinning threshold and a new effective velocity both depending on $\delta$. We also consider the case of non-uniform inclusions distributed into periodic uniform layers.


    The generalized linear model (GLM), a generalization of the linear model with wide applications in many research areas, was proposed by Nelder and Wedderburn [1] in 1972 for discrete dependent variables, which cannot be dealt with by the ordinary linear regression model. The GLM allows the response variable to be nonnormal distributions, including binomial, Poisson, gamma, and inverse Gaussian distributions, whose means are linked with the predictors by a link function.

    Nowadays, with the rapid development of science and technology, massive data is ubiquitous in many fields, including medicine, industry, and economics. Extracting effective information from massive data is the core challenge of big data analysis. However, the limited arithmetic power of computers tends to consume a lot of computing time. In order to deal with this challenge, parallel computing and distributed computing are commonly used, and subsampling techniques have emerged as a result, i.e., a small number of representative samples are extracted from massive data. Imberg et al. [2] proposed a theory on optimal design in the context of general data subsampling issues. It includes and extends most existing methods, works out optimality conditions, and offers algorithms for finding optimal subsampling scheme designs, which introduces a new class of invariant linear optimality criteria. Chao et al. [3] presented an optimal subsampling approach for modal regression with big data. The estimators are obtained by means of a two-step algorithm based on the modal expectation maximization when the bandwidth is not related to the subsample size.

    There has been a great deal of research on subsampling algorithms of specific models. Wang et al. [4] devised a rapid subsampling algorithm to approximate the maximum likelihood estimators in the context of logistic regression. Based on the previous study, Wang [5] presented an enhanced estimation method for logistic regression, which has a higher estimation efficiency. In the case that data are usually distributed in multiple distributed sites for storage, Zuo et al. [6] developed a distributed subsampling procedure to effectively approximate the maximum likelihood estimators of logistic regression. Ai et al. [7] focused on the optimal subsampling method under the A-optimality criteria based on the method developed by Wang [4] for generalized linear models to quickly approximate maximum likelihood estimators from massive data. Yao and Wang [8] examined optimal subsampling methods for various models, including logistic regression models, softmax regression models, generalized linear models, quantile regression models, and quasi-likelihood estimators. Yu et al. [9] proposed an efficient subsampling procedure for online data streams with a multinomial logistic model. Yu et al. [10] studied the subsampling technique for the Akaike information criterion (AIC) and the smoothed AIC model-averaging framework for generalized linear models. Yu et al. [11] reviewed several subsampling methods for massive datasets from the viewpoint of statistical design.

    To the best of our knowledge, all the existing methods above assume that the covariates are fully observable. However, in practice, this assumption is not realistic, and covariates may be inaccurately observed owing to measurement errors, which will lead to biases in the estimators of the regression coefficients. This means that we may incorrectly determine some unimportant variables as significant, which in turn affects the model selection and interpretation. Therefore, it is necessary to consider measurement errors. Liang et al. [12], Li and Xue [13], and Liang and Li [14] investigated the partial linear measurement error models. Stefanski [15] and Nakamura [16] obtained the corrected score functions of the GLM, such as linear regression, gamma regression, inverse gamma regression, and Poisson regression. Yang et al. [17] proposed an empirical likelihood method based on the moment identity of the corrected score function to perform statistical inference for a class of generalized linear measurement error models. Fuller [18] estimated the variable error model using the maximum likelihood method and studied statistical inference. Hu and Cui [19] proposed a corrected error variance method to accurately estimate the error variance, which can effectively reduce the influence of measurement error and false correlation at the same time. Carroll et al. [20] summarized the measurement errors in linear regression and described some simple and universally applicable measurement error analysis methods. Yi et al. [21] presented a regression calibration method, which is one of the first statistical methods introduced to address measurement errors in the covariates. In addition, they presented an overview of the conditional score and corrected score approaches for measurement error correction. Regarding the measurement errors in different situations existing in actual data, extensive research has been carried out, and a variety of methods have been proposed, see [22,23,24,25]. Recently, a class of variable selection procedures has been developed for measurement error models, see [26,27]. More recently, Ju et al. [28] studied the optimal subsampling algorithm and the random perturbation subsampling algorithm for big data linear models with measurement errors. The aim of this paper is to estimate the parameters using a subsampling algorithm for a class of generalized linear measurement error models in the massive data analysis.

    In this paper, we study a class of the GLM with measurement errors, such as logistic regression models and Poisson regression models. We combine the corrected score function method with subsampling techniques to investigate subsampling algorithms. The consistency and asymptotic normality of the estimators obtained in the general subsampling algorithm are derived. We optimize the subsampling probabilities based on the design of A-optimality and L-optimality criteria and incorporate a truncation method in the optimal subsampling probabilities to obtain the optimal estimators. In addition, we develop an adaptive two-step algorithm and obtain the consistency and asymptotic normality of the final subsampling estimators. Finally, the effectiveness of the proposed method is demonstrated through numerical simulations and real data analysis.

    The remainder of this paper is organized as follows: Section 2 introduces the corrected score function under different distributions and derives the general subsampling algorithm and the adaptive two-step algorithm. Sections 3 and 4 verify the effectiveness of the proposed method by generating simulated experimental data and two real data sets, respectively. Section 5 provides conclusions.

    In the GLM, it is assumed that the conditional distribution of the response variable belongs to the exponential family

    $ f(y;\theta ) = \exp \left\{\frac{\theta y - b(\theta )}{a(\phi)} + c(y,\phi) \right\}, $

    where $ a(\cdot), b(\cdot), c(\cdot, \cdot) $ are known functions, $ \theta $ is called the natural parameter, and $ \phi $ is called the dispersion parameter.

    Let $ \left\{ {\left({{{\mathit{\boldsymbol{X}}}_i}, {{{Y}}_i}} \right)} \right\}_{i = 1}^N $ be independent and identically distributed random samples, $ {\mu_i } = E\left({{Y}_i} \mid {\mathit{\boldsymbol{X}}_i}\right), \; V\left({\mu}_i \right) = \text{Var}\left({{Y}_i}\mid {\mathit{\boldsymbol{X}}_i}\right) $, where the covariate $ {\mathit{\boldsymbol{X}}}_i \in{\mathbb{R}{^p}} $ and the response variable $ {{Y}}_i \in{\mathbb{R}} $, $ V(\cdot) $ is a known variance function. The conditional expectation of $ {Y_i} $ given $ {{\mathit{\boldsymbol{X}}}_i} $ is

    $ g(μi)=XTiβ, $ (2.1)

    where $ g(\cdot) $ is the canonical link function, and $ \mathit{\boldsymbol{\beta}} = {\left({{\beta _1}, \ldots, {\beta _p}} \right)^{\rm T}} $is a $ p $-dimensional unknown regression parameter.

    In practice, covariates are not always accurately observed, and there are measurement errors that cannot be ignored. Let $ {{\mathit{\boldsymbol{W}}}_i} $ be an accurate observation of the covariate $ {{\mathit{\boldsymbol{X}}}_i} $. Assuming that the additive measurement error model is

    $ Wi=Xi+Ui, $ (2.2)

    where $ {\mathit{\boldsymbol{U}}_i} \sim N_p({{\mathbf{0}}}, {{\mathit{\boldsymbol{\Sigma}}}_u}) $, and it is independent of $ ({{\mathit{\boldsymbol{X}}}_i}, {Y_i}) $. Combining (2.1) and (2.2) yields a generalized linear model with measurement errors.

    Define the log-likelihood function as $ \ell (\mathit{\boldsymbol{\beta}}; Y_i) = \sum\limits_{i = 1}^N \log f(Y_i; \mathit{\boldsymbol{\beta}}) $. If $ {{\mathit{\boldsymbol{X}}}_i} $ is observable, the score function for $ \mathit{\boldsymbol{\beta}} $ in (2.1) is

    $ \sum\limits_{i = 1}^N \mathit{\boldsymbol{\eta}}_{i} \left(\mathit{\boldsymbol{\beta}} ;{{\mathit{\boldsymbol{X}}}_i},{Y_i} \right) = \sum\limits_{i = 1}^N \frac{\partial \ell (\mathit{\boldsymbol{\beta}}; Y_i)}{\partial \mathit{\boldsymbol{\beta}}} = \sum\limits_{i = 1}^N \frac{Y_i - \mu_i}{V(\mu_i)} \cdot \frac{\partial \mu_i}{\partial \mathit{\boldsymbol{\beta}}}, $

    and satisfies $ E[{\mathit{\boldsymbol{\eta}}}_i \left(\mathit{\boldsymbol{\beta}}; {{\mathit{\boldsymbol{X}}}_i}, {Y_i} \right) \mid {{\mathit{\boldsymbol{X}}}_i}] = {\mathbf{0}} $. However, when there is an error in $ {{\mathit{\boldsymbol{X}}}_i} $, directly replacing $ {{\mathit{\boldsymbol{X}}}_i} $ with $ {{\mathit{\boldsymbol{W}}}_i} $ to calculate $ {\mathit{\boldsymbol{\eta}}}_i \left(\mathit{\boldsymbol{\beta}}; {{\mathit{\boldsymbol{X}}}_i}, {Y_i} \right) $ causes a bias, i.e., $ E[{\mathit{\boldsymbol{\eta}}}_i \left(\mathit{\boldsymbol{\beta}}; {{\mathit{\boldsymbol{X}}}_i}, {Y_i} \right)] = {{\mathbf{0}}} $ will not always hold, hence a correction is needed. We define an unbiased score function $ {{\mathit{\boldsymbol{\eta}}}_i^*}({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}; {{\mathit{\boldsymbol{W}}}_i}, {Y_i}) $ for $ \mathit{\boldsymbol{\beta}} $ satisfying $ E[{{\mathit{\boldsymbol{\eta}}}_i^*} ({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}; {{\mathit{\boldsymbol{W}}}_i}, {Y_i}) \mid {{\mathit{\boldsymbol{X}}}_i}] = {\mathbf{0}} $ by the idea of [16]. The maximum likelihood estimator $ {\hat {\mathit{\boldsymbol{\beta}}} _{\text{MLE}}} $ of $ \mathit{\boldsymbol{\beta}} $ is the solution of the estimating equation

    $ Q(β):=Ni=1ηi(Σu,β;Wi,Yi)=0. $ (2.3)

    Based on the following moment identities associated with the error model (2.2),

    $ E({{\mathit{\boldsymbol{W}}}_i}\mid {{\mathit{\boldsymbol{X}}}_i}) = {{\mathit{\boldsymbol{X}}}_i}, $
    $ E({{\mathit{\boldsymbol{W}}}_i}{{\mathit{\boldsymbol{W}}}_i^{\rm T}}\mid{{\mathit{\boldsymbol{X}}}_i}) = {{\mathit{\boldsymbol{X}}}_i}{{{\mathit{\boldsymbol{X}}}}_i^{\rm T}} + {{\mathit{\boldsymbol{\Sigma}}}_u}, $
    $ E(\exp ({{\mathit{\boldsymbol{W}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}})\mid{{\mathit{\boldsymbol{X}}}_i}) = \exp \left( {{{\mathit{\boldsymbol{X}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}} + \frac{1}{2}{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\bf{\Sigma }}_u}\mathit{\boldsymbol{\beta}}} \right), $
    $ E\left[ {{{\mathit{\boldsymbol{W}}}_i}\exp \left( {{{\mathit{\boldsymbol{W}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}}} \right)\mid {{\mathit{\boldsymbol{X}}}_i}} \right] = \left( {{{\mathit{\boldsymbol{X}}}_i} + {{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right)\exp \left( {{{\mathit{\boldsymbol{X}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}} + \frac{1}{2}{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right), $
    $ E\left[ {{{\mathit{\boldsymbol{W}}}_i}\exp \left( { - {{{\mathit{\boldsymbol{W}}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}}} \right)\mid {{\mathit{\boldsymbol{X}}}_i}} \right] = \left( {{{\mathit{\boldsymbol{X}}}_i} - {{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right)\exp \left[ { - {{\mathit{\boldsymbol{X}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}} + \frac{1}{2}{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right], $
    $ E\left[ {{{\mathit{\boldsymbol{W}}}_i}\exp \left( { - 2{{\mathit{\boldsymbol{W}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}}} \right) \mid {{\mathit{\boldsymbol{X}}}_i}} \right] = \left( {{{\mathit{\boldsymbol{X}}}_i} - 2{{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right)\exp \left[ { - 2{{\mathit{\boldsymbol{X}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}} + 2{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right], $

    then we can construct the unbiased score function for binary logistic measurement error regression models and Poisson measurement error regression models, which are widely used in practice.

    (1) Binary logistic measurement error regression models.

    We consider the logistic measurement error regression model

    $ \left\{ P(Yi=1Xi)=11+exp(XTiβ),Wi=Xi+Ui, \right. $

    with mean $ {\mu_i } = {\left[ {1 + \exp \left({ - {{\mathit{\boldsymbol{X}}}_i^{\rm T}}\mathit{\boldsymbol{\beta}}} \right)} \right]^{ - 1}} $ and variance $ \text{Var}\left({Y_i \mid {{\mathit{\boldsymbol{X}}}_i}} \right) = {\mu_i }\left({1 - {\mu_i }} \right) $. Followed by Huang and Wang [29], the corrected score function is

    $ \mathit{\boldsymbol{\eta}}_i^*\left( {{{\mathit{\boldsymbol{\Sigma}}}_u},\mathit{\boldsymbol{\beta}};{\mathit{\boldsymbol{W}}_i},{Y_i}} \right) = {\mathit{\boldsymbol{W}}_i}{Y_i} + \left( {{\mathit{\boldsymbol{W}}_i} + {{\mathit{\boldsymbol{\Sigma}}}_u}{\mathit{\boldsymbol{\beta}}}} \right)\exp \left( { - {\mathit{\boldsymbol{W}}_i^{\rm T}} {\mathit{\boldsymbol{\beta}}} - \frac{1}{2}{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\mathit{\boldsymbol{\Sigma}}}_u}{\mathit{\boldsymbol{\beta}}}} \right){Y_i} - {\mathit{\boldsymbol{W}}_i}, $

    and its first-order derivative is

    $ {\bf{\Omega }}_i^*\left( {{{\mathit{\boldsymbol{\Sigma}}}_u},\mathit{\boldsymbol{\beta}};{\mathit{\boldsymbol{W}}_i},{Y_i}} \right) = \frac{{\partial \mathit{\boldsymbol{\eta}}_i^*\left( {{{\mathit{\boldsymbol{\Sigma}}}_u},\mathit{\boldsymbol{\beta}};{\mathit{\boldsymbol{W}}_i},{Y_i}} \right)}}{\partial {{\mathit{\boldsymbol{\beta}}}^{\rm T}}} = \left[ {{{\mathit{\boldsymbol{\Sigma}}}_u} - \left( {{\mathit{\boldsymbol{W}}_i} + {{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right){{\left( {{\mathit{\boldsymbol{W}}_i} + {{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right)}^{\rm T}}} \right]\exp \left( { - \mathit{\boldsymbol{W}}_i^{\rm T}\mathit{\boldsymbol{\beta}} - \frac{1}{2}{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\mathit{\boldsymbol{\Sigma}}}_u} \mathit{\boldsymbol{\beta}}} \right){Y_i}. $

    (2) Poisson measurement error regression models.

    Let $ Y_i $ follow the Poisson distribution with mean $ {\mu }_i $, $ \text{Var}\left({Y_i \mid {\mathit{\boldsymbol{X}}}_i} \right) = {\mu }_i $. Consider the log linear measurement error model

    $ \left\{ log(μi)=XTiβ,Wi=Xi+Ui, \right. $

    then we have the corrected score function

    $ \mathit{\boldsymbol{\eta}}_i^*\left( {{{\mathit{\boldsymbol{\Sigma}}}_u},\mathit{\boldsymbol{\beta}};{\mathit{\boldsymbol{W}}_i},{Y_i}} \right) = {\mathit{\boldsymbol{W}}_i}{Y_i} - \left( {{\mathit{\boldsymbol{W}}_i} - {{\mathit{\boldsymbol{\Sigma}}}_u}{\mathit{\boldsymbol{\beta}}}} \right)\exp \left( {\mathit{\boldsymbol{W}}_i^{\rm T}\mathit{\boldsymbol{\beta}} - \frac{1}{2}{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right), $

    and its first-order derivative is

    $ {\bf{\Omega }}_i^*\left( {{{\mathit{\boldsymbol{\Sigma}}}_u},\mathit{\boldsymbol{\beta}};{\mathit{\boldsymbol{W}}_i},{Y_i}} \right) = \frac{{\partial \mathit{\boldsymbol{\eta}}_i^*\left( {{{\mathit{\boldsymbol{\Sigma}}}_u},\mathit{\boldsymbol{\beta}};{\mathit{\boldsymbol{W}}_i},{Y_i}} \right)}}{{\partial {\mathit{\boldsymbol{\beta}}}}^{\rm T}} = \left[ {{{\mathit{\boldsymbol{\Sigma}}}_u} - \left( {{\mathit{\boldsymbol{W}}_i} - {{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right){{\left( {{\mathit{\boldsymbol{W}}_i} - {{\mathit{\boldsymbol{\Sigma}}}_u}\mathit{\boldsymbol{\beta}}} \right)}^{\rm T}}} \right]\exp \left( {{\mathit{\boldsymbol{W}}}_i^{\rm T} {\mathit{\boldsymbol{\beta}}} - \frac{1}{2}{\mathit{\boldsymbol{\beta}}^{\rm T}}{{\mathit{\boldsymbol{\Sigma}}}_u} \mathit{\boldsymbol{\beta}}} \right). $

    It is assumed that $ {\pi_i} $ is the probability of sampling the $ i $-th sample $ \left({{{\mathit{\boldsymbol{W}}}_i}, {{Y}_i}} \right) $, $ i = 1, \ldots, N $. Let $ S $ be the set of the subsamples $ \left({{{\widetilde {\mathit{\boldsymbol{W}}}_i}}, {\widetilde {{Y}_i}}} \right) $ with corresponding sampling probabilities $ \widetilde {\pi_{i}} $, i.e., $ S = \left\{ { {\left({{\widetilde {{\mathit{\boldsymbol{W}}}_i}}, {\widetilde {{Y}_i}}, {\widetilde {\pi_i}}} \right)} } \right\} $ with the subsample size $ r $. The general subsampling algorithm is shown in Algorithm 1.

    Algorithm 1 General subsampling algorithm.
    Step 1. Given the subsampling probabilities $ \pi_i, \; i = 1, \ldots, N $ of all data points.
    Step 2. Perform repeated sampling with replacement $ r $ times to form the subsample set $ S = \left\{ \left(\widetilde{\mathit{\boldsymbol{W}}}_i, \widetilde{Y}_i, \widetilde{\pi}_i \right) \right\} $, where $ {\widetilde {{\boldsymbol{W}}}_i} $, $ {\widetilde {{Y}_i}} $ and $ {\widetilde {\pi_i}} $ represent the covariate, response variable and subsampling probability in the subsample, respectively.
    Step 3. Based on the subsample set $ S $, solve the weighted estimation equation $ \mathit{\boldsymbol{Q}}^{*}(\mathit{\boldsymbol{\beta}}) $ to obtain $ \overset{\smile}{\mathit{\boldsymbol{\beta}}} $, where
    $ Q(β):=1rri=11~πi˜ηi(Σu,β;˜Wi,~Yi)=0, $              (2.4)
    where $ {\tilde {\mathit{\boldsymbol{\eta}}}_i^*}\left(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}}; {\widetilde {{\boldsymbol{W}}}_i}, {\widetilde {{Y}_i}}\right) $ is the unbiased score function of $ i $-th sample point in the subsample and its first order derivative is $ {\widetilde{\bf{\Omega }}_i^*}\left(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}}; {\widetilde {{\boldsymbol{W}}}_i}, {\widetilde {{Y}_i}}\right) $.

    To obtain the consistency and asymptotic normality of $ \overset{\smile}{\mathit{\boldsymbol{\beta}}} $, the following assumptions should be made. For simplicity, denote $ \mathit{\boldsymbol{\eta}} _i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}; {{\mathit{\boldsymbol{W}}}_i}, {Y_i}) $ and $ {\mathit{\boldsymbol{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}; {{\mathit{\boldsymbol{W}}}_i}, {Y_i}) $ as $ \mathit{\boldsymbol{\eta}} _i^*\left({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}\right) $ and $ {\mathit{\boldsymbol{\Omega}}}_i^*\left({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}\right) $.

    A1: It is assumed that $ {{\mathit{\boldsymbol{W}}}_i^{\rm T}} \mathit{\boldsymbol{\beta}} $ is almost necessarily in the interior of a closed set $ K \in \Theta $, $ \Theta $ is a natural parameter space.

    A2: The regression parameters are located in the ball $ \Lambda = \left\{ {\mathit{\boldsymbol{\beta}} \in {\mathbb{R}{^p}}:{{\left\| \mathit{\boldsymbol{\beta}} \right\|}_1} \le B} \right\}, {{\mathit{\boldsymbol{\beta}}} _t} $ and $ {\hat {\mathit{\boldsymbol{\beta}}}_{\text{MLE}}} $ are true parameters and maximum likelihood estimators, which are interior points of $ \Lambda $, and $ B $ is a constant, where $ \| \cdot \|_1 $ denotes $ \ell_1 $-norm.

    A3: As $ n \to \infty $, the observed information matrix $ {{\bf{M}}_X} : = \frac{1}{N} \sum\limits_{i = 1}^N {{\bf{\Omega}}_{i}^*} \left(\mathit{\boldsymbol{\Sigma}}_u, {\hat {\mathit{\boldsymbol{\beta}}}_{\text{MLE}}}\right) $ is a positive definite matrix in probability.

    A4: Assume that for all $ \mathit{\boldsymbol{\beta}} \in \Lambda $, $ \frac{1}{N}\sum\limits_{i = 1}^N {{{\left\| {\mathit{\boldsymbol{\eta}}_{_i}^*\left({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}\right)} \right\|}^4}} = {O_P}(1) $, where $ \| \cdot \| $ denotes the Euclidean norm.

    A5: Suppose that the full sample covariates have finite 6th-order moments, i.e., $ E{\left\| {{\mathit{\boldsymbol{W}}_1}} \right\|^6} \le \infty $.

    A6: For any $ \delta \ge 0 $, we assume that

    $ \frac{1}{{{N^{2 + \delta }}}}\sum\limits_{i = 1}^N {\frac{{{{\left\| {\mathit{\boldsymbol{\eta}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})} \right\|}^{2 + \delta }}}}{{\pi _i^{1 + \delta }}}} = {O_P}(1), \; \frac{1}{{{N^{2 + \delta }}}}\sum\limits_{i = 1}^N {\frac{{{{\left| {{\bf{\Omega}}_{_i}^{*(j_1j_2)}({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})} \right|}^{2 + \delta }}}}{{\pi _i^{1 + \delta }}}} = {O_P}(1), $

    where $ {\bf{\Omega}}_{_i}^{*(j_1j_2)} $ represents the elements of the $ j_1 $-th row and $ j_2 $-th column of the matrix $ {\bf{\Omega}}_{i}^* $.

    A7: Assume that $ \mathit{\boldsymbol{\eta}} _i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}) $ and $ {\mathit{\boldsymbol{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}}) $ are $ m({{\mathit{\boldsymbol{W}}}_i}) $-Lipschitz continuous. For any $ {\mathit{\boldsymbol{\beta}} _1}, \; {\mathit{\boldsymbol{\beta}} _2} \in \Lambda $, there exist functions $ {m_1}({{\mathit{\boldsymbol{W}}}_i}) $ and $ {m_2}({{\mathit{\boldsymbol{W}}}_i}) $ such that $ \left\| {\mathit{\boldsymbol{\eta}} _i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {\mathit{\boldsymbol{\beta}} _1}) - \mathit{\boldsymbol{\eta}} _i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {\mathit{\boldsymbol{\beta}} _2})} \right\| \le {m_1}({{\mathit{\boldsymbol{W}}}_i})\left\| {{\mathit{\boldsymbol{\beta}} _1} - {\mathit{\boldsymbol{\beta}} _2}} \right \| $, $ {{\left\| {{\bf{\Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {\mathit{\boldsymbol{\beta}} _1}) - {\bf{\Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {\mathit{\boldsymbol{\beta}} _2})} \right\|}_S \le {m_2}({{\mathit{\boldsymbol{W}}}_i})\left\| {{\mathit{\boldsymbol{\beta}} _1} - {\mathit{\boldsymbol{\beta}} _2}} \right\|} $, where $ {\| \bf{A} \|}_S $ denotes the spectral norm of matrix $ \bf A $. Further assume that $ E\left\{ {{m_1}({{\mathit{\boldsymbol{W}}}_i})} \right\} \le \infty $ and $ E\left\{ {{m_2}({{\mathit{\boldsymbol{W}}}_i})} \right\} \le \infty $.

    Assumptions A1 and A2 are also used in Clémencon et al. [30]. The set $ \Lambda $ in Assumption A2 is also known as the admissible set and is a prerequisite for consistency estimation for the GLM with full data [31]. Assumption A3 imposes a condition on the covariates to ensure that the MLE based on the full dataset is consistent. In order to obtain the Bahadur representation of the subsampling estimators, Assumptions A4 and A5 are required. Assumption A6 is a moment condition for the subsampling probability and is also required for the Lindberg-Feller central limit theorem. Assumption A7 adds a restriction on smoothing, which can be found in [32].

    The following theorems show the consistency and asymptotic normality of the subsampling estimators.

    Theorem 2.1. If Assumptions A1–A7 hold, as $ r \to \infty $ and $ N \to \infty $, $ \overset{\smile}{\mathit{\boldsymbol{\beta}}} $ converges to $ \hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}} $ in conditional probability given $ \mathcal{F}_N $, and the convergence rate is $ {r^{\frac{1}{2}}} $. That is, for all $ \varepsilon > 0 $, there exist constants $ {\Delta _\varepsilon } $ and $ {r_\varepsilon } $ such that

    $ P(βˆβMLEr12ΔεFN)<ε, $ (2.5)

    for all $ r > {r_\varepsilon } $.

    Theorem 2.2. If Assumptions A1–A7 hold, as $ r \to \infty $ and $ N \to \infty $, conditional on $ \mathcal{F}_N $, the estimator $ \overset{\smile}{\mathit{\boldsymbol{\beta}}} $ obtained from Algorithm 1 satisfies

    $ V12(βˆβMLE)dNp(0,I), $ (2.6)

    where $ {\bf{V}} = {\bf{M}}_X^{ - 1}{{\bf{V}}_{\mathit{\text{C}}}}{\bf{M}}_X^{ - 1} = {O_P}({r^{ - 1}}) $, and

    $ {{\bf{V}}_{\mathit{\text{C}}}} = \frac{1}{{{N^2}r}}\sum\limits_{i = 1}^N {\frac{{\mathit{\boldsymbol{\eta}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},\hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}})\mathit{\boldsymbol{\eta}}{{_{_i}^*}^{\rm T}}({{\mathit{\boldsymbol{\Sigma}}}_u},\hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}})}}{{{\pi _i}}}}. $

    Remark 1. In order to get the standard error of the corresponding estimator, we estimate the variance-covariance matrix of $ \overset{\smile}{\mathit{\boldsymbol{\beta}}} $ by

    $ {\widehat{\bf{V}}} = \widehat{\bf{ M}}_X^{ - 1}{{\widehat{\bf{ V}}}_{\text {C}}}{\widehat{\bf{ M}}}_X^{ - 1}, $

    where

    $ {{\widehat{\bf{ M}}}_X} = \frac{1}{{Nr}}\sum\limits_{i = 1}^r {\frac{{{\widetilde{\bf{\Omega}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{{{\widetilde \pi }_i}}}}, $
    $ {{\widehat{\bf{ V}}}_{\text {C}}} = \frac{1}{{{N^2}{r^2}}}\sum\limits_{i = 1}^r {\frac{{\tilde{\mathit{\boldsymbol{\eta}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})\tilde{\mathit{\boldsymbol{\eta}}}{{_{_i}^*}^{\rm T}}({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _i^2}}}. $

    Based on the A-optimality criteria in the optimal design language, the optimal subsampling probabilities are obtained by minimizing the asymptotic mean square error of $ \overset{\smile}{\mathit{\boldsymbol{\beta}}} $ in Theorem 2.2.

    However, $ {\mathit{\boldsymbol{\Sigma}}}_u $ is usually unknown in practice. Therefore, we need to estimate the covariance matrix $ {\mathit{\boldsymbol{\Sigma}}}_u $ as suggested by [12]. We observe that the consistent, unbiased moment estimator of $ {\mathit{\boldsymbol{\Sigma}}}_u $ is

    $ {{\widehat{\bf{ \Sigma }}}_u} = \frac{{\sum\limits_{i = 1}^N {\sum\limits_{j = 1}^{{m_i}} {\left( {{\mathit{\boldsymbol{W}}_{ij}} - {{\overline{\mathit{\boldsymbol{W}}}}_i}} \right){{\left( {{\mathit{\boldsymbol{W}}_{ij}} - {{\overline{\mathit{\boldsymbol{W}}}}_i}} \right)}^{\rm T}}} } }}{{\sum\limits_{i = 1}^N {\left( {{m_i}}-1 \right)} }}, $

    where $ {\overline{\mathit{\boldsymbol{W}}}}_i $ is the sample mean of the replicates, and $ m_i $ is the number of repeated measurements of the $ i $-th individual.

    Theorem 2.3. Define $ g_i^{\mathit{\text{mV}}} = \left\| {{\bf{M}}_X^{-1}}\mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, \hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}}) \right\|, \; i = 1, \ldots, N $. The subsampling strategy is mV-optimal if the subsampling probability is chosen such that

    $ πmVi=gmViNj=1gmVj, $ (2.7)

    which is obtained by minimizing $ tr({\bf{V}}) $.

    Theorem 2.4. Define $ g_i^{\mathit{\text{mVc}} } = \left\| {\mathit{\boldsymbol{\eta}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, \hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}}) \right\|, \; i = 1, \ldots, N $. The subsampling strategy is mVc-optimal if the subsampling probability is chosen such that

    $ πmVci=gmVciNj=1gmVcj, $ (2.8)

    which is obtained by minimizing $ tr({{\bf{V}}_{\mathit{\text{C}}}}) $.

    Remark 2. $ {{\bf{M}}_X} $ and $ {{\bf{V}}_{\text{C}}} $ are non-negative definite matrices, and $ {\bf{V}} = {\bf{M}}_X^{ - 1}{{\bf{V}}_{\text{C}}}{\bf{M}}_X^{ - 1} $, then $ tr({\bf{V}}) = tr\left({{\bf{M}}_X^{ - 1}{{\bf{V}}_{\text{C}}}{\bf{M}}_X^{ - 1}} \right) \le {\sigma _{\max }}\left({{\bf{M}}_X^{ - 2}} \right)tr\left({{{\bf{V}}_{\text{C}}}} \right) $, where $ {\sigma_{\max}}\left({\bf{A}} \right) $ represents the maximum eigenvalue of square matrix $ {\bf{A}} $. As $ {\sigma _{\max }}\left({{\bf{M}}_X^{ - 2}} \right) $ does not depend on $ \mathit{\boldsymbol{\pi}} $, minimizing $ tr({{\bf{V}}_C}) $ means minimizing the upper bound of $ tr({\bf{V}}) $. In fact, for two given subsampling probabilities $ {\mathit{\boldsymbol{\pi}}_1} $ and $ {\mathit{\boldsymbol{\pi}}_2} $, $ {\bf{V}}\left({{\mathit{\boldsymbol{\pi}}_1}} \right) \le {\bf{V}}\left({{\mathit{\boldsymbol{\pi}}_2}} \right) $ if and only if $ {{\bf{V}}_{\text{C}}}\left({{\mathit{\boldsymbol{\pi}}_1}} \right) \le {{\bf{V}}_{\text{C}}}\left({{\mathit{\boldsymbol{\pi}}_2}} \right) $. Therefore, minimizing $ tr({{\bf{V}}_{\text{C}}}) $ reduces considerable computational time compared to minimizing $ tr({\bf{V}}) $, and $ tr({{\bf{V}}_{\text{C}}}) $ does not take into account the structural information of the data.

    The optimal subsampling probabilities are defined as $ \left\{ \pi_i^{\text{op}} \right\}_{i = 1}^N = \left\{ \pi_i^{\text{mV}} \right\}_{i = 1}^N $ or $ \left\{ \pi_i^{\text{mVc}} \right\}_{i = 1}^N $. However, because $ \pi _i^{\text{op}} $ depends on $ {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}} $, it cannot be used directly in applications. To calculate $ \pi _i^{\text{op}} $, it is necessary to use a prior estimator $ {{\tilde{\mathit{\boldsymbol{\beta}}}}_0} $, which is obtained by the prior subsample of size $ {r_0} $.

    We know $ \pi _i^{\text{op}} $ is proportional to $ \left\| {{\mathit{\boldsymbol{\eta}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}} \right\| $, however, in actual situations, there may be some data points that make $ \mathit{\boldsymbol{\eta}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) = {\mathbf{0}} $, which will never be included in a subsample, and some data points with $ \mathit{\boldsymbol{\eta}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) \approx {\mathbf{0}} $ also have small probabilities of being sampled. If these special data points are excluded, some sample information will be missed, but if these data points are included, the variance of the subsampling estimator may increase.

    To avoid Eq (2.4) from being inflated by these special data points, this paper adopts a truncation method, setting a threshold $ \omega $ for $ \left\| \mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{{\beta}}}}}_{\text{MLE}}) \right\| $, that is, replacing $ \left\| \mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) \right\| $ with $ max \left\{ \left\| \mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) \right\|, \omega \right\} $, where $ \omega $ is a very small positive number, for example, $ 10^{-4} $. In applications, the choice and design of the truncation weight function, which is a commonly used technique, are crucial to improving the robustness of the model and optimizing the performance.

    We replace $ \hat{\mathit{\boldsymbol{{\beta}}}}_{\text {MLE}} $ in the matrix $ \bf{V} $ with $ \tilde{\mathit{\boldsymbol{{\beta}}}}_0 $, denoted as $ \widetilde{\bf{{V}}} $, then $ tr\left(\widetilde{\bf{{V}}} \right) \le tr\left(\widetilde{\bf{{V}}}^\omega \right) \le tr\left(\widetilde{\bf{{V}}} \right) + \frac{\omega^2}{N^2 r} \sum\limits_{i = 1}^N \frac{{\left\| {\bf{M}}_X^{-1} \right\|}^2}{\pi_i^{\text {op}}} $. Therefore, when $ \omega $ is sufficiently small, $ tr\left(\widetilde{\bf{{V}}}^\omega \right) $ approaches $ tr\left(\widetilde{\bf{{V}}} \right) $. The threshold $ \omega $ is set to make the subsample estimators more robust without sacrificing excessively estimation efficiency. $ {\widetilde{\bf{{M}}}}_X = \frac{1}{Nr_0} \sum\limits_{i = 1}^{r_0} {\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\tilde{\mathit{\boldsymbol{\beta}}}}_0) $ based on the prior subsample can be used to approximate $ {\bf{M}}_X $. The two-step algorithm is presented in Algorithm 2.

    Algorithm 2 Optimal subsampling algorithm.
    Step 1. Extract a prior subsample set $ {S_{r_0}} $ with a subsample size of $ r_0 $ from the full data, assuming that the subsampling probabilities of the prior subsample are $ {{\mathit{\boldsymbol{\pi}}}^{\text {UNIF}}} = \left\{ {{\pi _i}: = \frac{{{1}}}{N}} \right\}_{i = 1}^N $. We use Algorithm 1 to obtain a prior estimator $ {\tilde{\mathit{\boldsymbol{\beta}}}_0} $, replace $ {\hat{\mathit{\boldsymbol{\beta}}}_{\text {MLE}}} $ with $ {\tilde{\mathit{\boldsymbol{\beta}}}_0} $ in Eqs (2.7) and (2.8) to get the optimal subsampling probabilities $ \left\{ {{ \pi }_i^{\text {opt}}}\right\}_{i = 1}^N $.
    Step 2. Use the optimal subsample probabilities $ \left\{ {{ \pi }_i^{\text {opt}}}\right\}_{i = 1}^N $ computed in Step 1 to extract a subsample size of $ r $ with replacement. According to the step in Algorithm 1, combining the subsamples from Step 1 and solving the estimating Eq (2.4) to get the estimator $ \check {\mathit{\boldsymbol{\beta}}} $ based on a subsample of total size $ r_0+r $.

    Remark 3. In Algorithm 2, $ {\tilde{\mathit{\boldsymbol{\beta}}}}_0 $ in Step 1 satisfies

    $ \mathit{\boldsymbol{Q}}_{\tilde{\mathit{\boldsymbol{\beta}}}_0}^{*0}(\mathit{\boldsymbol{\beta}}) = \frac{1}{r_0} \sum\limits_{i = 1}^{r_0} \frac{\tilde{\mathit{\boldsymbol{{\eta}}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}})}{\pi_i^{\text {UNIF}}} = {\mathbf{0}} $

    with the prior subsample set $ S_{r_0} $, and

    $ {\bf{M}}_X^{{\tilde{\mathit{\boldsymbol{\beta}}}}_0} = \frac{1}{Nr_0} \sum\limits_{i = 1}^{r_0} \frac{\widetilde{\bf{{\Omega}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \tilde{\mathit{\boldsymbol{{\beta}}}}_0)}{\pi_i^{\text {UNIF}}}. $

    In Step 2, the subsampling probabilities are $ \left\{{\pi}_i^{\text{opt}}\right\}_{i = 1}^N = \left\{ \pi_i^{\text{mVt}} \right\}_{i = 1}^N $ or $ \left\{ \pi_i^{\text{mVct}} \right\}_{i = 1}^N $, let

    $ g_i^{\text{mVt}} = {M1Xηi(Σu,ˆβMLE),ifηi(Σu,ˆβMLE)>ωωM1X,ifηi(Σu,ˆβMLE)<ω, \; i = 1, \ldots, N,\; $
    $ g_i^{\text{mVct}} = max \left\{ \left\| \mathit{\boldsymbol{\eta}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) \right\|, \omega \right\}, $

    then

    $ \pi_i^{\text{mVt}} = \frac{g_i^{\text{mVt}}}{\sum\limits_{j = 1}^N g_j^{\text{mVt}}} \quad and \quad \pi_i^{\text{mVct}} = \frac{g_i^{\text{mVct}}}{\sum\limits_{j = 1}^N g_j^{\text{mVct}}}. $

    The subsample set is $ S_{r_0} \cup \left\{ \left(\widetilde{\mathit{\boldsymbol{W}}}_i, \widetilde{Y}_i, \widetilde{\pi}_i^{\text{opt}} \right) \mid i = 1, \ldots, r \right\} $ with a subsample size of $ r + r_0 $, and $ \check{\mathit{\boldsymbol{\beta}}} $ is the solution to the corresponding estimating equation

    $ Qtwostep˜β0(β)=1r+r0r+r0i=1˜ηi(Σu,β)˜πopti=rr+r0Q˜β0(β)+r0r+r0Q0˜β0(β)=0, $

    where

    $ \mathit{\boldsymbol{Q}}_{{{\mathit{\boldsymbol{\tilde \beta}}}_0}}^{*}(\mathit{\boldsymbol{\beta}}) = \frac{1}{r}\sum\limits_{i = 1}^r {\frac{{\tilde{\mathit{\boldsymbol{\eta}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},\mathit{\boldsymbol{\beta}})}}{{\widetilde \pi _i^{\text {opt}}}}}. $

    Theorem 2.5. If Assumptions A1–A7 hold, as $ {r_0}{{r}^{-1}} \to 0 $, $ {r_0} \to \infty, r \to \infty $ and $ N \to \infty $, if $ {\tilde {\mathit{\boldsymbol{\beta}}}}_0 $ exists, then the estimator $ \check{\mathit{\boldsymbol{\beta}}} $ obtained from Algorithm 2 converges to $ \hat{\mathit{\boldsymbol{\beta}}}_{\mathit{\text{MLE}}} $ in conditional probability given $ \mathcal{F}_N $, and its convergence rate is $ {r^{\frac{1}{2}}} $. For all $ \varepsilon > 0 $, there exist finite $ \Delta_\varepsilon $ and $ r_\varepsilon $ such that

    $ P(ˇβˆβMLEr12ΔεFN)<ε, $ (2.9)

    for all $ r > r_\varepsilon $.

    Theorem 2.6. If Assumptions A1–A7 hold, as $ {r_0}{{{r}^{-1}}} \to 0 $, $ {r_0} \to \infty, r \to \infty $ and $ N \to \infty $, conditional on $ \mathcal{F}_N $, the estimator $ \check{\mathit{\boldsymbol{\beta}}} $ obtained from Algorithm 2 satisfies

    $ V12opt(ˇβˆβMLE)dNp(0,I), $ (2.10)

    where $ {{\bf{V}}_{\mathit{\text{opt}}}} = {\bf{M}}_X^{ - 1}{\bf{V}}_{\mathit{\text{C}}}^{\mathit{\text{opt}}}{\bf{M}}_X^{ - 1} = {O_P}({r^{ - 1}}) $, and

    $ {\bf{V}}_{\mathit{\text{C}}}^{\mathit{\text{opt}}} = \frac{1}{{{N^2}r}}\sum\limits_{i = 1}^N {\frac{{\mathit{\boldsymbol{\eta}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})\mathit{\boldsymbol{\eta}}{{_{_i}^*}^{\rm T}}({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})}}{{\pi _i^{\mathit{\text{opt}}}}}}. $

    Remark 4. We estimate the variance-covariance matrix of $ \check{\mathit{\boldsymbol{\beta}}} $ by

    $ {\widehat{\bf{V}}}_{\text{opt}} = {\widehat{\bf{ M}}}_X^{- 1}{{\widehat{\bf{ V}}}_{\text {C}}^{\text{opt}}}{\widehat{\bf{ M}}}_X^{ - 1}, $

    where

    $ {{\widehat{\bf{ M}}}_X} = \frac{1}{{N\left( {{r_0} + r} \right)}}\left[ {\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{\Omega}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _{_i}^{\text {UNIF}}}}} + \sum\limits_{i = 1}^r {\frac{{{\widetilde{\bf{ \Omega}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _{_i}^{\text{opt}}}}} } \right], $
    $ {{\widehat{\bf{ V}}}_{\text {C}}^{\text{opt}}} = \frac{1}{{{N^2}{{\left( {{r_0} + r} \right)}^2}}}\left[ {\sum\limits_{i = 1}^{{r_0}} {\frac{{\tilde{\mathit{\boldsymbol{\eta}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}){\tilde {\mathit{\boldsymbol{\eta}}}}{{_{_i}^*}^{\rm T}}({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _{_i}^{{\text {UNIF}}^2}}}} + \sum\limits_{i = 1}^r {\frac{{\tilde{\mathit{\boldsymbol{\eta}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})\tilde{\mathit{\boldsymbol{\eta}}}{{_{_i}^*}^{\rm T}}({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _i^{{\text{opt}}^2}}}} } \right]. $

    In this section, we perform numerical simulations using synthetic data to evaluate the finite sample performance of the proposed method in Algorithm 2 (denoted as mV and mVc). For a fair comparison, we also give the results of the uniform subsampling method and set the size to be the same as that of Algorithm 2. The estimators of the above three subsampling methods, uniform—the uniform subsampling, mV—the mV probability subsampling, and mVc—the mVc probability subsampling, are compared with MLE—the maximum likelihood estimators for full data. In addition, we conduct simulation experiments using two models: the logistic regression model and the Poisson regression model.

    Set the sample size $ N = 100000 $, the true value $ {\mathit{\boldsymbol{\beta}}}_{t} = (0.5, -0.6, 0.5)^{\rm T} $, the covariate $ {\mathit{\boldsymbol{X}}}_i \sim N_3({{\mathbf{0}}}, {\boldsymbol \Sigma}) $, where $ {\mathit{\boldsymbol{\Sigma}}} = 0.5{\bf I} +0.5 {{\mathbf{1}}}{{\mathbf{1}}}^{\rm T} $, $ {\bf I} $ is an identity matrix. The response $ Y_i $ follows a binomial distribution with $ P \left(Y_i = 1 \mid {\mathit{\boldsymbol{X}}_i} \right) = \left(1+\exp(-{\mathit{\boldsymbol{X}}_i^{\rm T}}{\boldsymbol \beta_t}) \right)^{-1} $. We consider the following three cases to generate the measurement error term $ {\boldsymbol{U}} _i $.

    ● Case 1: $ {\mathit{\boldsymbol{U}}}_i \sim N_3({{\mathbf{0}}}, 0.4^2{\bf I}) $;

    ● Case 2: $ {\mathit{\boldsymbol{U}}}_i \sim N_3({{\mathbf{0}}}, 0.5^2{\bf I}) $;

    ● Case 3: $ {\mathit{\boldsymbol{U}}}_i \sim N_3({{\mathbf{0}}}, 0.6^2{\bf I}) $.

    The subsample size in Step 1 of Algorithm 2 is selected as $ r_0 = 400 $. The subsample size $ r $ is set to be 500, 1000, 1500, 2000, 2500, and 5000. In order to verify that $ \check{\mathit{\boldsymbol{\beta}}} $ can asymptotically approach $ {\mathit{\boldsymbol{\beta_{t}}}} $, we repeat $ K = 1000 $ and calculate $ MSE = \frac{1}{K}{\sum\limits_{k = 1}^K {\left\| {{{\check{\mathit{\boldsymbol{\beta}}}}^{\left(k \right)}} - {\mathit{\boldsymbol{\beta}}_t}} \right\|} ^2} $, where $ {\check{\mathit{\boldsymbol{\beta}}}}^{\left(k \right)} $ is the parameter estimator of the subsample generated by the $ k $-th repetition.

    The simulation results are shown in Figure 1, which can be seen that both mV and mVc always have smaller MSEs than uniform subsampling. The MSEs of all the subsampling methods decrease as an increase of $ r $, which confirms the theoretical results of the consistency of the subsampling methods. As the variance of the error term increases, the MSEs of uniform, mV, and mVc also increase. The mV is better than the mVc because the subsampling probabilities of mV take the structural information of the data into account. A comparison between the corrected and uncorrected methods shows that the MSEs of the corrected methods are much smaller than those of the uncorrected methods, and the difference between the corrected and uncorrected methods increases as the error variance increases.

    Figure 1.  MSEs for $ \check{\boldsymbol \beta} $ with different second step subsample size $ r $ and $ r_0 = 400 $. The colorful icons and lines represent the corrected subsampling methods. The gray icons and lines represent the uncorrected subsampling methods.

    Now, we evaluate the statistical inference performance of the optimal subsampling method for different $ r $ and variances of $ {{\boldsymbol{U}}}_i $. The parameter $ {\beta}_1 $ is taken as an example, and a $ 95\% $ confidence interval is constructed. Table 1 reports the empirical coverage probabilities and average lengths of three subsampling methods. It is evident that both mV and mVc have similar performance and consistently outperform the uniform subsampling method. As $ r $ increases, the length of the confidence interval uniformly decreases.

    Table 1.  Empirical coverage probabilities and average lengths of confidence intervals for $ {\beta}_1 $ in the logistic regression models with different $ r $ and $ r_0 = 500 $.
    uniform mV mVc
    Case $ r $ Coverage Length Coverage Length Coverage Length
    Case 1 500 0.958 0.565 0.932 0.331 0.942 0.457
    1000 0.952 0.453 0.925 0.248 0.954 0.333
    1500 0.960 0.387 0.920 0.206 0.964 0.274
    2000 0.932 0.345 0.907 0.180 0.954 0.237
    2500 0.938 0.313 0.910 0.160 0.956 0.211
    5000 0.964 0.302 0.908 0.148 0.937 0.202
    Case 2 500 0.956 0.634 0.946 0.602 0.962 0.613
    1000 0.946 0.621 0.934 0.586 0.946 0.593
    1500 0.927 0.597 0.954 0.551 0.962 0.561
    2000 0.943 0.543 0.956 0.524 0.921 0.518
    2500 0.970 0.475 0.958 0.453 0.944 0.462
    5000 0.963 0.438 0.932 0.417 0.947 0.441
    Case 3 500 0.958 0.706 0.956 0.432 0.968 0.550
    1000 0.946 0.561 0.972 0.399 0.970 0.409
    1500 0.944 0.479 0.968 0.321 0.960 0.329
    2000 0.936 0.425 0.964 0.265 0.958 0.281
    2500 0.926 0.389 0.966 0.249 0.954 0.250
    5000 0.915 0.356 0.947 0.220 0.942 0.236

     | Show Table
    DownLoad: CSV

    Let $ {\mathit{\boldsymbol{\beta}}}_{t} = (0.5, -0.6, 0.5)^{\rm T} $, the covariate $ {{\boldsymbol{X}}}_i \sim N_3({{\mathbf{0}}}, {\boldsymbol \Sigma}) $, where $ {\mathit{\boldsymbol{\Sigma}}} = 0.3{\bf I} +0.5 {{\mathbf{1}}}{{\mathbf{1}}}^{\rm T} $, $ {\bf I} $ is an identity matrix. We consider the following three cases to generate the measurement error term $ {\boldsymbol{U}} _i $.

    ● Case 1: $ {\mathit{\boldsymbol{U}}}_i \sim N_3({{\mathbf{0}}}, 0.3^2{\bf I}) $;

    ● Case 2: $ {\mathit{\boldsymbol{U}}}_i \sim N_3({{\mathbf{0}}}, 0.4^2{\bf I}) $;

    ● Case 3: $ {\mathit{\boldsymbol{U}}}_i \sim N_3({{\mathbf{0}}}, 0.5^2{\bf I}) $.

    We also generate a sample of $ N = 100000 $ following $ Poisson \left({{\mu _i}} \right) $, where $ {\mu _i} = \exp \left({\mathit{\boldsymbol{X}}_i^{\rm T}{\mathit{\boldsymbol{\beta}}_t}} \right) $. distribution, and summarize the MSEs with the number of simulations $ K = 1000 $ in Figure 2. The other settings are the same as those in the logistic regression example.

    Figure 2.  MSEs for $ \check{\boldsymbol \beta} $ with different second step subsample size $ r $ and $ r_0 = 400 $. The colorful icons and lines represent the corrected subsampling methods. The gray icons and lines represent the uncorrected subsampling methods.

    In Figure 2, it can be seen that the MSEs of both the mV and mVc methods are smaller than those of the uniform subsampling, with the mV method being the optimal. In addition, the corrected method is obviously effective, which is consistent with Figure 1. Table 2 reports the empirical coverage probabilities and average lengths of $ 95\% $ confidence interval of the parameter $ {\beta}_3 $ for three subsampling methods. The conclusions of Table 2 are consistent with those of Table 1, but the average lengths of the intervals for Poisson regression are significantly longer than those for logistic regression.

    Table 2.  Empirical coverage probabilities and average lengths of confidence intervals for $ {\beta}_3 $ in the Poisson regression models with different $ r $ and $ r_0 = 500 $.
    uniform mV mVc
    Case $ r $ Coverage Length Coverage Length Coverage Length
    Case 1 500 0.962 0.441 0.962 0.383 0.958 0.399
    1000 0.944 0.352 0.964 0.291 0.964 0.304
    1500 0.932 0.302 0.964 0.241 0.966 0.255
    2000 0.952 0.268 0.930 0.210 0.944 0.223
    2500 0.946 0.244 0.958 0.188 0.974 0.201
    5000 0.952 0.234 0.961 0.173 0.943 0.185
    Case 2 500 0.938 0.127 0.936 0.108 0.948 0.109
    1000 0.936 0.102 0.946 0.082 0.934 0.082
    1500 0.942 0.087 0.934 0.069 0.936 0.068
    2000 0.952 0.078 0.956 0.060 0.952 0.059
    2500 0.946 0.071 0.932 0.053 0.944 0.053
    5000 0.935 0.068 0.965 0.045 0.971 0.047
    Case 3 500 0.940 0.185 0.936 0.153 0.953 0.156
    1000 0.950 0.148 0.954 0.113 0.958 0.118
    1500 0.932 0.127 0.950 0.094 0.958 0.099
    2000 0.946 0.113 0.952 0.082 0.960 0.086
    2500 0.942 0.103 0.932 0.073 0.950 0.077
    5000 0.937 0.096 0.956 0.065 0.964 0.061

     | Show Table
    DownLoad: CSV

    In order to explore the influence of different subsample size allocated in the two-step algorithm, we calculate the MSEs for $ r_0 $ at different proportions under the condition that the total subsample size remains constant. Set the total subsample size $ r_0 + r = 3000 $, and the result is shown in Figure 3. It can be seen that the accuracy of the two-step algorithm will initially improve with the increase of $ r_0 $. However, when $ r_0 $ increases to a certain extent, the accuracy of the algorithm begins to decrease. There are two reasons: (1) if $ r_0 $ is too small, the estimators in the first step will be biased, and it is difficult to ensure the accuracy; (2) if $ r_0 $ are too large, then the performances of mV and mVc are similar to that of the uniform subsampling. When $ r_0/(r_0 + r) $ is around 0.25, the two-step algorithm performs the best.

    Figure 3.  MSEs vs proportions of the first step subsample with fixed total subsample size for logistic and Poisson models with Case 1.

    We use the Sys.time() function in R to calculate the running time of three subsampling methods and full data. We conduct 1000 repetitions, set $ r_0 = 200 $, and consider different $ r $ values in Case 1. The results are shown in Tables 3 and 4. It is easy to find that the uniform subsampling algorithm requires the least computation time. Because there is no need to calculate the subsampling probabilities. In addition, the mV method takes longer than the mVc method, and this result is consistent with the theoretical analysis in Section 2.

    Table 3.  Computing time (in seconds) for logistic regression with Case 1 for different $ r $ and fixed $ r_0 = 200 $.
    $ r $
    Method 300 500 800 1200 1600 2000
    uniform 0.2993 0.3337 0.4985 0.5632 0.8547 0.5083
    mV 3.5461 3.6485 3.8623 4.1256 4.4325 5.2365
    mVc 3.2852 3.3658 3.5463 3.8562 4.0235 4.4235
    Full 45.9075

     | Show Table
    DownLoad: CSV
    Table 4.  Computing time (in seconds) for Poisson regression with Case 1 for different $ r $ and fixed $ r_0 = 200 $.
    $ r $
    Method 300 500 800 1200 1600 2000
    uniform 0.4213 0.4868 0.5327 0.5932 0.7147 0.8883
    mV 4.6723 4.8963 5.2369 5.6524 6.0128 6.3567
    mVc 4.3521 4.6329 4.9658 5.2156 5.7652 5.9635
    Full 51.2603

     | Show Table
    DownLoad: CSV

    In this section, we apply the proposed method to analyze the 1994 global census data, which contains 42 countries, from the Machine Learning Database [33]. There are 5 covariates in the data: $ x_1 $ represents age; $ x_2 $ represents the population weight value, which is assigned by the Population Division of the Census Bureau and is related to socioeconomic characteristics; $ x_3 $ represents the highest level of education, that is, the highest level of education since primary school; $ x_4 $ represents capital loss, which refers to the loss of income from bad investment, which is the difference between the lower selling price and the higher purchase price of an individual's investment; $ x_5 $ represents weekly working hours. If an individual's annual income exceeds 50,000 dollars, it is expressed as $ y_i = 1 $ and $ y_i = 0 $ otherwise.

    To verify the effectiveness of the proposed method, we add the measurement errors to the covariates $ x_2 $, $ x_4 $ and $ x_5 $ in this dataset, and the covariance matrix of the measurement error is

    $ Σu=[00.0400.040.04]. $

    We split the full dataset into a training set of 32561 observations and a test set of 16281 observations in a 2:1 ratio. We apply the proposed method to the training set and evaluate the classification performance with the test set. We calculate $ LEMSE = \log \left(\frac{1}{K}{\sum\limits_{k = 1}^K {\left\| {{{\check{\mathit{\boldsymbol{\beta}}}}^{\left(k \right)}} - {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}} \right\|} ^2}\right) $ based on 1000 bootstrap subsample estimators with $ r = 500, 1000, 1500, 2200, 2500, $ and $ r_0 = 500 $. The corrected MLE estimators for the training set are $ {{\hat \beta }_{\text{MLE, 0}}^{\text{err}}} = -1.6121 $, $ {{\hat \beta }_{\text{MLE, 1}}^{\text{err}}} = 1.1992 $, $ {{\hat \beta }_{\text{MLE, 2}}^{\text{err}}} = 0.0103 $, $ {{\hat \beta }_{\text{MLE, 3}}^{\text{err}}} = 0.9142 $, $ {{\hat \beta }_{\text{MLE, 4}}^{\text{err}}} = 0.2617 $, $ {{\hat \beta }_{\text{MLE, 5}}^{\text{err}}} = 0.8694 $.

    Table 5 shows the average estimators and the corresponding standard errors based on the proposed method ($ r_0 = 500 $, $ r = 2000 $). It can be seen that the estimators from three subsampling methods are close to the estimators from the full data. In general, the mV and mVc subsampling methods produce small standard errors.

    Table 5.  Average estimators based on subsamples with measurement error and subsample size $ r = 2000 $. The numbers in parentheses are the standard errors of the average estimators.
    uniform mV mVc
    $ \text{Intercept} $ -1.6084(0.069) -1.5998(0.055) -1.3122(0.052)
    $ {\check{\beta}}_1^{\text{err}} $ 1.2879(0.205) 1.1880(0.103) 1.2038(0.097)
    $ {\check{\beta}}_2^{\text{err}} $ 0.0105(0.106) 0.0104(0.059) 0.0111(0.046)
    $ {\check{\beta}}_3^{\text{err}} $ 1.0033(0.201) 0.9217(0.067) 0.9199(0.054)
    $ {\check{\beta}}_4^{\text{err}} $ 0.2636(0.094) 0.2698(0.054) 0.2555(0.063)
    $ {\check{\beta}}_5^{\text{err}} $ 0.9469(0.229) 0.8741(0.083) 0.8628(0.076)

     | Show Table
    DownLoad: CSV

    All subsampling methods show that each variable has a positive impact on income, with age, highest education level, and weekly working hours having significant impacts on income. Interestingly, capital losses have a significant positive impact on income because low-income people rarely invest. However, the population weight value has the smallest impact on income, the reason should be more inclined to reflect the overall distribution characteristics among groups rather than the specific economic performance of individuals. Income is a highly volatile variable, and the income gap between different groups may be large. Even under the same socioeconomic characteristics, the income distribution may have a large variance. This high variability weakens the overall impact of the population weight on income.

    Fix $ r_0 = 500 $, Figure 4(a) shows the LEMSEs calculated for the subsample with measurement errors. We can see that the LEMSEs of the corrected methods are much smaller than those of the uncorrected methods. As $ r $ increases, the LEMSEs become increasingly small. The estimators of the subsampling methods are consistent and the mV method is the best. Figure 4(b) shows the proportion of responses in the test set being correctly classified for different subsample sizes. The mV performs slightly better than the mVc. It can also be seen that the prediction accuracy of the corrected subsampling methods is slightly greater compared with the correspondingly uncorrected methods.

    Figure 4.  LEMSEs and model prediction accuracy (proportion of correctly classified models) for the subsample with measurement errors. The colorful icons and lines represent the corrected subsampling methods. The gray icons and lines represent the uncorrected subsampling methods.

    This subsection applies the corrected subsampling method to creditcard fraud detection dataset from Kaggle *, and the dependent variable is whether an individual has committed creditcard fraud. There are 284,807 pieces of data in the dataset, with a total of 492 fraud cases. Since the data involves sensitive information, the covariates have all been processed by principal component analysis with a total of 28 principal components. Amount represents the consumption amount, class is the dependent variable, 1 represents fraud, and 0 means normal. The first four principal components and the consumption amount are selected as independent variables.

    *https://www.kaggle.com/datasets/creepycrap/creditcard-fraud-dataset

    To verify the effectiveness of the proposed method, we add the measurement errors to the covariates, and the covariance matrix of the measurement error is $ {\mathit{\boldsymbol{\Sigma}}}_{u} = 0.16{\bf I} $. We split the dataset into the training set and the test set in a 3:1 ratio and summarize the LEMSEs based on the number of simulations $ K = 1000 $ with $ r = 500, 1000, 1500, 2200, 2500, 5000, $ and $ r_0 = 500 $.

    The MLE estimators for the training set are $ {{\hat \beta }_{\text{MLE, 0}}^{\text{err}}} = -8.8016 $, $ {{\hat \beta }_{\text{MLE, 1}}^{\text{err}}} = -0.6070 $, $ {{\hat \beta }_{\text{MLE, 2}}^{\text{err}}} = 0.0737 $, $ {{\hat \beta }_{\text{MLE, 3}}^{\text{err}}} = -0.9056 $, $ {{\hat \beta }_{\text{MLE, 4}}^{\text{err}}} = 1.4553 $, $ {{\hat \beta }_{\text{MLE, 5}}^{\text{err}}} = -0.1329 $. Table 6 shows the average estimators and the corresponding standard errors ($ r_0 = 500 $, $ r = 2000 $). It can be seen that the estimators from three subsampling methods are close to the estimators from the full data. In general, the mV and mVc subsampling methods produce small standard errors. From Figure 5, we can obtain similar results as in Figure 4.

    Table 6.  Average estimators based on subsamples with measurement error and subsample size $ r = 2000 $. The numbers in parentheses are the standard errors of the average estimators.
    uniform mV mVc
    $ \text{Intercept} $ -8.7934(0.0678) -8.8105(0.0562) -8.8135(0.0543)
    $ {\check{\beta}}_1^{\text{err}} $ -0.6123(0.341) -0.6047(0.142) -0.6035(0.105)
    $ {\check{\beta}}_2^{\text{err}} $ 0.0712(0.125) 0.0730(0.064) 0.0798(0.088)
    $ {\check{\beta}}_3^{\text{err}} $ -0.9321(0.245) -0.9087(0.067) -0.9123(0.057)
    $ {\check{\beta}}_4^{\text{err}} $ 1.4618(0.198) 1.4580(0.054) 1.4603(0.075)
    $ {\check{\beta}}_5^{\text{err}} $ -0.1435(0.531) -0.1347(0.242) -0.1408(0.225)

     | Show Table
    DownLoad: CSV
    Figure 5.  LEMSEs and model prediction accuracy (proportion of correctly classified models) for the subsample with measurement errors. The colorful icons and lines represent the corrected subsampling methods. The gray icons and lines represent the uncorrected subsampling methods.

    In this paper, we not only combine the corrected score method with the subsampling technique, but also theoretically derive the consistency and asymptotic normality of the subsampling estimators. In addition, an adaptive two-step algorithm is developed based on optimal subsampling probabilities using A-optimality and L-optimality criteria and the truncation method. The theoretical results of the proposed method are tested with simulated and two real datasets, and the experimental results demonstrate the effectiveness and good performance of the proposed method.

    This paper merely assumes that the covariates are affected by the measurement error. However, in practical applications, the response variables can be influenced by measurement errors. The optimal subsampling probabilities are obtained by minimizing $ tr(\bf V) $ or $ tr({\bf V}_{\text{C}}) $ using the design ideas of the A-optimality and L-optimality criteria. In the future, the other optimality criteria for subsampling can be considered to develop more efficient algorithms.

    Ruiyuan Chang: Furnished the algorithms and numerical results presented in the manuscript and composed the original draft of the manuscript; Xiuli Wang: Rendered explicit guidance regarding the proof of the theorem and refined the language of the entire manuscript; Mingqiu Wang: Rendered explicit guidance regarding the proof of theorems and the writing of codes and refined the language of the entire manuscript. All authors have read and consented to the published version of the manuscript.

    The authors declare that they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This research was supported by the National Natural Science Foundation of China (12271294) and the Natural Science Foundation of Shandong Province (ZR2024MA089).

    The authors declare no conflict of interest.

    The proofs of the following lemmas and theorems are primarily based on Wang et al. [5], Ai et al. [7] and Yu et al. [34].

    Lemma 1. If Assumptions A1–A4 and A6 hold, as $ r \to \infty $ and $ N \to \infty $, conditional on $ \mathcal{F}_N $, we have

    $ MXMX=OPFN(r12), $ (A.1)
    $ 1NQ(ˆβMLE)1NQ(ˆβMLE)=OPFN(r12), $ (A.2)
    $ 1NV12CQ(ˆβMLE)dNp(0,I), $ (A.3)

    where

    $ {\overset{\smile}{\bf{M}}_X} = \frac{1}{{Nr}}\sum\limits_{i = 1}^r {\frac{{{\widetilde{ {\bf{\Omega}}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})}}{{{{\widetilde \pi }_i}}}}, $

    and

    $ {{\bf{V}}_{\mathit{\text{C}}}} = \frac{1}{{{N^2}r}}\sum\limits_{i = 1}^N {\mathit{\boldsymbol{\eta}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})\mathit{\boldsymbol{\eta}}{{_{_i}^*}^{\rm T}}({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})}. $

    Proof.

    $ E(MX|FN)=E(1Nrri=1˜Ωi(Σu,ˆβMLE)˜πi|FN)=1Nrri=1Nj=1πjΩj(Σu,ˆβMLE)πj=1NNi=1Ωi(Σu,ˆβMLE)=MX. $

    By Assumption A6, we have

    $ E[(Mj1j2XMj1j2X)2|FN]=E[(1Nrri=1˜Ω(j1j2)i(Σu,ˆβMLE)˜πi1NNi=1Ω(j1j2)i(Σu,ˆβMLE))2|FN]=1rNi=1πi(Ω(j1j2)i(Σu,ˆβMLE)NπiMj1j2X)2=1rNi=1πi(Ω(j1j2)i(Σu,ˆβMLE)Nπi)21r(Mj1j2X)21rNi=1πi(Ω(j1j2)i(Σu,ˆβMLE)Nπi)2=OP(r1). $

    It follows from Chebyshev's inequality that (A.1) holds.

    $ E(1NQ(ˆβMLE)|FN)=E(1N1rri=1˜ηi(Σu,ˆβMLE)˜πi|FN)=1Nrri=1Nj=1πjηj(Σu,ˆβMLE)πj=1NNi=1ηi(Σu,ˆβMLE)=0. $

    By Assumption A4, we have

    $ Var(1NQ(ˆβMLE)|FN)=Var[(1N1rri=1˜ηi(Σu,ˆβMLE)˜πi)|FN]=1N2r2ri=1Nj=1πjηi(Σu,ˆβMLE)ηTi(Σu,ˆβMLE)π2j=1N2rNi=1ηi(Σu,ˆβMLE)ηTi(Σu,ˆβMLE)πi=OP(r1). $

    Now (A.2) follows from Markov's Inequality.

    Let $ \mathit{\boldsymbol{\gamma}} _i^{*} = {\left(N{\pi _i}\right)}^{-1} {{\tilde {\mathit{\boldsymbol{\eta}}} }_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u}, {{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}})} $, then $ N^{-1}{\mathit{\boldsymbol{Q}}^{*}}({\hat{\mathit{\boldsymbol{\beta}}}_{\text {MLE}}}) = {r}^{-1}\sum\limits_{i = 1}^r {\mathit{\boldsymbol{\gamma}} _i^{*}} $ holds. Based on Assumption A6, for all $ \varepsilon > 0 $, we have

    $ ri=1E{r12γi2I(γi>r12ε)|FN}=1rri=1E{γi2I(γi>r12ε)|FN}1r32εri=1E{γi3|FN}=1r12ε1N3Ni=1γi3π2i=OP(r12)=oP(1). $

    This shows that the Lindeberg-Feller conditions are satisfied in probability. Therefore (A.3) is true.

    Lemma 2. Assumptions A1–A7 hold, as $ r \to \infty $ and $ N \to \infty $, conditional on $ \mathcal{F}_N $, for all $ {{{\boldsymbol{s}}}_r} \to {\mathbf{0}} $, we have

    $ 1NrNi=1˜Ωi(Σu,ˆβMLE+sr)˜πi1NNi=1Ωi(Σu,ˆβMLE)=oPFN(1). $ (A.4)

    Proof. The Eq (A.4) can be written as

    $ \frac{1}{Nr}\sum\limits_{i = 1}^N {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}} + {{{\boldsymbol{s}}}_r})}}{{{{\widetilde \pi }_i}}}} - \frac{1}{Nr}\sum\limits_{i = 1}^N {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}})}}{{{{\widetilde \pi }_i}}}} + \frac{1}{Nr}\sum\limits_{i = 1}^N {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}})}}{{{{\widetilde \pi }_i}}}} - \frac{1}{N}\sum\limits_{i = 1}^N {{\bf{\Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\mathit{\boldsymbol{\hat \beta}}}_{\text {MLE}}})}. $

    Let

    $ {{\mathit{\boldsymbol{\tau}}} _1} : = \frac{1}{Nr}\sum\limits_{i = 1}^N {\frac{{\widetilde{\bf{ \Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}} + {{{\boldsymbol{s}}}_r})}}{{{{\widetilde \pi }_i}}}} - \frac{1}{Nr}\sum\limits_{i = 1}^N {\frac{{\widetilde{\bf{ \Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}})}}{{{{\widetilde \pi }_i}}}}, $

    then by Assumption A7, we have

    $ E(τ1S|FN)=E{1NrNi=11˜πi˜Ωi(Σu,ˆβMLE+sr)˜Ωi(Σu,ˆβMLE)S|FN}=1Nrri=1Nj=1πj1πjΩi(Σu,ˆβMLE+sr)Ωi(Σu,ˆβMLE)S1NNi=1m2(Wi)sr=oP(1). $

    It follows from Markov's inequality that $ {{\mathit{\boldsymbol{\tau}}} _1} = {o_{P\mid\mathcal{F}_N}}(1) $.

    Let

    $ {{\mathit{\boldsymbol{\tau}} }_2} : = \frac{1}{Nr}\sum\limits_{i = 1}^N {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}})}}{{\widetilde \pi }_i} - \frac{1}{N}\sum\limits_{i = 1}^N {{\bf{\Omega}}_i^*}({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text {MLE}}})}, $

    then

    $ E\left\{ \frac{1}{Nr} \sum\limits_{i = 1}^N \frac{\widetilde{\bf{{\Omega}}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{\widetilde{\pi}_i} \,\middle\vert\, \mathcal{F}_N \right\} = \frac{1}{N} \sum\limits_{i = 1}^N {\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}). $

    From the proof of Lemma 1, it follows that

    $ E\left[ \left( \overset{\smile}{\bf{M}}_X^{j_1 j_2} - {\bf{M}}_X^{j_1 j_2} \right)^2 \,\middle\vert\, \mathcal{F}_N \right] = {O_P}({r^{ - 1}}) = {o_P}(1). $

    Therefore $ {{\mathit{\boldsymbol{\tau}}} _2} = {o_{P\mid\mathcal{F}_N}}(1) $, and (A.4) holds

    Next, we will prove Theorems 2.1 and 2.2.

    Proof of Theorem 2.1. $ \overset{\smile}{\mathit{\boldsymbol{\beta}}} $ is the solution of $ {\mathit{\boldsymbol{Q}}^{*}}(\mathit{\boldsymbol{\beta}}) = \frac{1}{r}\sum\limits_{i = 1}^r {\frac{1}{{{{\widetilde \pi }_i}}}}{{\tilde{\mathit{\boldsymbol{\eta}}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u}, \mathit{\boldsymbol{\beta}})} = {\mathbf{0}} $, then

    $ E\left( \frac{1}{N} \mathit{\boldsymbol{Q}}^{*}(\mathit{\boldsymbol{\beta}}) \,\middle\vert\, \mathcal{F}_N \right) = \frac{1}{Nr} \sum\limits_{i = 1}^r \sum\limits_{j = 1}^N \pi_j \frac{{\mathit{\boldsymbol{\eta}}}_{j}^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}})}{\pi_j} = \frac{1}{N} \sum\limits_{i = 1}^N {\mathit{\boldsymbol{\eta}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}}) = \frac{1}{N} \mathit{\boldsymbol{Q}}(\mathit{\boldsymbol{\beta}}). $

    By Assumption A6, we have

    $ Var(1NQ(β)|FN)=Var(1N1rri=1˜ηi(Σu,β)˜πi|FN)=1N2r2ri=1Nj=1πjηi(Σu,β)ηTi(Σu,β)π2j=1N2rNi=1ηi(Σu,β)ηTi(Σu,β)πi=OP(r1). $

    Therefore, as $ r \to \infty $, $ {N}^{-1} \mathit{\boldsymbol{Q}}^{*}(\mathit{\boldsymbol{\beta}}) - {N}^{-1} \mathit{\boldsymbol{Q}}(\mathit{\boldsymbol{\beta}}) \xrightarrow{} 0 $ for all $ {\boldsymbol \beta} \in \Lambda $ in conditional probability given $ \mathcal{F}_N $. Thus, from Theorem 5.9 in [32], we have $ \left\| \overset{\smile}{\mathit{\boldsymbol{\beta}}} - \hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}} \right\| = o_{P\mid\mathcal{F}_N}(1) $. By Taylor expansion,

    $ 1NQ(β)=0=1NQ(ˆβMLE)+1Nrri=1˜Ωi(Σu,ˆβMLE+sr)˜πi(βˆβMLE). $

    By Lemma 2, it follows that

    $ \frac{1}{Nr} \sum\limits_{i = 1}^N \frac{\widetilde{\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} + {{\boldsymbol{s}}}_r)}{\widetilde{\pi}_i} - \frac{1}{N} \sum\limits_{i = 1}^N {\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) = o_{P\mid\mathcal{F}_N}(1), $

    then

    $ {\mathbf{0}} = \frac{1}{N} \mathit{\boldsymbol{Q}}^{*}({\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) + \frac{1}{N} \sum\limits_{i = 1}^N {\bf{\Omega}}_i^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) (\overset{\smile}{\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) + o_{P\mid\mathcal{F}_N}(1)(\overset{\smile}{\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}). $

    Here is

    $ \frac{1}{N}{\mathit{\boldsymbol{Q}}^{*}}({\mathit{\boldsymbol{\hat \beta}}_{\text {MLE}}}) + {{\bf{M}}_X}(\overset{\smile}{\mathit{\boldsymbol{\beta}}} - \hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}}) + {o_{P\mid\mathcal{F}_N}}\left( {\left\| {\overset{\smile}{\mathit{\boldsymbol{\beta}}} - \hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}}} \right\|} \right) = {\mathbf{0}}, $

    we have

    $ βˆβMLE=M1X{1NQ(ˆβMLE)+oPFN(βˆβMLE)}=M1XV12CV12C1NQ(ˆβMLE)+M1XoPFN(βˆβMLE)=OPFN(r12)+oPFN(βˆβMLE). $ (A.5)

    By Lemma 1 and Assumption A3, $ {\bf{M}}_X^{ - 1} = {O_{P\mid\mathcal{F}_N}}\left(1 \right) $, we have $ \overset{\smile}{\mathit{\boldsymbol{\beta}}} - \hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}} = {O_{P\mid\mathcal{F}_N}}\left({{r^{ - \frac{1}{2}}}} \right) $.

    Proof of Theorem 2.2. By Lemma 1 and (A.5), as $ r \to \infty $, conditional on $ \mathcal{F}_N $, it holds that

    $ V12(βˆβMLE)=V12M1XV12CV12C1NQ(ˆβMLE)+oP|FN(1). $

    By Lemma 1 and Slutsky's theorem, it follows that

    $ {{\bf{V}}^{ - \frac{1}{2}}}(\overset{\smile}{\mathit{\boldsymbol{\beta}}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})\mathop \to \limits^d N_p({{\mathbf{0}}},{\bf I}). $

    Proof of Theorem 2.3. To minimize the asymptotic variance $ tr({\bf{V}}) $ of $ \overset{\smile}{\mathit{\boldsymbol{\beta}}} $, the optimization problem is

    $ {mintr(V)=min1N2rNi=1[1πiM1Xηi(Σu,ˆβMLE)2], s.t. Ni=1πi=1,0πi1,i=1,,N. $ (A.6)

    Define $ g_i^{\text{mV}} = \left\|{{\bf{M}}_X^{-1}} {\mathit{\boldsymbol{\eta}}}_i^*\left(\mathit{\boldsymbol{\Sigma}}_u, \hat{\mathit{\boldsymbol{\beta}}}_{\text {MLE}}\right)\right\|, \; i = 1, \ldots, N $, it follows from Cauchy's inequality that

    $ tr(V)=1N2rNi=1[1πiM1Xηi(Σu,ˆβMLE)2]=1N2r(Ni=1πi){Ni=1[1πiM1Xηi(Σu,ˆβMLE)2]}1N2r[Ni=1M1Xηi(Σu,ˆβMLE)]2=1N2r[Ni=1gmVi]2. $

    The equality sign holds if and only if $ {\pi _i} \propto g_i^{\text{mV}} $, therefore

    $ \pi_i^{\text{mV}} = \frac{g_i^{\text{mV}}}{\sum\limits_{j = 1}^N g_j^{\text{mV}}} $

    is the optimal solution.

    The proof of Theorem 2.4 is similar to Theorem 2.3.

    Lemma 3. If Assumptions A1–A4 and A6 hold, as $ r_0 \to \infty $, $ r \to \infty $ and $ N \to \infty $, conditional on $ \mathcal{F}_N $, we have

    $ M˜β0XMX=OP|FN(r12), $ (A.7)
    $ M0XMX=OP|FN(r012), $ (A.8)
    $ 1NQ˜β0(ˆβMLE)=OP|FN(r12), $ (A.9)
    $ 1NQ0˜β0(ˆβMLE)=OP|FN(r012), $ (A.10)
    $ 1NVopt12CQ˜β0(ˆβMLE)dNp(0,I), $ (A.11)

    where

    $ \overset{\smile}{\bf{M}}_X^{\tilde{\boldsymbol \beta}_0} = \frac{1}{{Nr}}\sum\limits_{i = 1}^r {\frac{{{\widetilde{\bf{\Omega}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})}}{{\widetilde \pi _i^{\mathit{\text{opt}}}}}}, $
    $ {\bf{M}}_X^0 = \frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{ \Omega}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\mathit{\text{MLE}}}})}}{{\widetilde \pi _i^{\mathit{\text{UNIF}}}}}}. $

    Proof.

    $ E(M˜β0X|FN)=E˜β0[E(M˜β0X|FN,˜β0)]=E˜β0[E(1Nrri=1˜Ωi(Σu,ˆβMLE)˜πopti|FN,˜β0)]=E˜β0[E(MX|FN,˜β0)]=MX. $

    By Assumption A6, we have

    $ E[(M˜β0,j1j2XMXj1j2)2|FN]=E˜β0{E[(M˜β0,j1j2XMj1j2X)2|FN,˜β0]}=E˜β0[1rNi=1πopti(˜Ωj1j2i(Σu,ˆβMLE)NπoptiMj1j2X)2|FN,˜β0]=E˜β0[1rNi=1πopti(˜Ωj1j2i(Σu,ˆβMLE)Nπopti)21r(Mj1j2X)2|FN,˜β0]E˜β0[1rNi=1πopti(˜Ωj1j2i(Σu,ˆβMLE)Nπopti)2|FN,˜β0]=1rNi=1(Ωj1j2i(Σu,ˆβMLE))2N2πopti=OP(r1). $

    It follows from Chebyshev's inequality that (A.7) holds. Similarly, (A.8) also holds.

    $ E\left( \frac{1}{N} \mathit{\boldsymbol{Q}}_{\tilde{\boldsymbol \beta}_0}^{*}({\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) \,\middle\vert\, \mathcal{F}_N \right) = E_{\tilde{\boldsymbol \beta}_0} \left[ E\left( \frac{1}{N} \frac{1}{r} \sum\limits_{i = 1}^r \frac{{\tilde{\mathit{\boldsymbol{{\eta}}}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}})}{\widetilde{\pi}_i^{\text{opt}}} \,\middle\vert\, \mathcal{F}_N, {\tilde{\boldsymbol \beta}}_0 \right) \right] = \frac{1}{N} \sum\limits_{i = 1}^N \mathit{\boldsymbol{\eta}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) = {{\mathbf{0}}}. $

    By Assumption A6, we have

    $ Var(1NQ˜β0(ˆβMLE)|FN)=E˜β0{Var[(1N1rri=1˜ηi(Σu,ˆβMLE)˜πopti)|FN,˜β0]}=1N2rNi=1ηi(Σu,ˆβMLE)ηTi(Σu,ˆβMLE)πopti=OP(r1). $

    Therefore, the (A.9) and (A.10) follow from Markov's Inequality.

    Let

    $ \mathit{\boldsymbol{\gamma}} _{i,{{\mathit{\boldsymbol{\tilde \beta}}}_0}}^{*} = \frac{{\tilde{\mathit{\boldsymbol{\eta}}}_{_i}^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{N\widetilde \pi _i^{\text {opt}}}}, $

    for all $ \varepsilon > 0 $, it follows that $ {N}^{-1}\mathit{\boldsymbol{Q}}_{{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}}^{*}({{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) = {r}^{-1}\sum\limits_{i = 1}^r {\mathit{\boldsymbol{\gamma}} _{i, {{\tilde{\mathit{\boldsymbol{\beta}}}}_0}}^{*}} $,

    $ ri=1E˜β0{E[r12γi,˜β02I(γi,˜β0>r12ε)|FN,˜β0]}=1rri=1E˜β0{E[γi,˜β02I(γi,˜β0>r12ε)|FN,˜β0]}1r32εri=1E˜β0[E(γi,˜β03|FN,˜β0)]=1r12ε1N3Ni=1˜ηi(Σu,ˆβMLE)3πopt2i=OP(r12)=oP(1). $

    This shows that the Lindeberg-Feller conditions are satisfied in probability. Therefore (A.11) is true.

    Lemma 4. If Assumptions A1–A7 hold, as $ r_0 \to \infty $, $ r \to \infty $ and $ N \to \infty $, for all $ {{{\boldsymbol{s}}}_{r_0}} \to \boldsymbol0 $ and $ {{{\boldsymbol{s}}}_r} \to {\mathbf{0}} $, conditional on $ \mathcal{F}_N $, we have

    $ 1Nr0r0i=1˜Ωi(Σu,ˆβMLE+sr0)˜πopti1NNi=1Ωi(Σu,ˆβMLE)=oPFN(1), $ (A.12)
    $ 1Nrri=1˜Ωi(Σu,ˆβMLE+sr)˜πopti1NNi=1Ωi(Σu,ˆβMLE)=oPFN(1). $ (A.13)

    Proof. The Eq (A.12) can be written as

    $ \frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}} + {\mathit{\boldsymbol{s}}_{r_0}})}}{{{{\widetilde \pi }_i^{\text {opt}}}}}} - \frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _i^{\text{opt}}}}} + \frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _i^{\text{opt}}}}} - \frac{1}{N}\sum\limits_{i = 1}^N {{\bf{\Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}. $

    Let

    $ \mathit{\boldsymbol{\tau}}_1^0 : = \frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}} + {\mathit{\boldsymbol{s}}_{r_0}})}}{{\widetilde \pi _i^{\text{opt}}}}} - \frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{ \Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _i^{\text{opt}}}}}, $

    then by Assumption A7, we have

    $ E(τ01S|FN)=E˜β0{E[1Nr0r0i=11˜πopti˜Ωi(Σu,ˆβMLE+sr0)˜Ωi(Σu,ˆβMLE)S|FN,˜β0]}=1NNi=1Ωi(Σu,ˆβMLE+sr0)Ωi(Σu,ˆβMLE)S1NNi=1m2(Wi)sr0=oP(1). $

    It follows from Markov's inequality that $ \mathit{\boldsymbol{\tau}}_1^0 = {o_{P\mid\mathcal{F}_N}}(1) $.

    Let

    $ \mathit{\boldsymbol{\tau}}_2^0 : = \frac{1}{{N{r_0}}}\sum\limits_{i = 1}^{{r_0}} {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}}{{\widetilde \pi _i^{\text{opt}}}}} - \frac{1}{N}\sum\limits_{i = 1}^N {{\bf{\Omega}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})}, $

    then

    $ E˜β0{E[1Nr0r0i=1˜Ωi(Σu,ˆβMLE+sr0)˜πopti|FN,˜β0]}=1NNi=1Ωi(Σu,ˆβMLE). $

    By the proof of Lemma 3, it follows that

    $ E\left[ \left(\overset{\smile}{\bf{M}}_X^{\tilde{\boldsymbol \beta}_0, j_1 j_2} - {{\bf{M}}}_X^{j_1 j_2} \right)^2 \,\middle\vert\, \mathcal{F}_N \right] = {O_P}({r_0}^{-1}) = {o_P}(1), $

    we have $ \mathit{\boldsymbol{\tau}}_2^0 = {o_{P\mid\mathcal{F}_N}}(1) $. Therefore (A.12) holds. Similarly, (A.13) is also true.

    Next, we will prove Theorems 2.5 and 2.6.

    Proof of Theorem 2.5.

    $ E\left( \frac{1}{N} \mathit{\boldsymbol{Q}}_{{\tilde{\boldsymbol \beta}}_0}^{*}(\mathit{\boldsymbol{\beta}}) \,\middle\vert\, \mathcal{F}_N \right) = E_{\tilde{\boldsymbol \beta}_0} \left[ E\left( \frac{1}{N} \frac{1}{r} \sum\limits_{i = 1}^r \frac{{\tilde{\mathit{\boldsymbol{\eta}}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}})}{\widetilde{\pi}_i^{\text{opt}}} \,\middle\vert\, \mathcal{F}_N, \tilde{\boldsymbol \beta}_0 \right) \right] = \frac{1}{N} \sum\limits_{i = 1}^N \mathit{\boldsymbol{\eta}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}}) = \frac{1}{N} \mathit{\boldsymbol{Q}}(\mathit{\boldsymbol{\beta}}). $

    By Assumption A6, we have

    $ {\text{Var}}\left( \frac{1}{N} {\mathit{\boldsymbol{Q}}}_{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}^{*}({\mathit{\boldsymbol{\beta}}}) \,\middle\vert\, \mathcal{F}_N \right) = E_{\tilde{\mathit{\boldsymbol{\beta}}}_0} \left\{ \text {Var} \left( \frac{1}{N} \frac{1}{r} \sum\limits_{i = 1}^r \frac{{\tilde{\mathit{\boldsymbol{\eta}}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}})}{\widetilde{\pi}_i^{\text{opt}}} \,\middle\vert\, \mathcal{F}_N, \tilde{\mathit{\boldsymbol{\beta}}}_0 \right) \right\} = \frac{1}{N^2 r} \sum\limits_{i = 1}^N \frac{\mathit{\boldsymbol{\eta}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}}) \mathit{\boldsymbol{\eta}}_{i}^{*{\rm T}}(\mathit{\boldsymbol{\Sigma}}_u, \mathit{\boldsymbol{\beta}})}{\pi_i^{\text{opt}}} = O_P(r^{-1}). $

    Hence, as $ r \to \infty $, $ {N}^{-1} \mathit{\boldsymbol{Q}}_{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}^{*}(\mathit{\boldsymbol{\beta}}) - {N}^{-1} \mathit{\boldsymbol{Q}}(\mathit{\boldsymbol{\beta}})\mathop \to \limits^{} {{\mathbf{0}}} $ for all $ {\boldsymbol \beta} \in \Lambda $ in conditional probability given $ \mathcal{F}_N $.

    $ \check {\mathit{\boldsymbol{\beta}}} $ is the solution of $ \mathit{\boldsymbol{Q}}_{{{{\tilde{\boldsymbol \beta}}}_0}}^{two - step}(\mathit{\boldsymbol{\beta}}) = {\mathbf{0}} $, we have

    $ 0=1NQtwostep˜β0(ˇβ)=rr+r01NQ˜β0(ˇβ)+r0r+r01NQ0˜β0(ˇβ). $ (A.14)

    By Lemma 4, we have

    $ \frac{1}{N r_0} \sum\limits_{i = 1}^{r_0} \frac{{\widetilde{\bf{\Omega}}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} + {{{\boldsymbol{s}}}_{r_0}})}{\widetilde{\pi}_i^{\text{opt}}} = \frac{1}{N} \sum\limits_{i = 1}^N {\bf{\Omega}}_{i}^*(\mathit{\boldsymbol{\Sigma}}_u, {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}) + o_{P|\mathcal{F}_N}(1) = {\bf{M}_X} + {o_{P|\mathcal{F}_N}(1)}, $

    and

    $ \frac{1}{{Nr}}\sum\limits_{i = 1}^r {\frac{{{\widetilde{\bf{\Omega}}}_i^*({{\mathit{\boldsymbol{\Sigma}}}_u},{{\hat{\mathit{\boldsymbol{\beta}}}}_{MLE}} + {{{\boldsymbol{s}}}_{r}})}}{{\widetilde \pi _i^{\text {opt}}}}} = {{\bf{M}}_X} + {o_{P|\mathcal{F}_N}(1)}. $

    By Taylor expansion, we have

    $ 1NQ˜β0(ˇβ)=1NQ˜β0(ˆβMLE)+1Nrri=1Ωi(Σu,ˆβMLE+sr)˜πopti(ˇβˆβMLE)=1NQ˜β0(ˆβMLE)+MX(ˆβˆβMLE)+oP|FN(1)(ˇβˆβMLE). $ (A.15)

    Similarly,

    $ 1NQ0˜β0(ˇβ)=1NQ0˜β0(ˆβMLE)+MX(ˇβˆβMLE)+oP|FN(1)(ˇβˆβMLE). $ (A.16)

    As $ {{{r_0}}}{{r}^{-1}} \to 0, {N}^{-1} \mathit{\boldsymbol{Q}}_{{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}}^{*0}({{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}}) = {O_{P|\mathcal{F}_N}}\left(r_0^{ - \frac{1}{2}}\right) $, then

    $ \frac{r_0}{r + r_0} \frac{1}{N} \mathit{\boldsymbol{Q}}_{\tilde{\boldsymbol \beta}_0}^{*0}(\hat{\mathit{\boldsymbol{\beta}}}_{\text{MLE}}) = \frac{r_0}{r + r_0} {O_{P|\mathcal{F}_N}}(r_0^{-\frac{1}{2}}) = {o_{P|\mathcal{F}_N}}(r^{-\frac{1}{2}}). $

    Combining this with (A.14)–(A.16), we have

    $ ˇβˆβMLE=OP|FN(r12)+oP|FN(ˇβˆβMLE), $ (A.17)

    which implies that $ {\check {\boldsymbol \beta}} - {\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}} = {O_{P|\mathcal{F}_N}} \left(r^{ - \frac{1}{2}} \right) $.

    Proof of Theorem 2.6. By Lemma 3, $ \frac{1}{N}{\bf{V}}_{\text {C}}^{ - \frac{1}{2}}\mathit{\boldsymbol{Q}}_{{{\tilde{\mathit{\boldsymbol{\beta}}}}_0}}^{*}({{\hat{\mathit{\boldsymbol{\beta}}}}_{\text{MLE}}})\mathop \to \limits^d N({{\mathbf{0}}}, {\bf I}) $, we have

    $ VCVoptCS=1N2rNi=1ηi(Σu,ˆβMLE)ηTi(Σu,ˆβMLE)πopi1N2rNi=1ηi(Σu,ˆβMLE)ηTi(Σu,ˆβMLE)πoptiS1N2rNi=11πopi1πoptiηi(Σu,ˆβMLE)2=1rNi=11πopiπoptiηi(Σu,ˆβMLE)2N2πopi. $

    Taking $ \pi _i^{\text{mVc}} $ as an example, by Assumpion A4, the above equation can be summarized as

    $ 1rNi=11πmVciπmVctiηi(Σu,ˆβMLE)2N2Nj=1gmVcjgmVci=1rNi=11πmVciπmVctiηi(Σu,ˆβMLE)2N2Nj=1ηj(Σu,ˆβMLE)ηi(Σu,ˆβMLE)1rNi=11πmVciπmVctiηi(Σu,ˆβMLE)NNj=1ηj(Σu,ˆβMLE)N1r(1NNi=11πmVciπmVcti2)12(Ni=1ηi(Σu,ˆβMLE)2N)12(Nj=1ηj(Σu,ˆβMLE)N)=oPFN(r1). $

    Therefore $ {\left\| {{{\bf{V}}_{\text {C}}} - {\bf{V}}_{\text {C}}^{opt}} \right\|_S} = {o_{P\mid\mathcal{F}_N}}\left({{r^{ - 1}}} \right) $, and

    $ V12opt(ˇβˆβMLE)=V12optM1X1NQtwostep˜β0(ˆβMLE)+oP|FN(1)=V12optM1X[rr+r01NQ˜β0(ˇβ)+r0r+r01NQ0˜β0(ˇβ)]+oP|FN(1)=V12optM1X(VoptC)12(VoptC)121NQ˜β0(ˇβ)+oP|FN(1), $

    which implies that

    $ {\bf{V}}_{\text {opt}}^{ - \frac{1}{2}}{\bf{M}}_X^{ - 1}{\left( {{\bf{V}}_{\text {C}}^{\text {opt}}} \right)^{\frac{1}{2}}}{\left\{ {{\bf{V}}_{\text {opt}}^{ - \frac{1}{2}}{\bf{M}}_X^{ - 1}{{\left( {{\bf{V}}_{\text {C}}^{\text {opt}}} \right)}^{\frac{1}{2}}}} \right\}^{\rm T}} = {\bf{V}}_{\text {opt}}^{ - \frac{1}{2}}{\bf{M}}_X^{ - 1}{\left( {{\bf{V}}_{\text {C}}^{\text {opt}}} \right)^{\frac{1}{2}}}{\left( {{\bf{V}}_{\text {C}}^{{\text {opt}}}} \right)^{\frac{1}{2}}}{\bf{M}}_X^{ - 1}{\bf{V}}_{\text {opt}}^{ - \frac{1}{2}} = {\bf{I}}. $

    $ \\ $ Therefore

    $ {\bf{V}}_{\text {opt}}^{ - \frac{1}{2}}({\check{\mathit{\boldsymbol{\beta}}}} - {{\mathit{\boldsymbol{\hat {\beta}}}}_{\text{MLE}}})\mathop \to \limits^d N_p({{\mathbf{0}}},{\bf I}). $

    [1] F. Almgren and J. E. Taylor, Flat flow is motion by crystalline curvature for curves with crystalline energies, J. Diff. Geom., 42 (1995), 1-22.
    [2] F. Almgren, J. E. Taylor and L. Wang, Curvature driven flows: A variational approach, SIAM J. Control Optim., 31 (1993), 387-438. doi: 10.1137/0331020
    [3] A. Braides, Approximation of Free-Discontinuity Problems, Lecture notes in Mathematics, 1694, Springer-Verlag, Berlin, 1998.
    [4] A. Braides, Local Minimization, Variational Evolution and $\Gamma$-Convergence, Lecture Notes in Mathematics, 2094, Springer Verlag, Berlin, 2014. doi: 10.1007/978-3-319-01982-6
    [5] A. Braides, M. S. Gelli and M. Novaga, Motion and pinning of discrete interfaces, Arch. Ration. Mech. Anal., 195 (2010), 469-498. doi: 10.1007/s00205-009-0215-z
    [6] A. Braides and G. Scilla, Motion of discrete interfaces in periodic media, Interfaces Free Bound., 15 (2013), 451-476. doi: 10.4171/IFB/310
    [7] C. Conca, J. San Martín, L. Smaranda and M. Vanninathan, On Burnett coefficients in periodic media in low contrast regime, J. Math. Phys., 49 (2008), 053514, 23 pp. doi: 10.1063/1.2919066
    [8] G. W. Milton, The Theory of Composites, Cambridge University Press, 2002. doi: 10.1017/CBO9780511613357
    [9] J. E. Taylor, Motion of curves by crystalline curvature, including triple junctions and boundary points, Differential Geometry, Proceedings of Symposia in Pure Math., 51 (1993), 417-438.
  • Reader Comments
  • © 2014 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(3035) PDF downloads(75) Cited by(4)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog