Loading [MathJax]/jax/element/mml/optable/BasicLatin.js
Research article

Estimation of partially linear single-index spatial autoregressive model using B-splines

  • Received: 07 September 2024 Revised: 10 December 2024 Accepted: 18 December 2024 Published: 26 December 2024
  • Possible dependence across spatial units is a relevant issue in many areas in practice. In this paper, we consider the partially linear single-index spatial autoregressive model to analyze the dependence of the spatial units and suggest an estimation method. An algorithm procedure is proposed to estimate the link function for the single index and the parameters in the single index, as well as the parameters in the linear component and the spatial parameter of the model. The nonparametric function is estimated based on B-spline approximation. The Nelder-Mead iteration algorithm is adopted to calculate the parametric and nonparametric parts simultaneously in the optimization. The asymptotic properties of parameter and function estimates are established. Monte Carlo simulation studies are conducted to investigate the performance of the proposed estimation methodology and calculation procedure. Furthermore, we use the proposed method to analyze air quality data and rural household income data in China.

    Citation: Lei Liu, Jun Dai. Estimation of partially linear single-index spatial autoregressive model using B-splines[J]. Electronic Research Archive, 2024, 32(12): 6822-6846. doi: 10.3934/era.2024319

    Related Papers:

    [1] Mingtao Cui, Wang Li, Guang Li, Xiaobo Wang . The asymptotic concentration approach combined with isogeometric analysis for topology optimization of two-dimensional linear elasticity structures. Electronic Research Archive, 2023, 31(7): 3848-3878. doi: 10.3934/era.2023196
    [2] Akeel A. AL-saedi, Jalil Rashidinia . Application of the B-spline Galerkin approach for approximating the time-fractional Burger's equation. Electronic Research Archive, 2023, 31(7): 4248-4265. doi: 10.3934/era.2023216
    [3] Dewang Chen, Xiaoyu Zheng, Ciyang Chen, Wendi Zhao . Remaining useful life prediction of the lithium-ion battery based on CNN-LSTM fusion model and grey relational analysis. Electronic Research Archive, 2023, 31(2): 633-655. doi: 10.3934/era.2023031
    [4] Haijun Wang, Wenli Zheng, Yaowei Wang, Tengfei Yang, Kaibing Zhang, Youlin Shang . Single hyperspectral image super-resolution using a progressive upsampling deep prior network. Electronic Research Archive, 2024, 32(7): 4517-4542. doi: 10.3934/era.2024205
    [5] Zhenyue Wang, Guowu Yuan, Hao Zhou, Yi Ma, Yutang Ma, Dong Chen . Improved YOLOv7 model for insulator defect detection. Electronic Research Archive, 2024, 32(4): 2880-2896. doi: 10.3934/era.2024131
    [6] Yongwei Yang, Yang Yu, Chunyun Xu, Chengye Zou . Passivity analysis of discrete-time genetic regulatory networks with reaction-diffusion coupling and delay-dependent stability criteria. Electronic Research Archive, 2025, 33(5): 3111-3134. doi: 10.3934/era.2025136
    [7] Victor Ginting . An adjoint-based a posteriori analysis of numerical approximation of Richards equation. Electronic Research Archive, 2021, 29(5): 3405-3427. doi: 10.3934/era.2021045
    [8] Wenjun Liu, Yukun Xiao, Xiaoqing Yue . Classification of finite irreducible conformal modules over Lie conformal algebra $ \mathcal{W}(a, b, r) $. Electronic Research Archive, 2021, 29(3): 2445-2456. doi: 10.3934/era.2020123
    [9] Wenjie Li, Zimei Huang . Do different stock indices volatility respond differently to Central bank digital currency signals?. Electronic Research Archive, 2023, 31(9): 5573-5588. doi: 10.3934/era.2023283
    [10] Noelia Bazarra, José R. Fernández, Ramón Quintanilla . Numerical analysis of a problem in micropolar thermoviscoelasticity. Electronic Research Archive, 2022, 30(2): 683-700. doi: 10.3934/era.2022036
  • Possible dependence across spatial units is a relevant issue in many areas in practice. In this paper, we consider the partially linear single-index spatial autoregressive model to analyze the dependence of the spatial units and suggest an estimation method. An algorithm procedure is proposed to estimate the link function for the single index and the parameters in the single index, as well as the parameters in the linear component and the spatial parameter of the model. The nonparametric function is estimated based on B-spline approximation. The Nelder-Mead iteration algorithm is adopted to calculate the parametric and nonparametric parts simultaneously in the optimization. The asymptotic properties of parameter and function estimates are established. Monte Carlo simulation studies are conducted to investigate the performance of the proposed estimation methodology and calculation procedure. Furthermore, we use the proposed method to analyze air quality data and rural household income data in China.



    In empirical practice of economics and statistics, possible dependence across spatial units is a relevant issue in urban, real estate, regional, public, agricultural, environmental economics, and industrial organization [1]. Many studies have proposed approaches and techniques to deal with these problems by imposing some structures on the model in order to capture spatial dependence or spatial autocorrelation in cross-sectional and/or panel data. In recent years, spatial autoregressive (SAR) models have attracted great interest due to their usefulness in practice. For instance, SAR models have the property of using autocorrelation structures according to spatial patterns on data. These models allow for the analysis of spatial dependencies and interactions among observations, making them particularly useful in fields such as geography, environmental science, and urban planning where spatial relationships play crucial roles.

    Lee [2] explained that ordinary least squares (OLS) estimators can be inconsistent for estimating an SAR model due to the fact that the spatial lagged variables can be correlated with the disturbances in a spatial model. In order to obtain a consistent estimator, Lee [1] proposed the quasi-maximum likelihood estimator (QMLE) for the SAR model, investigated the asymptotic properties of QMLE, and found out that the rates of convergence of QMLE may depend on some general features of the spatial weight matrix of the model. However, the quasi-maximum likelihood procedures often pose computational challenges when dealing with large sample sizes, as analytical solutions are not available and numerical methods must be employed to obtain numerical solutions. Kelejian and Prucha [3] proposed the generalized spatial two-stage least squares procedure for estimating the SAR model in which they demonstrated consistency and asymptotic normality of the resulting estimator. However, the estimator is not asymptotically effective. Lee [4] proposed best spatial two stage least squares estimators for the SAR model to improve the efficiency of the estimator. Su [5] extended the SAR to a semi-parametric setting and proposed the GMM estimation method, and showed that the resulting estimator is consistent and asymptotically normal. Xu and Lee [6] proposed the maximum likelihood estimator (MLE) for the SAR Tobit model and studied the asymptotic properties.

    Recently, there has been a growing literature on semiparametric spatial autoregressive models. Invoking the flexibility of partial linear model, Su and Jin [7] developed asymptotic theories for the QMLE of partial linear SAR models based on the profile quasi-maximum likelihood theme. In Su and Jin [7], the rate of convergence of the spatial parameter estimator depends on some general features of the spatial weight matrix of the model. Du et al. [8] proposed the partial linear additive SAR model. Obviously, the model proposed by Du et al. [8] did not impose any interaction within nonparametric components in spatial data. Therefore, the proposed method by Du et al. [8] cannot solve the problem of interaction effects on response variables in spatial data. Moreover, the nonparametric components are subject to the curse of dimensionality and can only accommodate low-dimensional covariates; see Wang [9]. The single-index model is an important tool in multivariate nonparametric regression. By reducing the dimensionality from multivariate predictors to a univariate index, single-index models avoid the so-called "curse of dimensionality" while still capturing important features in high-dimensional data. Cheng and Chen [10] introduced the partially linear single-index spatial autoregressive model (PLSISARM), employing local linear approximation to estimate unknown link functions and utilizing MLE. In contrast to the partial linear SAR model and the partial linear additive SAR model, PLSISARM considers covariate interactions within its nonparametric component. This paper proposes a new estimation procedure for PLSISARM. Specifically, we utilize B-splines to approximate the unknown link function, enhancing smoothness, and adopt QMLE as the estimation method. We also introduce a feasible Nelder-Mead iteration algorithm tailored for this new estimation procedure. Furthermore, we establish the asymptotic properties for the QMLE. Finally, we demonstrate the application of PLSISARM on two distinct datasets from meteorology and economics, thereby extending the practical utility of the model. Compared to Cheng and Chen's work [10], B-spline approximation is computationally more efficient and smoother than local linear approximation, facilitating easier interpretation in practical applications. Additionally, this paper employs QMLE instead of MLE, resulting in differences in asymptotic distributions. Notably, when the error term does not satisfy the normal distribution, QMLE can still estimate the asymptotic variance, while MLE fails to do so.

    This paper is organized as follows. In Section 2, we describe the technical setting and propose the estimation procedure for PLSISARM via QMLE. In Section 3, we establish the asymptotic properties for the proposed procedure. In Section 4, we present the simulation studies to assess the finite sample performance of the proposed procedure. In Section 5, we analyze two real data sets to illustrate the practical usefulness of the proposed model. In Section 6, we conclude the paper. The appendix provides the proofs of the main results.

    We consider the partially linear single-index spatial autoregressive model (PLSISARM)

    YN=λWNYN+XNβ+g(ZNα)+VN, (2.1)

    where N is the total number of spatial units; YN is the N-dimensional vector of observations of the dependent variable; WN is an N×N specified constant spatial weight matrix with zero diagonal elements, the weights obtained based on physical distance, social networks, or "economic" distance; XN=(x1,x2,,xN)T and ZN=(z1,z2,,zN)T are the N×p and N×d matrices of regressors, respectively; g is an unknown differentiable function, the nonparametric components assume that the influence of the covariates z can be collapsed to a single index, zTα, via a nonparametric link function g; αRd,α=1, and the first nonzero element of α is positive for identifiability; and VN=(ϵ1,ϵ2,,ϵN)T is an N-dimensional vector of i.i.d. disturbances with zero mean and finite variance σ2.

    Remark 1: PLSISARM is flexible enough to cover a variety of situations. When λ=0, β=0, and d=1, the proposed model is reduced to the partial linear single-index model, the single-index spatial autoregressive model, and the partial linear spatial autoregressive model, respectively.

    In order to obtain the estimator of the parameters θ=(σ2,λ,αT,βT)T and the function g(), similar to Lee [1] and Su and Jin [7], we maximize the following log quasi-likelihood function to obtain the estimator

    LN(θ,g())=N2log(2π)N2logσ2+log|MN(λ)|12σ2VTN(θ)VN(θ),

    where SN(λ)=INλWN, and VN(θ)=YNλWNYNXNβg(ZNα). Since the infinite parameter g() in the objective function LN(θ) is unknown, we are unable to maximize LN(θ) to obtain the estimator of θ directly. The B-spline method has been widely used in estimation for nonparametric models and semi-parametric models [11]. B-splines also have numerous precedences for their application in semi-parametric spatial autoregressive models, as demonstrated in Zhang et al. [12], Du et al. [8], and Tian et al. [13]. In this paper, we use B-splines to approximate the nonparametric function g(t) due to the computational advantages. Let B(t)=(B1(t),...,BKN+l+1(t))T be a set of B-spline basis functions of order l+1 with knots a=t0<t1<<tkN<tKN+1=b for a<bR. Therefore, approximating g(t) by a linear combination of normalized B-spline basis functions, it has

    g(t)KN+l+1j=1Bj(t)γj=B(t)Tγ,

    where γ=(γ1,,γ(KN+l+1))T is the unknown spline coefficient vector. Based on the above B-spline approximation, model (2.1) can be rewritten as

    YNλWNYN+XNβ+Π(ZNα)γ+VN, (2.2)

    where Π(ZNα)=(B(zT1α),,B(zTNα))T.

    Given α, apply the QMLE method [1] to model (2.2) to obtain the estimator of δ=(σ2,λ,βT,γT)T, denoted as ˆδ(α)=(σ2(α),ˆλ(α),ˆβ(α)T,ˆγ(α)T)T. Next, we consider the estimation of the parameter α. Substitute ˆδ(α) back into model (2.2). Then we have

    YNˆλ(α)WNYN+XNˆβ(α)+Π(ZNα)ˆγ(α)+VN. (2.3)

    Minimize the following objective function to achieve the estimator of α:

    QN(α)=YNˆλ(α)WNYNXNˆβ(α)Π(ZNα)ˆγ(α)2.

    Once we get the estimator ˆα, we can further obtain ˆδ(ˆα)=(ˆσ2(ˆα),ˆλ(ˆα),ˆβ(ˆα)T,ˆγ(ˆα)T)T. Then, we can achieve the estimator of g() as ˆg=B(zTˆα)Tˆγ.

    To elaborate the ideas of the estimation methodology, we detail the algorithm as follows.

    Algorithm for stage one: Calculate the initial value of the parameters θ=(σ2,λ,βT,αT)T and γ.

    Step 1. Calculate the initial value of α for iteration.

    Consider the following model:

    YN=λWNYN+XNβ+ZNα+VN. (2.4)

    Apply the QMLE of Lee (2004) to model (2.4) to get the estimator of the parameters. Denote the initial value of α as ˆα(0). Normalize ˆα(0) so that ˆα(0)=1 and the first element of ˆα(0), ˆα(0)i>0.

    Step 2. Substitute ˆα(0) into model (2.2) and calculate the QMLE for the parameters in the following model:

    YN=λWNYN+XNβ+Π(ZNˆα(0))γ+VN.

    Let the resulting estimators ˆλ(0), ˆβ(0) and ˆγ(0) be the initial values of λ, β and γ, respectively.

    Algorithm for stage two: The iteration procedure.

    Step 3. Update the estimation for α.

    Substitute the estimators ˆλ(k), ˆβ(k), and ˆγ(k) into model (2.2) to obtain the following model:

    YN=ˆλ(k)WNYN+XNˆβ(k)+Π(ZNα)ˆγ(k)+VN.

    Then, minimize the following objective function to get the update estimation of α, ˆα(k+1),

    Q(α)=YNˆλ(k)WNYNXNˆβ(k)Π(ZNα)ˆγ(k)2.

    Normalize ˆα(k+1) so that ˆα(k+1)=1 and the first element of ˆα(k+1), ˆα(k+1)i>0. The optimization for Q(α) is completed by the Nelder-Mead method of the "optim" function in R.

    Step 4. Update the estimation for σ2,λ, β, and γ.

    Substitute ˆα(k+1) into model (2.2) and calculate the QMLE for the parameters in the following model to get the update estimators ˆσ2(k+1)ˆλ(k+1), ˆβ(k+1), and ˆγ(k+1):

    YN=λWNYN+XNβ+Π(ZNˆα(k+1))γ+VN.

    Repeat Step 3 and Step 4 until the estimator converges, which is

    |ˆλ(k+1)ˆλ(k)|2+ˆα(k+1)ˆα(k)2+ˆβ(k+1)ˆβ(k)2+ˆγ(k+1)ˆγ(k)2<C,

    where C is a predetermined constant which is C=104 in the simulation studies. Thus, the estimator for g() is B(zTˆα(k+1))Tˆγ(k+1).

    Selection of knots: It is very important to select the knots (including the number of knots and the position of knots for B-spline estimation) in this paper. We use the BIC criterion to select the number of knots. Given the number of knots, it is suggested that the position of the knots could be equally spaced or equally spaced sample quantiles of the predictor variables. The order of splines is not necessarily better when it is higher. To avoid overfitting, lower-order splines, such as quadratic and cubic splines, are typically chosen. However, the approximation accuracy also needs to be considered, so cubic splines are often selected in practice, which is also a common choice in literature [8,13]. In this paper, we use cubic splines with equally spaced sample quantiles of the predictor variable knots to estimate the B-spline basis.

    For the purpose of statistical inference, the large-sample properties of the estimates need to be established. Let M(λ)=IλWN, and for any matrix M, denote the projection matrix PM=IM(MTM)1MT. The log likelihood function of model (2.2) is

    LN(θ,γ)=N2log(2π)N2logσ2+log|M(λ)|12σ2VTN(θ,γ)VN(θ,γ), (3.1)

    where VN(θ,γ)=MN(λ)YNXNβΠ(ZNα)γ. Then, we concentrate out γ, and, when it does not cause ambiguity, denote Π(ZNα) simply as Π. The concentrated log likelihood function of θ is

    ˜LN(θ)N2logσ2+log|M(λ)|12σ2(M(λ)YNXNβ)TPΠ(M(λ)YNXNβ). (3.2)

    Moreover, let η=(λ,α). Given η, one can derive the solutions for β and σ2 from (3.2), which are

    ˆβ(η)=(XTNPΠXN)1XTNPΠM(λ)YN, (3.3)

    and

    ˆσ2(η)=1N[M(λ)YN]TPTΠPPΠXNPΠM(λ)YN. (3.4)

    By concentrating β and σ2 from (3.2), we derive the concentrated likelihood

    ˜LcN(η)=N2log2πN2+log|M(λ)|N2log{ˆσ2(η)}. (3.5)

    Let θ0=(σ20,ηT0,βT0)T with η0=(λ0,αT0)T be the true parameters. Define G(λ)=WNM1(λ), M0=M(λ0),G0=WNM10, and A0=(g(zT1α0),,g(zTNα0))T. We then rewrite the model as

    YN=XNβ0+λ0G1(XNβ0+A0)+M10VN, (3.6)

    which is derived from the equation M10=I+λ0G0. Denote Q(η)=max. Then, the solution of this problem is

    (3.7)

    and

    (3.8)

    where . Consequently,

    (3.9)

    Let , and . Before establishing the asymptotic properties of the estimator, we need to make the following assumptions:

    A1. (i) The disturbance is i.i.d. with 0 mean and variance, and the moment exists for some .

    (ii) The regressors and are i.i.d. non-stochastic sequences and have bounded support sets on and , respectively.

    (iii) The distribution of is absolutely continuous and its density is bounded away from zero and infinity on .

    (iv) The real-valued function , where .

    A2. (i) The matrix is nonsingular for all .

    (ii) The entries of satisfies that , , and for all with .

    (iii) The matrices and are uniformly bounded in both row and column sums. The matrix is uniformly bounded in both row and column sums, uniformly in in a compact parameter space . The true is an interior point of .

    A3. (i) Let be the interior knots of and the bound points of . Letting , there exists a constant such that and .

    (ii) The knots number is assumed to satisfy .

    A4. The limit exists and is nonsingular, where and is the first derivative of .

    Assumption A1 outlines the essential features of the disturbances, regressors, and function for the model. Assumption A2 is parallel to Assumption 2, Assumption 5, and Assumption 7 of Lee [1], which characterize the spatial weight matrix . Assumption A3 constrains the characteristics of the knots in B-splines, in parallel with the work of Du et al. [8] and Tian et al. [13]. Assumption A4 ensures the existence of the limiting distribution of the parameters. Subsequently, we present the following main results.

    Theorem 1. Suppose that Assumptions A1–A4 hold. Then, is globally identifiable and is consistent for .

    Theorem 2. Suppose that Assumptions A1–A4 hold. Then, , where "" denotes convergence in probability.

    Theorem 3. Suppose that Assumptions A1–A4 hold. Then, we have , where "" denotes convergence in distribution, and

    (3.10)

    which are defined in the appendix. Additionally, if follows , then .

    The proofs of all theorems can be found in the Appendix. Theorem 1 and Theorem 2 guarantee the consistency of the parameter and the link function , respectively, while Theorem 3 provides the asymptotic distribution of , which can be used for statistical inference. By substituting the corresponding parts in and with estimators, we can obtain the sample analog of the asymptotic variance.

    In this section, we investigate the finite sample performance of the proposed estimation procedures via Monte Carlo simulation studies. The data is generated from the following model:

    (4.1)

    where and , with and values generated from , ; and , with , , and values generated from , ; ; ; and . In the simulation, similar to Carroll et al. [14] and Yu et al. [15], suppose , where , . We focus on the spatial scenario in Case [16] where there are districts and each district contains members. The sample size is . The weight matrix is , where , is the -dimensional column vector of ones, and is the Kronecker product. The numerical study examined different values of from , and , and from , and . For comparison, three different values are considered for , that is, , which represents from weak to strong spatial dependence of the responses. represents weak spatial dependence, represents mild spatial dependence, and represents relatively strong spatial dependence.

    We use the empirical bias (denoted as Bias) and the empirical standard deviation (denoted as SD) to measure the effectiveness of the parameter estimates. Additionally, we have calculated the asymptotic standard deviation (denoted as AsySD) of the parameter estimates to verify the validity of the asymptotic theory. To assess the accuracy of the proposed nonparametric estimator, we choose the integrated mean squared error as the criterion for judgment, and its empirical version is defined as

    (4.2)

    where is the number of grid points on .

    Tables 19 show the simulation results based on 1000 replications. We can observe the following: (i) As the sample size increases, the SD of the parameter estimators (except ) and the of decreases, with the bias remaining near 0, suggesting the consistency of the estimators. (ii) The performance of improves as increases, but it does not necessarily improve as increases. This phenomenon is expected, as only increasing can violate Assumption A2, leading to the unavailability of asymptotic properties. (iii) The performance of improves as increases. This can be understood through the knowledge obtained from the asymptotic variance, as the asymptotic variance of strongly depends on . (iv) The SD of the estimator for finite-dimensional parameters is positively correlated with . (v) The empirical standard deviation of the parameter estimates is very close to the asymptotic standard deviation, validating the asymptotic theory presented in Theorem 3. The performance of the nonparametric estimates for is demonstrated in Figure 1. The true function and the mean of each estimated over the 1000 replicates are plotted. We only display the figure for and , as the plots for other occasions are similar. To save spaces, we omit them.

    Table 1.  Simulation results for .
    m Est R = 10 R = 20 R = 30
    Bias SD AysSD Bias SD AysSD Bias SD AysSD
    0.1 20 2.068 7.815 7.424 0.561 4.134 4.340 0.357 2.985 3.272
    0.081 2.465 2.713 0.040 1.841 1.910 0.007 1.521 1.593
    0.077 2.452 3.077 0.005 1.825 2.428 0.164 1.510 2.128
    0.168 2.611 2.652 0.131 1.799 1.970 0.095 1.564 1.604
    0.031 2.467 2.245 0.021 1.666 1.559 0.071 1.226 1.276
    0.089 2.151 2.140 0.120 1.668 1.620 0.082 1.392 1.329
    1.740 0.646 0.640
    30 0.964 5.106 5.864 0.544 4.920 5.494 0.070 2.663 2.858
    0.408 2.134 2.121 0.145 1.335 1.503 0.058 1.169 1.202
    0.166 1.893 2.824 0.031 1.327 1.879 0.123 1.294 1.549
    0.133 2.074 2.206 0.225 1.436 1.486 0.221 1.209 1.234
    0.100 1.872 1.865 0.002 1.329 1.311 0.083 1.039 1.056
    0.032 1.795 1.835 0.015 1.269 1.246 0.017 1.150 1.074
    1.115 0.308 0.266
    40 0.992 5.732 5.806 0.595 3.698 4.065 0.407 3.423 3.369
    0.037 1.800 1.744 0.057 1.206 1.328 0.019 1.049 1.110
    0.153 1.799 2.235 0.039 1.222 1.674 0.015 1.076 1.409
    0.201 1.815 1.917 0.021 1.230 1.318 0.033 1.086 1.064
    0.143 1.590 1.577 0.102 1.162 1.127 0.030 0.896 0.919
    0.068 1.506 1.489 0.057 1.164 1.115 0.010 0.849 0.914
    0.935 0.272 0.194

     | Show Table
    DownLoad: CSV
    Table 2.  Simulation results for .
    m Est R = 10 R = 20 R = 30
    Bias SD AysSD Bias SD AysSD Bias SD AysSD
    0.5 20 6.699 15.934 14.490 1.845 8.318 8.602 1.086 5.995 6.552
    0.212 5.748 5.793 0.024 4.215 4.089 0.053 3.354 3.398
    0.388 5.727 6.257 0.312 4.187 4.911 0.482 3.456 4.340
    0.735 6.232 5.636 0.163 4.094 4.202 0.124 3.428 3.450
    0.142 5.544 5.017 0.013 3.727 3.485 0.179 2.742 2.852
    0.107 4.823 4.769 0.309 3.730 3.620 0.162 3.111 2.970
    0.000 13.118 0.000 0.000 3.909 0.000 0.000 3.874 0.000
    30 3.985 11.105 11.979 2.138 9.421 10.243 0.577 5.524 5.866
    0.527 4.884 4.463 0.214 3.026 3.272 0.039 2.545 2.629
    0.060 4.362 5.489 0.107 3.044 3.953 0.018 2.664 3.316
    0.106 4.816 4.547 0.362 3.297 3.235 0.196 2.594 2.698
    0.203 4.197 4.165 0.033 2.971 2.929 0.111 2.318 2.360
    0.104 4.009 4.098 0.041 2.830 2.785 0.009 2.569 2.400
    0.000 6.997 0.000 0.000 1.630 0.000 0.000 1.305 0.000
    40 3.703 12.135 11.881 1.955 7.685 8.209 1.419 6.876 6.748
    0.216 4.136 3.750 0.162 2.749 2.901 0.058 2.351 2.420
    0.161 4.038 4.569 0.017 2.754 3.549 0.003 2.286 2.999
    0.384 4.132 4.049 0.053 2.791 2.879 0.089 2.435 2.328
    0.339 3.555 3.524 0.255 2.599 2.518 0.081 2.003 2.055
    0.137 3.378 3.329 0.136 2.605 2.493 0.020 1.901 2.043
    0.000 5.859 0.000 0.000 1.434 0.000 0.000 0.969 0.000

     | Show Table
    DownLoad: CSV
    Table 3.  Simulation results for .
    m Est R=10 R=20 R=30
    Bias SD AysSD Bias SD AysSD Bias SD AysSD
    1.0 20 9.291 19.214 16.714 2.673 10.363 10.183 1.577 7.620 7.865
    0.128 8.474 7.703 0.143 6.207 5.502 0.060 4.861 4.582
    0.755 8.554 7.853 0.651 6.177 6.215 0.773 5.033 5.582
    1.407 9.367 7.479 0.182 5.976 5.639 0.075 4.936 4.685
    0.242 7.864 7.089 0.003 5.274 4.929 0.263 3.882 4.032
    0.152 6.828 6.718 0.463 5.283 5.117 0.222 4.399 4.197
    0.000 49.203 0.000 0.000 11.433 0.000 0.000 9.957 0.000
    30 6.432 14.909 14.607 3.069 11.441 11.455 1.045 7.182 7.253
    0.812 7.361 5.913 0.201 4.372 4.480 0.020 3.610 3.628
    0.089 6.425 6.731 0.228 4.404 5.217 0.125 3.782 4.457
    0.379 7.204 5.935 0.506 4.767 4.437 0.211 3.714 3.723
    0.306 5.954 5.885 0.052 4.200 4.140 0.132 3.277 3.337
    0.146 5.670 5.789 0.066 3.998 3.937 0.003 3.632 3.392
    0.000 20.222 0.000 0.000 3.543 0.000 0.000 2.762 0.000
    40 5.679 15.696 14.471 2.917 10.009 9.900 2.105 8.628 8.077
    0.460 6.139 5.062 0.301 3.975 3.990 0.124 3.331 3.323
    0.308 5.910 5.829 0.010 3.974 4.718 0.039 3.296 4.002
    0.811 6.150 5.395 0.101 4.001 3.962 0.133 3.497 3.206
    0.489 5.027 4.983 0.370 3.678 3.560 0.117 2.834 2.905
    0.168 4.794 4.707 0.194 3.685 3.525 0.026 2.689 2.888
    0.000 17.027 0.000 0.000 3.131 0.000 0.000 2.042 0.000

     | Show Table
    DownLoad: CSV
    Table 4.  Simulation results for .
    m Est R = 10 R = 20 R = 30
    Bias SD AysSD Bias SD AysSD Bias SD AysSD
    0.1 20 0.800 2.980 3.354 0.368 2.712 3.104 0.343 2.522 2.746
    0.351 2.597 2.622 0.026 1.763 1.924 0.041 1.461 1.529
    0.018 2.478 3.395 0.027 1.718 2.302 0.066 1.497 1.866
    0.204 2.453 2.675 0.082 1.812 1.906 0.029 1.401 1.544
    0.037 2.398 2.291 0.019 1.558 1.516 0.012 1.248 1.291
    0.083 2.486 2.320 0.008 1.684 1.663 0.008 1.347 1.304
    0.000 1.363 0.000 0.000 0.470 0.000 0.000 0.398 0.000
    30 0.455 4.187 4.588 0.859 3.775 3.735 0.240 2.161 2.486
    0.001 2.042 2.119 0.190 1.448 1.547 0.098 1.110 1.249
    0.006 2.059 2.674 0.055 1.355 1.973 0.036 1.130 1.622
    0.100 1.955 2.095 0.083 1.450 1.542 0.100 1.154 1.269
    0.069 2.136 1.978 0.044 1.291 1.280 0.021 1.050 1.094
    0.036 1.846 1.876 0.040 1.314 1.301 0.070 1.112 1.110
    0.000 1.022 0.000 0.000 0.372 0.000 0.000 0.261 0.000
    40 1.456 6.342 6.789 0.285 2.684 2.982 0.258 1.863 2.066
    0.003 1.635 1.765 0.041 1.220 1.357 0.001 0.989 1.060
    0.130 1.687 2.224 0.047 1.219 1.666 0.002 0.980 1.441
    0.056 1.592 1.756 0.047 1.323 1.274 0.028 0.996 1.108
    0.190 1.544 1.563 0.031 1.181 1.154 0.094 0.901 0.883
    0.047 1.622 1.612 0.031 1.123 1.127 0.094 0.867 0.881
    0.000 0.544 0.000 0.000 0.224 0.000 0.000 0.204 0.000

     | Show Table
    DownLoad: CSV
    Table 5.  Simulation results for .
    m Est R = 10 R = 20 R = 30
    Bias SD AysSD Bias SD AysSD Bias SD AysSD
    0.5 20 2.551 6.428 6.921 1.347 5.261 5.408 1.159 5.111 5.512
    1.011 6.023 5.449 0.153 4.006 4.109 0.034 3.360 3.341
    0.246 5.782 6.641 0.082 3.936 4.687 0.135 3.424 3.954
    0.372 5.617 5.599 0.184 4.094 4.088 0.119 3.199 3.370
    0.018 5.346 5.100 0.071 3.490 3.373 0.011 2.795 2.887
    0.141 5.562 5.184 0.021 3.764 3.717 0.023 3.006 2.915
    0.000 9.385 0.000 0.000 2.622 0.000 0.000 2.171 0.000
    30 2.258 8.623 8.838 2.215 7.157 6.847 0.951 4.330 4.794
    0.065 4.623 4.430 0.370 3.192 3.354 0.261 2.457 2.736
    0.185 4.719 5.178 0.084 2.953 4.100 0.096 2.542 3.466
    0.312 4.609 4.362 0.028 3.290 3.333 0.190 2.604 2.780
    0.109 4.793 4.418 0.113 2.886 2.863 0.013 2.347 2.444
    0.074 4.123 4.190 0.116 2.940 2.908 0.157 2.482 2.482
    0.000 6.673 0.000 0.000 1.950 0.000 0.000 1.387 0.000
    40 4.679 12.349 11.693 1.092 5.388 5.820 0.792 3.788 4.170
    0.301 3.719 3.777 0.129 2.753 2.971 0.037 2.256 2.313
    0.104 3.869 4.524 0.089 2.741 3.580 0.015 2.238 3.053
    0.028 3.842 3.765 0.166 2.950 2.797 0.079 2.270 2.409
    0.431 3.446 3.493 0.086 2.642 2.579 0.208 2.015 1.972
    0.134 3.638 3.603 0.062 2.513 2.519 0.200 1.938 1.969
    0.000 2.994 0.000 0.000 1.179 0.000 0.000 1.087 0.000

     | Show Table
    DownLoad: CSV
    Table 6.  Simulation results for .
    m Est R = 10 R = 20 R = 30
    Bias SD AysSD Bias SD AysSD Bias SD AysSD
    1.0 20 3.973 8.684 8.745 2.112 6.863 6.549 1.631 6.150 6.165
    2.137 9.396 7.106 0.358 5.916 5.517 0.081 4.870 4.585
    0.343 8.941 8.030 0.249 5.800 5.972 0.198 4.913 5.202
    0.349 8.572 7.308 0.300 6.003 5.508 0.322 4.639 4.610
    0.012 7.559 7.181 0.107 4.939 4.750 0.012 3.955 4.083
    0.166 7.874 7.326 0.032 5.322 5.256 0.035 4.251 4.123
    0.000 43.795 0.000 0.000 6.492 0.000 0.000 4.961 0.000
    30 3.646 11.145 10.405 2.870 8.577 7.700 1.459 5.481 5.636
    0.124 6.847 5.811 0.485 4.573 4.584 0.422 3.511 3.769
    0.484 7.000 6.294 0.068 4.257 5.353 0.143 3.638 4.612
    0.649 6.991 5.713 0.114 4.712 4.538 0.222 3.732 3.827
    0.140 6.805 6.238 0.159 4.085 4.050 0.009 3.318 3.456
    0.112 5.826 5.920 0.173 4.159 4.112 0.223 3.506 3.509
    0.000 21.085 0.000 0.000 4.331 0.000 0.000 3.008 0.000
    40 6.211 14.879 12.560 1.642 6.765 6.878 1.179 4.884 5.097
    0.538 5.429 5.084 0.218 3.955 4.090 0.075 3.241 3.180
    0.190 5.685 5.754 0.067 3.928 4.791 0.073 3.203 4.054
    0.082 5.625 5.072 0.273 4.225 3.862 0.123 3.257 3.299
    0.598 4.874 4.937 0.126 3.738 3.647 0.295 2.849 2.788
    0.187 5.147 5.094 0.086 3.556 3.562 0.281 2.741 2.783
    0.000 7.083 0.000 0.000 2.533 0.000 0.000 2.357 0.000

     | Show Table
    DownLoad: CSV
    Table 7.  Simulation results for .
    m Est R = 10 R = 20 R = 30
    Bias SD AysSD Bias SD AysSD Bias SD AysSD
    0.1 20 0.434 2.291 3.358 0.129 1.438 1.794 0.175 1.278 1.570
    0.089 2.754 2.642 0.053 1.828 1.936 0.040 1.540 1.564
    0.058 2.484 3.117 0.066 1.782 2.307 0.031 1.426 1.877
    0.213 2.694 2.615 0.098 1.824 1.846 0.128 1.476 1.583
    0.019 2.029 2.065 0.005 1.676 1.621 0.052 1.291 1.310
    0.013 2.145 2.090 0.057 1.613 1.608 0.040 1.258 1.281
    0.000 1.354 0.000 0.000 0.490 0.000 0.000 0.361 0.000
    30 0.388 2.051 2.027 0.264 1.361 1.537 0.058 1.050 1.155
    0.152 1.990 2.034 0.091 1.422 1.488 0.021 1.058 1.231
    0.040 2.004 2.697 0.038 1.453 1.920 0.069 1.078 1.566
    0.014 1.828 2.119 0.075 1.438 1.555 0.079 1.108 1.187
    0.078 1.829 1.762 0.045 1.255 1.294 0.061 1.087 1.081
    0.008 1.670 1.650 0.085 1.300 1.279 0.000 1.043 1.067
    0.000 0.778 0.000 0.000 0.414 0.000 0.000 0.260 0.000
    40 0.475 2.033 2.286 0.179 1.438 1.539 0.107 1.066 1.206
    0.092 1.655 1.718 0.118 1.136 1.324 0.014 0.965 1.148
    0.081 1.661 2.348 0.035 1.185 1.757 0.116 0.933 1.397
    0.061 1.659 1.857 0.117 1.186 1.375 0.154 0.999 1.078
    0.120 1.611 1.557 0.011 1.119 1.120 0.030 0.906 0.903
    0.067 1.624 1.544 0.006 1.083 1.102 0.053 0.929 0.894
    0.000 0.479 0.000 0.000 0.297 0.000 0.000 0.169 0.000

     | Show Table
    DownLoad: CSV
    Table 8.  Simulation results for .
    m Est R = 10 R = 20 R = 30
    Bias SD AysSD Bias SD AysSD Bias SD AysSD
    0.5 20 1.631 4.793 5.106 0.544 2.864 3.065 0.671 2.486 2.616
    0.119 6.104 5.610 0.201 4.160 4.143 0.096 3.417 3.394
    0.226 5.444 6.099 0.074 4.091 4.709 0.023 3.216 3.951
    0.788 5.997 5.498 0.323 4.221 3.958 0.167 3.318 3.427
    0.011 4.555 4.618 0.009 3.744 3.614 0.120 2.886 2.929
    0.085 4.792 4.653 0.136 3.601 3.583 0.086 2.803 2.847
    0.000 9.417 0.000 0.000 2.820 0.000 0.000 1.949 0.000
    30 1.376 4.181 3.938 0.776 2.758 2.882 0.283 2.090 2.195
    0.413 4.540 4.326 0.131 3.248 3.224 0.110 2.393 2.689
    0.016 4.607 5.387 0.140 3.232 3.990 0.132 2.436 3.335
    0.122 4.260 4.469 0.003 3.268 3.362 0.178 2.522 2.594
    0.127 4.098 3.940 0.077 2.805 2.891 0.136 2.432 2.415
    0.013 3.724 3.678 0.184 2.910 2.856 0.000 2.325 2.376
    0.000 4.647 0.000 0.000 2.271 0.000 0.000 1.345 0.000
    40 1.617 4.363 4.359 0.743 2.908 2.922 0.383 2.153 2.295
    0.275 3.756 3.704 0.142 2.699 2.889 0.003 2.211 2.510
    0.120 3.799 4.818 0.019 2.799 3.721 0.208 2.133 2.994
    0.215 3.758 3.983 0.074 2.762 2.988 0.331 2.260 2.363
    0.286 3.601 3.480 0.026 2.500 2.504 0.071 2.024 2.015
    0.156 3.634 3.452 0.004 2.422 2.460 0.122 2.077 1.999
    0.000 2.669 0.000 0.000 1.610 0.000 0.000 0.868 0.000

     | Show Table
    DownLoad: CSV
    Table 9.  Simulation results for .
    m Est R = 10 R = 20 R = 30
    Bias SD AysSD Bias SD AysSD Bias SD AysSD
    1.0 20 2.416 6.098 5.732 0.840 3.611 3.548 0.989 3.058 2.986
    0.292 9.656 7.409 0.471 6.112 5.587 0.218 4.930 4.645
    0.100 8.394 7.384 0.067 6.080 6.011 0.115 4.611 5.217
    1.822 9.440 7.208 0.578 6.223 5.339 0.262 4.807 4.675
    0.028 6.444 6.527 0.011 5.294 5.104 0.163 4.081 4.141
    0.226 6.763 6.569 0.198 5.082 5.060 0.122 3.962 4.014
    0.000 42.319 0.000 0.000 7.594 0.000 0.000 4.378 0.000
    30 2.066 5.395 4.810 1.102 3.510 3.423 0.453 2.651 2.634
    0.715 6.799 5.759 0.207 4.721 4.401 0.201 3.424 3.704
    0.082 7.043 6.681 0.265 4.655 5.213 0.179 3.481 4.460
    0.400 6.480 5.907 0.101 4.718 4.583 0.299 3.626 3.574
    0.159 5.811 5.570 0.106 3.966 4.086 0.191 3.440 3.413
    0.020 5.264 5.190 0.264 4.118 4.034 0.001 3.283 3.351
    0.000 14.225 0.000 0.000 5.203 0.000 0.000 2.863 0.000
    40 2.448 5.830 5.197 1.135 3.682 3.464 0.568 2.735 2.725
    0.488 5.445 5.025 0.175 3.918 3.966 0.035 3.183 3.449
    0.086 5.558 6.191 0.044 4.085 4.933 0.262 3.105 4.008
    0.376 5.397 5.378 0.196 3.996 4.086 0.495 3.292 3.255
    0.409 5.096 4.919 0.036 3.533 3.540 0.103 2.860 2.846
    0.217 5.140 4.880 0.002 3.425 3.474 0.172 2.937 2.825
    0.000 6.278 0.000 0.000 3.606 0.000 0.000 1.831 0.000

     | Show Table
    DownLoad: CSV
    Figure 1.  Curve estimate for a single replication of the model simulation study. The true cure (solid curve), the mean of with B-splines () over 1000 simulations (), and the point-wise confidence interval for are shown.

    In this section, we illustrate the usefulness of the proposed estimation method via analysis of real data, especially for the dependence across spatial units.

    Air quality is gaining more and more attention in China and is closely related to people's daily life. Wu and Kuo [17] utilized statistical models to analyze air quality in Taiwan, China. Chen et al. [18] employed machine learning methods to predict air quality in Hangzhou, China, highlighting the need to consider the impact of neighboring areas on air quality. The factors affecting air quality are complex and can be divided into several categories, such as urban characteristics, meteorological factors, emission factors, and so on. According to the fluidity of the air, the air quality of the core city is related to the surrounding cities and the degree of influence is related to the spatial distance. Therefore, we select the air quality data of cities in China to illustrate the usefulness of the proposed method in Section 2.

    The data set contains information of 28 different capital cities in China including air quality index, urban characteristics, meteorological factors, and emission factors from January 1st to April 30th, 2014. There are 13 variables (see Table 10).

    Table 10.  Variables in China air quality data.
    Variables Explanation (data sources)
    TEMP Temperature
    WIND Wind
    RAIN Rainfall
    PEOPLE Urban resident population/(10, 000 people)
    CAR Civil vehicle ownership/vehicle
    BUILDING Building construction area/(10, 000 )
    SO2 Industrial sulfur dioxide emissions/t
    YANCHEN Industrial soot emissions/t
    GREEN Green area/
    GN Dichotomous variable, the heating period in northern area in China(GN=1)
    LONG Longitude coordinates
    LATT Latitude coordinates
    AQI Air quality index
    *data sources: (1) China Weather Network http://www.weather.com.cn/, (2) China Regional Economic Statistics Yearbook, (3) China City Statistical Yearbook, (4) http://www.gpsspg.com/maps.htm.

     | Show Table
    DownLoad: CSV

    Air quality index (AQI) is the dependent variable. Urban permanent population (PEOPLE), green area (GREEN), civil vehicle ownership (CAR), industrial sulfur dioxide emissions (SO2), industrial soot emissions (YANCHEN), temperature (TEMP), wind (WIND), rainfall (RAIN), building construction area (BUILDING), and the heating problem in northern China (GN) are the independent variables. The model considers the influence of the spatial structure factors of the city on the air quality. According to the research background, we consider the variables log(RAIN), log(PEOPLE), log(CAR), log(BUILDING), GN as the linear independent variables and the variables TEMP, WIND, log(SO2), log(YANCHEN), log(GREEN) as the non-parametric independent variables.

    Therefore, we consider fitting the data via the following model:

    (5.1)

    where is AQI; are log(RAIN), log(PEOPLE), log(CAR), log(BUILDING), GN, respectively; and are TEMP, WIND, log(SO2), log(YANCHEN), log(GREEN), respectively. Logarithmic transformation for variables is taken to ease off the trouble caused by big gaps in the domain [19].

    According to the practice in Du et al. [8], Su and Yang [20], and Xie et al. [21], let the weight matrix be

    where represents the spatial distance of the th and th units, the Euclidean distance is calculated in terms of the longitude and latitude coordinates of any two cities, and is the threshold distance chosen to median of . We choose the weight matrix in this form as the dependent variable AQI is spatially correlated. That is, when the distance between two cities is far enough, there is tiny spatial relation between them, and a threshold distance is chosen to set the tiny weight to be zero to alleviate the potential effect of tiny weights on the data analysis. Furthermore, we normalize the weight matrix so that the sum of rows of is equal to 1, i.e., . Obviously, the spatial weight matrix is exogenous. We use the proposed method in Section 2 to analyze the processed data. The analysis results for the estimation of unknown parametric coefficients are listed in Table 11. For the nonparametric component, the univariate function estimates are displayed in Figure 2.

    Table 11.  Analysis results for China air quality data.
    Parameter log(RAIN) log(PEOPLE) log(CAR) log(BUILDING) GN
    () () () () ()
    Estimation 0.261 0.437 0.351 0.120 0.075 1.232
    SD 0.134 0.259 0.314 0.247 0.166 0.434
    Parameter TEMP WIND log(SO2) log(YANCHEN) log(GREEN)
    () () () () ()
    Estimation 0.023 0.188 0.981 0.027 0.020
    SD 0.296 0.329 0.510 0.617 0.324

     | Show Table
    DownLoad: CSV
    Figure 2.  Curve estimate for China air quality data, with on the x-axis and on the y-axis.

    From the analysis results, the estimation of the spatial parameter is , which indicates a substantial spatial relationship exists among the AQI of the cities in China. The regression coefficients of are positive, implying that the factors of rainfall, urban resident population, civil vehicle ownership/vehicle, building construction area, and the heating period have positive effects on AQI of the cites. The results are generally consistent with our judgment and expectations. As population grows, resource consumption increases, resulting in air pollution. Increased car ownership and construction area lead to increased emissions of pollutants. Increased and large amounts of pollutants during the heating period in the northern area lead to a drop of air quality.

    From Figure 2, in the nonparametric component of the model, we observe that the single-index has an "S" shape nonlinear effect on the AQI of the city. It indicates that there exists interaction among the covariates in the nonparametric components. From the results of the single-index coefficients of , the variables temperature, wind, industrial sulfur dioxide emissions, industrial soot emissions, and green area have non-linear effects on the AQI. The single-index coefficients of temperature, industrial sulfur dioxide emissions, industrial soot emissions, green area are positive, and the single-index coefficient of wind are negative. The results are generally consistent with our judgment and expectations. Increased wind power helps the diffusion of pollutants in the air, which improves air quality and leads to a decline of AQI. Increased temperature is not conducive to the decomposition of air pollutants. Increased industrial sulfur dioxide and industrial soot emissions lead to an increase of air pollutants.

    However, more rainfall is conducive to reducing the presence of pollutants in the air and increased green plants are conducive to absorption and degradation of air pollutants, which contradicts our data analysis results (). The model has not undergone variable selection, therefore significance cannot be solely determined by coefficient values. The abnormality of these two coefficients may indicate that they are statistically insignificant.

    The development of the rural economy is significant to the overall economic development in China. Recently, it is facing many problems, such as large populations, low incomes and increasing difficulty in employment. Among them, rural household income is affected by multiple factors. We take the rural household income data of Fuping County in China as an example, including 209 villages, to analyze the factors of rural household income by the proposed method in Section 1. In this model, the dependent variable is the per capita net income of farmers, and the independent variables are 7 factors, including rural natural conditions, village industries, and labor skills, which have great impact on rural household income (see Table 12).

    We consider the variables agNum and farm as the linear independent variables because they represent agriculture-related productivity and means of production in economics, which signify the scale of agriculture. We anticipate that 'income' will exhibit a linear relationship with respect to these two factors. The other variables urNum, agFa, area, indusNum, workNum are the non-parametric independent variables. Next, we fit the data via the following model:

    (5.2)
    Table 12.  Variables in rural household income data.
    Variables Explanation
    income the per capita net income of farmers
    agNum the number of labors engaging in agriculture
    urNum the number of labors left the farm for urban
    farm Farmland
    agFa Facility agriculture
    area Effective irrigation area
    indusNum Number of rural industries
    workNum Number of skilled workers

     | Show Table
    DownLoad: CSV

    where is income; are agNum and farm, respectively; and are urNum, agFa, area, indusNum, and workNum, respectively.

    Obtained from spatial geographic information, we can get the neighboring situation of the villages. Therefore,

    (5.3)

    where represents adjacent of the th and th villages; when two villages are not adjacent, there is tiny spatial relation between them, then . As the response variable could be spatially correlated, we choose the weight matrix in this form. Furthermore, we normalize the weight matrix so that the sum of rows of is equal to 1, i.e., . Obviously, the spatial weight matrix is exogenous. The analysis results for the estimation of unknown parametric coefficients are listed in Table 13. For the nonparametric components, the univariate function estimate is displayed in Figure 3. According to the result, the estimation of the spatial parameter is indicates a substantial spatial relationship exists in the model. The regression coefficients of and are negative, which implies that the amount of labor engaging in agriculture and farmland would not increase rural household income.

    Table 13.  Analysis results for rural household income data.
    Parameter agNum farm urNum agFa area indusNum workNum
    () () () () () () ()
    Estimation 0.242 0.044 0.228 0.309 0.349 0.826 0.310 0.072
    SD 0.072 0.073 0.072 0.285 0.189 0.283 0.359 0.367

     | Show Table
    DownLoad: CSV
    Figure 3.  Curve estimate for rural household income data, with on the x-axis and on the y-axis.

    From Figure 3, in the nonparametric parts of the model, we observe that single-index has a nonlinear effect on rural household income. There exists interaction among the covariates in the nonparametric components. The single-index coefficients of (variables urNum, agFa, area, indusNum, workNum) have significant non-linear effects on rural household income.

    Further research is needed to develop a model structure selection method for partially linear single-index spatial autoregressive models. Variable selection decides which covariates have linear effects, and which have nonlinear effects for data analysis.

    In this paper, B-spline approximation is employed for the link function in the partially linear single-index spatial autoregressive model, offering a smoother and more interpretable alternative compared to local linear approximation. A feasible Nelder-Mead iteration algorithm tailored for this model is proposed to solve the QMLE. The parameter estimation and function estimation have been proven to be consistent, and the asymptotic distribution of parameter estimates for non-normal, general error terms is provided. A series of Monte Carlo simulations are performed to verify the effectiveness of the proposed model and estimation method. Finally, the model and method are applied to air quality datasets and rural income datasets, thereby extending the application scope of the model.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This research was funded by the grants from the Key Projects of the National Natural Science Foundation of China (Grant No.: 11731101).

    The authors declare there is no conflicts of interest.

    Lemma 1. Suppose that assumptions A1-A4 hold. Then, , for .

    Proof. According to Corollary 6.21 in Schumaker [22], we find that . Using this, similar to Lemma 1 of Tian et al. [13], it suffices to show .

    Lemma 2. Suppose that assumptions A1-A4 hold. Then, .

    Proof. Given that the eigenvalues of a projection matrix consist solely of 0s and 1s, with the number of 1s equaling the rank of the matrix, and recognizing that the trace of a matrix is equivalent to the sum of its eigenvalues, we can readily prove the lemma.

    Proof of Theorem 1. First, we prove that . Similar to Su and Jin [7] and Cheng and Chen [10], it suffices to show that

    (A1)

    and

    (A2)

    where is the complement of an open neighborhood of with diameter .

    To prove (A1), it is sufficient to show that

    (A3)

    From (3.4), we have

    (A4)

    Then,

    (A5)

    Similar to the proof of Theorem 1 in Tian et al. [13], by using Lemma 1, it is easy to prove that , and such that (A1) holds. Furthermore, referring to Tian et al. [13], based on Lemma 1, we can also prove the validity of (A2). The primary difference lies in the use of an unknown single-index function in our cross-sectional data context, whereas Tian et al. [13] employed a q-dimensional unknown function in their panel data setting.

    Next, we prove that . Given that we now have and , due to the continuity of in terms of , it follows that . Recall that

    (A6)

    The second equality in the above equation relies on the fact that , and the projection matrix eliminates the term . Then, by Lemma 1 and Lemma 2, we have

    and

    where is the diagonal element of . Then, we have . The proof of consistency for is analogous to that of : it suffices to demonstrate that its expectation converges to and its variance converges to 0. Further elaboration on this point is omitted here for brevity.

    Proof of Theorem 2. According to Theorem 1, we have . Then,

    where . Similar to Theorem 3 of Tian et al. [13], we can prove all the components above are , which indicates .

    According to de Boor [11], similar to Theorem 3 in Tian et al. [13], B-splines have the following properties: for , and

    where and are positive constants. With Cororllary 6.21 of Schumaker [22], we have . Then,

    so .

    Proof of Theorem 3. By Theorem 2 of Lee [1] and Theorem 3 of Tian et al. [13], the asymptotic normality of the parameter can be derived from the Taylor expansion of . The first-order Taylor expansion at is

    where lies between and . Then,

    (A7)

    The proof of the theorem can be accomplished using Slusky's theorem, if it holds that

    (A8)

    and

    (A9)

    Recall and . Comining Lemma 1 and Assumption A1, we have

    (A10)

    and

    (A11)

    Notice that the above equation is linear, quadratic, or cubic in terms of , except for terms involving . Combining , by continuity we can obtain

    (A12)

    if . To show this, by the mean value theorem we have

    (A13)

    where lies between and . According to Assumption A2, it easy to know that , so (A12) is proved. To show

    (A14)

    we have

    (A15)

    which is the result of Lemma 2. The existence of is guaranteed by Assumptions A2 and A4. According to the law of large numbers, (A14) holds, and thus (A9) holds.

    It is easy to see that the terms of (A10) are linear or quadratic functions of , and therefore their means are all . By the central limit theorem of Kelejian and Prucha [23], we have

    (A16)

    where

    (A17)

    with

    It is not difficult to observe that when follows , we have , i.e., .



    [1] L. F. Lee, Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models, Econometrica, 72 (2004), 1899–1925. https://doi.org/10.1111/j.1468-0262.2004.00558.x doi: 10.1111/j.1468-0262.2004.00558.x
    [2] L. F. Lee, Consistency and efficiency of least squares estimation for mixed regressive, spatial autoregressive models, Econometric Theory, 18 (2002), 252–277. https://doi.org/10.1017/S0266466602182028 doi: 10.1017/S0266466602182028
    [3] H. H. Kelejian, I. R. Prucha, A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances, J. Real Estate Finance Econ., 17 (1998), 99–121. https://doi.org/10.1023/A:1007707430416 doi: 10.1023/A:1007707430416
    [4] L. F. Lee, Best spatial two stage least squares estimators for a spatial autoregressive model with autoregressive disturbances, Econometric Rev., 22 (2003), 307–335. https://doi.org/10.1081/ETC-120025891 doi: 10.1081/ETC-120025891
    [5] L. J. Su, Semiparametric GMM estimation of spatial autoregressive models, J. Econometrics, 167 (2012), 543–560. https://doi.org/10.1016/j.jeconom.2011.09.034 doi: 10.1016/j.jeconom.2011.09.034
    [6] X. Xu, L. F. Lee, Maximum likelihood estimation of a spatial autoregressive tobit model, J. Econometrics, 188 (2015), 264–280. https://doi.org/10.1016/j.jeconom.2015.05.004 doi: 10.1016/j.jeconom.2015.05.004
    [7] L. J. Su, S. N. Jin, Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models, J. Econometrics, 157 (2010), 18–33. https://doi.org/10.1016/j.jeconom.2009.10.033 doi: 10.1016/j.jeconom.2009.10.033
    [8] J. Du, X. Q. Sun, R. Y. Cao, Z. Z. Zhang, Statistical inference for partially linear additive spatial autoregressive models, Spat. Stat., 25 (2018), 52–67. https://doi.org/10.1016/j.spasta.2018.04.008 doi: 10.1016/j.spasta.2018.04.008
    [9] J. L. Wang, L. G. Xue, L. X. Zhu, Y. S. Chong, Estimation for a partial-linear single-index model, Ann. Stat., 38 (2010), 246–274. https://doi.org/10.1214/09-AOS712 doi: 10.1214/09-AOS712
    [10] S. L. Cheng, J. B. Chen, Estimation of partially linear single-index spatial autoregressive model, Stat. Pap., 62 (2021), 495–531. https://doi.org/10.1007/s00362-019-01105-y doi: 10.1007/s00362-019-01105-y
    [11] C. De Boor, A Practical Guide to Splines, Springer, 1978.
    [12] Y. Zhang, C. X. Xia, L. R. Zeng, B-spline estimation for varying-coefficient single-index model, Chin. J. Appl. Prob. Stat., 29 (2013), 433–442. https://doi.org/10.3969/j.issn.1001-4268.2013.09.008 doi: 10.3969/j.issn.1001-4268.2013.09.008
    [13] R. Q. Tian, M. J. Xia, D. K. Xu, Profile quasi-maximum likelihood estimation for semiparametric varying-coefficient spatial autoregressive panel models with fixed effects, Stat. Pap., 65 (2024), 5109–5143. https://doi.org/10.1007/s00362-024-01586-6 doi: 10.1007/s00362-024-01586-6
    [14] R. J. Carroll, J. Q. Fan, I. Gijbels, M. P. Wand, Generalized partially linear single-index models, J. Am. Stat. Assoc., 92 (1997), 477–489. https://doi.org/10.1080/01621459.1997.10474001 doi: 10.1080/01621459.1997.10474001
    [15] P. Yu, J. Du, Z. Z. Zhang, Single-index partially functional linear regression model, Stat. Pap., 61 (2020), 1107–1123. https://doi.org/10.1007/s00362-018-0980-6 doi: 10.1007/s00362-018-0980-6
    [16] A. C. Case, Spatial patterns in household demand, Econometrica, 59 (1991), 953–965. https://doi.org/10.2307/2938168 doi: 10.2307/2938168
    [17] E. Y. Wu, S. L. Kuo, A study on the use of a statistical analysis model to monitor air pollution status in an air quality total quantity control district, Atmosphere, 4 (2013), 349–364. https://doi.org/10.3390/atmos4040349 doi: 10.3390/atmos4040349
    [18] L. Chen, H. Y. Long, J. H. Xu, B. Q. Wu, Z. Hang, T. Xing, et al. Deep citywide multisource data fusion-based air quality estimation, IEEE Trans. Cybern., 54 (2024), 111–122. https://doi.org/10.1109/TCYB.2023.3245618 doi: 10.1109/TCYB.2023.3245618
    [19] J. Wang, L. J. Yang, Efficient and fast spline-backfitted kernel smoothing of additive models, Ann. Inst. Stat. Math., 61 (2009), 663–690. https://doi.org/10.1007/s10463-007-0157-x doi: 10.1007/s10463-007-0157-x
    [20] L. J. Su, Z. L. Yang, Instrumental Variable Quantile Estimation of Spatial Autoregressive Models, 2011. Available from: https://ink.library.smu.edu.sg/soe_research/1074
    [21] T. F. Xie, R. Y. Cao, J. Du, Variable selection for spatial autoregressive models with a diverging number of parameters, Stat. Pap., 61 (2020), 1125–1145. https://doi.org/10.1007/s00362-018-0984-2 doi: 10.1007/s00362-018-0984-2
    [22] L. L. Schumaker, Spline Functions, Wiley, 1981.
    [23] H. H. Kelejian, I. R. Prucha, Specification and estimation of spatial autoregressive models with autoregressive and heteroscedastic disturbances, J. Econometrics, 157 (2010), 53–67. https://doi.org/10.1016/j.jeconom.2009.10.025 doi: 10.1016/j.jeconom.2009.10.025
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(548) PDF downloads(44) Cited by(0)

Figures and Tables

Figures(3)  /  Tables(13)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog