1.
Introduction
Spatial autoregressive model (SAR) and its derivatives have been widely used in many areas such as economics, political science, public health and so on. There are lots of literatures concerning about spatial autoregressive models such as Anselin [1], LeSage [19], Anselin and Bera [2], Lee and Yu [20], LeSage and Pace [21], Lee [22], Dai, Li and Tian [9]. In particular, Lee [22] utilised generalized method of moments to make inference about spatial autoregressive model. Xu and Li [25] investigated the instrumental variable (Ⅳ) and maximum likelihood estimators for spatial autoregressive model using a nonlinear transformation of dependent variable. However, spatial autoregressive model may not be flexible enough to capture nonlinear impact of some covariates since its parametric structure. In order to enrich model adaptability and flexibility, some semiparametric spatial autoregressive models have been proposed. For example, Su [31] studied a semiparametric SAR, which includes nonparametric covariates. Su and Jin [32] proposed partially linear SAR with both linear covariates and nonparametric explanatory variables. Sun et al. [33] studied a semiparametric spatial dynamic model with a profile likelihood approach. Wei and Sun [36] derived semiparametric generalized method of moments estimator. Hoshino [35] proposed a semiparametric series generalized method of moments estimator and established consistency and asymptotic normality of the proposed estimator.
However, with development of economic and scientific technology, huge amounts of data can be easily collected and stored. In particular, some types of data are observed in high dimensions and frequencies containing rich information. We usually call them functional data. When those types data are included in a model as covariates, it is common to use functional linear model (FLM). There exist vast literatures on estimation and prediction for FLM (See, for example, Reiss et al.[26], Ramsay and Dalzell [27], Delaigle and Hall [11], Aneiros-Pˊerez and Vieu [3]). Many methods were proposed to estimate the slop function such as Cardot et al. [5], Hall and Horowitz [14], Crambes et al. [8], Shin [28]. In particular, Hall and Horowitz [14] established minimax convergence rates of estimation. Cai and Hall [6] proposed functional principle components method and a reproducing kernel Hilbert space approach was used in Yuan and Cai [7].
In many applications of spatial data, there are often covariates with nonlinear effect and functional explanatory variables. This motivates us to propose an interesting and novel functional semiparametric spatial autoregressive model. The model is relatively flexible because it utilises functional linear model to deal with functional covariate and semiparametric SAR model to allow spatial dependence and nonlinear effect of scalar covariate. Recently, some models consider both functional covariates and spatial dependence. For instance, Pineda-Rˊios [24] proposed functional SAR model and used least squares and maximum likelihood methods to estimate parameters. The functional SAR model considers spatial effect for error term instead of spatial effect for response variable. Huang et al. [12] considered spatial functional linear model and they developed an estimation method based on maximum method and functional principle component analysis. Hu et al. [13] developed generalized methods of moments to estimate parameters in spatial functional linear model. In the paper, we proposed a generalized method of moments estimator which is heteroskedasticity robust and takes a closed-form written explicitly.
The rest of the paper is organized as follows. Section 2 introduces the proposed model and the estimation procedure. The asymptotic properties of the proposed estimators are established in Section 3. Section 4 conducts simulation studies to evaluate the empirical performance of the proposed estimators. Section 5 gives some discussions about the model. All technical proofs are provided in appendix.
2.
Model and estimation method
2.1. Model and notations
Consider the following novel functional semiparametric spatial autoregressive model,
where Y is a response variable, ρ is an unknown coefficient of the spatial neighboring effect, W is the constant spatial weight matrix with a zero diagonal, Z=(Z1,...,Zp)′ is a p-dimensional covariate and θ is its coefficient. {X(t):t∈[0,1]} is a zero-mean and second-order (i.e. E|X(t)|2<∞,∀t∈[0,1]) stochastic process defined on (Ω,B,P) with sample paths in L2[0,1], the Hilbert space containing integrable functions with inner product ⟨x,y⟩=∫10x(t)y(t)dt,∀x,y∈L2[0,1] with norm ‖. The slope function \beta(t) is a square integrable function on [0, 1] , \mathit{\boldsymbol{U}} is a random variable, g(\cdot) is an unknown function on its support [0, 1] without loss of generality. We assume E[g(\cdot)] = 0 to ensure the identifiability of the nonparametric function. \boldsymbol{\varepsilon} is a random error with zero mean and finite variance \sigma^2, independent of \mathit{\boldsymbol{Z}}, \mathit{\boldsymbol{U}} and X(t) .
Remark 1. The model (2.1) is more flexible to take different models. It generalizes both semiparametric spatial autoregressive model [32] and functional partial linear model [28] which correspond to the cases \beta(t) = 0 and \rho = 0 , respectively. The model can be represented by \mathit{\boldsymbol{Y}} = (\mathit{\boldsymbol{I}}-\rho \mathit{\boldsymbol{W}})^{-1}\int^{1}_{0} \mathit{\boldsymbol{X}}(t)\beta(t)dt+(\mathit{\boldsymbol{I}}-\rho \mathit{\boldsymbol{W}})^{-1} \mathit{\boldsymbol{Z}}' \boldsymbol{\theta}+(\mathit{\boldsymbol{I}}-\rho \mathit{\boldsymbol{W}})^{-1}g(\mathit{\boldsymbol{U}})+(\mathit{\boldsymbol{I}}-\rho \mathit{\boldsymbol{W}})^{-1} \boldsymbol{\varepsilon}. We assume \mathit{\boldsymbol{I}}-\rho \mathit{\boldsymbol{W}} could be inverted to make the presentation valid. Thus Y_i is also influenced by its neighbours' covariates X_{j}(t) as j\neq i . Parameter \rho indicates the basic impact of the neighbours. Greater absolute value of \rho means that the response variable is more likely to be affected by its neighbours.
2.2. Estimation method
In the section, we give a method to estimate unknown parameter \rho and \boldsymbol{\theta} , slope function \beta(\cdot) and nonparametric function g(\cdot) . We use B-spline basis function to approximate g(\cdot) and \beta(\cdot) . Let 0 = u_0 < u_1... < u_{k_1+1} = 1 be a partition of interval [0, 1] . Using u_i as knots, we have N_1 = k_1+l_1+1 normalized B-spline basis functions of order l_1+1 that from a basis function for the linear spline space. Put the basis function as a vector \mathit{\boldsymbol{B}}_1(t) = (B_{11}(t), ..., B_{1N_1}(t))' and then the slope function \beta(\cdot) is approximated by \mathit{\boldsymbol{B}}'_1(\cdot) \boldsymbol{\gamma} . Similarly, let \mathit{\boldsymbol{B}}_2(u) = (B_{21}(u), ..., B_{2N_2}(u))' be normalized B-spline basis function vector determined by k_2 interior knots in [0, 1] and the order l_2+1 to approximate g(\cdot), where N_2 = k_2+l_2+1 . Then it follows that
where \boldsymbol{\gamma} = (\gamma_1, ..., \gamma_{N_1})' and \boldsymbol{\zeta} = (\zeta_1, ..., \zeta_{N_2})' .
Let {\bf{D}} = \langle X(t), \mathit{\boldsymbol{B}}_1(t)\rangle = \bigg(\int_{0}^{1}X(t)B_{11}(t)dt, ..., \int_{0}^{1}X(t)B_{1N_1}(t)dt\bigg)', {\bf{D}}_i = \langle X_i(t), \mathit{\boldsymbol{B}}_1(t)\rangle . Then the model can be rewritten as
Let \mathit{\boldsymbol{P}} = \boldsymbol{\Pi}(\boldsymbol{\Pi}' \boldsymbol{\Pi})^{-1} \boldsymbol{\Pi}' denote the projection matrix onto the space by \boldsymbol{\Pi} , where \boldsymbol{\Pi} = ({\bf{D}}', \mathit{\boldsymbol{B}}'_2(U))' . Similar to Zhang and Shen [39], profiling out the functional approximation, we obtain
Let {\bf{Q}} = (\mathit{\boldsymbol{W}} \mathit{\boldsymbol{Y}}, \mathit{\boldsymbol{Z}}) and \boldsymbol{\eta} = (\rho, \boldsymbol{\theta}')' . Applying the two stage least squares procedure proposed by Kelejian and Prucha [17], we propose the following estimator
where \mathit{\boldsymbol{M}} = \mathit{\boldsymbol{H}}(\mathit{\boldsymbol{H}}' \mathit{\boldsymbol{H}})^{-1} \mathit{\boldsymbol{H}}' and \mathit{\boldsymbol{H}} is matrices of instrumental variables. Moreover,
Consequently, we use \hat{\beta}(t) = \mathit{\boldsymbol{B}}'_1(t)\hat{ \boldsymbol{\gamma}}, \hat{g}(u) = \mathit{\boldsymbol{B}}'_2(u)\hat{ \boldsymbol{\zeta}} as the estimator of \beta(t) and g(u).
For statistical inference based on \hat{ \boldsymbol{\eta}} , consistent estimators of the asymptotic covariance matrices are needed. Define the following estimator
and
where \|\cdot\| is the L^2 norm for a function or the Euclidean norm for a vector. In order to make statistical inference about \sigma^2 , it need to get the value \omega = E[(\varepsilon_1^2-\sigma^2)^2] . Therefore, we use the following estimator \hat{\omega} to estimate \omega
Similar to Zhang and Shen [39], we use an analogous idea for the construction of instrument variables. In the first step, the following instrumental variables are obtained \tilde{ \mathit{\boldsymbol{H}}} = (\mathit{\boldsymbol{W}}(\mathit{\boldsymbol{I}}-\tilde{\rho} \mathit{\boldsymbol{W}})^{-1}(\mathit{\boldsymbol{Z}}, {\bf{D}}'\tilde{ \boldsymbol{\gamma}}, \mathit{\boldsymbol{B}}'_2(U)\tilde{ \boldsymbol{\zeta}}), \mathit{\boldsymbol{Z}}, \boldsymbol{\Pi}), where \tilde{\rho} , \tilde{ \boldsymbol{\gamma}} and \tilde{ \boldsymbol{\zeta}} are obtained by simply regressing \mathit{\boldsymbol{Y}} on pseudo regressor variables \mathit{\boldsymbol{W}} \mathit{\boldsymbol{Y}}, \boldsymbol{\Pi}. In the second step, instrumental variables \tilde{ \mathit{\boldsymbol{H}}} are used to obtain the estimators \bar{ \boldsymbol{\eta}} , \bar{ \boldsymbol{\gamma}} and \bar{ \boldsymbol{\zeta}} , which are necessary to construct the following instrumental variables \mathit{\boldsymbol{H}} = (\mathit{\boldsymbol{W}}(\mathit{\boldsymbol{I}}-\bar{\rho} \mathit{\boldsymbol{W}})^{-1}(\mathit{\boldsymbol{Z}}'\bar{ \boldsymbol{\theta}}+ {\bf{D}}'\bar{ \boldsymbol{\gamma}}+ \mathit{\boldsymbol{B}}'_2(U)\bar{ \boldsymbol{\zeta}}), \mathit{\boldsymbol{Z}}). Finally, we use the instrumental variables \mathit{\boldsymbol{H}} to obtain the final estimators \hat{\rho} and \hat{ \boldsymbol{\theta}} .
3.
Asymptotic theory
In this section, we derive the asymptotic normality and rates of convergence for the estimators defined in previous section. Firstly, we introduce some notations. For convenience and simplicity, c denote a generic positive constant, which may take different values at different places. Let \beta_0(\cdot) and g_o(\cdot) be the true value of function \beta(\cdot) and g(\cdot) respectively. K(t, s) = {\rm{Cov}}(X(t), X(s)) denotes the covariance function of X(\cdot). a_n\sim b_n means that a_n/b_n is bounded away from zero and infinity as n\rightarrow \infty. In the paper, we make the following assumptions.
C1 The matrix \mathit{\boldsymbol{I}}-\rho \mathit{\boldsymbol{W}} is nonsingular with |\rho| < 1.
C2 The row and column sums of the matrices \mathit{\boldsymbol{W}} and (\mathit{\boldsymbol{I}}-\rho \mathit{\boldsymbol{W}})^{-1} are bounded uniformly in absolute value for any |\rho| < 1 .
C3 For matrix \mathit{\boldsymbol{S}} = \mathit{\boldsymbol{W}}(\mathit{\boldsymbol{I}}-\rho \mathit{\boldsymbol{W}})^{-1} , there exists a constant \lambda_c such that \lambda_c \mathit{\boldsymbol{I}}- \mathit{\boldsymbol{S}} \mathit{\boldsymbol{S}}' is positive semidefinite for all n .
C4 Matrix \frac{1}{n}\tilde{ {\bf{Q}}}'(\mathit{\boldsymbol{I}}- \mathit{\boldsymbol{P}}) \mathit{\boldsymbol{M}}(\mathit{\boldsymbol{I}}- \mathit{\boldsymbol{P}})\tilde{ {\bf{Q}}}\rightarrow \boldsymbol{\Sigma} in probability for some positive definite matrix, where \tilde{ {\bf{Q}}} = (\mathit{\boldsymbol{W}}(\mathit{\boldsymbol{I}}-\rho \mathit{\boldsymbol{W}})^{-1}(\mathit{\boldsymbol{Z}}' \boldsymbol{\theta}+\int^{1}_{0}X(t)\beta(t)dt+g(\mathit{\boldsymbol{U}})), \mathit{\boldsymbol{Z}}).
C5 For matrix \tilde{ {\bf{Q}}}, there exsits a constant \lambda_c^{*} such that \lambda_c^{*} \mathit{\boldsymbol{I}}-\tilde{Q}\tilde{ {\bf{Q}}}' is positive semidefinite for all n .
C6 X(t) has a finite fourth moment, that is, E\|X(t)\|^4\leq c .
C7 K(t, s) is positive definite.
C8 The nonparametric function g(\cdot) has bounded and continuous derivatives up to order r(\geq 2) and the slope function \beta(t)\in C^{r}[0, 1].
C9 The density of \mathit{\boldsymbol{U}} , f_U(u) , is bounded away from 0 and \infty on [0, 1] . Furthermore, we assume that f_U(u) is continuously differentiable on [0, 1] .
C10 For the knots number k_j, (j = 1, 2) , it is assumed k_1\sim k_2\sim k.
Assumptions C 1- C 3 are required in the setting of spatial autoregressive model (see, for example, Lee [23], Kelejian and Prucha [18], Zhang and Shen [39]). They concern the restriction of spatial weight matrix and SAR parameter. Assumption C4 (see Du et al.[10]) is used to represent the asymptotic covariance matrix of \hat{ \boldsymbol{\eta}} . Moreover, assumption C4 requires implicitly that the generated regressors \mathit{\boldsymbol{C}} \mathit{\boldsymbol{Z}} and \mathit{\boldsymbol{Z}} , deviated from their functional part of projection onto \boldsymbol{\Pi} , are not asymptotically multicollinear. Assumption C 5 is required to ensure the identifiability of parameter \boldsymbol{\eta} . Assumptions C 6- C 7 are commonly assumed in functional linear model [14]. Assumption C 6 is a mild restriction to prove the convergence of our estimator. Assumption C 7 guarantees the identifiability of \beta(t) . Assumption C 8 ensures that \beta(\cdot) and g(\cdot) are sufficiently smoothed and can be approximated by basis functions in the spline space. Assumption C 9 requires a bounded condition on the covariates. It is often assumed in asymptotic analysis of nonparametric regression problems (see, for example [15,37]). Assumption C 10 is required to achieve the optimal convergence rate of \hat{\beta}(\cdot) and \hat{g}(\cdot).
Let
where {\bf{V}} = \langle X(t), \beta_0(t)\rangle. The following theorems state the asymptotic properties of the estimators for parameter and nonparametric function.
Theorem 1. Suppose assumptions C1-C10 hold, then
Theorem 2. Assumptions C1-C10 hold and k\sim n^{\frac{1}{2r+1}} , then
Remark 2. Theorem 2 gives the consistency of function estimators. The slope function estimator \hat{\beta}(\cdot) and nonparametric function estimator \hat{g}(\cdot) have the same optimal global convergence rate established by Stone [29].
Theorem 3. Suppose assumptions C1-C10 hold, and E(|\varepsilon_1|^{4+r}) < \infty for some r > 0 , then
Remark 3. From the proof of Theorem 3, if {\rm{trace}}(\mathit{\boldsymbol{S}})/n = o(1) , it can be shown that
Theorem 4. Suppose assumptions C1-C10 hold and n/(k_1^{2r+1}) = n/(k_2^{2r+1}) = o(1) , for any fixed points t, u \in(0, 1) , as n\rightarrow \infty,
where \beta^{*}(t) = \mathit{\boldsymbol{B}}'_1(t) \boldsymbol{\gamma}_0 , g^*(u) = \mathit{\boldsymbol{B}}'_2(u) \boldsymbol{\zeta}_0 , \Xi(t) = \lim_{n\rightarrow \infty}\frac{\sigma^2}{k_1} \mathit{\boldsymbol{B}}'_1(t)\Delta_n \mathit{\boldsymbol{B}}_1(t) , \Lambda(u) = \lim_{n\rightarrow \infty}\frac{\sigma^2}{k_2} \mathit{\boldsymbol{B}}'_2(u)\Omega_n \mathit{\boldsymbol{B}}_2(u), \boldsymbol{\gamma}_0 and \boldsymbol{\zeta}_0 are defined in Lemma 1 of appendix.
Remark 4. The above conclusions is similar to those of Yu et al. [38], which gave the asymptotic normality for spline estimators in single-index partial functional linear regression model. Note that \hat{\beta}(t)-\beta_0(t) = (\beta^*(t)-\beta_0(t))+(\hat{\beta}(t)-\beta^*(t)). We obtain that \beta^*(t)-\beta_0(t) = O(k_1^{-r}) by Lemma 1 on Appendix and \hat{\beta}(t)-\beta^*(t) dominates \beta^*(t)-\beta_0(t) . Therefore we can use the asymptotic behaviors of \hat{\beta}(t)-\beta^*(t) to describe the asymptotic behaviors of \hat{\beta}(t)-\beta_0(t) .
The variance \Xi(t) and \Lambda(u) are involved in basic function and knots. Different basis functions and knots can get different variance estimators. Moreover, the variance expression contains unknown quantities. Replacing them by consistent estimators can lead to approximation errors. What's more, there may exist heteroscedasticity in error term and then the estimator \hat{\sigma}^2 is not consistent. Consequently, we propose the following residual-based method to construct piecewise confidence interval in practice.
It is crucial that spatial structure must be preserved during data resampling in models with spatial dependence [1]. Therefore, we employ the residual-based bootstrap procedure to derive the empirical pointwise standard error of \hat{\beta}(t) and \hat{g}(\cdot) . The procedure can be described as follows:
(1) Based on the data sets \{ \mathit{\boldsymbol{Y}}, \mathit{\boldsymbol{Z}}, X(t), \mathit{\boldsymbol{U}}\} and spatial matrix \mathit{\boldsymbol{W}} , one fits the proposed model and obtain the residual vector \hat{ \boldsymbol{\varepsilon}}_{1} = (\hat{\varepsilon}_{11}, ..., \hat{\varepsilon}_{n1})' . Then, we derive the centralized residual vector \hat{ \boldsymbol{\varepsilon}} .
(2) Draw a bootstrap sample \hat{ \boldsymbol{\varepsilon}}^{*} with replacement from the empirical distribution function of \hat{ \boldsymbol{\varepsilon}} and generate \mathit{\boldsymbol{Y}}^{*} = (\mathit{\boldsymbol{I}}-\hat{\rho} \mathit{\boldsymbol{W}})^{-1}(\mathit{\boldsymbol{Z}}' \boldsymbol{\theta}+ {\bf{D}}'\hat{ \boldsymbol{\gamma}}+ \mathit{\boldsymbol{B}}'_2(U)\hat{ \boldsymbol{\zeta}}+\hat{ \boldsymbol{\varepsilon}}^{*}).
(3) Based on the new data sets \{ \mathit{\boldsymbol{Y}}^*, \mathit{\boldsymbol{Z}}, X(t), \mathit{\boldsymbol{U}}\} and spatial matrix \mathit{\boldsymbol{W}} , we fit the proposed model again to derive the estimator \hat{\beta}^*(t) and \hat{g}^*(u) . Repeat the process many times. Thus, for given t and u , calculate empirical variance of \hat{\beta}^*(t) and \hat{g}^*(u) respectively. Consequently, we use the empirical variance to construct its confidence interval.
4.
Simulation studies
In this section, we use simulation examples to study the properties of the proposed estimators. The data is generated from the following model:
where \rho = 0.5, \beta(t) = \sqrt{2}sin(\pi t/2)+3\sqrt{2}sin(3\pi t/2) and X(t) = \sum_{j = 1}^{50}\gamma_{j}\phi_{j}(t) , where \gamma_{j} is distributed as independent normal with mean 0 and variance \lambda_{j} = ((j-0.5)\pi)^{-2} and \phi_{j}(t) = \sqrt{2}sin((j-0.5)\pi t) . Z_{i1} and Z_{i2} are independent and follow standard normal distribution, \theta_1 = \theta_2 = 1 , U_i\sim U(0, 1) , g(u) = sin(\frac{\pi(u-A)}{C-A}), A = \frac{\sqrt{3}}{2}-\frac{1.654}{\sqrt{12}}, C = \frac{\sqrt{3}}{2}+\frac{1.654}{\sqrt{12}} . The spatial weight matrix \mathit{\boldsymbol{W}} = (w_{ij})_{n\times n} is generated based on mechanism that w_{ij} = 0.3^{|i-j|}I(i\neq j), 1\leq i, j \leq n with w_{ii} = 0, i = 1, ..., n . A standardized transformation is used to convert the matrix \mathit{\boldsymbol{W}} to have row-sums of unit. We set the following three kinds of error term: (1) \varepsilon_i\sim N(0, \sigma^2); (2) \varepsilon_i\sim 0.75t(3); (3) \varepsilon_i\sim (1+0.5U_i)N(0, \sigma^2), where \sigma^2 = 1 . In order to compare the different situations for magnitudes of \rho , we set \rho = \{0.2, 0.7\} with error term N(0, \sigma^2) . Simulation results are derived based on 1000 replications.
To achieve good numerical performances, the order l_1 and l_2 of splines and the number of interior knots k_1 and k_2 should be chosen. To reduce the burden of computation, we use the cubic B-spline with four evenly distributed knots (i.e., k_1 = k_2 = 2 ) for slope function \beta(\cdot) and nonparametric function g(\cdot) respectively. These choices of k_1 and k_2 are small enough to avoid overfitting in typical problem with sample size not too small and big enough to flexibly approximate many smooth function. We use the square root of average square errors (RASE) to assess the performance of estimators \hat{\beta}(\cdot) and \hat{g}(\cdot) respectively
where \{t_i, i = 1, ..., n_1\} , \{u_i, i = 1, ..., n_2\} and n_1 = n_2 = 200 are grid points chosen equally spaced in the domain of \beta(\cdot) and g(\cdot) respectively.
Tables 1–3 show simulation results with different kinds of error terms. Table 4 presents different magnitudes of \rho with error term N(0, 1) . They show the bias (Bias), standard deviation (SD), standard error (SE) and coverage probability (CP) with nominal level of 95 \% for estimator and the mean and standard deviation (SD) of RASE _j (j = 1, 2) for \hat{\beta}(\cdot) and \hat{g}(\cdot) . The simulation results can be summarized as follows:
(1) The estimators \hat{\rho}, \hat{\theta}_1, \hat{\theta}_2, \hat{\sigma}^2 are approximately unbiased and the estimated standard errors are close to sample standard deviations in normal error distribution. The empirical coverage probabilities approximate the nominal level of 95 \% well.
(2) Figure 1 gives an example of the estimated function curve \hat{\beta}(\cdot) and \hat{g}(\cdot) and its empirical 95% confidence interval with sample size n = 300 for error term N(0, 1) . From the mean and standard deviation (SD) of RASE _j (j = 1, 2) , combined with Figure 1, we conclude that the proposed function estimators \hat{\beta}(\cdot) and \hat{g}(\cdot) perform well.
(3) For error term 0.75t(3) and (1+0.5U_i)N(0, 1) , the estimators \hat{\rho}, \hat{\theta}_1, \hat{\theta}_2 are approximately unbiased and the estimated standard errors are close to sample standard deviations. In addition, the mean and standard deviation for RASE of estimated coefficient function \hat{\beta}(\cdot) and \hat{g}(\cdot) are decreasing. It indicates that parametric and non parametric estimators perform well in non-normal error term.
(4) From Table 1 and Table 4, as basic spatial effect \rho increases, the SE and SD of \hat{\rho} decrease. For the different magnitudes of \rho , the Bias and SD of parametric estimators for \hat{\theta}_1 and \hat{\theta}_2 , and the mean of RASE for \hat{\beta}(\cdot) and \hat{g}(\cdot) remain stable. It means that the magnitudes of \rho do not affect the other parametric and nonparametric estimators.
5.
Discussion
In this paper, an interesting and novel functional semiparametric spatial autoregressive model is proposed. The model considers functional covariates based on semiparametric spatial autoregressive model. The slope function and nonparametric function are approximated by B-spline basis function. Then generalized method of moments is proposed to estimate parameters. Under mild conditions, we establish the asymptotic properties for proposed estimators.
In order to use our model in practical applications, firstly, response variable needs spatial dependence. Secondly, there are covariates with nonlinear effect and functional variables. A problem of practical interest is to extend our model to take into account functional covariates and single index function simultaneously. What's more, making a test about spatial dependence and nonlinear effect of covariates is an important issue. Those topics are left for future work.
Acknowledgments
We would like to thank the referees for their helpfull suggestions and comments which lead to the improvement of this article. Bai's work was supported by the National Natural Science Foundation of China (No.11771268).
Conflict of interest
No potential conflict of interest was reported by the authors.
A.
Appendix
Lemma 1. Assume condition C8 holds for g_0(u) and \beta_0(t) , there exits \boldsymbol{\gamma}_0 and \boldsymbol{\zeta}_0 such that
where \boldsymbol{\gamma} = (\gamma_{01}, ..., \gamma_{0N_1})' , \boldsymbol{\zeta} = (\zeta_{01}, ..., \zeta_{0N_2})' and c_1 > 0 , c_2 > 0 depend only on l_1 and l_2 , respectively.
Proof of Lemma 1. It can be followed by spline's approximation property ([4,16,34]).
Proof of Theorem 1. The proof is similar to Theorem 1 in [10] and we omit here.
Proof of Theorem 2. Let \delta = n^{-\frac{r}{2r+1}} , \mathit{\boldsymbol{T}}_1 = \delta^{-1}(\boldsymbol{\gamma}- \boldsymbol{\gamma}_0) , \mathit{\boldsymbol{T}}_2 = \delta^{-1}(\boldsymbol{\zeta}- \boldsymbol{\zeta}_0) and \mathit{\boldsymbol{T}} = (\mathit{\boldsymbol{T}}'_1, \mathit{\boldsymbol{T}}'_2)' . We then prove that for any given \epsilon > 0, there exits a sufficient large constant L = L_{\epsilon} such that
where \boldsymbol{\phi}_0 = (\boldsymbol{\gamma}'_0, \boldsymbol{\zeta}'_0)' , l(\boldsymbol{\gamma}, \boldsymbol{\zeta}) = \sum_{i = 1}^{n}(Y_i- {\bf{Q}}'_i\hat{ \boldsymbol{\eta}}- {\bf{D}}'_i \boldsymbol{\gamma}- \mathit{\boldsymbol{B}}'_2(u_i) \boldsymbol{\zeta})^2 . This implies that with the probability at least 1-\epsilon that there exits a local minimizer in the ball \{ \boldsymbol{\phi}_0+\delta \mathit{\boldsymbol{T}}:\| \mathit{\boldsymbol{T}}\|\leq L\} . By Taylor expansion and simple calculation, it holds that
where R_{1i} = \langle X_i(t), \beta_0(t)- \mathit{\boldsymbol{B}}'_1(t) \boldsymbol{\gamma}_0\rangle, R_{2i} = g_0(u_i)- \mathit{\boldsymbol{B}}'_2(u_i) \boldsymbol{\zeta}_0 , V_i = {\bf{D}}'_i \mathit{\boldsymbol{T}}_1+ \mathit{\boldsymbol{B}}'_2(u_i) \mathit{\boldsymbol{T}}_2 . By assumption C 6 , Lemmas 1 and 8 of Stone [30], we derive that \|R_{1i}\|\leq ck_1^{-r}, \|R_{2i}\| = O_p(k_2^{-r}) . Then by simple calculation, we obtain that
Similarly, it hold that \sum_{i = 1}^{n}\varepsilon_iV_i = O_p(\sqrt{n})\| \mathit{\boldsymbol{T}}\|, \sum_{i = 1}^{n}R_{2i}V_i = O_p(nk^{-r})\| \mathit{\boldsymbol{T}}\|, \sum_{i = 1}^{n}V^2_i = O_p(n)\| \mathit{\boldsymbol{T}}\|^2. Similar to the proof of Theorem 2 in Du et al. [10], we get (\boldsymbol{\eta}-\hat{ \boldsymbol{\eta}})' {\bf{Q}}' {\bf{Q}}(\boldsymbol{\eta}-\hat{ \boldsymbol{\eta}}) = O_p(1). Then it holds that \sum_{i = 1}^{n} {\bf{Q}}_i(\boldsymbol{\eta}-\hat{ \boldsymbol{\eta}})V_i = O_p(\sqrt{n})\| \mathit{\boldsymbol{T}}\|. Consequently, we show that A_1 = O_p(n\delta^2)\| \mathit{\boldsymbol{T}}\|, A_2 = O_p(n\delta^2)\| \mathit{\boldsymbol{T}}\|^2. Then through choosing a sufficiently large L, A_2 dominates A_1 uniformly in \| \mathit{\boldsymbol{T}}\| = L. Thus, there exits local minimizers \hat{ \boldsymbol{\gamma}}, \hat{ \boldsymbol{\zeta}} such that \| \boldsymbol{\gamma}-\hat{ \boldsymbol{\gamma}}\| = O_p(\delta), \| \boldsymbol{\zeta}-\hat{ \boldsymbol{\zeta}}\| = O_p(\delta).
Let R_{1k_1}(t) = \beta_0(t)- \mathit{\boldsymbol{B}}'_1(t) \boldsymbol{\gamma}_0 . Then we get
Since \| \boldsymbol{\gamma}-\hat{ \boldsymbol{\gamma}}\| = O_p(\delta) and \|\int_{0}^{1} \mathit{\boldsymbol{B}}_1(t) \mathit{\boldsymbol{B}}'_1(t)dt\| = O(1) , then we have
In addition, by Lemma1, it holds that \int_{0}^{1} R^2_{1k_1}(t)dt = O_p(\delta^2) . Thus, we obtain \|\hat{\beta}(t)-\beta_0(t)\|^2 = O_p(\delta^2) . Similarly, we get \|\hat{g}(u)-g_0(u)\|^2 = O_p(\delta^2) .
Proof of Theorem 3. The proof is similar to Theorem 3 in [10] and we omit here.
Proof of Theorem 4. By the definition of l(\boldsymbol{\gamma}, \boldsymbol{\zeta}) in the proof of Theorem 2, we have
where \tilde{e}_i = \varepsilon_i+R_{1i}+R_{2i}, R_{1i} = \langle X_i(t), \beta_0(t)- \mathit{\boldsymbol{B}}'_1(t) \boldsymbol{\gamma}_0\rangle, R_{2i} = g_0(u_i)- \mathit{\boldsymbol{B}}'_2(u_i) \boldsymbol{\zeta}_0. The remainder is o_p(1) because \frac{1}{n}\sum_{i = 1}^{n} {\bf{Q}}_i(\hat{ \boldsymbol{\eta}}- \boldsymbol{\eta}) = o_p(1) by Theorem 1. In addition, we have
It follows from (A.2) that
Let
By substituting (A.3) into (A.1), we obtain
Since \hat{\beta}(t)-\beta^*(t) = \mathit{\boldsymbol{B}}'_1(t)(\hat{ \boldsymbol{\gamma}}- \boldsymbol{\gamma}_0) and for any t\in(0, 1), as n\rightarrow \infty , by the law of large numbers, the slutsky's theorem and the property of multivariate normal distribution, we obtain that
where \Xi(t) = \lim_{n\rightarrow \infty}\frac{\sigma^2}{k_1} \mathit{\boldsymbol{B}}'_1(t)\Delta_n \mathit{\boldsymbol{B}}_1(t) . Similar arguments hold for \hat{g}(u) .