1.
Introduction
Spatial autoregressive model (SAR) and its derivatives have been widely used in many areas such as economics, political science, public health and so on. There are lots of literatures concerning about spatial autoregressive models such as Anselin [1], LeSage [19], Anselin and Bera [2], Lee and Yu [20], LeSage and Pace [21], Lee [22], Dai, Li and Tian [9]. In particular, Lee [22] utilised generalized method of moments to make inference about spatial autoregressive model. Xu and Li [25] investigated the instrumental variable (Ⅳ) and maximum likelihood estimators for spatial autoregressive model using a nonlinear transformation of dependent variable. However, spatial autoregressive model may not be flexible enough to capture nonlinear impact of some covariates since its parametric structure. In order to enrich model adaptability and flexibility, some semiparametric spatial autoregressive models have been proposed. For example, Su [31] studied a semiparametric SAR, which includes nonparametric covariates. Su and Jin [32] proposed partially linear SAR with both linear covariates and nonparametric explanatory variables. Sun et al. [33] studied a semiparametric spatial dynamic model with a profile likelihood approach. Wei and Sun [36] derived semiparametric generalized method of moments estimator. Hoshino [35] proposed a semiparametric series generalized method of moments estimator and established consistency and asymptotic normality of the proposed estimator.
However, with development of economic and scientific technology, huge amounts of data can be easily collected and stored. In particular, some types of data are observed in high dimensions and frequencies containing rich information. We usually call them functional data. When those types data are included in a model as covariates, it is common to use functional linear model (FLM). There exist vast literatures on estimation and prediction for FLM (See, for example, Reiss et al.[26], Ramsay and Dalzell [27], Delaigle and Hall [11], Aneiros-Pˊerez and Vieu [3]). Many methods were proposed to estimate the slop function such as Cardot et al. [5], Hall and Horowitz [14], Crambes et al. [8], Shin [28]. In particular, Hall and Horowitz [14] established minimax convergence rates of estimation. Cai and Hall [6] proposed functional principle components method and a reproducing kernel Hilbert space approach was used in Yuan and Cai [7].
In many applications of spatial data, there are often covariates with nonlinear effect and functional explanatory variables. This motivates us to propose an interesting and novel functional semiparametric spatial autoregressive model. The model is relatively flexible because it utilises functional linear model to deal with functional covariate and semiparametric SAR model to allow spatial dependence and nonlinear effect of scalar covariate. Recently, some models consider both functional covariates and spatial dependence. For instance, Pineda-Rˊios [24] proposed functional SAR model and used least squares and maximum likelihood methods to estimate parameters. The functional SAR model considers spatial effect for error term instead of spatial effect for response variable. Huang et al. [12] considered spatial functional linear model and they developed an estimation method based on maximum method and functional principle component analysis. Hu et al. [13] developed generalized methods of moments to estimate parameters in spatial functional linear model. In the paper, we proposed a generalized method of moments estimator which is heteroskedasticity robust and takes a closed-form written explicitly.
The rest of the paper is organized as follows. Section 2 introduces the proposed model and the estimation procedure. The asymptotic properties of the proposed estimators are established in Section 3. Section 4 conducts simulation studies to evaluate the empirical performance of the proposed estimators. Section 5 gives some discussions about the model. All technical proofs are provided in appendix.
2.
Model and estimation method
2.1. Model and notations
Consider the following novel functional semiparametric spatial autoregressive model,
where Y is a response variable, ρ is an unknown coefficient of the spatial neighboring effect, W is the constant spatial weight matrix with a zero diagonal, Z=(Z1,...,Zp)′ is a p-dimensional covariate and θ is its coefficient. {X(t):t∈[0,1]} is a zero-mean and second-order (i.e. E|X(t)|2<∞,∀t∈[0,1]) stochastic process defined on (Ω,B,P) with sample paths in L2[0,1], the Hilbert space containing integrable functions with inner product ⟨x,y⟩=∫10x(t)y(t)dt,∀x,y∈L2[0,1] with norm ‖x‖=⟨x,x⟩1/2. The slope function β(t) is a square integrable function on [0,1], U is a random variable, g(⋅) is an unknown function on its support [0,1] without loss of generality. We assume E[g(⋅)]=0 to ensure the identifiability of the nonparametric function. ε is a random error with zero mean and finite variance σ2, independent of Z,U and X(t).
Remark 1. The model (2.1) is more flexible to take different models. It generalizes both semiparametric spatial autoregressive model [32] and functional partial linear model [28] which correspond to the cases β(t)=0 and ρ=0, respectively. The model can be represented by Y=(I−ρW)−1∫10X(t)β(t)dt+(I−ρW)−1Z′θ+(I−ρW)−1g(U)+(I−ρW)−1ε. We assume I−ρW could be inverted to make the presentation valid. Thus Yi is also influenced by its neighbours' covariates Xj(t) as j≠i. Parameter ρ indicates the basic impact of the neighbours. Greater absolute value of ρ means that the response variable is more likely to be affected by its neighbours.
2.2. Estimation method
In the section, we give a method to estimate unknown parameter ρ and θ, slope function β(⋅) and nonparametric function g(⋅). We use B-spline basis function to approximate g(⋅) and β(⋅). Let 0=u0<u1...<uk1+1=1 be a partition of interval [0,1]. Using ui as knots, we have N1=k1+l1+1 normalized B-spline basis functions of order l1+1 that from a basis function for the linear spline space. Put the basis function as a vector B1(t)=(B11(t),...,B1N1(t))′ and then the slope function β(⋅) is approximated by B′1(⋅)γ. Similarly, let B2(u)=(B21(u),...,B2N2(u))′ be normalized B-spline basis function vector determined by k2 interior knots in [0,1] and the order l2+1 to approximate g(⋅), where N2=k2+l2+1. Then it follows that
where γ=(γ1,...,γN1)′ and ζ=(ζ1,...,ζN2)′.
Let D=⟨X(t),B1(t)⟩=(∫10X(t)B11(t)dt,...,∫10X(t)B1N1(t)dt)′,Di=⟨Xi(t),B1(t)⟩. Then the model can be rewritten as
Let P=Π(Π′Π)−1Π′ denote the projection matrix onto the space by Π, where Π=(D′,B′2(U))′. Similar to Zhang and Shen [39], profiling out the functional approximation, we obtain
Let Q=(WY,Z) and η=(ρ,θ′)′. Applying the two stage least squares procedure proposed by Kelejian and Prucha [17], we propose the following estimator
where M=H(H′H)−1H′ and H is matrices of instrumental variables. Moreover,
Consequently, we use ˆβ(t)=B′1(t)ˆγ,ˆg(u)=B′2(u)ˆζ as the estimator of β(t) and g(u).
For statistical inference based on ˆη, consistent estimators of the asymptotic covariance matrices are needed. Define the following estimator
and
where ‖⋅‖ is the L2 norm for a function or the Euclidean norm for a vector. In order to make statistical inference about σ2, it need to get the value ω=E[(ε21−σ2)2]. Therefore, we use the following estimator ˆω to estimate ω
Similar to Zhang and Shen [39], we use an analogous idea for the construction of instrument variables. In the first step, the following instrumental variables are obtained ˜H=(W(I−˜ρW)−1(Z,D′˜γ,B′2(U)˜ζ),Z,Π), where ˜ρ, ˜γ and ˜ζ are obtained by simply regressing Y on pseudo regressor variables WY,Π. In the second step, instrumental variables ˜H are used to obtain the estimators ˉη, ˉγ and ˉζ, which are necessary to construct the following instrumental variables H=(W(I−ˉρW)−1(Z′ˉθ+D′ˉγ+B′2(U)ˉζ),Z). Finally, we use the instrumental variables H to obtain the final estimators ˆρ and ˆθ.
3.
Asymptotic theory
In this section, we derive the asymptotic normality and rates of convergence for the estimators defined in previous section. Firstly, we introduce some notations. For convenience and simplicity, c denote a generic positive constant, which may take different values at different places. Let β0(⋅) and go(⋅) be the true value of function β(⋅) and g(⋅) respectively. K(t,s)=Cov(X(t),X(s)) denotes the covariance function of X(⋅). an∼bn means that an/bn is bounded away from zero and infinity as n→∞. In the paper, we make the following assumptions.
C1 The matrix I−ρW is nonsingular with |ρ|<1.
C2 The row and column sums of the matrices W and (I−ρW)−1 are bounded uniformly in absolute value for any |ρ|<1.
C3 For matrix S=W(I−ρW)−1, there exists a constant λc such that λcI−SS′ is positive semidefinite for all n.
C4 Matrix 1n˜Q′(I−P)M(I−P)˜Q→Σ in probability for some positive definite matrix, where ˜Q=(W(I−ρW)−1(Z′θ+∫10X(t)β(t)dt+g(U)),Z).
C5 For matrix ˜Q, there exsits a constant λ∗c such that λ∗cI−˜Q˜Q′ is positive semidefinite for all n.
C6 X(t) has a finite fourth moment, that is, E‖X(t)‖4≤c.
C7 K(t,s) is positive definite.
C8 The nonparametric function g(⋅) has bounded and continuous derivatives up to order r(≥2) and the slope function β(t)∈Cr[0,1].
C9 The density of U, fU(u), is bounded away from 0 and ∞ on [0,1]. Furthermore, we assume that fU(u) is continuously differentiable on [0,1].
C10 For the knots number kj,(j=1,2), it is assumed k1∼k2∼k.
Assumptions C1−C3 are required in the setting of spatial autoregressive model (see, for example, Lee [23], Kelejian and Prucha [18], Zhang and Shen [39]). They concern the restriction of spatial weight matrix and SAR parameter. Assumption C4 (see Du et al.[10]) is used to represent the asymptotic covariance matrix of ˆη. Moreover, assumption C4 requires implicitly that the generated regressors CZ and Z, deviated from their functional part of projection onto Π, are not asymptotically multicollinear. Assumption C5 is required to ensure the identifiability of parameter η. Assumptions C6−C7 are commonly assumed in functional linear model [14]. Assumption C6 is a mild restriction to prove the convergence of our estimator. Assumption C7 guarantees the identifiability of β(t). Assumption C8 ensures that β(⋅) and g(⋅) are sufficiently smoothed and can be approximated by basis functions in the spline space. Assumption C9 requires a bounded condition on the covariates. It is often assumed in asymptotic analysis of nonparametric regression problems (see, for example [15,37]). Assumption C10 is required to achieve the optimal convergence rate of ˆβ(⋅) and ˆg(⋅).
Let
where V=⟨X(t),β0(t)⟩. The following theorems state the asymptotic properties of the estimators for parameter and nonparametric function.
Theorem 1. Suppose assumptions C1-C10 hold, then
Theorem 2. Assumptions C1-C10 hold and k∼n12r+1, then
Remark 2. Theorem 2 gives the consistency of function estimators. The slope function estimator ˆβ(⋅) and nonparametric function estimator ˆg(⋅) have the same optimal global convergence rate established by Stone [29].
Theorem 3. Suppose assumptions C1-C10 hold, and E(|ε1|4+r)<∞ for some r>0, then
Remark 3. From the proof of Theorem 3, if trace(S)/n=o(1), it can be shown that
Theorem 4. Suppose assumptions C1-C10 hold and n/(k2r+11)=n/(k2r+12)=o(1), for any fixed points t,u∈(0,1), as n→∞,
where β∗(t)=B′1(t)γ0, g∗(u)=B′2(u)ζ0, Ξ(t)=limn→∞σ2k1B′1(t)ΔnB1(t), Λ(u)=limn→∞σ2k2B′2(u)ΩnB2(u), γ0 and ζ0 are defined in Lemma 1 of appendix.
Remark 4. The above conclusions is similar to those of Yu et al. [38], which gave the asymptotic normality for spline estimators in single-index partial functional linear regression model. Note that ˆβ(t)−β0(t)=(β∗(t)−β0(t))+(ˆβ(t)−β∗(t)). We obtain that β∗(t)−β0(t)=O(k−r1) by Lemma 1 on Appendix and ˆβ(t)−β∗(t) dominates β∗(t)−β0(t). Therefore we can use the asymptotic behaviors of ˆβ(t)−β∗(t) to describe the asymptotic behaviors of ˆβ(t)−β0(t).
The variance Ξ(t) and Λ(u) are involved in basic function and knots. Different basis functions and knots can get different variance estimators. Moreover, the variance expression contains unknown quantities. Replacing them by consistent estimators can lead to approximation errors. What's more, there may exist heteroscedasticity in error term and then the estimator ˆσ2 is not consistent. Consequently, we propose the following residual-based method to construct piecewise confidence interval in practice.
It is crucial that spatial structure must be preserved during data resampling in models with spatial dependence [1]. Therefore, we employ the residual-based bootstrap procedure to derive the empirical pointwise standard error of ˆβ(t) and ˆg(⋅). The procedure can be described as follows:
(1) Based on the data sets {Y,Z,X(t),U} and spatial matrix W, one fits the proposed model and obtain the residual vector ˆε1=(ˆε11,...,ˆεn1)′. Then, we derive the centralized residual vector ˆε.
(2) Draw a bootstrap sample ˆε∗ with replacement from the empirical distribution function of ˆε and generate Y∗=(I−ˆρW)−1(Z′θ+D′ˆγ+B′2(U)ˆζ+ˆε∗).
(3) Based on the new data sets {Y∗,Z,X(t),U} and spatial matrix W, we fit the proposed model again to derive the estimator ˆβ∗(t) and ˆg∗(u). Repeat the process many times. Thus, for given t and u, calculate empirical variance of ˆβ∗(t) and ˆg∗(u) respectively. Consequently, we use the empirical variance to construct its confidence interval.
4.
Simulation studies
In this section, we use simulation examples to study the properties of the proposed estimators. The data is generated from the following model:
where ρ=0.5, β(t)=√2sin(πt/2)+3√2sin(3πt/2) and X(t)=∑50j=1γjϕj(t), where γj is distributed as independent normal with mean 0 and variance λj=((j−0.5)π)−2 and ϕj(t)=√2sin((j−0.5)πt). Zi1 and Zi2 are independent and follow standard normal distribution, θ1=θ2=1, Ui∼U(0,1), g(u)=sin(π(u−A)C−A),A=√32−1.654√12,C=√32+1.654√12. The spatial weight matrix W=(wij)n×n is generated based on mechanism that wij=0.3|i−j|I(i≠j),1≤i,j≤n with wii=0,i=1,...,n. A standardized transformation is used to convert the matrix W to have row-sums of unit. We set the following three kinds of error term: (1) εi∼N(0,σ2); (2) εi∼0.75t(3); (3) εi∼(1+0.5Ui)N(0,σ2), where σ2=1. In order to compare the different situations for magnitudes of ρ, we set ρ={0.2,0.7} with error term N(0,σ2). Simulation results are derived based on 1000 replications.
To achieve good numerical performances, the order l1 and l2 of splines and the number of interior knots k1 and k2 should be chosen. To reduce the burden of computation, we use the cubic B-spline with four evenly distributed knots (i.e., k1=k2=2) for slope function β(⋅) and nonparametric function g(⋅) respectively. These choices of k1 and k2 are small enough to avoid overfitting in typical problem with sample size not too small and big enough to flexibly approximate many smooth function. We use the square root of average square errors (RASE) to assess the performance of estimators ˆβ(⋅) and ˆg(⋅) respectively
where {ti,i=1,...,n1}, {ui,i=1,...,n2} and n1=n2=200 are grid points chosen equally spaced in the domain of β(⋅) and g(⋅) respectively.
Tables 1–3 show simulation results with different kinds of error terms. Table 4 presents different magnitudes of ρ with error term N(0,1). They show the bias (Bias), standard deviation (SD), standard error (SE) and coverage probability (CP) with nominal level of 95% for estimator and the mean and standard deviation (SD) of RASEj(j=1,2) for ˆβ(⋅) and ˆg(⋅). The simulation results can be summarized as follows:
(1) The estimators ˆρ,ˆθ1,ˆθ2,ˆσ2 are approximately unbiased and the estimated standard errors are close to sample standard deviations in normal error distribution. The empirical coverage probabilities approximate the nominal level of 95% well.
(2) Figure 1 gives an example of the estimated function curve ˆβ(⋅) and ˆg(⋅) and its empirical 95% confidence interval with sample size n=300 for error term N(0,1). From the mean and standard deviation (SD) of RASEj(j=1,2), combined with Figure 1, we conclude that the proposed function estimators ˆβ(⋅) and ˆg(⋅) perform well.
(3) For error term 0.75t(3) and (1+0.5Ui)N(0,1), the estimators ˆρ,ˆθ1,ˆθ2 are approximately unbiased and the estimated standard errors are close to sample standard deviations. In addition, the mean and standard deviation for RASE of estimated coefficient function ˆβ(⋅) and ˆg(⋅) are decreasing. It indicates that parametric and non parametric estimators perform well in non-normal error term.
(4) From Table 1 and Table 4, as basic spatial effect ρ increases, the SE and SD of ˆρ decrease. For the different magnitudes of ρ, the Bias and SD of parametric estimators for ˆθ1 and ˆθ2, and the mean of RASE for ˆβ(⋅) and ˆg(⋅) remain stable. It means that the magnitudes of ρ do not affect the other parametric and nonparametric estimators.
5.
Discussion
In this paper, an interesting and novel functional semiparametric spatial autoregressive model is proposed. The model considers functional covariates based on semiparametric spatial autoregressive model. The slope function and nonparametric function are approximated by B-spline basis function. Then generalized method of moments is proposed to estimate parameters. Under mild conditions, we establish the asymptotic properties for proposed estimators.
In order to use our model in practical applications, firstly, response variable needs spatial dependence. Secondly, there are covariates with nonlinear effect and functional variables. A problem of practical interest is to extend our model to take into account functional covariates and single index function simultaneously. What's more, making a test about spatial dependence and nonlinear effect of covariates is an important issue. Those topics are left for future work.
Acknowledgments
We would like to thank the referees for their helpfull suggestions and comments which lead to the improvement of this article. Bai's work was supported by the National Natural Science Foundation of China (No.11771268).
Conflict of interest
No potential conflict of interest was reported by the authors.
A.
Appendix
Lemma 1. Assume condition C8 holds for g0(u) and β0(t), there exits γ0 and ζ0 such that
where γ=(γ01,...,γ0N1)′, ζ=(ζ01,...,ζ0N2)′ and c1>0, c2>0 depend only on l1 and l2, respectively.
Proof of Lemma 1. It can be followed by spline's approximation property ([4,16,34]).
Proof of Theorem 1. The proof is similar to Theorem 1 in [10] and we omit here.
Proof of Theorem 2. Let δ=n−r2r+1, T1=δ−1(γ−γ0), T2=δ−1(ζ−ζ0) and T=(T′1,T′2)′. We then prove that for any given ϵ>0, there exits a sufficient large constant L=Lϵ such that
where ϕ0=(γ′0,ζ′0)′, l(γ,ζ)=∑ni=1(Yi−Q′iˆη−D′iγ−B′2(ui)ζ)2. This implies that with the probability at least 1−ϵ that there exits a local minimizer in the ball {ϕ0+δT:‖T‖≤L}. By Taylor expansion and simple calculation, it holds that
where R1i=⟨Xi(t),β0(t)−B′1(t)γ0⟩,R2i=g0(ui)−B′2(ui)ζ0, Vi=D′iT1+B′2(ui)T2. By assumption C6, Lemmas 1 and 8 of Stone [30], we derive that ‖R1i‖≤ck−r1,‖R2i‖=Op(k−r2). Then by simple calculation, we obtain that
Similarly, it hold that ∑ni=1εiVi=Op(√n)‖T‖,∑ni=1R2iVi=Op(nk−r)‖T‖,∑ni=1V2i=Op(n)‖T‖2. Similar to the proof of Theorem 2 in Du et al. [10], we get (η−ˆη)′Q′Q(η−ˆη)=Op(1). Then it holds that ∑ni=1Qi(η−ˆη)Vi=Op(√n)‖T‖. Consequently, we show that A1=Op(nδ2)‖T‖, A2=Op(nδ2)‖T‖2. Then through choosing a sufficiently large L, A2 dominates A1 uniformly in ‖T‖=L. Thus, there exits local minimizers ˆγ,ˆζ such that ‖γ−ˆγ‖=Op(δ),‖ζ−ˆζ‖=Op(δ).
Let R1k1(t)=β0(t)−B′1(t)γ0. Then we get
Since ‖γ−ˆγ‖=Op(δ) and ‖∫10B1(t)B′1(t)dt‖=O(1), then we have
In addition, by Lemma1, it holds that ∫10R21k1(t)dt=Op(δ2). Thus, we obtain ‖ˆβ(t)−β0(t)‖2=Op(δ2). Similarly, we get ‖ˆg(u)−g0(u)‖2=Op(δ2).
Proof of Theorem 3. The proof is similar to Theorem 3 in [10] and we omit here.
Proof of Theorem 4. By the definition of l(γ,ζ) in the proof of Theorem 2, we have
where ˜ei=εi+R1i+R2i,R1i=⟨Xi(t),β0(t)−B′1(t)γ0⟩,R2i=g0(ui)−B′2(ui)ζ0. The remainder is op(1) because 1n∑ni=1Qi(ˆη−η)=op(1) by Theorem 1. In addition, we have
It follows from (A.2) that
Let
By substituting (A.3) into (A.1), we obtain
Since ˆβ(t)−β∗(t)=B′1(t)(ˆγ−γ0) and for any t∈(0,1), as n→∞, by the law of large numbers, the slutsky's theorem and the property of multivariate normal distribution, we obtain that
where Ξ(t)=limn→∞σ2k1B′1(t)ΔnB1(t). Similar arguments hold for ˆg(u).