1.
Introduction
In empirical practice of economics and statistics, possible dependence across spatial units is a relevant issue in urban, real estate, regional, public, agricultural, environmental economics, and industrial organization [1]. Many studies have proposed approaches and techniques to deal with these problems by imposing some structures on the model in order to capture spatial dependence or spatial autocorrelation in cross-sectional and/or panel data. In recent years, spatial autoregressive (SAR) models have attracted great interest due to their usefulness in practice. For instance, SAR models have the property of using autocorrelation structures according to spatial patterns on data. These models allow for the analysis of spatial dependencies and interactions among observations, making them particularly useful in fields such as geography, environmental science, and urban planning where spatial relationships play crucial roles.
Lee [2] explained that ordinary least squares (OLS) estimators can be inconsistent for estimating an SAR model due to the fact that the spatial lagged variables can be correlated with the disturbances in a spatial model. In order to obtain a consistent estimator, Lee [1] proposed the quasi-maximum likelihood estimator (QMLE) for the SAR model, investigated the asymptotic properties of QMLE, and found out that the rates of convergence of QMLE may depend on some general features of the spatial weight matrix of the model. However, the quasi-maximum likelihood procedures often pose computational challenges when dealing with large sample sizes, as analytical solutions are not available and numerical methods must be employed to obtain numerical solutions. Kelejian and Prucha [3] proposed the generalized spatial two-stage least squares procedure for estimating the SAR model in which they demonstrated consistency and asymptotic normality of the resulting estimator. However, the estimator is not asymptotically effective. Lee [4] proposed best spatial two stage least squares estimators for the SAR model to improve the efficiency of the estimator. Su [5] extended the SAR to a semi-parametric setting and proposed the GMM estimation method, and showed that the resulting estimator is consistent and asymptotically normal. Xu and Lee [6] proposed the maximum likelihood estimator (MLE) for the SAR Tobit model and studied the asymptotic properties.
Recently, there has been a growing literature on semiparametric spatial autoregressive models. Invoking the flexibility of partial linear model, Su and Jin [7] developed asymptotic theories for the QMLE of partial linear SAR models based on the profile quasi-maximum likelihood theme. In Su and Jin [7], the rate of convergence of the spatial parameter estimator depends on some general features of the spatial weight matrix of the model. Du et al. [8] proposed the partial linear additive SAR model. Obviously, the model proposed by Du et al. [8] did not impose any interaction within nonparametric components in spatial data. Therefore, the proposed method by Du et al. [8] cannot solve the problem of interaction effects on response variables in spatial data. Moreover, the nonparametric components are subject to the curse of dimensionality and can only accommodate low-dimensional covariates; see Wang [9]. The single-index model is an important tool in multivariate nonparametric regression. By reducing the dimensionality from multivariate predictors to a univariate index, single-index models avoid the so-called "curse of dimensionality" while still capturing important features in high-dimensional data. Cheng and Chen [10] introduced the partially linear single-index spatial autoregressive model (PLSISARM), employing local linear approximation to estimate unknown link functions and utilizing MLE. In contrast to the partial linear SAR model and the partial linear additive SAR model, PLSISARM considers covariate interactions within its nonparametric component. This paper proposes a new estimation procedure for PLSISARM. Specifically, we utilize B-splines to approximate the unknown link function, enhancing smoothness, and adopt QMLE as the estimation method. We also introduce a feasible Nelder-Mead iteration algorithm tailored for this new estimation procedure. Furthermore, we establish the asymptotic properties for the QMLE. Finally, we demonstrate the application of PLSISARM on two distinct datasets from meteorology and economics, thereby extending the practical utility of the model. Compared to Cheng and Chen's work [10], B-spline approximation is computationally more efficient and smoother than local linear approximation, facilitating easier interpretation in practical applications. Additionally, this paper employs QMLE instead of MLE, resulting in differences in asymptotic distributions. Notably, when the error term does not satisfy the normal distribution, QMLE can still estimate the asymptotic variance, while MLE fails to do so.
This paper is organized as follows. In Section 2, we describe the technical setting and propose the estimation procedure for PLSISARM via QMLE. In Section 3, we establish the asymptotic properties for the proposed procedure. In Section 4, we present the simulation studies to assess the finite sample performance of the proposed procedure. In Section 5, we analyze two real data sets to illustrate the practical usefulness of the proposed model. In Section 6, we conclude the paper. The appendix provides the proofs of the main results.
2.
The model and estimation procedures
We consider the partially linear single-index spatial autoregressive model (PLSISARM)
where N is the total number of spatial units; YN is the N-dimensional vector of observations of the dependent variable; WN is an N×N specified constant spatial weight matrix with zero diagonal elements, the weights obtained based on physical distance, social networks, or "economic" distance; XN=(x1,x2,…,xN)T and ZN=(z1,z2,…,zN)T are the N×p and N×d matrices of regressors, respectively; g is an unknown differentiable function, the nonparametric components assume that the influence of the covariates z can be collapsed to a single index, zTα, via a nonparametric link function g; α∈Rd,‖α‖=1, and the first nonzero element of α is positive for identifiability; and VN=(ϵ1,ϵ2,…,ϵN)T is an N-dimensional vector of i.i.d. disturbances with zero mean and finite variance σ2.
Remark 1: PLSISARM is flexible enough to cover a variety of situations. When λ=0, β=0, and d=1, the proposed model is reduced to the partial linear single-index model, the single-index spatial autoregressive model, and the partial linear spatial autoregressive model, respectively.
In order to obtain the estimator of the parameters θ=(σ2,λ,αT,βT)T and the function g(⋅), similar to Lee [1] and Su and Jin [7], we maximize the following log quasi-likelihood function to obtain the estimator
where SN(λ)=IN−λWN, and VN(θ)=YN−λWNYN−XNβ−g(ZNα). Since the infinite parameter g(⋅) in the objective function LN(θ) is unknown, we are unable to maximize LN(θ) to obtain the estimator of θ directly. The B-spline method has been widely used in estimation for nonparametric models and semi-parametric models [11]. B-splines also have numerous precedences for their application in semi-parametric spatial autoregressive models, as demonstrated in Zhang et al. [12], Du et al. [8], and Tian et al. [13]. In this paper, we use B-splines to approximate the nonparametric function g(t) due to the computational advantages. Let B(t)=(B1(t),...,BKN+l+1(t))T be a set of B-spline basis functions of order l+1 with knots a=t0<t1<…<tkN<tKN+1=b for a<b∈R. Therefore, approximating g(t) by a linear combination of normalized B-spline basis functions, it has
where γ=(γ1,…,γ(KN+l+1))T is the unknown spline coefficient vector. Based on the above B-spline approximation, model (2.1) can be rewritten as
where Π(ZNα)=(B(zT1α),…,B(zTNα))T.
Given α, apply the QMLE method [1] to model (2.2) to obtain the estimator of δ=(σ2,λ,βT,γT)T, denoted as ˆδ(α)=(σ2(α),ˆλ(α),ˆβ(α)T,ˆγ(α)T)T. Next, we consider the estimation of the parameter α. Substitute ˆδ(α) back into model (2.2). Then we have
Minimize the following objective function to achieve the estimator of α:
Once we get the estimator ˆα, we can further obtain ˆδ(ˆα)=(ˆσ2(ˆα),ˆλ(ˆα),ˆβ(ˆα)T,ˆγ(ˆα)T)T. Then, we can achieve the estimator of g(⋅) as ˆg=B(zTˆα)Tˆγ.
To elaborate the ideas of the estimation methodology, we detail the algorithm as follows.
Algorithm for stage one: Calculate the initial value of the parameters θ=(σ2,λ,βT,αT)T and γ.
Step 1. Calculate the initial value of α for iteration.
Consider the following model:
Apply the QMLE of Lee (2004) to model (2.4) to get the estimator of the parameters. Denote the initial value of α as ˆα(0). Normalize ˆα(0) so that ‖ˆα(0)‖=1 and the first element of ˆα(0), ˆα(0)i>0.
Step 2. Substitute ˆα(0) into model (2.2) and calculate the QMLE for the parameters in the following model:
Let the resulting estimators ˆλ(0), ˆβ(0) and ˆγ(0) be the initial values of λ, β and γ, respectively.
Algorithm for stage two: The iteration procedure.
Step 3. Update the estimation for α.
Substitute the estimators ˆλ(k), ˆβ(k), and ˆγ(k) into model (2.2) to obtain the following model:
Then, minimize the following objective function to get the update estimation of α, ˆα(k+1),
Normalize ˆα(k+1) so that ‖ˆα(k+1)‖=1 and the first element of ˆα(k+1), ˆα(k+1)i>0. The optimization for Q(α) is completed by the Nelder-Mead method of the "optim" function in R.
Step 4. Update the estimation for σ2,λ, β, and γ.
Substitute ˆα(k+1) into model (2.2) and calculate the QMLE for the parameters in the following model to get the update estimators ˆσ2(k+1)ˆλ(k+1), ˆβ(k+1), and ˆγ(k+1):
Repeat Step 3 and Step 4 until the estimator converges, which is
where C is a predetermined constant which is C=10−4 in the simulation studies. Thus, the estimator for g(⋅) is B(zTˆα(k+1))Tˆγ(k+1).
Selection of knots: It is very important to select the knots (including the number of knots and the position of knots for B-spline estimation) in this paper. We use the BIC criterion to select the number of knots. Given the number of knots, it is suggested that the position of the knots could be equally spaced or equally spaced sample quantiles of the predictor variables. The order of splines is not necessarily better when it is higher. To avoid overfitting, lower-order splines, such as quadratic and cubic splines, are typically chosen. However, the approximation accuracy also needs to be considered, so cubic splines are often selected in practice, which is also a common choice in literature [8,13]. In this paper, we use cubic splines with equally spaced sample quantiles of the predictor variable knots to estimate the B-spline basis.
3.
Asymptotic properties
For the purpose of statistical inference, the large-sample properties of the estimates need to be established. Let M(λ)=I−λWN, and for any matrix M, denote the projection matrix PM=I−M(MTM)−1MT. The log likelihood function of model (2.2) is
where VN(θ,γ)=MN(λ)YN−XNβ−Π(ZNα)γ. Then, we concentrate out γ, and, when it does not cause ambiguity, denote Π(ZNα) simply as Π. The concentrated log likelihood function of θ is
Moreover, let η=(λ,α). Given η, one can derive the solutions for β and σ2 from (3.2), which are
and
By concentrating β and σ2 from (3.2), we derive the concentrated likelihood
Let θ0=(σ20,ηT0,βT0)T with η0=(λ0,αT0)T be the true parameters. Define G(λ)=WNM−1(λ), M0=M(λ0),G0=WNM−10, and A0=(g(zT1α0),…,g(zTNα0))T. We then rewrite the model as
which is derived from the equation M−10=I+λ0G0. Denote Q(η)=max. Then, the solution of this problem is
and
where . Consequently,
Let , and . Before establishing the asymptotic properties of the estimator, we need to make the following assumptions:
A1. (i) The disturbance is i.i.d. with 0 mean and variance, and the moment exists for some .
(ii) The regressors and are i.i.d. non-stochastic sequences and have bounded support sets on and , respectively.
(iii) The distribution of is absolutely continuous and its density is bounded away from zero and infinity on .
(iv) The real-valued function , where .
A2. (i) The matrix is nonsingular for all .
(ii) The entries of satisfies that , , and for all with .
(iii) The matrices and are uniformly bounded in both row and column sums. The matrix is uniformly bounded in both row and column sums, uniformly in in a compact parameter space . The true is an interior point of .
A3. (i) Let be the interior knots of and the bound points of . Letting , there exists a constant such that and .
(ii) The knots number is assumed to satisfy .
A4. The limit exists and is nonsingular, where and is the first derivative of .
Assumption A1 outlines the essential features of the disturbances, regressors, and function for the model. Assumption A2 is parallel to Assumption 2, Assumption 5, and Assumption 7 of Lee [1], which characterize the spatial weight matrix . Assumption A3 constrains the characteristics of the knots in B-splines, in parallel with the work of Du et al. [8] and Tian et al. [13]. Assumption A4 ensures the existence of the limiting distribution of the parameters. Subsequently, we present the following main results.
Theorem 1. Suppose that Assumptions A1–A4 hold. Then, is globally identifiable and is consistent for .
Theorem 2. Suppose that Assumptions A1–A4 hold. Then, , where "" denotes convergence in probability.
Theorem 3. Suppose that Assumptions A1–A4 hold. Then, we have , where "" denotes convergence in distribution, and
which are defined in the appendix. Additionally, if follows , then .
The proofs of all theorems can be found in the Appendix. Theorem 1 and Theorem 2 guarantee the consistency of the parameter and the link function , respectively, while Theorem 3 provides the asymptotic distribution of , which can be used for statistical inference. By substituting the corresponding parts in and with estimators, we can obtain the sample analog of the asymptotic variance.
4.
Simulation studies
In this section, we investigate the finite sample performance of the proposed estimation procedures via Monte Carlo simulation studies. The data is generated from the following model:
where and , with and values generated from , ; and , with , , and values generated from , ; ; ; and . In the simulation, similar to Carroll et al. [14] and Yu et al. [15], suppose , where , . We focus on the spatial scenario in Case [16] where there are districts and each district contains members. The sample size is . The weight matrix is , where , is the -dimensional column vector of ones, and is the Kronecker product. The numerical study examined different values of from , and , and from , and . For comparison, three different values are considered for , that is, , which represents from weak to strong spatial dependence of the responses. represents weak spatial dependence, represents mild spatial dependence, and represents relatively strong spatial dependence.
We use the empirical bias (denoted as Bias) and the empirical standard deviation (denoted as SD) to measure the effectiveness of the parameter estimates. Additionally, we have calculated the asymptotic standard deviation (denoted as AsySD) of the parameter estimates to verify the validity of the asymptotic theory. To assess the accuracy of the proposed nonparametric estimator, we choose the integrated mean squared error as the criterion for judgment, and its empirical version is defined as
where is the number of grid points on .
Tables 1–9 show the simulation results based on 1000 replications. We can observe the following: (i) As the sample size increases, the SD of the parameter estimators (except ) and the of decreases, with the bias remaining near 0, suggesting the consistency of the estimators. (ii) The performance of improves as increases, but it does not necessarily improve as increases. This phenomenon is expected, as only increasing can violate Assumption A2, leading to the unavailability of asymptotic properties. (iii) The performance of improves as increases. This can be understood through the knowledge obtained from the asymptotic variance, as the asymptotic variance of strongly depends on . (iv) The SD of the estimator for finite-dimensional parameters is positively correlated with . (v) The empirical standard deviation of the parameter estimates is very close to the asymptotic standard deviation, validating the asymptotic theory presented in Theorem 3. The performance of the nonparametric estimates for is demonstrated in Figure 1. The true function and the mean of each estimated over the 1000 replicates are plotted. We only display the figure for and , as the plots for other occasions are similar. To save spaces, we omit them.
5.
Real data analysis
In this section, we illustrate the usefulness of the proposed estimation method via analysis of real data, especially for the dependence across spatial units.
5.1. China air quality data
Air quality is gaining more and more attention in China and is closely related to people's daily life. Wu and Kuo [17] utilized statistical models to analyze air quality in Taiwan, China. Chen et al. [18] employed machine learning methods to predict air quality in Hangzhou, China, highlighting the need to consider the impact of neighboring areas on air quality. The factors affecting air quality are complex and can be divided into several categories, such as urban characteristics, meteorological factors, emission factors, and so on. According to the fluidity of the air, the air quality of the core city is related to the surrounding cities and the degree of influence is related to the spatial distance. Therefore, we select the air quality data of cities in China to illustrate the usefulness of the proposed method in Section 2.
The data set contains information of 28 different capital cities in China including air quality index, urban characteristics, meteorological factors, and emission factors from January 1st to April 30th, 2014. There are 13 variables (see Table 10).
Air quality index (AQI) is the dependent variable. Urban permanent population (PEOPLE), green area (GREEN), civil vehicle ownership (CAR), industrial sulfur dioxide emissions (SO2), industrial soot emissions (YANCHEN), temperature (TEMP), wind (WIND), rainfall (RAIN), building construction area (BUILDING), and the heating problem in northern China (GN) are the independent variables. The model considers the influence of the spatial structure factors of the city on the air quality. According to the research background, we consider the variables log(RAIN), log(PEOPLE), log(CAR), log(BUILDING), GN as the linear independent variables and the variables TEMP, WIND, log(SO2), log(YANCHEN), log(GREEN) as the non-parametric independent variables.
Therefore, we consider fitting the data via the following model:
where is AQI; are log(RAIN), log(PEOPLE), log(CAR), log(BUILDING), GN, respectively; and are TEMP, WIND, log(SO2), log(YANCHEN), log(GREEN), respectively. Logarithmic transformation for variables is taken to ease off the trouble caused by big gaps in the domain [19].
According to the practice in Du et al. [8], Su and Yang [20], and Xie et al. [21], let the weight matrix be
where represents the spatial distance of the th and th units, the Euclidean distance is calculated in terms of the longitude and latitude coordinates of any two cities, and is the threshold distance chosen to median of . We choose the weight matrix in this form as the dependent variable AQI is spatially correlated. That is, when the distance between two cities is far enough, there is tiny spatial relation between them, and a threshold distance is chosen to set the tiny weight to be zero to alleviate the potential effect of tiny weights on the data analysis. Furthermore, we normalize the weight matrix so that the sum of rows of is equal to 1, i.e., . Obviously, the spatial weight matrix is exogenous. We use the proposed method in Section 2 to analyze the processed data. The analysis results for the estimation of unknown parametric coefficients are listed in Table 11. For the nonparametric component, the univariate function estimates are displayed in Figure 2.
From the analysis results, the estimation of the spatial parameter is , which indicates a substantial spatial relationship exists among the AQI of the cities in China. The regression coefficients of are positive, implying that the factors of rainfall, urban resident population, civil vehicle ownership/vehicle, building construction area, and the heating period have positive effects on AQI of the cites. The results are generally consistent with our judgment and expectations. As population grows, resource consumption increases, resulting in air pollution. Increased car ownership and construction area lead to increased emissions of pollutants. Increased and large amounts of pollutants during the heating period in the northern area lead to a drop of air quality.
From Figure 2, in the nonparametric component of the model, we observe that the single-index has an "S" shape nonlinear effect on the AQI of the city. It indicates that there exists interaction among the covariates in the nonparametric components. From the results of the single-index coefficients of , the variables temperature, wind, industrial sulfur dioxide emissions, industrial soot emissions, and green area have non-linear effects on the AQI. The single-index coefficients of temperature, industrial sulfur dioxide emissions, industrial soot emissions, green area are positive, and the single-index coefficient of wind are negative. The results are generally consistent with our judgment and expectations. Increased wind power helps the diffusion of pollutants in the air, which improves air quality and leads to a decline of AQI. Increased temperature is not conducive to the decomposition of air pollutants. Increased industrial sulfur dioxide and industrial soot emissions lead to an increase of air pollutants.
However, more rainfall is conducive to reducing the presence of pollutants in the air and increased green plants are conducive to absorption and degradation of air pollutants, which contradicts our data analysis results (). The model has not undergone variable selection, therefore significance cannot be solely determined by coefficient values. The abnormality of these two coefficients may indicate that they are statistically insignificant.
5.2. Rural household income data
The development of the rural economy is significant to the overall economic development in China. Recently, it is facing many problems, such as large populations, low incomes and increasing difficulty in employment. Among them, rural household income is affected by multiple factors. We take the rural household income data of Fuping County in China as an example, including 209 villages, to analyze the factors of rural household income by the proposed method in Section 1. In this model, the dependent variable is the per capita net income of farmers, and the independent variables are 7 factors, including rural natural conditions, village industries, and labor skills, which have great impact on rural household income (see Table 12).
We consider the variables agNum and farm as the linear independent variables because they represent agriculture-related productivity and means of production in economics, which signify the scale of agriculture. We anticipate that 'income' will exhibit a linear relationship with respect to these two factors. The other variables urNum, agFa, area, indusNum, workNum are the non-parametric independent variables. Next, we fit the data via the following model:
where is income; are agNum and farm, respectively; and are urNum, agFa, area, indusNum, and workNum, respectively.
Obtained from spatial geographic information, we can get the neighboring situation of the villages. Therefore,
where represents adjacent of the th and th villages; when two villages are not adjacent, there is tiny spatial relation between them, then . As the response variable could be spatially correlated, we choose the weight matrix in this form. Furthermore, we normalize the weight matrix so that the sum of rows of is equal to 1, i.e., . Obviously, the spatial weight matrix is exogenous. The analysis results for the estimation of unknown parametric coefficients are listed in Table 13. For the nonparametric components, the univariate function estimate is displayed in Figure 3. According to the result, the estimation of the spatial parameter is indicates a substantial spatial relationship exists in the model. The regression coefficients of and are negative, which implies that the amount of labor engaging in agriculture and farmland would not increase rural household income.
From Figure 3, in the nonparametric parts of the model, we observe that single-index has a nonlinear effect on rural household income. There exists interaction among the covariates in the nonparametric components. The single-index coefficients of (variables urNum, agFa, area, indusNum, workNum) have significant non-linear effects on rural household income.
Further research is needed to develop a model structure selection method for partially linear single-index spatial autoregressive models. Variable selection decides which covariates have linear effects, and which have nonlinear effects for data analysis.
6.
Conclusions
In this paper, B-spline approximation is employed for the link function in the partially linear single-index spatial autoregressive model, offering a smoother and more interpretable alternative compared to local linear approximation. A feasible Nelder-Mead iteration algorithm tailored for this model is proposed to solve the QMLE. The parameter estimation and function estimation have been proven to be consistent, and the asymptotic distribution of parameter estimates for non-normal, general error terms is provided. A series of Monte Carlo simulations are performed to verify the effectiveness of the proposed model and estimation method. Finally, the model and method are applied to air quality datasets and rural income datasets, thereby extending the application scope of the model.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
This research was funded by the grants from the Key Projects of the National Natural Science Foundation of China (Grant No.: 11731101).
Conflict of interest
The authors declare there is no conflicts of interest.
Appendix
Lemma 1. Suppose that assumptions A1-A4 hold. Then, , for .
Proof. According to Corollary 6.21 in Schumaker [22], we find that . Using this, similar to Lemma 1 of Tian et al. [13], it suffices to show . □
Lemma 2. Suppose that assumptions A1-A4 hold. Then, .
Proof. Given that the eigenvalues of a projection matrix consist solely of 0s and 1s, with the number of 1s equaling the rank of the matrix, and recognizing that the trace of a matrix is equivalent to the sum of its eigenvalues, we can readily prove the lemma. □
Proof of Theorem 1. First, we prove that . Similar to Su and Jin [7] and Cheng and Chen [10], it suffices to show that
and
where is the complement of an open neighborhood of with diameter .
To prove (A1), it is sufficient to show that
From (3.4), we have
Then,
Similar to the proof of Theorem 1 in Tian et al. [13], by using Lemma 1, it is easy to prove that , and such that (A1) holds. Furthermore, referring to Tian et al. [13], based on Lemma 1, we can also prove the validity of (A2). The primary difference lies in the use of an unknown single-index function in our cross-sectional data context, whereas Tian et al. [13] employed a q-dimensional unknown function in their panel data setting.
Next, we prove that . Given that we now have and , due to the continuity of in terms of , it follows that . Recall that
The second equality in the above equation relies on the fact that , and the projection matrix eliminates the term . Then, by Lemma 1 and Lemma 2, we have
and
where is the diagonal element of . Then, we have . The proof of consistency for is analogous to that of : it suffices to demonstrate that its expectation converges to and its variance converges to 0. Further elaboration on this point is omitted here for brevity. □
Proof of Theorem 2. According to Theorem 1, we have . Then,
where . Similar to Theorem 3 of Tian et al. [13], we can prove all the components above are , which indicates .
According to de Boor [11], similar to Theorem 3 in Tian et al. [13], B-splines have the following properties: for , and
where and are positive constants. With Cororllary 6.21 of Schumaker [22], we have . Then,
so . □
Proof of Theorem 3. By Theorem 2 of Lee [1] and Theorem 3 of Tian et al. [13], the asymptotic normality of the parameter can be derived from the Taylor expansion of . The first-order Taylor expansion at is
where lies between and . Then,
The proof of the theorem can be accomplished using Slusky's theorem, if it holds that
and
Recall and . Comining Lemma 1 and Assumption A1, we have
and
Notice that the above equation is linear, quadratic, or cubic in terms of , except for terms involving . Combining , by continuity we can obtain
if . To show this, by the mean value theorem we have
where lies between and . According to Assumption A2, it easy to know that , so (A12) is proved. To show
we have
which is the result of Lemma 2. The existence of is guaranteed by Assumptions A2 and A4. According to the law of large numbers, (A14) holds, and thus (A9) holds.
It is easy to see that the terms of (A10) are linear or quadratic functions of , and therefore their means are all . By the central limit theorem of Kelejian and Prucha [23], we have
where
with
It is not difficult to observe that when follows , we have , i.e., . □