We introduce a wild multiplicative bootstrap for M and GMM estimators in nonlinear models when autocorrelation structures of moment functions are unknown. The implementation of the bootstrap algorithm does not require any parametric assumptions on the data generating process. After proving its validity, we also investigate the accuracy of our procedure through Monte Carlo simulations. The wild bootstrap algorithm always outperforms inference based on standard first-order asymptotic theory. Moreover, in most cases the accuracy of our procedure is also better and more stable than that of block bootstrap methods. Finally, we apply the wild bootstrap approach to study the forecast ability of variance risk premia to predict future stock returns. We consider US equity from 1990 to 2010. For the period under investigation, our procedure provides significance in favor of predictability. By contrast, the block bootstrap implies ambiguous conclusions that heavily depend on the selection of the block size.
1.
Introduction
Extremum estimators, such as M and generalized method of moments (GMM) estimators, have attained widespread applicability in various statistics and econometrics problems; see, e.g., Huber (1964) and Hansen (1982). The GMM provides a powerful tool for introducing statistical inference in several economic and financial models that are specified by some moment conditions; see, e.g., Hall (2005) for a review of the GMM. Unfortunately, recent research indicates that there are considerable issues with M and GMM estimators, in particular in their finite sample performance. More precisely, the asymptotic theory may provide very poor approximations of the sampling distribution of M and GMM estimators and related test statistics; see, e.g., the special issue of the Journal of Business and Economic Statistics (Volume 14 (3), 1996).
To overcome this problem, a common approach consists of applying bootstrap methods. In time series settings, in the absence of parametric assumptions on the data generating process, the standard approach to bootstrapping is the block bootstrap; see, e.g., Hall (1985), Carlstein (1986), and Künsch (1989). Under strong regularity conditions on the data generating process and the general estimating functions, the block bootstrap may provide asymptotic refinements relative to standard first-order asymptotic theory; see, e.g., Hall and Horowitz (1996), Götze and Künsch (1996), Lahiri (1996), Andrews (2002), and Inoue and Shintani (2006). However, the magnitude of these improvements is not as large as that of the iid bootstrap or the parametric bootstrap. A main issue is that the independence of the blocks does not correctly mimic the structure of the true data generating process. Moreover, from a practical point of view, to ensure accurate approximations, the definition of the block bootstrap also requires an appropriate selection of the block size. The bootstrap literature proposes several ways of selecting this tuning parameter; see, e.g., Hall et al. (1995). Unfortunately, many of these approaches rely on asymptotic arguments, and the practical implementation in finite samples remains unclear.
In this paper, we introduce a wild multiplicative bootstrap for time series settings with unknown structures of the autocorrelation function that does not require the selection of block sizes, but depends on a different lag truncation tuning parameter. Unlike conventional bootstrap procedures proposed in the literature, in our algorithm we do not construct random samples by resampling from the observations. Rather, we propose to perturbate the general estimating functions using correlated innovations. More precisely, to generate the covariance matrix of these innovations, we apply the same kernel function principle adopted for the computation of the heteroskedasticity and autocorrelation consistent (HAC) covariance matrix in the efficient GMM estimation criterion; see, e.g., Newey and West (1987) and Andrews (1991) for seminal works on HAC estimation, and Müller (2014) and Lazarus et al. (2018) for more recent studies on heteroskedasticity and autocorrelation robust (HAR) inference. By introducing this time series dependence, our approach is able to properly capture the autocorrelation of the true moments. Similar multiplicative bootstrap procedures have also been proposed in Minnier et al. (2011), Kline and Santos (2012), and Chernozhukov et al. (2014) in iid settings. Furthermore, dependent wild bootstrap methods for time series are also developed in Politis and Romano (1992), Shao (2010), Zhu and Li (2015) and Bücher and Kojadinovic (2016), among others. In contrast to these studies, instead of generating new random bootstrap observations by introducing correlated error terms, our bootstrap algorithm fixes the original observations and perturbates the (nonlinear) general estimating functions of M and GMM estimators.
In the Monte Carlo analysis, our bootstrap method always outperforms inference based on standard first-order asymptotic theory. Furthermore, the accuracy of our procedure is in general superior to that of block bootstrap methods, and less sensitive to the selection of tuning parameters. Finally, we also consider a real data application. Using the wild multiplicative bootstrap and the block bootstrap, we study the ability of variance risk premia to predict future returns. We consider US equity data from 1990 to 2010 from Shiller (2000) and Bollerslev et al. (2009). For the period under investigation, the wild multiplicative bootstrap provides significance in favor of predictability. By contrast, the block bootstrap implies ambiguous conclusions that heavily depend on the selection of the block size. The reason for these divergent conclusions could be related to the lack of robustness of the block bootstrap in the presence of anomalous observations; see, e.g., Singh (1998), Salibian-Barrera and Zamar (2002) and Camponovo et al.(2012, 2015) for more details on the robustness properties of resampling methods. Indeed, the period under investigation is characterized by several unusual observations, linked to the recent credit crisis, that may easily corrupt inference based on block bootstrap procedures.
The rest of the paper is organized as follows. In Section 2, we introduce M and GMM estimators. In Section 3, we present the wild bootstrap algorithm and prove its validity. In Section 4, we study the accuracy of our approach and block bootstrap procedures through Monte Carlo simulations. In Section 5, we consider the real data application. Finally, Section 6 concludes. A proof and assumptions related to the main theorem about the bootstrap validity discussed in Section 3 are presented in the Appendix.
2.
Extremum estimators
In this section, we introduce M and GMM estimators. As noted in Andrews (2002), M estimators can be written as GMM estimators. However, because of the different identification conditions, we prefer to introduce these classes of estimators separately; see, e.g., Andrews (2002).
2.1. M estimators
Let (X1,…,Xn) be a sample from a process X={Xt,t∈Z} defined on the probability space (Ω,F,P), where Xt∈Rdx. Furthermore, let θ∈Θ⊂Rdθ be an unknown parameter. We consider M estimators ˆθn of θ defined as
where ρ:Rdx×Rdθ→R is a known smooth function. Examples of M estimators include maximum likelihood, quasi-maximum likelihood, and least squares estimators, among others; see, e.g., Andrews (2002).
Let θ0 denote the true value of the unknown parameter θ. Then, under some regularity conditions, √n(ˆθn−θ0) converges weakly to a normally distributed random vector with mean 0 and covariance matrix V0=D−10Ω0D−10, where D0=limn→∞E[1n∑nt=1∂2∂θ∂θ′ρ(Xt,θ0)], and Ω0=limn→∞E[1n∑ni=1∑nj=1∂∂θρ(Xi,θ0)∂∂θρ(Xj,θ0)′]. Therefore, the normal distribution provides valid approximations of the sampling distribution of √n(ˆθn−θ0). Unfortunately, the asymptotic distribution may work poorly in finite samples. To overcome this problem, in Section 3 we analyze bootstrap approximations.
2.2. GMM estimators
For simplicity, we adopt the same notation introduced in the previous section. Let (X1,…,Xn) be a sample from a process X={Xt,t∈Z} defined on the probability space (Ω,F,P), where Xt∈Rdx. Consider the moment condition E[g(Xt,θ0)]=0, where g(⋅,⋅) is an Rdg-valued function with dg≥dθ, and θ0 denotes the true value of the unknown parameter θ∈Θ⊂Rdθ. We focus on GMM estimators ˆθn of θ0 defined as
where Wn is a positive-definite symmetric matrix. Examples of matrix Wn also include the efficient weighting matrix Wn=(Ωn(ˉθn))−1, where ˉθn is a preliminary estimator of θ0,
k(⋅) is a kernel function, and h is the lag truncation.
Suppose that Wn converges in probability to a non-random positive-definite symmetric matrix W0. Then, under some further regularity conditions, the GMM statistic √n(ˆθn−θ0) converges weakly to a normally distributed random vector with mean 0 and covariance matrix V0=(D′0W0D0)−1D′0W0Ω0W0D0(D′0W0D0)−1, where D0=limn→∞E[1n∑nt=1∂∂θg(Xt,θ0)], and Ω0=limn→∞E[1n∑ni=1∑nj=1g(Xi,θ0)g(Xj,θ0)′]. Therefore, in this case as well the normal distribution provides valid approximations of the sampling distribution of √n(ˆθn−θ0). Alternatively, in the next section we analyze bootstrap approximations.
3.
Bootstrap approximations
In Section 3.1, we briefly present the block bootstrap approach, while in Section 3.2 we introduce our wild multiplicative bootstrap procedure.
3.1. Block bootstrap
Since in our setting we do not have parametric information on the data generating process, the standard approach to bootstrapping is the block bootstrap; see, e.g., Carlstein (1986). More precisely, given the observation sample (X1,…,Xn), consider the non-overlapping blocks (Xim+1,…,X(i+1)m), i=0,…,n/m−1, of size m, where for simplicity we assume n/m=b∈N. The non-overlapping block bootstrap constructs random samples (X⋆1,…,X⋆n) by selecting b non-overlapping blocks with replacement. Let ˆθ⋆n be the bootstrap M or GMM estimator solution of (1) or (2), respectively, based on the bootstrap sample (X⋆1,…,X⋆n). Then, the non-overlapping block bootstrap approximates the sampling distribution of √n(ˆθn−θ0) with the conditional distribution of √n(ˆθ⋆n−ˆθn) given the observations (X1,…,Xn); see also Künsch (1989) for the definition of block bootstrap approximations based on overlapping blocks.
Under strong regularity conditions on the data generating process and on the general estimating functions, the block bootstrap may provide asymptotic refinements relative to standard first-order asymptotic theory; see, e.g., Inoue and Shintani (2006). However, to ensure accurate approximations of the sampling distribution of √n(ˆθn−θ0), the definition of the block bootstrap also requires an appropriate selection of the block size. The bootstrap literature proposes several ways of selecting m; see, e.g., Hall et al. (1995). Unfortunately, many of these approaches rely on asymptotic arguments, and the practical implementation in finite samples remains unclear. In the next section, we introduce a wild multiplicative bootstrap approach that does not require the selection of block sizes.
3.2. Wild multiplicative bootstrap
First, we introduce the wild multiplicative bootstrap algorithm. In a second step, we clarify the key rationale of our approach. Finally, we prove the validity of the wild bootstrap approximation.
Algorithm 1. Wild Multiplicative Bootstrap.
(ⅰ) Compute either the M or the GMM estimators ˆθn defined in (1) and (2), respectively.
(ⅱ) Generate a random sample (e1,…,en) of positive correlated observations with following properties E[et|(X1,…,Xn)]=1 and Cov(et,et+i|(X1,…,Xn))=k(i/h), where k(⋅) is an appropriate kernel function, and h is the lag truncation parameter. For t=1,…,n, let
(ⅲ) Compute either the wild multiplicative bootstrap M or GMM estimators ˆθ∗n defined as, respectively,
(ⅳ) Repeat steps (ⅱ)-(ⅲ) B times, where B is a large number. The empirical distribution of √n(ˆθ∗n−ˆθn) approximates the sampling distribution of √n(ˆθn−θ0).
Unlike conventional bootstrap procedures proposed in the literature, in our approach we do not construct random samples by resampling from the observations. Rather, in step (ⅱ) of Algorithm 1, we perturbate the general estimating functions using correlated innovations. By introducing this time series dependence, our bootstrap method is able to properly capture the autocorrelation of the true moments. In equation (8), we compute the wild multiplicative bootstrap GMM estimator. To this end, as in Hall and Horowitz (1996) and Andrews (2002), we recenter the bootstrap moment by subtracting off 1n∑ni=1g(Xi,ˆθn). The recentering ensures that the bootstrap moment E∗[1n∑nt=1g∗(Xt,θ)]=0, when θ=ˆθn. In the next theorem, we prove the validity of our bootstrap algorithm.
Theorem 3.1. Let Assumptions 6.1-6.3 in the Appendix hold. Then,
(i) For M estimators, the conditional law of √n(ˆθ∗n−ˆθn) converges weakly to a normal distribution with mean 0 and covariance matrix V0=D−10Ω0D−10, as n→∞.
(ii) For GMM estimators, the conditional law of √n(ˆθ∗n−ˆθn) converges weakly to a normal distribution with mean 0 and covariance matrix V0=(D′0W0D0)−1D′0W0Ω0W0D0(D′0W0D0)−1, as n→∞.
Theorem 3.1 shows that both for M and GMM estimators, the wild multiplicative bootstrap algorithm provides a valid method for approximating the sampling distribution of √n(ˆθn−θ0).
Remark 1. To verify the validity of the wild bootstrap approximation, in the proof of Theorem 3.1 first we show that √n(ˆθ∗n−ˆθn) minimizes a particular random process. Then, we compute the limit of this random process. To this end, we consider the conditional probability given the sample (X1,…,Xn), and compute the limit by successively conditioning on a sequence of samples, as n→∞. Suppose that 1n∑nt=1g(Xt,ˆθn)=0. Then, note that
where Γi(θ)=1n∑n−it=1g(Xt,θ)g(Xt+i,θ)′, which converges in probability to Ω0 under Assumptions 6.1-6.3. Finally, we compute the limit, and apply results in Geyer (1994).
Remark 2. In Algorithm 1, we can observe that the definition of the wild multiplicative bootstrap does not require the selection of block sizes m. However, the multiplicative bootstrap still requires the selection of the lag truncation tuning parameter h. As highlighted in our Monte Carlo analysis in Section 4, the wild multiplicative bootstrap is less sensitive to the selection of the tuning parameter h than is the block bootstrap to the selection of the block size m, yielding more stable results; see, e.g., Shao (2010) for similar empirical findings.
Remark 3. Suppose that in equation (2) we adopt the optimal weighting matrix Wn=(Ωn(ˉθn))−1. Then, the natural selection of the weighting matrix in equation (8) in the wild bootstrap algorithm is given by Wn=(Ω∗n(ˉθ∗n))−1, where
and ˉθ∗n is a preliminary bootstrap GMM estimator. Note that since E[et]=1, in equation (12) we replace g∗(Xt,θ) with ˉg∗(Xt,θ)=(g(Xt,θ)−1n∑ni=1g(Xi,ˆθn))(et−1).
Remark 4. Using similar arguments adopted in the proof of Theorem 3.1, we can show that Ω∗n(ˉθ∗n) converges in conditional probability to Ω0, as n→∞. Similarly, we can easily introduce consistent bootstrap estimators of D0. These results indicate that the wild multiplicative bootstrap may also provide valid approximations of the sampling distribution of asymptotically pivotal statistics such as t-statistics or J-statistics.
Remark 5. As correctly pointed out by a Referee, in step (ⅲ) of Algorithm 1, instead of re-estimating the unknown parameter of interest, we could simply perturb the estimating equations and use directly for the construction of confidence intervals. However, this approach is not investigated in the Monte Carlo analysis, and left for future research.
Remark 6. Our wild multiplicate bootstrap has some analogies with the (multiplier) bootstrap methods proposed in Minnier et al. (2011), Kline and Santos (2012), Chernozhukov et al. (2014), Politis and Romano (1992), Shao (2010), Zhu and Li (2015), Zhu and Ling (2015), Bücher and Kojadinovic (2016), and Zhu(2016, 2019). However, it is important to highlight that our approach is conceptually different from the procedures developed in previous studies, and in particular from the wild dependent bootstrap introduced in Shao (2010). Specifically, Shao (2010) proposes to generate new random bootstrap observations by introducing correlated error terms. On the other hand, in our bootstrap algorithm we fix the original observations, and propose to perturbate the general estimating functions in a multiplicative way using correlated innovations. Therefore, the wild dependent bootstrap proposed in Shao (2010) cannot be applied to the simplified version of an asset pricing model proposed in our Section 4.2. On the other hand, our approach works in this setting as well.
Remark 7. Inference provided by conventional (block) bootstrap procedures may be easily inflated by a small fraction of anomalous observations in the data; see, e.g., Singh (1998), Salibian-Barrera and Zamar (2002), and Camponovo et al.(2012, 2015). Intuitively, this feature is explained by the overly high fraction of anomalous data that is often simulated by conventional block bootstrap procedures compared to the actual fraction of anomalous observations in the original data. On the other hand, since the wild multiplicative bootstrap does not construct random samples by resampling from the observations, our procedure ensures a desirable accuracy and stability even in the presence of contaminated data. Indeed, preliminary Monte Carlo simulations in the predictive regression setting of Section 4.1 with a small fraction of additive outlying observations confirm a better stability of the wild bootstrap with respect to the block bootstrap. We conjecture that using the breakdown point theory developed in Camponovo et al. (2015), it is possible to establish the superior robustness properties of the wild multiplicative bootstrap with respect to the moving block bootstrap. A complete analysis of the robustness properties of the wild bootstrap is left for future research.
4.
Monte carlo simulations
In this section, we study through Monte Carlo simulations the accuracy of our wild bootstrap approach. In Section 4.1, we present the results for a predictive regression model with different form of heteroskedasticity. Subsequently, in Section 4.2, we consider the simplified version of an asset pricing model analyzed in Hall and Horowitz (1996). Finally, in Section 4.3, we analyze a regression model with a time series structure as proposed in Inoue and Shintani (2006).
We use the Parzen kernel in order to construct the covariance matrix of the correlated innovations in step (ⅱ) of the wild multiplicative bootstrap algorithm. As in other contexts, the choice of the kernel has only a marginal and negligible impact on the accuracy of the results. The number of bootstrap replications is B=999 and the nominal coverage probability is 90%. Unreported Monte Carlo simulations for other coverage probabilities, e.g. 95%, produced similar results and confirmed the findings illustrated in the next subsections. For simplicity, in Sections 4.2 and 4.3 we focus on GMM estimators with identity matrix as the weighting matrix. Furthermore, we construct confidence intervals for the unknown parameter of interest θ0 using approximations of the sampling distribution of the non-studentized statistic √n(ˆθn−θ0). Unreported empirical results with optimal weighting matrix and based on studentized statistics are qualitatively very similar. However, in this case the wild multiplicative bootstrap seems to be slightly more sensitive to the selection of the tuning parameter h. The source of this instability may be related to the estimation of the optimal weighting matrix; see, e.g., Altonji and Segal (1996) for similar computational issues.
Finally, for brevity, we report results only for our bootstrap approach and the non-overlapping block bootstrap. Monte Carlo investigations with alternative block bootstrap procedures such as the stationary block bootstrap and the stationary block-of-blocks bootstrap based on the resampling of the estimating functions using the block bootstrap produce similar results to those shown for the non-overlapping block bootstrap; robustness checks are available from the authors upon request. These findings are not too surprising since stationary block bootstrap methods cannot address the problem of breaking up the dependence structure either and since the block-of-blocks bootstrap also mitigates the problem only at the break points of the subsamples.
4.1. Predictive regression model
We consider the predictive regression model,
where, for t=1,…,n, Yt denotes the dependent variable at time t, and Zt−1 is assumed to predict Yt. The parameters α∈R and μ∈R are the unknown intercepts of the linear regression model and the autoregressive model, respectively, θ∈R is the unknown parameter of interest, ρ∈R is the unknown autoregressive coefficient, and Ut∈R, Vt∈R are error terms.
In the first exercise, we generate 5000 Monte Carlo samples of size n=180 according to model (13)–(14), with Ut∼N(0,1), Vt∼N(0,1), α0=μ0=0, ρ0=0.3,0.5,0.7, and θ0=0. We estimate the unknown parameter of interest through the least squares estimators,
We construct 90% confidence intervals for θ0 using the block bootstrap with block sizes m=2,5,10,15,20, and the wild multiplicative bootstrap with lag truncation h=2,5,10,15,20. Table 1 reports the empirical coverages.
In Table 1, we can observe that both bootstrap procedures provide empirical coverages quite close to the nominal coverage probability 90%. However, the wild multiplicative bootstrap seems to be less sensitive to the selection of the tuning parameter h than the block bootstrap is to the selection of the block size m. For instance, when ρ0=0.3, the empirical coverages of the wild bootstrap range from 90.6 to 91.4 for h=5 and h=20, respectively. On the other hand, in the same setting the empirical coverages of the block bootstrap range from 91.3 to 87.2 for m=5 and m=20, respectively. In particular, in the lines "Variation Block" and "Variation Wild" we report the maximal difference between empirical coverages implied by the block bootstrap and the wild bootstrap for different values of the block size and the lag truncation tuning parameter, respectively. We can observe that the variation for the block bootstrap is always larger than 4.5%. On the other hand, for the wild bootstrap the difference is below 2.0%.
In the second exercise, we consider the same parameter selection as in the previous study. However, in this case the error terms are heteroskedastic and correlated. More precisely, let σ2t=1t−1∑t−1i=1Z2i. Then, for the distribution of the error terms we consider following model,
where Et∼N(0,1). In Table 2, we report the empirical coverages using the block bootstrap and the wild multiplicative bootstrap. Also in this case, we can observe that both bootstrap procedures provide empirical coverages quite close to the nominal coverage probability 90%. However, the wild multiplicative bootstrap is again less sensitive to the selection of the tuning parameter h than the block bootstrap is to the selection of the block size m. Indeed, in the lines "Variation Block" and "Variation Wild" we note that the maximal variation for the block bootstrap is always larger than 5%. On the other hand, for the wild bootstrap the difference is always below 2.0%.
In the last exercise, we study the power properties of the bootstrap procedures. To this end, we generate 5000 Monte Carlo samples of size n=180 according to model (13)–(14), with Ut∼N(0,1), Vt∼N(0,1), α0=μ0=0, ρ0=0.3,0.5,0.7, and θ0∈[0,3/√n]. Finally, using the block and the wild bootstrap, we test the null hypothesis H0:θ0=0 versus H1:θ0≠0, for θ0∈[0,3/√n]. Figure 1 reports the power curves for different selections of the block size m and the lag truncation h.
In Figure 1, we can observe that both bootstrap procedures have quite similar power properties. When θ0=0, the empirical rejection frequencies of the null hypothesis are very close to the significance level 10%. As expected, when θ0≠0, the empirical rejection frequencies increase. However, in this case as well we can observe that the wild multiplicative bootstrap seems to be less sensitive to the selection of the tuning parameter h than the block bootstrap for the selection of the block size m. Given that power results in the next two settings are perfectly in line with those presented for predictive regressions, for the sake of brevity we do not report them in detail.
We consider the example introduced in Hall and Horowitz (1996), who introduce a simplified version of an asset pricing model defined by the moment conditions
where X=(X1,X2)′, θ0=3 is the parameter of interest, μ is a known normalization constant, and X1, X2 are independent random scalars. In particular, we consider the case where X1∼N(0,0.22) and X2 follows a strictly stationary AR(1) process with no intercept, first-order serial correlation coefficient ρ, and standard normal innovations.
In Table 3, we report empirical coverage probabilities of 90% confidence intervals for parameter θ0 based on 5000 Monte Carlo samples of size n=48,96, and 256. For the first-order serial correlation coefficient in the data generating process, we consider the cases ρ0=0.3,0.5,0.7. We construct the confidence intervals using first-order asymptotic theory and bootstrap approximations. More precisely, for the wild bootstrap and the block bootstrap, we consider as lag truncation and block sizes h=m=2,4,6,8,10,12, h=m=4,8,12,16,20 and h=m=4,8,16,24,32, for n=48, n=96, and n=256, respectively. The values we consider are similar to those in Hall and Horowitz (1996) who focused on block sizes m=5,10,20 for n=50,100.
The results for ρ0=0.3,0.5,0.7 are qualitatively very similar. The first observation we make is that the wild multiplicative bootstrap significantly outperforms inference based on standard first-order asymptotic theory for all values of h we consider. The second observation is that the accuracy of both the wild bootstrap and the block bootstrap depends on the choice of the parameters h and m, respectively. Furthermore, for the same values of h and m, we see that the wild bootstrap is closer to the nominal coverage probability 90% for most of the settings. Finally, when comparing the wild and block bootstrap, we also observe that the wild bootstrap is much less sensitive to the choice of h than the block bootstrap is to the choice of m. Indeed, in the lines "Variation Block" and "Variation Wild" we note that the maximal difference between empirical coverages implied by the block bootstrap is always larger than 5%. On the other hand, the maximal difference for the wild bootstrap is around 1%. As mentioned above, there is no clear method to determine the block size in finite samples, which makes this dependence problematic in practice. The higher stability of the wild bootstrap with respect to the lag truncation h is therefore a major advantage in practice, as the procedure is quite accurate for a wide range of values, unlike the block bootstrap.
In this section, we consider the linear regression model
where Yt∈R, θ∈R, and the disturbance and the regressors are generated according to the following autoregressive processes with common ρ,
with Vt=(V1t,V2t)′∼N(0,I2). We generate 5000 samples according to this model with θ0=0, ρ0=0.3,0.5,0.7, and n=48,96,256. Note that in this setting, the unknown parameter of interest satisfies the moment conditions
where Xt=(Yt,Zt,Zt−1,Zt−2)′. Again, we construct 90% confidence intervals using first-order asymptotic theory and bootstrap approximations. More precisely, for the wild bootstrap and the block bootstrap, we consider as lag truncation and block sizes h=m=2,4,6,8,10,12, h=m=4,8,12,16,20 and h=m=4,8,16,24,32, for n=48, n=96, and n=256, respectively. The empirical coverage probabilities are summarized in Table 4.
In this setting as well, the wild multiplicative bootstrap clearly outperforms inference based on standard first-order asymptotic theory, regardless of the choice of the lag truncation. The higher precision of the wild bootstrap with respect to that of the block bootstrap when using the same parameter values is even more evident than in the previous setting. Moreover, results again show that the accuracy of the block bootstrap is much more sensitive to the block size parameter than is that of the wild bootstrap with respect to the lag truncation parameter, even for quite large samples and low persistence.
5.
Real data application
In this section, we study the forecast ability of variance risk premia to predict future stock returns. Recently, a large number of studies have investigated whether stock returns can be predicted by economic variables such as the price-dividend ratio, the interest rate or the variance risk premia; see, e.g., Rozeff (1984), Fama and French (1988), Campbell and Shiller (1988), Nelson and Kim (1993), Campbell and Yogo (2006), Jansson and Moreira (2006), Polk et al. (2006), and Bollerslev et al. (2009).
In this empirical analysis, we consider monthly S & P 500 index data (1871–2010) from Shiller (2000). We define the one-period real total return as
where Pt is the end-of-month real stock price and dt is the real dividends paid during month t. Finally, we consider the predictive regression model,
where ln(Rt+k,t):=ln(Rt+1)+⋯+ln(Rt+k) and the variance risk premium VRPt:=IVt−RVt is defined by the difference between the S & P 500 index option-implied volatility at time t, for one month maturity options, and the ex-post realized return variation over the period [t−1,t]. Bollerslev et al. (2009) show that the variance risk premium is the most significant predictive variable of market returns over a quarterly horizon. Therefore, we test the predictive regression model (24) for k=3.
We estimate the unknown parameter of interest through the least squares estimators
We are interested in testing the hypothesis of no predictability H0:θ0=0. To this end, using the block bootstrap and the wild multiplicative bootstrap, we construct 90% confidence intervals for the unknown parameter of interest θ0. More precisely, we apply the procedures under investigation to the period 1990–2010, consisting of 240 observations. Table 5 reports our empirical results. For the period under investigation, our wild bootstrap procedure always provides significance in favor of predictability. Similarly, inference based on standard first-order asymptotic theory also rejects the null hypothesis. By contrast, the block bootstrap implies larger and less stable confidence intervals that lead to ambiguous conclusions depending on the selection of the block size. For instance, for m=5,10,15 the block bootstrap also rejects the hypothesis of no predictability. However, for m=20, the block bootstrap does not reject H0.
A possible source of the divergent conclusions could be related to the lack of robustness of the block bootstrap in the presence of anomalous observations. Indeed, the year 2008 is characterized by several unusual observations linked to the recent credit crisis. As shown in Camponovo et al. (2015), inference provided by block bootstrap procedures may be easily inflated by a small fraction of anomalous observations in the data. Intuitively, this feature is explained by the excessively high fraction of anomalous data that is often simulated by conventional block bootstrap procedures, when compared to the actual fraction of anomalous observations in the original data. On the other hand, since the wild multiplicative bootstrap does not construct random samples by resampling from the observations, our procedure may preserve a desirable accuracy even in the presence of anomalous observations.
6.
Conclusions
In time series models, in absence of parametric assumptions on the data generating process, the standard approach to bootstrapping is the block bootstrap. After splitting the original sample into (non)-overlapping blocks, the block bootstrap constructs random samples by selecting the (non)-overlapping blocks with replacement. Under strong regularity conditions on the data generating process and on the estimating functions, the block bootstrap may provide asymptotic refinements relative to standard first-order asymptotic theory. However, to achieve this objective, the definition of the block bootstrap and the selection of the block size require some care.
In this paper, we introduce a wild multiplicative bootstrap procedure that does not require the selection of block sizes but still depends on a less sensitive lag truncation parameter. Unlike conventional bootstrap procedures proposed in the literature, in our algorithm we do not construct random samples by resampling from the observations. Instead, we propose perturbating the general estimating functions using correlated innovations. By introducing this time series dependence, our bootstrap method is able to properly capture the autocorrelation of the true moments. Moreover, unlike conventional bootstrap methods, the wild bootstrap may preserve a desirable accuracy and stability even in the presence of anomalous observations. We prove the validity of our bootstrap procedure and in a Monte Carlo analysis show that our approach always outperforms inference based on standard first-order asymptotic theory. Furthermore, the wild multiplicative bootstrap we propose also compares favorably with block bootstrap procedures for values of the block size typically suggested in the literature.
Finally, in a real data application related to the large literature on stock return predictability, we show the advantages of the proposed procedure for obtaining clear results that are not influenced by the presence of possible anomalous observations in the data.
Acknowledgments
We thank the editor, the associate editor and four anonymous referees for useful comments.
Conflict of interest
We have no conflict on interest to declare.
Appendix: Assumptions and Proofs
Before proving Theorem 3.1, let us introduce a set of assumptions in line with Goncalves and White (2004) and Allen, et al. (2010), for M and GMM estimators, respectively.
Assumption 6.1.
(a) Let (Ω,F,P) be a complete probability space. The observed data are a realization of a stochastic process Xt:Ω→Rdx, dx∈N, with Xt(ω)=Wt(…,Vt−1(ω),Vt(ω),Vt+1(ω),…), Vt:Ω→Rv, v∈N, and Wt:∏∞τ=−∞Rv→Rdx is such that Xt is measurable for all t.
(b) Either for M estimators, the function ρ:Rd×Θ→R is such that ρ(⋅,θ) is measurable for each θ∈Θ, a compact subset of Rp, and ρ(Xt,⋅) is continuous on Θ a.s. for all t; or for GMM estimators, the function g:Rdx×Θ→Rdg is such that g(⋅,θ) is measurable for each θ∈Θ, a compact subset of Rdθ, and g(Xt,⋅) is continuous on Θ a.s. for all t.
(c) Either for M estimators: (i) θ0 is the unique minimum of E[1n∑nt=1ρ(Xt,θ)] over θ∈Θ. (ii) θ0 is an interior point of Θ; or for GMM estimators: (i) θ0 is the unique solution of E[g(Xt,θ)]=0, θ∈Θ. (ii) θ0 is an interior point of Θ.
(d) Either for M estimators: (i) ρ(Xt,θ) is Lipschitz continuous on Θ, i.e., |ρ(Xt,θ1)−ρ(Xt,θ2)|≤Lt|θ1−θ2| a.s. for all θ1,θ2∈Θ, where 1n∑nt=1E[Lt]=O(1). (ii) ∂2∂θ∂θ′ρ(Xt,θ) is Lipschitz continuous on Θ; or for GMM estimators: (i) g(Xt,θ) is Lipschitz continuous on Θ, i.e., ‖g(Xt,θ1)−g(Xt,θ2)‖≤Lt‖θ1−θ2‖ a.s. for θ1,θ2∈Θ, where 1n∑nt=1E[Lt]=O(1). (ii) ∂∂θg(Xt,θ) is Lipschitz continuous on Θ.
(e) For some r>2, either for M estimators: (i) ρ(Xt,θ) is r-dominated on Θ uniformly in t, i.e., there exists Dt such that |ρ(Xt,θ)|≤Dt for all θ∈Θ, and Dt is measurable such that E[|Dt|r]<∞ for all t. (ii) ∂∂θρ(Xt,θ) is r-dominated on Θ uniformly in t. (iii)∂2∂θ∂θ′ρ(Xt,θ) is r-dominated on Θ uniformly in t; or for GMM estimators: (i) g(Xt,θ) is r-dominated on Θ uniformly in t, i.e., there exists Dt such that ‖g(Xt,θ)‖≤Dt for all θ∈Θ, and Dt is measurable such that E[‖Dt‖r]<∞ for all t. (ii) ∂∂θg(Xt,θ) is r-dominated on Θ uniformly in t.
(f) {Vt} is an α-mixing sequence of size −2r/(r−2), with r>2.
(g) Either for M estimators: the elements of (i) ρ(Xt,θ) are near epoch dependent on {Vt} of size −1/2. (ii) ∂∂θρ(Xt,θ) are near epoch dependent on {Vt} of size −1 uniformly on (Θ,f), where f is any convenient norm on Rp. (iii) ∂2∂θ∂θ′ρ(Xt,θ) are near epoch dependent on {Vt} of size −1/2 uniformly on (Θ,f); or for GMM estimators: the elements of (i) g(Xt,θ) are near epoch dependent on {Vt} of size −1 uniformly on (Θ,f), where f is any convenient norm on Rdg. (ii) ∂∂θg(Xt,θ) are near epoch dependent on {Vt} of size −1 uniformly on (Θ,f).
(h) Either for M estimators: (i) ‖1n∑ni=1∑nj=1E[∂∂θρ(Xi,θ0)∂∂θρ(Xj,θ0)′]−Ω0‖→0, for some positive definite matrix Ω0. (ii) ‖1n∑nt=1E[∂2∂θ∂θ′ρ(Xt,θ0)]−D0‖→0, where D0 is of full rank; or for GMM estimators: (i) ‖1n∑ni=1∑nj=1E[g(Xi,θ0)g(Xj,θ0)′]−Ω0‖2→0, for some positive definite matrix Ω0. (ii) ‖1n∑nt=1E[∂∂θg(Xt,θ0)]−D0‖2→0, where D0 is of full rank. (iii) Wn converges in probability to a non-random positive-definite symmetric matrix W0.
(l) (i) The kernel function k(⋅) is continuous, k(0)=1, k(x)=k(−x), and ∫∞−∞|k(x)|dx<∞. (ii) Let K(λ)=12π∫∞−∞k(x=e−ixλdx, then ∫∞−∞|K(λ)|dλ<∞. (iii) The lag truncation h satisfies 1h+h√n→0, as n→∞.
Assumption 6.2.
(a) For some r>2, either for M estimators: ∂∂θρ(Xt,θ) is 3r-dominated on Θ uniformly in t; or for GMM estimators: g(Xt,θ) is 3r-dominated on Θ uniformly in t.
(b) Either for M estimators: (i) For small δ>0 and some r>2, the elements of ∂∂θρ(Xt,θ) are L2+δ near epoch dependent on {Vt} of size −2(r−1)/(r−2) uniformly on (Θ,f). (ii) {Vt} is α-mixing of size −r(2+δ)/(r−2); or for GMM estimators: (i) For small δ>0 and some r>2, the elements of g(Xt,θ) are L2+δ near epoch dependent on {Vt} of size −2(r−1)/(r−2) uniformly on (Θ,f). (ii) {Vt} is α-mixing of size −r(2+δ)/(r−2).
Assumption 6.3.
(a) Let (e1,…,en) be a sample from a stationary process of positive correlated observations with E[et|(X1,…,Xn)]=1, Cov(et,et+i|(X1,…,Xn))=k(i/h), E[e4t|(X1,…,Xn)]<∞ where k(⋅) is an appropriate kernel function, and h is the lag truncation parameter.
Assumptions 6.1 and 6.2 are mild conditions typically required for the validity of bootstrap approximations that are satisfied in several time series settings. In particular, Assumption 6.1 provides a set of conditions that are typically required for the consistency and asymptotic normality of M and GMM estimators, whereas in Assumption 6.2, in line with Goncalves and White (2004) and Allen, et al. (2010), we add conditions necessary for the consistency of the bootstrap approximation. Finally, in Assumption 6.3, we add conditions for the error terms in the construction of the wild bootstrap approximation. Unfortunately, these assumptions do not apply to unknown parameters defined through non-differentiable estimating functions.
Proof of Theorem 3.1: First, we consider the M estimator case, and prove statement (i). To this end, consider the random process
Note that Rn(u) is minimized at √n(ˆθ∗n−ˆθn). By considering a Taylor expansion of ρ∗(Xt,ˆθn+u/√n) around ˆθn we have
Therefore, we can rewrite the random process Rn(u) as
First, consider the second factor 12n∑nt=1u′(∂2∂θ∂θ′ρ∗(Xt,ˆθn))u in the above expansion. By Theorem 20.21 in Davidson (1994), the term 1n∑nt=1∂2∂θ∂θ′ρ∗(Xt,ˆθn) converges in conditional probability to D0. Furthermore, consider now the first factor 1√n∑nt=1u′(∂∂θρ∗(Xt,ˆθn)). By De Jong and Davidson (2000), and Corollary 24.7 in Davidson (1994), the conditional law of 1√n∑nt=1(∂∂θρ∗(Xt,ˆθn)) converges weakly to a normal distribution with mean 0 and covariance matrix Ω0.
Therefore, the limit R(u) of Rn(u) is given by
where v0∼N(0,Ω0). Note that the unique minimum of R(u) is −D−10v0, which is normally distributed with mean 0 and covariance matrix D−10Ω0D−10. It turns out that by use of the results in Geyer (1994) the conditional law of √n(ˆθ∗n−ˆθn) also converges weakly to a normal distribution with mean 0 and the same covariance matrix.
To prove statement (ⅱ), we adopt the same approach adopted in the proof of statement (ⅰ). More precisely, consider the random process
Note that Sn(u) is minimized at √n(ˆθ∗n−ˆθn). By considering a Taylor expansion of the term 1√n∑nt=1g∗(Xt,ˆθn+u/√n) around ˆθn we have,
It turns out that using (32) we can rewrite Sn(u) as
Therefore, by Theorem 20.12 in Davidson (1994), De Jong and Davidson (2000), and Corollary 24.7 in Davidson (1994), the limit S(u) of Sn(u) is given by
where v0∼N(0,Ω0). Note that the unique minimum of S(u) is −(D′0W0D0)−1D′0W0v0, which is normally distributed with mean 0 and covariance matrix (D′0W0D0)−1D′0W0Ω0W0D0(D′0W0D0)−1. By the use of the results in Geyer (1994), the conditional law of √n(ˆθ∗n−ˆθn) also converges weakly to a normal distribution with mean 0 and the same covariance matrix.