Keywords: Structural VAR, long-run identification, non-parametric estimation, spectral factorization
Abstract:
In addition, when combining VAR coefficients with non-parametric estimates of the spectral density, care needs to be taken to consistently account for information embedded in the non-parametric estimates about serial correlation in VAR residuals. This paper uses a spectral factorization to ensure a correct representation of the data's variance. But this cannot overcome the fundamental problems of estimating the long-run dynamics of macroeconomic data in samples of typical length.
JEL Classification: C32, C51, E17, E32
VARs have been criticized for failures in estimating the responses to long-run shocks. A crucial element for long run identification is the spectral density at zero-frequency, also known as "long-run variance". But OLS estimates of VAR coefficients are concerned with minimizing forecast error variance, not estimating the long run variance. This has motivated Christiano, Eichenbaum, and Vigfusson (2006a; 2006b), henceforth "CEV", to propose a new way of estimating structural VARs using a combination of OLS and a non-parametric estimator. They argue that their estimator virtually eliminates the bias associated with the standard OLS estimator.
This paper shows that non-parametric estimates of the spectral density, henceforth called "spectral estimators", are no panacea for the implementation of long-run restrictions in finite sample. Macroeconomic time series display a fair amount of persistence, posing two serious challenges for long-run identification. First, an accurate representation of the true model typically requires a VAR with a high lag order, much higher than what is affordable in a sample of typical length and resulting in a sizable truncation bias (Chari, Kehoe, and McGrattan 2008, henceforth "CKM"). Second, there is the small sample bias in estimated coefficients known from Hurwicz (1950), which becomes ever more severe the smaller the sample, and the more persistent the data. As will be shown, both issues affect not only VARs in the time domain, but also spectral estimators in the frequency domain.
The conventional VAR technique as well as different combinations with spectral estimators are evaluated in the context of a simple two-shock RBC model, which has also been used by CEV and CKM. When using the various procedures to estimate the response of hours to technology, or to decompose the variance of fluctuations in output or hours, none of the procedures clearly dominates the others.
Furthermore, CEV do not consider some conceptual pitfalls in combining VAR coefficients with spectral estimates. Non-parametric estimates of the spectral density allow for non- residuals in the finite-order VAR, which is good since the underlying model is likely of infinite order. In what may be called "mixing and matching", the CEV approach plugs these estimates into the standard VAR formula alongside with coefficients from the finite-order VAR. This approach uses the extra information about omitted lags in the VAR to compute the long-run responses of variables to shocks--but not when mapping these back into impact responses. To retain a consistent representation of the data, that would however be necessary. Otherwise, the total variance of the data is misrepresented. In the simulations reported here, this misrepresentation is quantitatively relevant. As a related issue, when the relationship between forecast errors and structural shocks is inverted with the CEV coefficients, one obtains a time series which is identical to the shock estimates from OLS up to a scale factor. All in all, this is of concern for any researcher wanting to adopt the CEV strategy.
The CEV framework is amended here by recognizing that the non-parametric estimate contains information about omitted lags in the VAR. This misspecification has been stressed by CKM, Erceg, Guerrieri, and Gust (2005), Ravenna (2007) and Cooley and Dywer (1998). The adjusted procedure retains the OLS estimates and fills up the omitted lags with a spectral factorization of the spectral density's non-parametric estimate. By construction, this adjusted SVAR--in fact an SVARMA--matches the sample variance of the data just as OLS does. Overall, this corrected procedure suffers from the same basic problems as the other long-run identification methods: truncation and small sample bias.
The remainder of this paper is structured as follows: Section 2 describes the model economy against which the various estimation routines will be evaluated. Section 3 describes the various SVAR methods, including a new spectral factorization procedure. Section 4 presents the Monte Carlo results and Section 5 concludes the paper.
This section describes a simple model economy, which will be used to illustrate and quantify the issues associated with various long-run identification schemes. None of the conceptual concerns related to spectral estimates raised in Section 3 will be specific to this model. The model is identical to the two-shock economy used by CKM and CEV.
The model is a common one-sector RBC economy driven by two shocks: First, a unit root shock to technology, . This is the permanent shock to be estimated by the VAR. Second, a transitory non-technology shock, , which drives a wedge between private household's labor-consumption decision.
The representative household maximizes his lifetime utility over (per-capita) consumption, , and labor services,
The non-technology shock is an exogenous labor tax. As discussed by CKM, it need not be literally interpreted as a tax levy, but stands in for the effects of a variety of non-technology shocks introduced into second-generation RBC models. Mechanically, it distorts the first-order condition for consumption and labor. It works similar to a stochastic preference shock to the Frisch elasticity of labor supply. Chari, Kehoe, and McGrattan (2007) show how this labor "wedge" can be understood more generally as the reduced form process for more elaborate distortions, such as sticky wages.
The production function is constant returns to scale, where is
labor-augmenting technological progress. Firms are static and maximize profits
. Per-capita output equals production,
, and the economy's resource constraint is
. The exogenous drivers follow linear stochastic processes:
where
and
are standard-normal random variables. They are the technology
shock, respectively labor shock. measures the persistence of the transitory labor tax. The scale factors and determine their relative importance in the model. ( is the drift in log-technology and is the average tax rate.)
The calibration is identical to the baseline model of CEV, which uses parameter values familiar from the business cycle literature. Utility is specified as (consistent with balanced growth) and the production function is Cobb-Douglas with a capital share of . The labor preference parameter is set to . On an annualized basis, the calibration sets the depreciation rate to 6%, the rate of time preferences to 2% and population growth to 1%.1 Following CEV, the transitory shock is calibrated as an AR(1) with persistence . This calibration is identical to the values used by CKM except for their values of and .
The model economy is calibrated over different ratios in the variance of transitory to permanent shocks, , which translate into different assumptions about the share of output fluctuations explained by technology shocks.2 As a benchmark, maximum-likelihood estimates of CEV obtained from fitting the model to U.S. post-war data imply that around two-thirds of the bandpass-filtered variance in output can be attributed to technology shocks.3 The bandpass filter employed throughout this paper considers only fluctuations with durations between two-and-a-half and eight years, which is consistent with the NBER definitions of Burns and Mitchell (1946).
Data is simulated for samples of length , corresponding to 45 years of quarterly data; identical to the simulations of CKM and CEV. Following CEV and CKM, bivariate VARs are estimated using simulated data of the (log) growth rate of labor productivity and hours worked; .4 For each simulated sample, the lag length of the VAR() is chosen by minimizing the Schwartz Information Criterion (SIC), typically picking small values close to one.5 When computing population moments, a VAR(1) is used. For each calibration, 1,000samples are simulated.
When looking at data simulated from this model, two statistics are of particular interest for this paper: How do hours worked respond to a technology shock? What is the share of fluctuations due to technology shocks? These questions are typically asked by empirical researchers trying to evaluate predictions from business cycle models with SVARs, such as Gali (1999) or Christiano, Eichenbaum, and Vigfusson (2004).
The linearized solution to the model described in the previous section is only one example from a wider class of linear dynamic models to which the SVAR methods discussed here can be applied. None of the issues discussed in this section will be specific to the model described above. An economic
model from this class is supposed to specify a VAR representation for a stationary vector of observable variables6 :
where is a polynomial in the lag-operator ,
whose roots lie all outside the unit circle and the innovations are ,
.
In principle, the model prescribes an infinite order VAR. When for this nests the case of a finite order VAR. But as noted by Cooley and Dywer (1998), many interesting models have only infinite order VAR representations. In the remainder of this paper the true VAR representation is always assumed to be of infinite order. The linearized solution to the model described in Section 2 has such an infinite order VAR representation; details are shown in Appendix B.
For the identification of structural shocks, there has to be an invertible one-to-one mapping from innovations to the structural shocks driving the underlying model--such as technology, monetary policy errors, exogenous government spending etc.:
where is square and . Fernàndez-Villaverde et al. (2007)
derive conditions when a linear dynamic model has an invertible VAR representation.7 (These are summarized in Appendix B.) This paper considers only cases where these conditions are satisfied, though possibly only in an infinite order VAR representation. The same applies to the situations studied by CKM, CEV as well as Erceg, Guerrieri, and Gust (2005).
Excluding the complications arising from non-invertibilities allows to focus on problems owing to small sample bias and the finite order approximations of the VAR.8
It will be handy to introduce the notation
for the non-structural moving average (VMA) coefficients of
. The structural moving average representation for is then
with
.
In the spirit of CEV and CKM, only one of the structural shocks will be identified. For concreteness, let it be the first one, denoted
, and call it "technology shock". Think of the first element of
as being a growth rate (a difference in logs), like the change in labor-productivity Gali (1999) or output growth Blanchard and Quah (1989). The identifying assumption is then that only the technology shock has a permanent effect on the level of the
first element of . This restricts the matrix of long-run coefficients,
:
This restriction holds exactly in the linearized solution to the model described in Section 2.
A key object for implementing this constraint is the spectral density of . The spectral density at frequency is defined as
where is the imaginary unit.9 factors the spectral density of at frequency zero:
One way to compute the first column of is by recovering from the Cholesky
decomposition of . (This is the unique lower triangular factorization of a positive definite matrix.10):
CEV show that the restriction in (4) uniquely pins down the first column of and the Cholesky factorization is one possible implementation.11
The long-run coefficients can then be mapped into the matrix of impact responses using the VAR dynamics encoded in the polynomial of lag coefficients :
Since the VAR innovations in (1) are assumed to be white noise, they satisfy the OLS normal equations
(). And in principle, the coefficients could be estimated from least squares projections of on its infinite past. In practice,
an empirical implementation can only work with a finite lag length. Henceforth
denotes a lag polynomial of finite order :
The associated VMA is
. Only stable VARs are considered, formally this requires all roots of
to be outside the unit-circle.
The standard procedure is to assume uncorrelated residuals,
. Following Blanchardand Quah (1989), the long run restriction (4) is implemented based on an estimate of
the spectral density at frequency zero constructed from the OLS estimates. Impact coefficients are computed by plugging these estimates into (6):
Using a finite-order VAR when the data has been generated from an infinite order process induces a truncation bias into the estimates. In this case, the OLS assumption of uncorrelated forecast errors is violated, which is an example of what Cooley and Dwyer criticized as an "auxiliary" (but not innocuous) assumption. This truncation bias arises even when the true population moments of the data generating process were known. Applied to data generated from a business cycle model the truncation bias in SVARs can be substantial, as shown by Cooley and Dwyer (1998), Erceg, Guerrieri, and Gust (2005), Ravenna (2007) or CKM. The truncation bias is also sizable for data from the model described Section 2 as will be seen in Figure 3 below.
CEV propose an alternative estimator for the matrix of impact coefficients. This new estimator uses a mixture of the OLS estimates of and a non-parametric estimator for . The procedure is motivated by observing that OLS projections construct
not necessarily with regard to but in order to
minimize the forecast error variance
. Following Sims (1972), the least-squares objective seeks OLS coefficients which minimize the average distance between
themselves and the true
, weighted by the spectral density of , which may or may not be
large at the zero frequency:12
Accordingly, OLS will try to set
close to only if the data's spectrum is high
at the zero frequency and
need not be the best possible estimate for the spectral density at frequency zero.
Instead of using
, CEV employ a spectral estimator of to
construct . In Christiano, Eichenbaum, and Vigfusson (2006a), they consider two estimators, one based on Newey and West (1987) and the other on Andrews and Monahan (1992). Both are based on truncated sums of autocovariance matrices. To ensure positive definiteness, these are weighted by a Bartlett kernel. Where Newey-West sums over the (sample) autocovariances of ,
Andrews-Monahan uses first the VAR to prewhiten the data and then sums over the residual autocovariances:
is a truncation parameter, also known as "bandwidth", which will be discussed in more detail below.13 The Andrews-Monahan estimator nests the OLS case when .
The new CEV estimator computes the long-run coefficients from the non-parametric density estimate. Combined with the OLS lag coefficients, CEV obtain their impact coefficients as
Impulse responses are
. Using the Newey-West estimator, impact coefficients are computed as
. For brevity, the remainder of this
section will mostly refer to the Andrews-Monahan estimator, with similar arguments holding for the Newey-West estimator. Section 4 presents simulations using both estimators.
The bandwidth choice is critical in estimating spectra, akin to choosing the lag order of a VAR. Bandwidth choice has been shown to be more important using other weighting schemes than the Bartlett kernel Newey and West (1994).14 For a consistent estimator, can grow with the sample size but at a smaller rate. CEV use a fixed and fairly large value of in a sample of observations.15
Theoretically, the prewhitening of Andrews-Monahan is appealing since it removes spikes from the spectral density of which make spectral estimation difficult (Priestley 1981, Chapter 7). Andrews and Monahan (1992) and Newey and West (1994) find the prewhitening to fare better in Monte Carlo studies than the original Newey-West estimator. Christiano, Eichenbaum, and Vigfusson (2006a) find no clearly superior choice between the two and Christiano, Eichenbaum, and Vigfusson (2006b) proceed to use only the Newey-West estimator.
To minimize the mean-squared error (MSE) of spectral estimates, the bandwidth selection schemes of Andrews (1991) and Newey and West (1994) can be used. However, constructing an MSE optimal estimator of the spectrum does not necessarily translate into an MSE optimal estimate of coefficients like or . Their MSE depends not solely on the MSE of but--amongst others--on bias and standard error of the spectrum in ways which are specific to the data generating process.16 Hence, the bandwidth selection scheme of Newey and West (1994) may serve as a useful starting point for bandwidth choice, but it is not necessarily optimal for the purpose of estimating impulse response or variance shares.
The simulations reported below use both the optimal bandwidth selection scheme of Newey and West (1994) and the large bandwidth choice of CEV. The former tend to pick fairly small bandwidths. For the various calibrations of the model economy considered here, the optimal bandwidth of the Newey-West estimator is typically close to ten, and the average for the Andrews-Monahan estimator is about four. In simulations not reported here, spectral estimates with intermediate bandwidth choices displayed performance characteristics which were intermediate between what is shown here for these two choices here.
The CEV procedure is motivated by dissatisfaction with . In conventional SVAR implementations, this estimate is needed for two purposes: First, to construct the long run responses as in (9), and second in order to map back into impact responses as in (10). CEV replace with a spectral estimate in the first step, but not in the second. This creates a non-negligible inconsistency in representing the overall dynamics of the VAR.
By plugging a spectral estimate into their SVAR computations, CEV have weakened the OLS assumption of uncorrelated residuals without fully accounting for its consequences. As a result, the impact coefficients of CEV will in general not reproduce the forecast error variance of the VAR, which is at the heart of variance computations. These and other consequences are illustrated here. The next sub-section shows how a spectral factorization could be used to incorporate spectral estimates into a VAR model while retaining an internally consistent model of the data.
The spectral estimates embody information about correlation in the VAR residuals . As can be seen from (14), the Andrews-Monahan estimator is constructed from autocovariances of the VAR residuals . Obviously, expresses a concern about serially correlated residuals. The Newey-West estimator also embodies concerns about serially correlated VAR residuals since it implies a spectrum for the VAR residuals, which is generally not constant across frequencies.17
Under the premise that the true model has only an infinite-order representation, it is indeed very plausible that the residuals from a VAR() will be correlated. In the spirit of Andrews and Monahan (1992), the VAR could then be viewed as having merely prewhitened the data. But typically, researchers fit the lag length of their VARs until the point where estimated residuals do not display any significant correlation. Employing a spectral estimate like (14) beyond this point implies a belief that there is still useful information to be gleaned from the estimated residuals--or in other words, it implies a distrust against the lag-selection criteria being chosen for the VAR. By allowing for residual dynamics poses, a researcher risks of overfitting the data, which may still reduce bias in the estimated spectra, but at the expense of a higher standard error.18 Against this backdrop, the assertion of Christiano, Eichenbaum, and Vigfusson (2006a) that the impulse responses computed from their procedure have "smaller bias, smaller means square error" appear even more striking--and as will be seen, these properties do neither extend to the wider set of calibrations studied below nor to other SVAR statistics like variance decompositions.
A researcher adopting the CEV strategy wants and thus .19 As a direct implication, the impact coefficients of CEV do not reproduce the forecast error variance of the VAR, . When computing the total variance of the data by summing over the conditional variations implied by the SVAR, the CEV procedure would not match the unconditional variance of the data either.20 This mismatch in the unconditional variances of VAR what is implied by the impulse responses of CEV occurs both in population as well as in small sample. A similar argument applies to the Newey-West variant of the CEV procedure, where a researcher seeks and thus .
Note: Percentage errors of variance in output growth, relative to true moments (left panel, "Population") or OLS sample moments (right panel, "Simulated"). Errors are computed for different calibrations of the importance of technology shocks in the data generating process. "Technology share" on the x-axis is the percentage of output variability due to technology shocks at business cycle frequencies (cycles with durations between two-and-a-half and eight years) in the data generating process.Variances denoted "CEV-NW" and "CEV-AM" are computed from the impulse responses of CEV's SVAR procedure using the Newey-West or Andrews-Monahan estimators respectively. The left panel, labeled "Population", uses variances computed from applying the SVAR procedures to the true population moments of the model. This panel documents mismatches arising from finite VAR order and spectral bandwidth. The right panel, labeled "Simulations", reports small means computed from 1,000simulations. (The variances are not bandpass-filtered.) |
For the model economy described in Section 2, Figure 1 illustrates the mismatch in the variance of output growth. In small sample, the variances implied by the CEV procedure are only about half as big as the OLS sample moments. As can be seen in the right panel of Figure 1, this occurs both when using the Newey-West or the Andrews-Monahan variant of the procedure and regardless of the share of fluctuations explained by technology shocks. As depicted in the left panel of the figure, the mismatch is qualitatively different, but also sizable when applying the procedure to the population moments of the model while using a lag length of and spectral bandwidth of .
The CEV procedure is motivated by concerns about the ability of OLS estimate to correctly capture the low-frequency dynamics of the data. But implicitly, differences between spectra estimated from OLS and the non-parametric methods are not attributed to misspecified dynamics, but rather to the VAR's forecast error variance. However, the accuracy of estimating has so far not been doubted. In fact, getting a good estimate for forecast error variance is precisely the objective of OLS projections--see (11) above. Still, the CEV procedure deviates from previous contributions to the SVAR literature where identification is defined as a search over the space of matrices satisfying .21
Finally, a researcher might want to re-construct structural shocks based on (2) as
and compare them against
. She will be troubled noticing that the estimated
technology shocks are perfectly correlated:22
This holds for any pair of matrices and constructed from (4) and (6) using and a satisfying the zero restrictions (4). Under those conditions the top rows of , and are identical up to a scaling.23
Since CEV were only concerned with impulses-responses and , the problem does not show up in their analysis. The construction of estimated shocks is however often used by researchers, for instance in order to plot historical decompositions or when identifying several shocks (see for example Altig et al. 2004).
To overcome the problems with the CEV procedure discussed above, it is necessary to parse out dynamics of
implied by the spectral estimates. Also when the true model has an infinite order VAR representation, OLS projections of on a finite number of lags are well defined in the sense of satisfying the projection equations (8) for , but the residuals
are not . In general, the residuals follow a
moving average representation:
Note: Each panel shows an element of for different lags . is computed from population moments of a VAR(1) applied to the model economy of Section 2 for three different percentages of the bandpass-filtered variability in output explained by technology shocks. A technology share of 67.5% (solid line) corresponds to the maximum likelihood estimates of CEV. |
CKM and CEV discuss a truncation bias which is hard to detect based on VAR lag-length selection procedures. In terms of the moving average , their results can be read as finding but . This can also be illustrated in the model economy described in Section 2. Figure 2 plots the population values of the cumulated sums when for different calibrations of the share of fluctuations in output explained by technology shocks. (Results are similar for other values of .) At each lag, the increments are small and close to zero, but summing over many lags leads to .
Many moving average representations can be consistent with a given spectrum. But only one of them is invertible. As will be shown next, is invertible and can be uniquely identified with a spectral factorization of .
It is straightforward to recover from with a spectral factorization. The "canonical spectral factorization" is a classic theorem in linear quadratic control, assuring existence and uniqueness of an invertible and a positive definite consistent with (17).
The theorem factors a spectrum constructed from a finite number of autocovariances into a finite-order MA. For an empirical application, a finite has of course to be chosen. But when applying the spectral factorization to the population objects of the true model (1), it remains to consider that is in general an MA(). However, since the processes for and are stationary, their autocovariances and MA-coefficients vanish for large lags (Hamilton 1994, Chapter 3.A). Analogous to Sims (1972), a spectral factorization with an arbitrarily large but finite can arbitrarily well approximate the true spectrum and true . Alternatively the true can be thought of as being the limit of applying Theorem 3.4 to an ever increasing sequence of 's.
For a correct identification of the structural shocks, the true impact coefficients (6) can be written in terms
and as
CEV construct according to (19) while using the spectral estimate . But they ignore the residual dynamics captured by in (20) when mapping back into the impact coefficients. As illustrated in Figure 2, is typically not a diagonal matrix in the model economy, far from equal to the identity matrix. Ignoring the residual captured by is the source of the variance misrepresentation discussed in the previous subsection.
To combine VAR coefficients and spectral estimates in an internally consistent fashion, a spectral factorization must be used. The spectral factorization of
yields a unique and invertible MA(),
denoted
, and an innovations variance matrix
. The superscript "SF-AM" indicates that these are calculated from the residual spectrum employed by the Andrews-Monahan estimator
. Impact coefficients are then
Sayed and Kailath (2001) survey a number of different algorithms for performing spectral factorizations. The computations reported here use a reliable and efficient algorithm from Li (2005), based on a state space representation of the moving average process of . (Details are given in Appendix A.)
In contrast to the CEV procedure, the spectral factorization is consistent with the variance of the data, in sample as well as in population.
A spectral factorization can also be applied directly to the Newey-West estimate of the data's spectrum,
, yielding coefficients for the VMA of ,
and innovation variance
. Following (6), impact coefficients and impulse responses can then be computed as
These impulse responses do not involve any VAR coefficients. Analogously to Proposition 3.4, their construction preserves the variance of the data.
The previous section described several schemes for imposing the long-run restriction (4) on the data. The conventional method, going back to Blanchard and Quah (1989), uses OLS estimates of a VAR. The recently proposed procedure of CEV combines this with a non-parametric estimate of the spectral density at frequency zero. This procedure has been criticized above for its lack of internal consistency. Finally, this paper proposed a new method, combining OLS estimates and spectral estimators in an internally consistent way. This method relies on a spectral factorization ("SF") to uncover the dynamics implied by the non-parametric spectral estimators.
These procedures are applied here to data simulated from the model economy described in Section 2. The same data generating process has also been used by CEV and CKM.24 For the CEV and SF methods, there are two variants depending on whether the spectral estimators of Newey and West (1987) or Andrews and Monahan (1992) are used. This section reports results for both.
Mimicking conditions faced by empirical researchers, "small" samples with 180 observations are simulated. In small sample, two distinct issues arise. First, there is truncation bias in VARs and spectral estimators arising from the need to specify a finite lag length , respectively a finite bandwidth . As discussed in Section 2, lag length is determined individually for each draw with an information criterion and spectral bandwidth is fixed at 150. In addition, alternative results are reported using the bandwidth selection procedure of Newey and West (1994) for Newey-West spectra. (See Section 3.2 for further discussion of bandwidth selection.)
Second, there is the small sample bias in estimated parameters known from Hurwicz (1950).25 To isolate the pure truncation effects from the Hurwicz bias, the identification procedures are not only applied to simulated data, but also to VARs and spectral estimates constructed from the model's true population moments.26
The procedures are evaluated in terms of of their capability to uncover two statistics typically of interest to applied researchers. Following CKM and CEV, estimates of the response of hours to a technology shock are computed. For brevity, the discussion is limited on impact coefficients , since all methods compute impulse responses from and their estimates of (except when factorizing ). In addition, the share of fluctuations in output and hours due to technology shocks is estimated. As it is typical in the business cycle literature, these shares are computed after filtering out any fluctuations which do not correspond to cycles with a duration between two-and-a-half and eight years.27 Two criteria are reported to assess the goodness of estimates: Bias and Root Mean Square Error (RMSE), both expressed as percentages relative to the true value known from the model.28
The results show that all procedures are subject to substantial truncation and small biases and none works like a panacea. Different methods display different strengths and weaknesses. The claims by Christiano, Eichenbaum, and Vigfusson (2006a) of "smaller bias, smaller means square error" associated with their procedure do neither generalize to a wider range of model calibrations nor do they extend from the estimation of impact responses to variance shares.
Note: Percentage points relative the model's true impact response of hours to a technology shock. Top row based on Andrews-Monahan estimator for CEV and SF, bottom row using Newey-West estimators; both with a fixed bandwidth of . "Technology share" on the x-axis is the percentage of output variability due to technology shocks at business cycle frequencies (cycles with durations between two-and-a-half and eight years) in the data generating process. |
Effects from the truncation and the small sample bias can offset each other. This is the case when estimating the impact of technology on hours. The left column in Figure 3 shows how impact responses are overestimated in population whereas the simulated bias shown in the middle column of the figure is lower (more negative). This simulated bias displays the total effect from truncation and Hurwicz bias. The OLS method has the largest population bias and it is only partially offset by the Hurwicz bias. The two spectral methods suffer from substantially smaller truncation bias, and depending on the simulated importance of technology shocks, the total bias can be either negative or positive. Coincidentally, the upwards bias in SF-AM and SF-NW is exactly offset around technology shares of about two thirds, corresponding to the range of MLE estimates of CEV and CKM for U.S. data. (Similarly for CEV-AM, but not CEV-NW.) However, results are different for other calibrations of the technology share, which cautions strongly against extrapolating from a particular result to different data sets and different applications.
Unless the true share of technology shocks is very large, the RMSE of estimated impact coefficients are very large, often surpassing more than 100% of the true value. Interestingly, the RMSE do not differ much across the different methods, as can be seen in the right-most column of Figure 3. If anything, SF-NW is outperforming CEV-NW on bias, at the expense of a worse RMSE. This is likely due to an overfitting of the residual dynamics by SF-NW.
Note: Percentage points relative the model's true technology share. Top row reports bias and RMSE for variance decomposition of output, bottom row for hours. Andrews-Monahan estimators of the spectral density used for CEV and SF with a fixed bandwidth of . (Similar results reported for Newey-West in Figure A.1). "Technology share" on the x-axis is the percentage of output variability due to technology shocks at business cycle frequencies (cycles with durations between two-and-a-half and eight years) in the data generating process. |
Turning to the estimated variance shares of output and hours shown in Figure 4, the relative performance of the various methods looks quite different. The panels in the top row of the figure show bias and RMSE for variance decompositions of output, the bottom row for variance decompositions of hours. For this figure, spectral densities have been estimated with the Andrews-Monahan estimator. Results are broadly similar when using the Newey-West estimator (see Figure A.1 in the separate appendix with additional results.)
Strikingly, for technology shares in output, bias and RMSE are very similar when using either OLS or CEV. The mismatch in total variance discussed in Section 3.3, does not seem to distort the computations of relative variance measures in this case. But, the two methods differ when decomposing the variance of hours. Bias and RMSE in the variance decomposition of hours are an order of magnitude larger than for output, cautioning very strongly against neglecting small sample issues when comparing SVAR estimates against model predictions. Moreover, the variance decompositions of hours provide a useful counterexample against disregarding OLS methods altogether, since OLS dominates the spectral methods both in terms of simulated bias and RMSE for all calibrations considered here. All in all, these results underline how truncation and Hurwicz bias interact with the different methods in ways which are hard to anticipate for an empirical researcher who does not know the true dynamics of the data.
The results presented in Figures 3 and 4 are based on a large and fixed spectral bandwidth of . A separate appendix with additional results shows that the results are similar for Newey-West spectra when their bandwidth has been chosen by the automatic bandwidth selection procedure of Newey and West (1994). Compared to the case of a large and fixed bandwidth, only two differences stand out. Estimating technology shares from a direct factorization of the Newey West spectrum perform worse compared to the large bandwidth case, both in terms of bias and RMSE, unless technology accounts for less than two thirds of business cycle fluctuations in output (see Figure A.1 in the appendix). Furthermore, the RMSE of impact coefficients estimated with CEV-NW is almost flat at around two thirds of the true value, independently of the true technology share (Figure A.2). Applying the automatic bandwidth selection for the residual spectra of the Andrews-Monahan estimator yields bandwidths close to zero, such that the results are mostly indistinguishable from the OLS estimates (Figure A.3).
In finite sample, truncation bias and Hurwicz bias pose fundamental problems when identifying structural shocks from restrictions on the long-run behavior of the data. These issues are present in the time domain when working with a VAR, as well as in the frequency domain when working with spectral estimators. Basically, the same estimates of the data's autocovariances are employed for constructing non-parametric estimates of the spectrum as well as for computing OLS coefficients. In both cases, truncation bias arises since there are only as many sample autocovariances as there are data points. And due to the Hurwicz bias, variance estimates tend to be biased downwards the smaller the sample and the larger the persistence of the data--again affecting both OLS estimates of VAR coefficients as well as non-parametric estimates of the spectral density.
Thus, spectral estimates offer no panacea against the truncation and small sample problems known from OLS. At best, by allowing for additional dynamics, they might improve upon OLS in terms of bias, but by overfitting the data, this comes at the expense of increasing RMSE.
The performance of different estimators appears to be very specific to the underlying model and its calibration, making it hard to predict, which procedure would do well in future applications using new data. Even for a given calibration, when a method performs better in terms of one model statistic, say impact coefficients, this does not necessarily translate into better performance for another statistic, like a variance share. Going forward, it would be more suitable to compare SVAR estimates (from any procedure), against the small sample predictions, not the true moments, of a specific model as in Cogley and Nason (1995), Kehoe (2006), Dupaigne, Feve, and Matheron (2007) and Dupaigne and Feve (2009).
Spectral factorization has a long tradition in the fields of linear quadratic control, robust estimation and control as surveyed for example by Whittle (1996).29 Theorem 3.4 has been adapted from Hannan (1970, p. 66). The original theorem allows for unit roots in . The version stated above has been slightly strengthened by excluding the case of zero power in the spectral density at zero-frequency, to ensure the invertibility of the MA().30
In the context of this paper, will be the spectral density of where . We will be using non-parametric estimates of based on weighted sums of the sample autocovariance function as described in Section 3.2.31
Theorem 3.4 requires to be non-singular. This can be understood as requiring that the autocovariances need to decay sufficiently fast in relation to the number of MA lags. For example, in the scalar case and with , the first-order autocorrelation to be matched with a MA(1) cannot be larger than 0.5 in absolute value.32
Algorithms for implementing the factorization go back to Whittle (1963) and have recently been surveyed by Sayed and Kailath (2001). The simulations reported here use the algorithm of Li (2005), which is based on a state space representation of and performed very reliably.33 The remainder of this appendix describes the algorithm in more detail.
Suppose follows an MA() as above. To represent it in a state space system,
define the state vector
where is the entire history of realizations of up to time . Li then constructs the following state space system
where and are the identity matrix, respectively the zero matrix.
What is needed is a mapping from the autocovariances of , , to the state space objects. The objects of interest are the matrix containing the stacked MA coefficients as well as the variance of the innovations process. To obtain this mapping, it is useful to stack the autocovariances into a matrix
Li (2005), Theorem 2 shows that the variance-covariance matrix of the states solves the Riccati equation and that the MA() coefficients can be recovered as and . As shown by Li (2005), the above Riccati equation can be solved recursively, starting from and iterating over since and .
At the end of each factorization computed for this paper, it has been verified that the factorization produces an invertible MA() polynomial, which matches the original spectral density. In all simulations, this held up to machine accuracy.
This section outlines how to derive the following: First, values from the lab economy for true VAR objects like , , , and the autocovariances of . Second, population coefficients of finite-order VARs implied by the lab economy.34
The linearized solution to the lab economy described in Section 2 yields a state space model for labor productivity growth and hours
State vector and shock vector are:
where is the log-deviation of detrended capital from its steady state, and
are the labor wedge and the growth rate in technology. ( includes
also lagged variables due to the presence of labor productivity growth in .)
The computation of the matrices , and is straightforward, please see CKM for a detailed presentation.
True VAR objects The decomposition in section 4 uses the following objects of the true process: , , as well as the autocovariances of . Their computation from the state space is straightforward since true impulse responses and spectrum are given by and . The impact coefficients are apparent from (21). Recalling equation (2), this also pins down the covariance matrix of the forecast errors .
In order to map forecast errors into structural shocks, must obviously be square and invertible. Furthermore, Fernandez-Villaverde, Rubio-Ramirez, and Sargent (2005) show that invertibility requires the eigenvalues of to be strictly less than one in modulus, which is satisfied for all calibrations considered here.
The non-structural moving average representation of is . From (3), the coefficients of the non-structural VAR() representation of the model can be obtained by inverting this moving average, yielding .
The autocovariances can be directly computed from the state space model. The covariance matrix of the states is obtained as the solution to a discrete Lyapunov equation: and the autocovariances of are .
VAR() coefficients in population Finite-order VAR() can be computed as
projections of on a finite number of its past values,
. In line with the notation of the main text, population coefficients of a VAR() are denoted with a superscript "".
The coefficients of the lag polynomial
solve the OLS normal equations
which are evaluated using the autocovariance matrices of whose computations are described in the preceding paragraph. For instance if ,
. Detailed formulas for higher VARs can be found in Fernandez-Villaverde, Rubio-Ramirez, and Sargent (2005).
Chari, Kehoe, and McGrattan (2005), Proposition 1 show that the VAR representation of in the model is of infinite order and residuals from a VAR() will not be martingales. By construction, the projection residuals are orthogonal to , ..., , but they are not orthogonal to the complete history of . The moving average representation of the forecast errors is easily constructed from .
Variance equation Even though the VAR() residuals
are not , the usual variance equation is still
applicable. For notational convenience, take the case of a VAR(1),
. The normal equations imply
The second line is obtained by recursive substitution of and the third line follows from the construction of moving-average coefficients of a VAR(1),
. The argument is easily extended to VARs with higher lag orders by using their companion form.
ADDITIONAL RESULTS
Note: Percentage points relative the model's true technology share. Top row reports bias and RMSE for variance decomposition of output, bottom row for hours. Newey-West estimators of the spectral density used for CEV and SF with a fixed bandwidth of . "Technology share" on the x-axis is the percentage of output variability due to technology shocks at business cycle frequencies (cycles with durations between two-and-a-half and eight years) in the data generating process. |
Note: Estimated impact responses of hours to technology (top row) and technology share (bottom row) when using the Newey-West estimator with automatic bandwidth selection (Newey and West, 1994). "Technology share" on the x-axis is the percentage of output variability due to technology shocks at business cycle frequencies (cycles with durations between two-and-a-half and eight years) in the data generating process. |
Note: Estimated impact responses of hours to technology (top row) and technology share (bottom row) when using the Andrews-Monahan estimator with automatic bandwidth selection (Newey and West, 1994). "Technology share" on the x-axis is the percentage of output variability due to technology shocks at business cycle frequencies (cycles with durations between two-and-a-half and eight years) in the data generating process. |