Data‐Driven Identification Constraints for DSGE Models

We propose imposing data&#8208;driven identification constraints to alleviate the multimodality problem arising in the estimation of poorly identified dynamic stochastic general equilibrium models under non&#8208;informative prior distributions. We also devise an iterative procedure based on the posterior density of the parameters for finding these constraints. An empirical application to the Smets and Wouters ([Smets, F., 2007]) model demonstrates the properties of the estimation method, and shows how the problem of multimodal posterior distributions caused by parameter redundancy is eliminated by identification constraints. Out&#8208;of&#8208;sample forecast comparisons as well as Bayes factors lend support to the constrained model.


I. Introduction
Advances in Bayesian simulation methods have recently facilitated the estimation of relatively large-scale dynamic stochastic general equilibrium (DSGE) models. However, when using the commonly employed random walk Metropolis-Hastings (RWMH) algorithm, typically relatively tight prior distributions have to be assumed to tackle flat and multimodal posterior distributions arising from weak identification in these models (see e.g. Koop, Pesaran and Smith, 2013 and the references therein). This has the unfortunate consequence that the resulting posterior distributions may not have much to say about how well the structural model fits the data, but the priors are likely to be driving the results, which precludes us from learning about the parameters of the model from the data.
Under less informative priors, one potential solution to the problem of weak identification is offered by the so-called data-driven identifiability constraints put forth in the statistics literature (see Frühwirth-Schnatter, 2001) but, to the best of our knowledge, not applied to DSGE models. Such constraints can be found by inspection of the output of JEL Classification numbers: C11, C32, C52, D58. *We would like to thank Francesco Zanetti (the Editor), and an anonymous referee for useful comments. Financial support from the Academy of Finland (grants 268454 and 308628) is gratefully acknowledged. The first author also acknowledges financial support from CREATES (DNRF78) funded by the Danish National Research Foundation, while the second author is grateful for financial support from the Research Funds of the University of Helsinki. Part of this research was done while the second author was visiting the Bank of Finland, whose hospitality is gratefully acknowledged. the posterior distribution. Subsequently, the restricted and unrestricted models may be compared to check the validity of the constraints. For instance, if two parameters seem to be weakly identified and always take an equal value with high probability, a model, where their equality is imposed, might be preferable. In practice, the constraints are set in an iterative procedure, where at each stage the posterior distribution of the parameters is inspected to find additional constraints, whose validity is then assessed by means of, say, Bayes factors and improvement in estimation accuracy. The iteration continues until no further acceptable constraints or signs of weak identification can be found.
It is important to distinguish our approach from specification searches (see e.g. Leamer, 1978), where the goal is to find the 'true'model, or to simplify or improve the current model. While our constraints may also lead to a simpler and more easily interpretable model, the ultimate objective is to alleviate problems arising from lack of identifiability. In other words, we take a DSGE model as given, but acknowledge the fact that these models tend to be poorly identified and, therefore, try to find constraints respecting the geometry of the posteriors to facilitate identification. In addition to improvements in estimation accuracy and probabilistic forecasts due to improved identification, data-driven identifiability constraints are indeed also likely to facilitate interpretation.
Our approach calls for an efficient estimation method that is capable of handling multimodality likely to be encountered at least in the unrestricted DSGE model. Such a procedure has recently been suggested by Herbst and Schorfheide (2014), who employed an adaptive sequential Monte Carlo (SMC) algorithm to estimate the Smets and Wouters (2007) model (SW model hereafter) based on relatively loose priors (see also Creal, 2007;Chib and Ramamurthy, 2010). Once the final model has been obtained, it is important to ensure that it has been accurately estimated (and in case of obvious inaccuracy, some of the constraints may be relaxed and alternative constraints entertained). To that end, Herbst and Schorfheide suggested running the SMC algorithm multiple times to obtain an approximation of the asymptotic variances of the parameter estimates. This is, unfortunately, computationally very costly in the case of a complex high-dimensional DSGE model, and, therefore, in practice only few runs (20 in Herbst and Schorfheide, 2014) are feasible, yielding a very imprecise measure of estimation accuracy.
To facilitate assessment of estimation accuracy, we propose to augment the SMC algorithm with a non-sequential importance sampling (IS) step, which has the advantage that numerical standard errors can be readily calculated without burdensome simulations. Moreover, convergence results for non-sequential IS are available (see Geweke (2005, Theorem 4.2.2), while the asymptotic properties of the adaptive SMC algorithm are not necessarily known. Hence, in addition to being computationally feasible in assessing the accuracy of the estimates, our procedure is theoretically well motivated. Of course, these two approaches are not substitutes. In particular, it may be a good idea to run the SMC algorithm a few times before the IS step to ensure that the SMC algorithm has visited the entire posterior. This is important because the IS step relies on the SMC approximation.
We estimate the SW model on the same data set as Smets and Wouters (2007) and with diffuse priors that are slightly different from those assumed by Herbst and Schorfheide (2014). Our augmented SMC method yields very accurate estimates similar to those of Herbst and Schorfheide (2014) based on both the RWMH and SMC methods and diffuse priors, but very different from those of Smets and Wouters (2007) based on tight prior distributions. We also estimate the model with the tight priors assumed by Smets and Wouters (2007), and find accurate posterior estimates very close to theirs. Comparison of the estimates based on the informative and uninformative priors reveals that the results of Smets and Wouters (2007) are indeed strongly driven by their informative prior distributions.
Closer inspection of the posterior densities of the parameters of the SW model suggests redundancy in the parametrization of the SW model. In particular, restricting the ARMA wage markup shock to a white noise process leads to an improved model, and we are able to conclude that the persistence of wages is driven by the endogenous instead of exogenous dynamics. In other words, by imposing these constraints accepted by the data, we can rule out one of the two modes, implying different wage dynamics, also detected by Herbst and Schorfheide (2014), whose relative importance they found difficult to assess. Moreover, imposing the data-driven identification constraints improves estimation accuracy, and also forecast comparisons lend support to the restricted model.
In the next section, we discuss the SMC algorithm and explain, how any such algorithm can be augmented with an IS step. The estimation results of the SW model are reported in section 'Estimation results', where they are also compared to the corresponding results in Smets and Wouters (2007), and Herbst and Schorfheide (2014). The data-driven identification constraints and the procedure for finding them, as well as the related empirical results are the topics of section III. Forecasting with DSGE models and computing log predictive densities that can be used to rank competing models, are discussed in section IV, while in section 'Forecast comparison', we report the results of forecast comparison. Finally, section V concludes.

II. Sequential Monte Carlo estimation
The likelihood function of any DSGE model is a complicated nonlinear function of the vector of its structural parameters , and, hence, potentially badly behaved. This complicates the estimation of the posterior distribution of the parameters and, if not properly handled, makes commonly used approaches such as the RWMH algorithm ill-suited for this purpose. One solution is to assume tight prior distributions, but as discussed in the Introduction, this may be problematic for a number of reasons. Therefore, it might be advisable to employ less informative priors, but this, of course calls for an appropriate estimation method.
Recently, Herbst and Schorfheide (2014) have proposed using a SMC algorithm for estimating DSGE models under non-informative priors (see also Creal, 2007 for estimating DSGE models by SMC methods). By virtue of their construction, these methods have been found to perform well for badly behaved posterior distributions (for further evidence on this, see e.g. Del Moral, Doucet and Jasra, 2006;Jasra, Stephens and Holmes, 2007;Durham and Geweke, 2014).
While SMC methods are well suited for estimating DSGE models, they involve the problem that the assessment of the accuracy of estimation is difficult because of the complexity of the asymptotic variances. One solution, applied by Herbst and Schorfheide (2014), among others, is to estimate the asymptotic variance from multiple runs of the SMC algorithm. While the time cost of this approach may be relatively small for simple univariate models, whose likelihoods can be readily calculated in parallel using graphics processing units, it can be unreasonably high for complex high dimensional models such as the SW model. For this reason, numerical standard errors, needed to assess the quality of the posterior densities on which we base our search for identification constraints, have been based on a relatively small number of runs and are thus likely to be very inaccurate (for instance, Herbst and Schorfheide, 2014 only run the algorithm 20 times).

Importance sampling step
As discussed in the Introduction, in order to facilitate convenient computation of the numerical standard errors of the estimates, we propose to conclude the SMC run by nonsequential IS. In other words, given the estimates produced by the SMC algorithm, a final IS step is performed to obtain an approximation of the asymptotic variance of these estimates. The idea is to approximate the posterior distribution by a mixture of Student's t-distributions. This approximation is then efficiently used as the IS density. The proposed procedure used in the final IS step closely resembles that of Hoogerheide, Opschoor and van Dijk (2012), and it delivers the efficient mixture components by minimizing the Kullback-Leibler divergence from the target posterior, which is approximated by the simulated particles. As IS algorithms are highly parallelizable, the time cost of the proposed procedure is marginal compared to a single SMC run. It is important to notice that we propose to use non-sequential IS as a complement to SMC methods mainly because assessing the numerical accuracy of SMC estimates is challenging. Indeed, it is our experience that adaptive SMC algorithms are able to deliver reliable approximations of even very pathological posterior distributions also in cases where no formal asymptotic convergence result is available.
Any SMC algorithm can be augmented by an IS step. For a description of the SMC algorithm as applied to a DSGE model, including the SW model, we refer the interested reader to the Appendix, or, e.g. Herbst and Schorfheide (2014). Here, we only explain how to construct a mixture of Student's t-distributions approximation to the target posterior distribution n ( ) (for data up to any period n ∈ {1,…, T }), from its particle approximation . This mixture density is then used to calculate the IS estimates of the posterior quantities of interest. The proposed procedure closely resembles that of Hoogerheide et al. (2012), and we refer to their paper for a more detailed discussion (see also Cappé et al., 2008).
The posterior distribution of the parameters of interest is approximated by a mixture of J multivariate t distributions: where t k ( | j , V j ; j ) is the density function of the k-variate t distribution with mode j , (positive definite) scale matrix V j , and degrees of freedom j , = ( 1 ,…, J , vech(V 1 ) ,…, vech(V J ) , 1 ,…, J , 1 ,…, J ) , and the mixing probabilities j sum to unity.
To obtain an efficient IS density that enables us to accurately approximate the posterior, we minimize the Kullback-Leibler divergence n ( ) log n ( ) f ( | ) d between the target posterior distribution n ( ) and f ( | ) in equation (1) with respect to . Because the elements of vector do not enter the posterior density n ( ), this is equivalent to maximizing where E denotes the expectation with respect to the posterior distribution n ( ). A simulation-consistent estimate of expression (2) is given by where the particle approximation { i } N i=1 is taken as a sample from the posterior distribution n ( ). Following Hoogerheide et al. (2012), we use the expectation-maximization (EM) algorithm to maximize (3) with respect to the parameters of the mixture distribution in equation (1). Once the IS density has been obtained, it can be used to estimate any posterior quantity of interest, and the associated numerical standard errors (these are estimated by the standard formula). Of course, standard asymptotic results apply to these estimators (see e.g. Geweke, 2005, p. 114). Hoogerheide et al. (2012) estimate the parameters in equation (1) by maximizing a weighted variant of equation (3) (where the weights are standard importance weights) in their bottom-up procedure, which iteratively adds components to the mixture (1), starting with one multivariate t distribution. Conversely, we start with a reasonably large number of distributions and remove the (nearly) singular ones (i.e. those with (nearly) singular covariance matrices and very small probability weights). This can be done because the particle approximation { i } N i=1 provides a very accurate description of the posterior distribution. In other words, in our case, it is sufficient to represent the information in { i } N i=1 in terms of the mixture distribution in equation (1). This also means that we do not have to calculate importance weights, when estimating the parameters in equation (1). In our experience, a good approximation even for a pathological posterior can be obtained by setting the number of components J in the mixture distribution (1) sufficiently large. However, the estimates of the scale matrices V j may become imprecise for some j for unreasonably large J . In the empirical application in section 'Estimation results', we set J = 20.
It is worth noting that the EM algorithm may be sensitive to the starting values of . To solve the problem, we partition the particle approximation { i } N i=1 into J clusters by an agglomerative hierarchical clustering algorithm (see e.g. Everitt et al., 2011), and then use the sample mean and covariance matrix of the particles in the jth cluster as initial values for j and V j (j ∈ 1,…, J ). Prior to clustering, the particle approximation { i } N i=1 is normalized and orthogonalized such that the elements of have zero means and unit variances, and are uncorrelated. The distance between two normalized and orthogonalized particles is measured by the Euclidean distance between them. Moreover, the distance between two clusters is measured by Ward's measure. A good initial value for the mixing probability j can be obtained by dividing the number of points in the jth cluster by N. This procedure tends to be quick and very reliable.

Estimation results
We estimate the SW model on the same quarterly data set (from 1970:1-2004:4) as Smets and Wouters (2007) using an SMC algorithm augmented with a non-sequential IS step,  (2007), and the RWMH and SMC estimates of Herbst and Schorfheide (2014). The SMC algorithm is described in the Appendix, where information concerning its tuning parameters is also provided. To check whether the SMC algorithm has visited the entire posterior, we ran it for each DSGE model several times before the IS step. As already discussed in section 'Importance sampling step', the SMC method indeed turned out to deliver reliable approximations. While Smets and Wouters (2007) assumed tight prior distributions, in estimation by SMC methods less informative priors are entertained. In particular, for the parameters that are defined on the unit interval, we consider logit transformations and assume univariate normal prior distributions for the transformed parameters. We operate on transformed parameters in order to enhance the performance of the sampler. In contrast, Herbst and Schorfheide (2014) operate on the original parameters and assume uniform prior distributions for the parameters on the unit interval. In line with Smets and Wouters (2007), we assume inverse-gamma prior distributions for the standard deviations of the innovations of the structural shocks. However, we set the prior hyperparameters such that the prior means and variances equal 0.5 and 1, respectively, instead of the values 0.1 and 4 that they considered. The differences between the priors are clearly visible in the density plots depicted in Figure 1. More than 90% of the probability mass of the prior distribution of Smets and Wouters (2007) lies below 0.2, which suggests rather small standard deviations for the innovations. It is also worth noting that this prior distribution has a relatively large variance, 4, because of its very long but thin right tail. Our prior has a smaller variance, 1, but it is considerably less leptokurtic than theirs. As far as the other parameters of the model are concerned, we follow Herbst and Schorfheide (2014) in scaling the associated prior variances of Smets and Wouters (2007) by a factor of three. Our diffuse priors are described in detail in Table 1. In Table 2, we report the estimation results based on our procedure with diffuse priors. Here, as well as in all subsequent estimations, we have run the SMC algorithm several times before the IS step to make sure that it has visited the entire posterior. The posterior means, their standard deviations and the 5th and 95th percentiles of the posterior densities of all parameters lie very close to those that Herbst and Schorfheide (2014) obtained by both the RWMH and SMC methods assuming their diffuse priors, but different from those of Smets and Wouters (2007) based on tight prior distributions. The numerical standard errors of the posterior means of the parameters reported in the rightmost column of Table 2 are remarkably small, indicating very accurate estimation, which is reassuring from and SD(mean) contain the mean, the 5th and 95th percentiles, and standard deviation of the posterior distribution, and the numerical standard error of the mean of the respective parameter, respectively. Following Herbst and Schorfheide (2014), we use 12,000 particles (i.e. N = 12, 000). The number of components J in equation (1) is set to 20, which should ensure that the resulting importance distribution provides an accurate approximation to the target posterior. The parameters of equation (1) are obtained using 500 iteration rounds in the EM algorithm. The IS results are based on 400,000 draws from equation (1).
the viewpoint of searching for the identification constraints. They are also much smaller than those reported by Herbst and Schorfheide (2014), which (given the similarity of their and our posterior distributions) most likely reflects the poor quality of their asymptotic variance estimates based on only 20 simulations. Hence, our results suggest that concluding the SMC run by non-sequential IS is worthwhile as far as assessing the accuracy of posterior estimates is concerned. For comparison, we also estimated the SW model using our procedure augmented with the IS step assuming the informative priors of Smets and Wouters (2007). The posterior estimates are very close to those in Smets and Wouters (2007), with very small standard errors, indicating great estimation accuracy (see Table A1 in the online appendix). The marginal posteriors of Smets and Wouters (2007) also seemed to be strongly driven by their informative prior distributions such that the posteriors tended to very closely resemble the priors.

III. Identification
Dynamic stochastic general equilibrium models (including the SW model that we consider in this paper) tend to be poorly identified (see Koop et al., 2013 and the references therein). When taking them to data, the Monte Carlo (MC) output of the posterior distribution should, therefore, be systematically analysed for the information concerning the identification of the parameters of the underlying structural model. To this end, we recommend first visually inspecting the marginal posterior densities of the parameters for bad behaviour indicating lack of identification (such as multimodality, or flatness over a wide range of relevant parameter values even if there is only one mode). Once the badly behaved parameters have been singled out, the next step is to analyse the bivariate posterior density plots of all pairs of these parameters. In our experience, an efficient estimation algorithm produces dense areas in the density plots that may reveal to what extent the associated structural parameters are identified. As will be seen, this information can then be useful in deriving so-called data-driven identifiability constraints on the parameters of the DSGE model (see Frühwirth-Schnatter, 2001).
Because constraints respect the geometry of the data-driven posterior distribution, they help us to learn about the parameters from the data, and are likely to improve the quality of probabilistic forecasts. Moreover, they may be useful in alleviating a potential multimodality problem, 1 and they are also likely to improve estimation accuracy. After estimating the restricted model, the procedure may be repeated until the data suggests no further identification constraints. Finally, the numerical standard errors are computed by the non-sequential IS step. For example, in the case of a multimodal posterior density, we may be able to rule out some of the modes (subspaces) by comparing different restricted and unrestricted models using, say, Bayes Factors. This procedure is demonstrated below.
Inspection of the marginal posterior densities of the parameters reveals six badly behaved parameters that govern wage ( w , w , w , w ) and price ( p , p ) stickiness (see Figure 2; the marginal posterior densities of the remaining parameters are depicted in 1 Lack of identification may give rise to multimodal posterior densities of the parameters. This may, for instance, be the case if the data are unable to distinguish between the role of a DSGE model's external and internal propagation mechanisms, as pointed out by Herbst and Schorfheide (2015, Ch. 4).  (2014) found multimodal features in the posterior distributions of these parameters. Next, the bivariate density plots of all combinations of the marginal posteriors of these parameters are inspected to find out about the identification of these parameters. These plots are depicted in Figures A3-A5 in the online appendix, and in Figures 3 and 4. All of these plots indicate weak identification. As outlined in the Introduction, we continue by iteratively imposing identifiability constraints, one at a time, until no further acceptable constraints or signs of weak identification can be found. The order in which the constraints are imposed, may matter for the choice of the final model, and in some cases, it may be a good idea to experiment with a number of different orderings. However, it is advisable to confine oneself to constraints that have a clear interpretation, simplify the model, and can be feasibly imposed in practice. Among the constraints suggested by the bivariate plots, the one involving the ARMA parameters w and w of the wage mark-up shock Figure 3 satisfies the criteria listed above. In particular, these parameters take almost equal values with high probability, giving rise to potential for redundant parameterization. Given the properties of the ARMA model, with w ≈ w , the wage markup shock might be better described as a white noise process (with w and w restricted to zero).
In addition, we consider two alternative constraints, namely, w = 0, and w = 0 separately. The Bayes factors based on the one-step-ahead log predictive densities in Table 4 slightly favour the model with only w restricted to zero over the model with w = w = 0. 2 However, because the estimate of w lies very close to zero in the former model, we proceed with the model involving both restrictions, which affords a clearer interpretation (see the discussion at the end of this section).
It is worth noting that all three constraints on w and w produce well-behaved unimodal marginal posteriors for the parameters of the model with the exception of p and p in the process of the price mark-up shock.Their bivariate posterior density plot for the unrestricted model in Figure 4 reveals two separate modes. The dominating mode ( p ≈ 0.85, p ≈ 0.14) lies close to the one in Smets and Wouters (2007), while the other mode (with a greater value of p ) suggests that the moving average part of the price mark-up shock is almost non-invertible ( p has about 23% of its probability mass very close to unity). In order to restrict attention only to the neighbourhood of the dominating mode, we may place different constraints on p or p . For instance, imposing p < 0.19 results in a well-behaved joint posterior distribution for p and p (see Figure 5). The posterior means and standard deviations of the parameters of the restricted model (with w = w = 0 and p < 0.19) are presented in Table 3. Imposing the constraint clearly improves estimation accuracy; all the numerical standard errors are smaller than those of the unrestricted SW model in Table 2, in some cases even substantially so. None of the plots of marginal posterior distributions (see Figure A6 in the online appendix) suggests remaining bad behaviour, and hence, there seems to be no need for further data-driven identification constraints. In particular, no signs of the multimodality problem discussed above can be seen. Of the two modes found by Herbst and Schorfheide (2014), the one implying the endogenous dynamics (related to w and w ) as the driver of the persistence of wages (their Mode 2) remains the only mode by construction since the parameter w and w related to the exogenous wage dynamics are constrained to zero. 3 Moreover, in the model where only w is restricted to zero, the posterior mean of w is small (0.13 with standard error 0.06), which also lends support to the endogenous dynamics as the driver of the persistence of wages. In section IV below, we provide further evidence in favour of the restricted SW model specification.

IV. Forecasting
There is by now a large literature on forecasting with DSGE models (for a recent survey, see Del Negro and Schorfheide, 2013). While forecasting is not the main focus of this paper, we present a number of density forecasting results in order to provide further information on model fit and the usefulness of the data-driven identification constraints imposed on the SW model in section III.
Following Adolfson, Lindé and Villani (2007), and Geweke andAmisano (2011, 2012), among many others, we tackle the problem of assessing the quality of the probabilistic forecasts for a random vector Y n , given its realized value y n using scoring rules. Scoring rules are carefully reviewed in Gneiting and Raftery (2007), and we refer to their paper for a more detailed discussion on the topic. In the following, we briefly describe the scoring rule used in this paper.
Let p n denote a forecaster's predictive distribution. A scoring rule S(y n , p n ) can be considered a reward that the forecaster seeks to maximize. It is said to be strictly proper if the expected score under the distribution of Y n , q n , is maximized by the choice p n = q n . It is further termed local if it depends on the density of the forecaster's predictive distribution p n only through its realized value y n . The logarithmic score S(y n , p n ) = log p n (y n ) is known to be the only scoring rule with these desirable properties, and, therefore, we use it to assess the quality of the probabilistic forecasts (see also Parry, Dawid and Lauritzen, 2012 for a discussion on the so-called order-m proper local scoring rules). In particular, we rank the competing models by the sum of the h-step-ahead log predictive densities 3 It can also be seen from Tables 2 and 3 that the estimates of some key parameters in the restricted model are different from those in the weakly identified model. For instance, in the weakly identified model, the posterior probability that consumption and hours worked are complements is one (i.e. Pr( c > 1|y) = 1; see Smets and Wouters, 2007, p. 589). This result also remains intact irrespective of the priors used (see Tables 2 and A1). However, when the identification constraints are imposed, the data lend strong support to log utility in consumption (i.e. c = 1). In particular, under the constraints, the posterior mean of c in Table 3 is relatively close to one (0.94) and 95% of the posterior mass of c lies between 0.55 and 1.35.
where h 1 is the forecasting horizon, S + 1 is the starting date of the forecast evaluation period, p(y n+h |y 1:n ) is the h-step-ahead predictive likelihood evaluated at the observed y n+h , and y 1:n = (y 1 ,…, y n ). The close connection of LS h to the marginal likelihood, when h = 1 facilitates the interpretation of the forecasting results (see e.g. Kass and Raftery, 1995). To see the connection, write equation (4) It is easy to see that quantity (5) has the same interpretation as the marginal likelihood, if y 1:S is interpreted as a training sample, that is, if p( |y 1:S ) is taken as the prior distribution of (see e.g. Adolfson, Lindé and Villani, 2007;Geweke and Amisano, 2010 for a detailed discussion).
To rank the competing forecasting models by LS h , we need to evaluate the h-stepahead predictive likelihoods p(y n+h |y 1:n ) for each model. Following Warne, Coenen and Christoffel (2017), we calculate these quantities using the IS estimator p(y n+h |y 1: where, under Gaussianity, the conditional likelihood p(y n+h |y 1:n , ) is given by p(y n+h |y 1:n , ) = | n+h|n | −1/ 2 (2 ) p/ 2 exp − 1 2 n+h|n −1 n+h|n n+h|n , n+h|n = y n+h −y n+h|n is the h-step-ahead forecast error, and n+h|n is the h-step-ahead mean squared error of the forecast. The h-period-ahead forecasts y n+h|n with n+h|n are calculated in a standard fashion from the filter estimates of the state variable and the associated state variable covariance matrix, both based on the data y 1:n (see e.g. Hamilton, 1994, p. 385).
The importance densities f ( | ) can be obtained by the procedure described in section 'Importance sampling step'. However, if we are not interested in the numerical standard errors of the estimates of p(y n+h |y 1:n ) (not typically reported), we may also evaluate (6) using the SMC approximation of the posterior distribution of the parameters by setting f ( i ) = n ( i ). At least for the SW model, the SMC and IS approximations of the posterior density of the parameters lie very close to each other. Thus, it may be reasonable to save computing time by estimating (6) using n ( ) as an importance distribution.

Forecast comparison
In order to gauge density forecasting performance, in particular the benefits of allowing for diffuse prior distributions with and without data-driven identification constraints, we compute pseudo-out-of-sample forecasts from a number of models for the period 1970:1 to 2004:4. The forecasts are computed recursively using an expanding data window starting at 1966:1. We consider the forecast horizons of 1, 4, 8, and 12 quarters. As discussed in section IV, we rank the models using the LS h criterion (4). Recall that for h = 1, LS h is very closely connected to the log of the marginal likelihoods (see the discussion preceding (5)), and hence, they can be interpreted using Bayes factors (see e.g. Kass and Raftery, 1995 for a detailed discussion on the Bayes factor).
The results are presented in Table 4. At all forecast horizons, the model estimated under the informative priors of Smets and Wouters (2007) performs the worst, while the model with the diffuse priors and the three identification constraints considered in section III is the clear winner. It is worth pointing out that among the identification constraints, the restriction w = 0 gets the least support, while the restriction w = 0 leads to the secondbest outcome at three out of the four forecast horizons considered. These findings thus lend further support to the identification constraints imposed in section III. 4 At the one-quarter forecast horizon, twice the logarithmic Bayes factor of the model with the diffuse priors against the model estimated under the informative priors of Smets and Wouters (2007) is 62, providing very strong evidence in favour of the model with diffuse priors. According to this measure of model fit, there is also strong evidence in favour of the data-driven identification constraints, with twice the logarithmic Bayes factor of the model with w = w = 0 and p < 0.19 against the unrestricted model with the diffuse priors around 6.6. 5 Our main motivation for conducting the forecast analysis is to ensure the feasibility of the data-driven identifiability constraints, and therefore, we only reported the pseudo-4 Estimation results for inflation and GDP growth are reported in Tables A3 and A4 in the online appendix, respectively. The model involving all three identification constraints clearly turns out to outperform all other models in forecasting the GDP growth. For inflation, the model with only one constraint ( w = 0) is the winner at all but the 12-quarter horizon, where, somewhat surprisingly, the unconstrained model estimated based on the informative prior produces the most accurate forecasts. 5 We also computed forecasts for a somewhat longer period, extending up to 2016:3. The results, reported in Table   A2 in the online appendix, are in line with those in Table 4. It is only at the one-quarter horizon that the model involving the constraint w = w = 0 outperforms the model with the three identification constraints. out-of-sample forecasts covering the estimation period 1970:1-2004:4 above. However, it might be interesting to extend the forecast comparison outside the estimation period to truly examine the forecast performance of the different models estimated under various priors, and to that end, In Table 5, we report results for the period 2005:1-2016:3. When interpreting them, it must be borne in mind that the forecast period is very different from the estimation period, involving, among other things, the recent financial crisis and a period with very low interest rates. Nevertheless, the identifiability constraints turn out quite useful in forecasting. In contrast to the pseudo-out-of-sample results in Table 4, the model involving all three constraints is the winner only at the 8-and 12-quarter horizons, while the constraints w = w = 0 lead to the most accurate forecasts at the one-quarter horizon. It is only at the one-year horizon that the model estimated with a diffuse prior and no constraints performs the best, but, most importantly, the model estimated with the informative prior performs the worst by a clear margin at all horizons. All in all, the forecast results provide supporting evidence in favour of the SW model estimated using the diffuse priors. Moreover, the data-driven identification constraints are, in general, seen to considerably improve forecast accuracy over and above the model with the diffuse priors alone.

V. Conclusion
In this paper, we have proposed the imposition of data-driven identifiability constraints to alleviate the multimodality problem encountered in estimating poorly identified DSGE models. Under diffuse prior distributions, such constraints facilitate interpretation, and improve estimation and forecast accuracy. They can be found by examining the posterior distribution of the parameters of the DSGE model being estimated in an iterative procedure, where constraints accepted by the data are repeatedly imposed until no further reasonable constraints can be found.The constraints are at each step validated by a number of measures, including estimation accuracy and Bayes factors.
Because finding the identifiability constraints is based on inspection of the posterior distribution of the parameters, it should be accurately estimated, and because estimation accuracy is a central measure in validating identifiability constraints, the estimation algorithm should facilitate convenient assessment of estimation accuracy. While the SMC algorithm of Herbst and Schorfheide (2014) is useful in estimating DSGE models under non-informative priors, it is computationally very burdensome in computing estimation accuracy. Therefore, to facilitate the use of data-driven identifiability constraints, we propose augmenting it (or any SMC algorithm) with an IS step, which allows for efficient assessment of estimation accuracy, and whose asymptotic properties are well known.
In empirical analysis of the Smets and Wouters (2007) model on the same data set that they considered, we found their results strongly driven by their informative prior distributions. Assuming diffuse priors, we obtained results similar to those of Herbst and Schorfheide (2014) and were able to show that our estimates are very accurate. Inspection of the posterior distributions gave rise to identification constraints that were strongly supported by the data. In particular, we were able to restrict the wage markup shock to a white noise process. Under this constraint, we could conclude that the persistence of wages is driven by the endogenous dynamics.
The out-of-sample forecast results were encouraging. While we have in this paper concentrated on estimation and used forecast comparisons mostly as a device in model selection, in future work it might be interesting to examine more systematically the usefulness of data-driven identification constraints in forecasting. In particular, comparisons to a wider set of competing models, including the so-called DSGE-VAR models found superior to DSGE models (see Del Negro and Schorfheide, 2013) would be of interest.
parameters of the p.d.f. p(·), involving the distinct elements of the covariance matrix of the Metropolis-Hastings (MH) proposal distribution. The elements of ϑ t are obtained online from the current population of particles.
This algorithm is discussed in Remark 1 of Del Moral et al. (2006), and in Durham and Geweke (2014). The convergence results for the particle system produced by the algorithm are established in Del Moral et al. (2006), where they assume that the sequences { t } t∈L and {ϑ t } t∈L are known. It is important to notice that the particles are mutated after the selection step. One advantage of mutating after resampling is that the accuracy of the particle approximation { In applications, the integers t in equation (A3) must be set in such a way that the importance distributions t−1 ( ) approximate the target posteriors t ( ) reasonably well for all t ∈ {1,…, L} (see the discussion preceding (A4)). Otherwise, the particle approximation may be degenerate (see Del Moral et al., 2006). Obviously, when t = t (i.e. 1 = 1,…, L = T ), i.e. resampling is performed T times, the successive posterior distributions are as close to each other as possible, resulting in potentially good importance distributions. However, this approach is not usually ideal in terms of precision of the particle approximation, because resampling increases the variance of the estimates and reduces the number of distinct particles (see Chopin, 2004;Del Moral, Doucet and Jasra, 2012). Therefore, resampling should be used only when necessary for preventing degeneracy of the particles. Hence, we use the following (adaptive) recursive procedure of Durham and Geweke (2014) to produce 1 ,…, L .At each cycle t ∈ {1,…, L}, conditional on the previous cycles, the posterior density kernel ( |y 1: t ) ∝ ( |y 1: t ) is obtained by introducing new data one observation at time into ( |y 1: t−1 ), until a stopping criterion, based on, say, the effective sample size (ESS) is met. Degeneracy of the particles is usually monitored by the ESS: (A4)). As the ESS takes small values for a degenerate particle approximation, we introduce new data one observation at a time until the ESS drops below a particular threshold, such as N/ 2 (which is also used in the empirical application). The convergence results presented in Del Moral et al. (2012), suggest that equation (A2) holds almost surely also for the particle system generated by this adaptive algorithm.
As to the mutation step, the sampling is performed by the randomized block Metropolis-Hastings (MH) method of Chib and Ramamurthy (2010), where at each MCMC iteration m ∈ {1,…, M }, the parameters j t (j ∈ N, t ∈ 1,…, L) are first randomly clustered into an arbitrary number of blocks, and then simulated one block at a time using an MH step, carried out using a Gaussian random walk proposal distribution for each block of j t . 7 7 We follow the procedure of Chib and Ramamurthy (2010) to obtain the random blocks (1) . Each parameter j, N (l) in turn for l = 2, 3,…, m, is included in the first block with probability (tuning parameter) p (in our empirical application, we set p to 0.85), and used to start a new block with probability (1 − p ). The procedure is repeated until each reshuffled error term is included in one of the blocks.
The covariance matrices of the proposal distributions are constructed from the associated elements of the sample covariance matrix of the current population of the particles, V t,m . The matrix V t,m is further multiplied by an adaptive tuning parameter 0.1 c t,m 1, whose role is to keep the MH acceptance rate at 0.25 (see Durham and Geweke, 2014;Herbst and Schorfheide, 2014). In particular, c t,m is set at c t,m−1 + 0.01 if the acceptance rate is greater than 0.25, and at c t,m−1 − 0.01 otherwise. This procedure is repeated independently for each particle j t until the particles are clearly distinct. Following Durham and Geweke (2014), we use the relative numerical efficiency (RNE) as a measure of particle divergence (see Geweke, 2005, p. 276). We calculate RNEs from predictive likelihoods, and stop mutating particles when the RNE exceeds a given threshold. The maximum number of MCMC iterations M max is set at 50.
In order to assess the quality of the probabilistic forecasts h periods ahead of the structural models of interest, we need p(y n+h |y 1:n ) in equation (6) for all n ∈ S + 1,…, T , while the SMC algorithm described above produces a sequence of distributions t ( ) = ( |y 1: t ) for t ∈ L, where the integers 1 ,…, L−1 ( L = T ) are computed online to optimize the performance of the sampler. To that end, given the sequence of the estimated posteriors t ( ) (t ∈ 1,…, L − 1), we propose to simulate the posteriors n ( ) (n ∈ t + 1,…, t+1 − 1) by the sequential importance resampling algorithm of Rubin (1988). In particular, using t ( ) as the importance density, we obtain the following importance weight functions: w n ( ) ∝w n ( ) = n l= t +1 | l|l−1 | −1/ 2 (2 ) p/ 2 exp − 1 2 l|l−1 −1 l|l−1 l|l−1 , (n ∈ t + 1,…, t+1 − 1) for each t ∈ 1,…, L − 1 (see the discussion preceding (A4)). We then simulate the posterior distributions n ( ) from the particle approximation { j t } j∈N of t ( ), using these weights w n ( j ). The resulting posterior estimates of n ( ) for n ∈ S + 1,…, T , can also be used to sample the joint predictive distributions p(Y n+1 ,…, Y n+h ), as described in Adolfson et al. (2007). In this so-called sampling the future algorithm, for each of the N draws from the posterior distribution of the parameters n ( ), M (in the empirical application, M = 8) future paths of Y n+1 ,…, Y n+h are simulated. This sample of N × M draws from the posterior predictive distribution, can then be used to calculate the posterior quantities of interest, such as point forecasts.

Supporting Information
Additional supporting information may be found in the online version of this article: Table A1. Augmented SMC estimation results of the unrestricted SW model with the informative prior distributions of Smets and Wouters (2007). Table A2. Density forecasting results for 1970:1-2016:3. Table A3. Density forecasting results of inflation for 1970:1-2004:4. Table A4. Density forecasting results of GDP growth for 1970:1-2004:4. Figure A1. Marginal posterior densities of a , b , g , i , r , w , a , b , g , i , r , p , , c and h under the diffuse priors in the unconstrained model. Figure A2. Marginal posterior densities of l , p , p , , , r , , r y , r y , , 100(1/ − 1), l, , ga , and under the diffuse priors in the unconstrained model. Figure A3. Joint posterior densities of ( w , w ), ( w , p ), ( w , w ) and ( w , w ). Figure A4. Joint posterior densities of ( w , w ), ( p , w ), ( p , w ) and ( w , w ). Figure A5. Joint posterior densities of ( p , w ), ( p , w ), ( w , w ), ( w , p ) and ( p , w ). Figure A6. Marginal posterior densities of w , w , p and p under the diffuse priors in the restricted model.