Can a small New Keynesian model of the world economy with risk‐pooling match the facts?

Correspondence Zhirong Ou, Economics Section, Cardiff Business School, Cardiff University, Cardiff, UK. Email: ouz@cardiff.ac.uk Abstract We ask whether a model of the US and Europe trading with the rest of the world can match the facts of world behaviour in a powerful indirect inference test. One version has uncovered interest parity (UIP), the other risk-pooling. Both pass the test but the most probable is risk-pooling. This is consistent with risk-pooling failing a number of single-equation tests, as has been found in past work; we show that these tests will typically reject risk-pooling when it in fact prevails. World economic behaviour under risk-pooling shows much stronger spillovers than under UIP with opposite monetary responses to the exchange rate. We argue that the risk-pooling model therefore demands more attention from policy-makers.


| INTRODUCTION
In this paper, we have a twofold empirical aim: to discover whether a three-country New Keynesian model of the world economy can match world data behaviour and as part of that endeavour whether a risk-pooling variant of that model can also do so. We do this using a testing and estimation approach, indirect inference, that has been found in recent work (see Xu, 2016 andXu, 2018 for comprehensive surveys of this work) to heavily dominate other non-Bayesian methods in the small samples we typically have to deal with in openeconomy macroeconomics. When, as here, there are fundamental questions of what modelling assumptions are appropriate, and the assumptions are the very things we wish to test, Bayesian methods, which rely on generally agreed priors, cannot be used. To anticipate our results, we find that they overturn much of the conventionally believed previous empirical findings on the issues here. It is therefore important for readers to be thoroughly aware of the power of the indirect inference methods we use, even though they are not yet widely familiar among open-economy macroeconomists.
A number of efforts have been made to create a DSGE New Keynesian model of the world economy with several countries, usually the US, Europe and the rest of the world. These models however so far have not been shown to be able to match data behaviour according to the powerful indirect inference test, for example, Chari, Kehoe, and McGrattan (2002), Kollmann et al. (2016) and Le, Meenagh, Minford, and Wickens (2010). These models have been estimated in various ways but the general consensus has been that while some moments can be matched a general matching of such a model to the facts of the world economy is not possible. Yet much success has been reported in matching single economy models to these economies' data behaviour; for example, DSGE models of the United States (e.g.,  and China (e.g., Le, Matthews, Meenagh, Minford, & Xiao, 2014) separately successfully match those economies' facts. We review this previous empirical work on multi-country models below.
On the particular issue of consumer risk-pooling across borders it is generally agreed according to a variety of direct empirical tests that there is no evidence of it or even of a weaker version of it in the form of uncovered interest parity (UIP). Examples are for UIP Delcoure, Barkoulas, Baum, and Chakraborty (2003) and Isard (2006), and for consumption risk-pooling Obstfeld (1989), Backus and Smith (1993), Canova and Ravn (1996), Crucini (1999), Hess and Shin (2000), Razzak (2013), Burnside (2019). The empirical testing in this work has been via predictive tests on the exchange rate based on the single-equation relationship for UIP or regression of the single equation for risk-pooling; co-integration tests are also used. However, there are considerable difficulties with these approaches, which we deal with carefully below. In this paper, we embed these relationships in a full DSGE model and test the model as a whole. Non-rejection implies success of the model in all its parts.
In this paper, we attempt to find a simple New Keynesian model of the world economy where in essence we take the three-equation set-up of Clarida et al. (1999) and extend it to embrace three economies, the United States, Europe and the rest of the world (but this last included only for its trade and not in a complete model since our focus here is on the behaviour of the major developed countries when linked together). We then use our powerful indirect inference test of its ability to match the data behaviour in the two economies. It turns out that because the cross-equation restrictions on a threecountry model are dense, the empirical tests we are using may have been set at too demanding a level which has misled some previous researchers, including some of us ourselves, into premature dismissal of these models.
We focus in particular on the capital movement relationships in these models: UIP and consumer risk-pooling. With highly sophisticated financial markets capable of providing insurance it has seemed a puzzle that the evidence noted above does not favour either UIP or riskpooling. However, one of the problems in assessing this evidence has been that all the variables in these hypotheses are endogenous, creating difficult econometric issues.
Given the financial crisis and the upheavals it has caused in monetary and regulatory policy, we have had to approach monetary and related policy issues in a way that would not complicate our simple set-up by creating non-linear regime switches to the zero bound and the accompanying adoption of Quantitative Easing (QE, aggressive open market operations) and stringent bank regulation. These are important issues, tackled in recent work for the US by Le, Meenagh, Minford, Wickens, and Xu (2016), Le, Meenagh, and Minford (2018) in the context of the closed continental US economy. Instead of focusing on these issues by such means, we assume that the relevant interest rate in these models is the corporate bond rate (AAA rated corporate bond yield for US, and equally weighted average of France and Germany Corp bond yield rate for European Area [EA]). This rate did not hit the zero bound, unlike the rate on government bonds. By implication of this choice of reference interest rate, we think of monetary policy as influencing it by various policy means, including bank regulation, QE and direct changes in central bank lending/deposit rates for banks (which of course have gone negative at times). Thus, our Taylor rule relates to this commercial credit rate according to this interpretation.
In the rest of this paper, we first describe the model, in Section 2, in both its standard version with uncovered interest parity (UIP) and non-contingent bonds and also its riskpooling version with fully contingent bonds. In Section 3, we go on to test the two versions, after estimation, against the data behaviour of the two countries: we carefully discuss the way our testing method works and the power of the test we use. In Section 4 we compare and contrast the two versions, in their responses to shocks. Section 5 concludes.

| The standard model with noncontingent bonds
We model a world economy comprising two countries (US and EA) and the rest of the world. The US and EA share the same model structure, while the rest of the world is included to pick up trade happening indirectly between the US and EA economies. To save space, we present in the following the basic model structure from the point of view of US that we refer to as the "home" country (denoted with subscript H). Unless necessary, we omit to present the EA economy, which is "foreign" (denoted with subscript F) to US and have variables denoted with asterisk. In Appendix A we provide the full listing of the log-linearized model.

| Households
The representative household's preference is given by: where β is the discount factor, ϵ t is the time-preference shock, σ is the inverse of consumption elasticity, φ is the inverse of labour elasticity, and C t is the aggregate consumption index defined as: where is the CES index of goods produced in home country, and C F,t = Ð 1 is that of goods produced in foreign country. η > 0 is the degree of substitution between domestic and foreign goods (Armington, 1969). α is the degree of openness (and we assume α * = α). γ, γ * > 0 are the price elasticities of differentiated goods.
We assume complete financial market in both the home and foreign economies. Households who invest both in domestic and foreign bonds, B t and B Ã t , have budget constraint: where Gali & Monacelli, 2005), S t is the nominal exchange rate defined as units of home currency per unit of foreign currency ($/¢), R t and R Ã t are the home and foreign nominal interest rates, W t is the nominal wage, and TR t is the lump-sum transfer.
The optimization problem of households is to maximize (1), subject to (3), by choosing C t , N t , B t + 1 and B Ã t + 1 . The optimal conditions with respect to C t and B t + 1 lead to the Euler equation: which can be log-linearized to be: where c t = lnC t , E t π t + 1 = E t lnP t + 1 − lnP t = E t p t + 1 − p t is the expected CPI inflation, and r = β − 1 −1 is the steady-state real interest rate. The optimal conditions with respect to B t + 1 and B Ã t + 1 imply: which can be log-linearized to find the UIP: where s t = lnS t . Let real exchange rate be Q t = S t P Ã t P t and therefore q t = s t + p Ã t −p t in log-linearized form. The UIP condition can be re-written in real terms as:

| Firms
Following Calvo (1983), we let a fraction (1 − θ) of firms re-optimize prices P t (h) in each period, while the rest θ keep theirs. Firms resetting prices maximize: by choosing P t (h), subject to demand: where MC t + k is the real marginal cost at t + k. The first order condition implies the optimal reset priceP t h ð Þ , which can be log-linearized and combined with the price index of domestic goods arized around a zero-inflation steady state as usual, to find the New Keynesian Phillips curve for domestic inflation: , and '⋏' denotes the percentage deviation of a variable from the steady-state level.
In open economy, the general price level reflects also imported products. Since the general price index in loglinearized form is: The CPI inflation can be shown as: which can be further simplified, using the real exchange rate equation (q t = s t + p Ã t −p t ), to: Substituting (14) into (11), it yields the open-economy New Keynesian Phillips curve: To find the real marginal cost in (15), let the production function of the whole economy be: where A t is productivity and N t is labour input. The unit cost of output is W t /Y t, where W t is the nominal wage rate. The real marginal cost per unit of output is therefore: which can be log-linearized to: where w t = lnW t and a t = lnA t .

| The IS-PC-Taylor rule model
The above model can be condensed to the well-known IS-Phillips curves model where for our "world model" variant here we also assume the following trade equations (All expressed in log; again, seeing US as the home economy): Home import from foreign country: Home import from the rest of the world 1 : Trade balance of the world economy: where Ξ and Ϝ are the steady-state import/export ratios, and the LHS of the equation can be seen as the output of the rest of the world: The world's relative demand for US and EA products is set by: Assume that the home economy clears at Y t = C t + NX t where NX t is the net export. The national income identity can be log-linearized and combined with the Euler equation and the trade equations to find the IS curve of the home economy: where c and x are the steady-state consumption and export ratios, Θ, z 1 and z 3 are combinations of structural parameters, and ε IS t is the equation error which can be interpreted as the demand shock (See details of derivation in Appendix B).
The Phillips curve can be re-written to reflect the relationship between CPI inflation and the "output gap" by combining (15) and (18), using the national income identity, to be: where κ a = λ σ 1 c Θ − 1 + φ À Á , and ε PP t is the supply shock. In particular, we assume that "potential output" y p t follows a random walk process with drift (in the log form) as: to reflect permanent impact of the productivity shock (ε yp t ). Γ y p in (25) is the deterministic trend of the potential output, and δ < 0 ensures that the process is trend stationary.
The model can be closed by setting a rule for monetary policy, which we let it follow a Taylor rule: where ρ measures the inertia of policy, ϕ π and ϕ y are the responses to inflation and output, and ε R t is the policy error. Here we allow for international monetary cooperation such that monetary policy also responds to fluctuations of the real exchange rate. On this occasion, the home interest rate rises if the home currency depreciates; the responsiveness is measured by ϕ q .
Thus, Equations (19)-(26), together with the UIP condition (8) and the 'foreign' equations omitted for the EA, constitute a simple "world" model that we list in full in Appendix A and treat as the benchmark model.

| The risk-pooling and UIP variants of the model
As was noted by Chari et al. (2002), given that this model has bond UIP via non-contingent nominal bonds, it produces real UIP which generates expected risk-pooling from an initial position. That is to say that from wherever the real exchange rate is today, it is expected that future consumption in the two countries will move together adjusting for movement in the real exchange rate. This comes about because the real interest rate differential is equal to the expected change in the real exchange rate (due to UIP) and also to the expected change in the consumption differential (adjusted for the risk-aversion parameter) due to the two Euler equations. Therefore, expected consumption will move together in the two countries apart from the effect of the changing real exchange rate. This risk-pooling is "dynamic" because it is disturbed by shocks to consumption preferences (there is no shock to UIP in the model because second moments are all constant). Viewed over time from some initial date, riskpooling is close to being delivered (exactly if utility is logarithmic in consumption). But there is no insurance against preference shocks. If however consumers have access to contingent nominal bonds, full risk-pooling occurs, insuring against all shocks, so that the real exchange rate is deterministically related to the ratio of foreign to home consumption; this case also implies UIP.
This can be shown formally as follows-following Chari et al. (2002): a) Full risk-pooling via state-contingent nominal bonds: let the price at time t = 0 (when the state was x 0 ) of a home nominal state-contingent bond paying 1 (home currency) in state x t be: where β is time-preference and f(x t , x 0 ) is the probability of x t occurring given x 0 has occurred. Now note that foreign consumers can also buy this bond freely via the foreign exchange market (where S is home currency per foreign currency as above) and its value as set by them will be: Here they are equating the expected marginal utility of acquiring this dollar bond with foreign currency, with the marginal utility of a unit of foreign currency at time 0. Plainly, the price paid by the foreign consumer must be equal by arbitrage to the price paid by the home consumer. Equating these two equations yields: Now we note that the terms for the period t = 0 are the same for all x t so that for all t from t = 0 onwards: t + s t be the real exchange rate (where a rise is a US, Home, depreciation) as in our notation elsewhere; ϵ is the shock to time-preference. Then this yields the risk-pooling condition: ignoring the constant: v is the difference between the logs of the two countries' time-preference errors (These errors will also form part of the two IS shocks). To see that this implies the UIP relationship, use the Euler equations for consumption (e.g., for home con- where B −1 is the forward operator keeping the date of expectations constant). Substituting for consumption into the risk-pooling equation gives us UIP: When there are only non-contingent bonds then arbitrage forces UIP. When this is substituted back into the Euler equations it yields: Hence now the risk-pooling condition occurs in expected form from where it currently is. But any shocks may disturb it in the future.
Thus with full risk-pooling under state-contingent bonds relative consumption is exactly correlated with the real exchange rate and time-preference shocks. But under non-contingent bonds it is subject to all shocks: it is only expected to be correlated exactly from where it currently is.
We continue with the model under both these variants: our "default" variant contains UIP, and we consider the risk-pooling variant as an explicit alternative where the UIP Equation (8) is replaced with the risk-pooling Equation (31).

| Single-equation tests of UIP
As noted in the introduction, there is a large empirical literature testing UIP and risk-pooling by single-equation methods.
Begin with the UIP equation: Here the usual single-equation test is a predictive test, to see whether the actual future real exchange rate obeys the rational expectation prediction of the equation. Thus q t + 1 = E t q t + 1 + e t + 1 ; and so: T is the sample size; we then compute the rejection rate of these sample, treating each as a single-equation test of the null hypothesis. We find that the test is heavily biased towards rejection, with a rejection rate of 16.9% (instead of 5%), as can be seen from Table 1 What this means is that the test is rejecting the (true) zero mean forecasting error of UIP nearly three times too much, 16.9% against the 5% the unbiased test would give. This is equivalent to applying a normal deviate of 1.37 as the rejection threshold compared with the appropriate 1.96. Equivalently the t-values of the test are over-stated by a third. What is the reason for this bias in the test? It is small sample bias. In a small sample, the tails of the population distribution will be under-represented in a number of samples, so that their standard deviation is smaller than the population standard deviation. Such samples will reject the zero mean hypothesis too often. There will also be samples in which the tails are over-represented, to compensate, keeping the average standard deviation across all samples in line with the population standard deviation. These will under-reject the zero mean hypothesis. But there are less of these samples than the former because the probability of getting tail draws in a sample is low compared with that of not getting them. Hence the rise in the number of rejections across all samples from the 5% nominal rate was observed.
A more widely used test of UIP is to estimate the regression of , and then test a = 0 and b = 1. To examine the extent of the bias in this regression we use the same Monte Carlo experiment in which we assume the model with UIP is true, creating 1,000 samples from this model. We then run the OLS regression on each sample and see how often the null hypothesis of a = 0 and b = 1 is rejected at the usual 5% threshold using the F test. We find, as reported in Table 2, a rejection rate of over 70%, whereas with an unbiased test it would be just 5%-again, a massive bias towards rejection.
What is the source of this bias? Again it is due to small sample bias: the small samples on which the regression is being run. The true relationship across the whole population is a = 0 and b = 1. But in any one sample the relationship can vary according to the data drawn in that sample from the model bootstraps; these data draws have high variance because all the model's shocks are drawn to create both the interest differential and the real exchange rate movement. Thus here the sample variation causes wide variations in the OLS estimates: notice that in our test above of forecasting accuracy, we imposed T A B L E 1 Monte Carlo forecasting test of UIP Forecasting equation: Reject rate of H 0 : e = 0 at the 5% level 16.9% a = 0, b = 1, so preventing this OLS source of bias, and merely getting the bias due to tail draws. We can illustrate this OLS bias by showing 10 samples side by side, each with their OLS regression line; and compare them with the regression line for all samples pooled together, creating a large sample. It is plain, as we show in Figure 1, how variable the 10 sample slopes are, whereas the large sample pooled regression has approximately a = 0 and b = 1 showing a slope of unity passing through the (0, 0) intercept as assumed in the true model.

| Single-equation tests of risk-pooling
If we turn now to the single-equation time-series tests of the risk-pooling equation, we see that they examine two time series-of the consumption differential and the real exchange rate-allowing for a random i.i.d. error: Here the question is whether the error created from the difference between the two time series behaves in line with the risk-pooling hypothesis. Because both the consumption ratio and the real exchange rate are non-stationary, one may here carry out a co-integration test, to see whether the two series vary together as the riskpooling hypothesis states: this test tests whether the error from an OLS regression is stationary or not, using the ADF test. Typically studies find a lack of co-integration, rejecting the hypothesis.
The problem however is that the risk-pooling equation includes the relative shock to consumers' time-preference, v t , as derived in (31). This is an exogenous variable, not an i.i.d. shock. It could be recovered from the two countries' Euler equations; but this is not usually done and if done would need to respect the rational expectations restrictions on expected future consumption in the Euler equation coming from the whole model solution. It is plainly an important time-series shock, which is included in the macro model. Simply leaving it out of the regression creates omitted variable bias for the estimated equation-a serious and possibly fatal specification error.
To find out what this problem might do to these cointegration tests, we again run a Monte Carlo experiment as above; but on this occasion, we generate bootstrap samples for the consumption differential and the real exchange rate from the risk-pooling variant of the model, in which there is co-integration by construction. We then compute how frequently co-integration is rejected, by testing the stationarity of the residual of the risk-pooling regression (34) for each sample (Co-integration is rejected if the ADF test fails to reject the null hypothesis of unit root). Table 3 shows that, depending on exact lags and trend assumptions used in the test, the rejection rate of co-integration at the 5% level lies between 70-93%. Thus, the test is very strongly biased against co-integration: the risk-pooling model from which these errors come implies co-integration on the true equation, but the general lack of co-integration comes from the omitted relative shock to consumers' time-preference (v t ) in the risk-pooling regression. Effectively it is this omitted shock that ensures co-integration.
Another widely used test of the risk-pooling hypothesis tests the estimates of a and b of the risk-pooling regression, in a similar way to the F test applied to the UIP regression reviewed above. The difference is that on this occasion the null hypothesis changes to H 0 : α = 0, b = 1/σ, as implied by the macro model. We can use the same Monte Carlo experiment as above to examine the bias of the test at the 5% threshold. Table 4 shows the mean OLS estimate of a to be −1.99 against a true value of zero, and that of b to be 0.23 against a true value of 0.63. Clearly, these estimates are highly biased. The average estimate of b has an average tvalue of only 1.2, so clearly these regressions will typically find an insignificant coefficient for b. The rejection rate of the null hypothesis of α = 0 and b = 1/σ is near 85%, which is massively over-sized compared to the 5% level.
What we have found therefore is that if the riskpooling model is correct the regressions performed on sample data generated from it will find an insignificant relationship between the real exchange rate and relative consumption and also a lack of co-integration because the key error in this relationship is omitted-an important mis-specification. In addition small sample bias will occur here as for UIP, with data variation in the small samples high relative to that in the population. This strong bias towards rejection of risk-pooling in singleequation tests is thus coming from omitted variable bias

| Indirect inference
We now turn to an indirect inference test of full models with either UIP or risk-pooling embedded in them. The idea is that any major fault in these models should lead to their rejection with high likelihood; this power of our test is something we establish below with Monte Carlo simulations. Plainly either UIP or risk-pooling are key relationships in the model that if wrong should produce rejection: both have strong implications for the behaviour of all the model's variables.
Indirect Inference is a relatively unfamiliar method of estimation and testing. We use it here because we need a method that will powerfully reject a mis-specified model in the small sample that we have (around 168 quarterly observations). The two main alternatives today are Bayesian estimation with strong priors or Maximum Likelihood (equivalent to Bayesian estimation with flat priors).
The former is an appropriate method when much is already known about the issue at hand, so that priors can be set out that command general assent; often the case in the physical sciences and indeed in some parts of the social sciences. However, this condition does not apply here: the macroeconomics of the world economy is not much explored and remains controversial.
Maximum Likelihood estimation is based on minimizing the model's now-casting prediction errors and its associated test is based on the likelihood implied by these  errors. The two main difficulties of this method are first that it exhibits high estimation bias in small samples and second that the power of the test in small samples is also rather limited and in particular its power to reject a misspecified model is close to zero, because such a model can be fitted closely to the data, so creating small errors. Le, Meenagh, Minford, Wickens, and Xu (2016) carried out a Monte Carlo comparison of this method with indirect inference, treating the widely used Smets and Wouters (2007) model of the US as the true model, and concluded that, while indeed ML methods suffered from these problems, by contrast indirect inference offered very low bias and potentially large power. The method involves first describing the data behaviour in the sample by an "auxiliary model," for which we use a VAR; and then simulating the DSGE model by bootstrapping its innovations to create many parallel samples (or histories) from each of which implied auxiliary model coefficients are estimated, generating a distribution of these coefficients according to the DSGE model. We then ask whether the VAR coefficients found in the actual data sample (actual history) came from this distribution with a high enough probability to pass the Wald test (where we put the test threshold at 5%). Notice that when we bootstrap these shocks we do so by time vector, that is to say we draw all the innovations for one period together when we randomly select shocks. This preserves any simultaneous correlation between them which may well be important because a single event source can trigger shocks all over the economy-as in the recent financial crisis.

| The auxiliary model
The state-space representation of log-linearized DSGE model in general has a restricted VARMA representation for the endogenous variables or a finite order VAR model. However, if the observed data are non-stationary, following Meenagh et al. (2018) and , an unrestricted version of VECM can be used as an auxiliary model when errors are stationary. The VECM model is an approximation of the reduced form of DSGE model and can be represented as a cointegrated VAR with exogenous variables (VARX) model. Suppose the structural model can be written in loglinearized form as: where y t is a vector of endogenous variables with dimension p × 1 and x t is a vector of exogenous variables with dimension q × 1. We assume x t are non-stationary and follows a unit root process: The disturbances e t and ϵ t are both vectors of i.i.d. error processes with zero means. L denotes the lag operator and A(L), (B(L), a(L), c(L) are polynomial functions having roots lying outside the unit circle.
The general solution of y t is given by: where f is a vector of constants and polynomial functions in lag operator. Since y t and x t are both non-stationary, the solution has p cointegrating relationships such that: where Π is a p × p matrix with a rank 0 ≤ r < p, with r being the number of linearly independent cointegrating vectors. In long run, the solution to the model is given by: where y t and x t are the long run solution to y t and x t respectively. The generic solution of x t can be decomposed into a deterministic trend x t d = 1−a 1 ð Þ ½ − 1 dt and a stochastic trend The solution of y t in Equation (37) can be re-written as in the cointegrated VECM with a mixed moving average process ω t : The VECM can be approximated by: where ζ t is an i.i.d. process with zero mean. Since g = y t − 1 −Π x t − 1 , the VECM can also be written as: Either of Equation (43) or (44) can serve as the auxiliary model. In particular, Equation (44) distinguishes between the effect of the trend component of x t and the temporary deviation of x t from trend; it can be re-written to be a VARX (1) in level: where x t − 1 contains the stochastic trends in the exogenous variables, η t is included to pick up the deterministic trends in y t , and v t is a vector of the error terms. For doing the Wald test, we calculate the Wald statistic where we account for the VAR coefficients of the lagged endogenous variables (I − K) and the variances of the VAR errors Var(v t ) that we take as descriptors of the data. We are not interested in matching the time trends and the coefficients of the exogenous variables (the two potential outputs on this occasion), and we assume that the model coefficients yielding these balanced growth paths and effects of trend productivity on the steady state are chosen accurately.

| Choosing the variables to be matched by indirect inference
A central question to be addressed in testing a model by indirect inference is choosing the power of the test. In practice, this is equivalent to choosing which variables to put in the auxiliary model-which here we put in the form of a VAR. Other forms of auxiliary model could be used instead, such as moments or IRFs, with similar results, as discussed in Le, Meenagh, Minford, Wickens, and Xu (2016). Le, Meenagh, Minford, Wickens, and Xu (2016) show that, as the number of variables and the order of the VAR rise, the power of the test increases up to the point where the full reduced form VAR of the model is reached. For example in Smets and Wouters (2007), the full reduced from is a VAR (4) in seven variables, implying some 200 VAR coefficients in all. Plainly each of these carries additional information about the implications of the model for the data.
Policy-maker using a macro model (or any other user) would like to find a model that passes the test, in order to make progress in assessing the effects of policy and also the accuracy of the assessment. A model that does not pass the test cannot be of any use in this respect. On the other hand the test needs also to have considerable power in order to discriminate between good and bad models and to ensure that the model chosen is reasonably accurate. Thus the most powerful II test will reject any model that is as little as 1% inaccurate; effectively only admitting a model that "is the real world." The least powerful may admit models of considerable inaccuracy.
To assess how many variables should be included and what order of VAR requires us to examine the power of various combinations on the type of model we are investigating. This can be done by Monte Carlo simulation. After some experimentation with different variables and VAR orders we found that just two key variables in a VAR (1)-the two outputs from each country-provide substantial power in testing this two-country model. We include these two variables and also the variances of their residuals in our auxiliary model.

| The power of test for a twovariable VARX (1): Some Monte Carlo experiments
In this section, we examine the relative power of the II test on alternative false models. We do so by first estimating both the UIP and risk-pooling models which we treat as the "true" models, and then using them to generate 1,000 sets of simulated data by bootstrapping the structural shocks identified over the sample period. These simulated data are then fitted to a VARX (1) for a distribution of the VAR parameters to be found. This also gives us a distribution of the Wald statistics of the true models which we know at the 95 percentile (i.e., at the 5% level of significance) 5% of the true model simulations will be rejected. The corresponding Wald statistic is the critical value of the Wald test at the 5% significance level. To evaluate the power of the test against the two models, in the next step we falsify each of them by biasing their parameters by a percentage and generate false simulations with the biased models. We then find the distribution of the Wald statistics just as before, but in this case we calculate the rejection rate by using the 5% critical values found with the true models. We try different degrees of falseness.
The Monte Carlo experiment results for each of the models are reported in Table 5. It can be seen that the power of the test with just the two outputs is very high. When all parameters of the UIP model are falsified by only 5%, the model is rejected 100% of the time; with only 3% falsification it is rejected 81% of the time. The test power for the risk-pooling model is similar. If we add just one variable to the two outputs, the real exchange rate, the power rises sharply for both models. 2

| Data and calibration
We now confront the model described above to the quarterly US and euro area data between 1970Q1 and 2011Q4 which we plot in Figure 2. Certain parameters are fixed throughout; others are calibrated to begin with and then re-estimated by indirect inference.
Of the fixed parameters, we set the discount factor (β) for both economies to 0.99 to imply a steady-state annual real interest rate of 4%. The steady-state consumption-tooutput ratio (c) is set to 0.66 for US and 0.55 for EA, while the export ratios (x) are 0.12 and 0.30, respectively. Other fixed parameters/steady-state ratios are detailed in Table 6.
For the parameters that are to be re-estimated later, we mainly follow Smets and Wouters (2007) in setting the starting values: the inter-temporal elasticity of substitution σ and elasticity of labour supply φ are set to 1.38 and 1.83, respectively; the Calvo non-adjusting probability is 0.66, which suggests nominal prices are on average adjusted every three quarters. The persistence of nominal interest rate is set to 0.81, while the response to inflation is 2.04 and that to output gap is 0.12. In our specification, we let nominal interest rate respond also to changes in real exchange rate and the response is 0.5. For the euro area, we follow Smets and Wouters (2003): thus σ and φ are 1.39 and 2.50, respectively; the Calvo parameter is 0.9; the Taylor rule coefficients are 0.96 (persistence), 1.69 (inflation response) and 0.12 (output response), and we let the real exchange rate response be the same as that of the US. These calibrated values are listed in Table 7 in comparison to the estimated values.

| The models' performance
In Table 7, we report the test results for the two models according to indirect inference. Not surprisingly, the calibrated models are severely rejected. However, after reestimation, the UIP model can jointly match the behaviour of the two outputs with a t-statistic of 1.4 and Wald percentile of 92.6, thus a p-value of 0.074. This result is in line with the empirical finding of Le, Meenagh, Minford, and Ou (2013) that a large UIP-based world model of the US and the EA, essentially following the full Smets-Wouters specification in both continents, matched a VAR using the subset of the two outputs. What is an entirely new finding is that a model with risk-pooling, a stronger hypothesis than UIP, will also jointly match the same behaviour. Furthermore, it does so with a considerably higher probability, with a t-statistic of 0.8 and a Wald percentile of 80.4, thus a p-value of 0.196-nearly three times that of UIP.

| How accurate are these estimates? robustness considerations
The error in the R-P equation is shown in the paper (Figure 4 as detailed in the next section): this is created by the consumption preference errors, observed here as the residual of the R-P equation. Consumption itself is substituted out of the model into the IS curve for total demand. This error process is only observed from the R-P equation itself, which is imposed in the model; to derive it, we assume, as in the model, CRRA consumption preferences, which gives the additive error, v t , and the parameter 1 σ on q t , in the equation. CRRA utility is standard across these macro models-partly because this utility form ensures a balanced growth path and partly because it is generally found to fit the consumption data-and sigma is the estimated parameter of risk-aversion. Given that the model as a whole is estimated as the most probable based on data behaviour, what we can say about robustness-that is, "how confident can we be about the truth of the model?"-emerges from our Monte Carlo experiment on power. This experiment asks how far from the true model parameters must be to be rejected 100% of the time by our test. We establish through this that the true model parameters, including consumer risk-aversion, cannot in general lie more than 5% from the estimated ones, since the latter passed our test. We also know from other work on similar macro models (Le, Meenagh, Minford, Wickens, & Xu, 2016;Meenagh et al., 2018) that model mis-specification is rejected 100% of the time; so we can be entirely confident that entirely different specifications (including of consumption utility) cannot be correct. To put these results another way, we can give assuranceand be robust in our belief-that the true model, including in its consumption aspects, lies fairly close to the model and parameters we have estimated. Notice that here we have not, as is often done with robustness tests, tested many different variants of the model for whether they would pass our test, since we are rather confident they would not, due to our having estimated the model tightly via indirect inference, a procedure with very low small sample bias (Meenagh et al., 2018).

| Previous empirical work on multicountry modelling
The principle other work on multi-country models (usually consisting of the US, the EA [the euro-zone], and the RoW) of which we are aware is Chari et al. (2002), reviewed in detail by Le et al. (2010) and Kollmann et al. (2016). For testing procedures, Chari et al. (2002) used an informal moments-matching method to examine a calibrated model of the US-EA world economy, while Le et al. used formal indirect inference methods to estimate and test their model against moments and VAR coefficients, which Meenagh et al. (2018) show are equivalent methods. Kollmann et al. (2016) by contrast uses Bayesian estimation, without any formal overall model test.
All these models assume UIP, though Chari et al. also look at an R-P specification. All of them share a New Keynesian specification, where they mainly differ in how the models are calibrated/estimated and tested (or not).
What emerges from these different approaches is that the informal testing method of Chari et al. does not give us a statistically based test. Le et al. discuss this point in detail, showing that an estimated multi-country model similar to that of Chari et al. is rejected formally when several groups of variables are selected to be matched by the model. This can however be explained by the Monte Carlo experiment as in our paper here. This shows that with three or more variables being chosen for matching of either their VAR coefficients or equivalently their moments, the test power is excessive, such that rejection will certainly occur at very small levels of inaccuracy, implying that reasonable models will be universally rejected. It is necessary to use only two variables-here we use the two countries' outputs only-to get high but not excessive power to test these multi-country models. As can be seen the model here passes this rather powerful test.
Comparison of our model parameters with those from other models is complicated by the fact that they have different structures, as well as being largely calibrated, rather than estimated by indirect inference as ours are. The exception is Le et al., which used indirect inference, like us, to estimate their model; however, comparison here is impossible because Le et al. used a full structural model as against our three-equation small open-economy model. For the other authors' models, in general one can say that in our modelling we initiated estimation generally from calibrated values for parameters that we felt followed the existing consensus-that is, we started from broadly similar values to other authors. However the final estimates are entirely dictated by estimation, whereas of these other authors, for example Chari et al. used calibration throughout and Kollmann et al. used Bayesian methods that allow the initial prior (calibrated) parameters to dominate the posterior estimates. Thus our parameter estimates are largely not comparable to those of these other authors, being determined by the data solely, whereas others' are effectively calibrated.
Because these other modelling approaches do not apply the statistical tests we apply here to our model, they cannot establish the validity of UIP or R-P when embedded in their multi-country model. Our paper tests these hypotheses via indirect inference in a full multicountry model-it should be noted that we have updated the tests of Le et al. by using strictly a two-variable auxiliary model, which has ideal power as explained in our work. As noted above, it is the first time the R-P hypothesis has been tested in this way.

| The shock processes
We can extract the structural shocks of both models from the unfiltered data and fit each of them to a time-series process to check their properties. (For productivity of the two countries, we simply use the potential output data that we extracted from the time series of outputs using the HP filter). We plot these processes in Figures 3 and 4. For each of them, we test their stationarity using the ADF test. Table 8 shows the two productivity processes are I (1) processes, which supports our specification for them (as in Equation 25). The two demand shocks and the two monetary policy shocks are trend stationary, while the two supply shocks are stationary.
We fit all the shock series to an AR (1) process (while the two productivities are kept to be ARIMA [1,1,0] processes as assumed). We estimate the persistence of all these processes using the Limited Information Maximum Likelihood method (McCallum, 1976;Wickens, 1982) and report the estimates in the same table.
5.2 | Impulse response functions 5.2.1 | The standard UIP model What emerges from our results for the standard UIP model is that the two continents are essentially self-contained. Spillovers on real variables are small, as can be seen from the UIP variance decomposition over any length of period (as detailed below in Table 9, and Tables D1 and D3 in Appendix D). They are frustrated partly by movements in interest rates and in the real exchange rate. Both central banks respond to the real exchange rate with interest rate changes, while the real exchange rate in turn responds to real interest rate differentials. This pattern conforms to a standard model of the open economy under floating, where the floating exchange rate allows interest rate movements in each country to dampen its own shocks as well as any spillovers from shocks abroad. Central banks in effect control cross-continent integration by pursuing their own objectives and forcing the exchange rate to adjust. Thus home shocks dominate the home economy real variables; foreign shocks are largely neutralized.
This pattern can be seen in the IRFs for individual shocks, where in all cases spillovers to foreign output and consumption are small. Thus for example a US demand shock (see Figure 5) raises US output on impact by 1.4%, raises EA inflation and interest rates and so lowers EA output by 0.2%, while an EA demand shock ( Figure 6) raises EA output by 1.5%, raises US real interest rates and so lowers US output by 0.4%.

| The risk-pooling model
The risk-pooling model in effect opens up a direct channel of insurance between consumers in different continents, removing power from central banks to separate the economies. One can think of this risk-pooling mechanism as enabling foreign consumers to transfer resources directly to home consumers hit by a downturn; these resources are then spent by home consumers who thereby bid for foreign supplies, their own being short. This raises the relative price of foreign supplies, causing a real depreciation in the home exchange rate. What matters here for the real exchange rate reaction is the elasticity of foreign supply which in this New Keynesian model is dictated by the Calvo stickiness parameter. The estimated stickiness of the US and EA are similar enough to imply that both EA and US output supply have a similar inelasticity.
The risk-pooling model therefore creates much greater integration of the two economies. Both US and EA shocks now spill over into the other continent. Take the demand shock below as an example. The US demand shock raises US output by 1.4% and EA output by 0.6%, while the US real exchange rate depreciates by nearly 2%. Central banks react to the home effects of the home shocks in a familiar way, in this case raising interest rates; but the foreign central banks, while reacting normally to the spillovers, react mainly to the sharp real exchange rate movements which push them in a direction opposite to the familiar one. Thus on the US demand shock EA interest rates fall in the attempt to dampen the real appreciation of the euro, while on the EA demand shock US interest rates fall to dampen the real appreciation of the dollar. In effect, central banks are being forced F I G U R E 1 1 Historical decomposition for US output (risk-pooling) [Colour figure can be viewed at wileyonlinelibrary.com] F I G U R E 1 2 Historical decomposition for EA output (risk-pooling) [Colour figure can be viewed at wileyonlinelibrary.com] to help the spillover process by dampening the real exchange rate reaction coming from supply inelasticity. Hence whereas central banks largely frustrate the ability of consumers to profit from spillovers under UIP, under risk-pooling consumers make free use of spillovers and force central banks to help the process along.
To save space we report the impulse responses of the other shocks in Appendix C (Figures C1-C7).

| Variance decomposition
If we now turn to the variance decomposition of the two models, we find that the real spillovers under risk-pooling are substantially larger than under UIP, as emerged from our impulse response functions. Taking the short-run (the two-year case) as an example, when there are demand shocks under UIP the variance share of the output spillover is 0.7% of that of the home output for the US demand shock and 7% for the EA demand shock (See Table 9). The corresponding percentages under riskpooling are 6% and 35% (Table 10).
In Appendix D we also report the decompositions for longer horizons (10 years and 40 years). As one would expect, the longer the time horizon the more the variances are dominated by productivity shocks. Indeed, we find too under risk-pooling there are spillovers, but none to speak of under UIP (Tables D1-D4).

| Historical decomposition
When we compare the historical decomposition of the two models, we see that the UIP model is the lack of spillovers (all in yellow) into output of either continent from the other (Figures 7 and 8). We also see that interest rates in each continent are a key instrument by which these spillovers are frustrated since the other continent's shocks (in yellow) bulk large in each continent's monetary responses (Figures 9 and 10). When we turn to the case of the risk-pooling model, we see larger output spillovers (again in yellow) in both directions (Figures 11 and  12). As for interest rates again we see how in each continent interest rates respond to foreign shocks (Figures 13  and 14)-here because of their effects on the exchange rate, the response pattern is quite different.

| CONCLUSION
In this paper, our first aim was to find a world model of two continents plus a Rest of World sector that could match selected data in a powerful indirect inference test. Our second aim was to discover whether there was risk-pooling in such a model. We did find such a model both in a UIP version in which there was "dynamic" risk-pooling from non-contingent bonds tradeable across borders; and also in a version with full risk-pooling provided by state-contingent bonds, a stronger hypothesis than UIP which implies it, but is not implied by it. Of these two versions, the riskpooling one was considerably more probable than the UIP version but both passed our tests.
In the UIP version of this model, we found rather familiar features: each continent, US and EA, responds almost entirely to its own non-stationary productivity shocks, with stationary demand/supply shocks having limited spillover effects. Each economy is largely insulated from the other by the floating exchange rate, with monetary policy largely unresponsive to it. By contrast, in the risk-pooling version, behaviour turned out to be materially different. The two continents were in this closely integrated by private insurance markets achieved through contingent assets. Shocks in one continent cause consumers to acquire resources in the other where they spend them, driving up relative prices (the real exchange rate) to generate supply. These real exchange rate movements then force monetary policy to lean against them and so boost these spillovers.
Previous statistical tests of both UIP and risk-pooling have used single-equation methods, which we explain are likely to reject the model spuriously; and we confirm this from Monte Carlo experiments. This is to our knowledge the first time that a powerful statistical test has been performed on a full world model embodying both these hypotheses with their distinctive effects on the behaviour of all variables. The fact that the full riskpooling hypothesis has passed this test with a high pvalue suggests that it deserves serious attention from policy-makers looking for a relevant model with which to discuss international monetary and other business cycle policy.

DATA AVAILABILITY STATEMENT
The data support the findings of this study are openly available in: 1: Fred Economic Data at https://fred. stlouisfed.org/ 2: The AWM database at https://eabcn. org/page/area-wide-model ORCID Zhirong Ou https://orcid.org/0000-0002-4610-7183 ENDNOTES 1 We assume import from RoW is only affected domestic income for simplicity. 2 In recent work  on Indirect Inference in small samples, it has been found that the test power tends to rise with the number of variables in the auxiliary VAR as we find here. However, the test power is rather insensitive to which variables are included in the auxiliary VAR; thus here we would expect similar test power with any other two variables, such as the two consumptions, or the two interest rates. The test power is also fairly insensitive to whether one uses a VAR for the two variables or a set of moments or a set of impulse response functions (IRFs), provided the number of each in the auxiliary model is similar. Thus a two-variable VAR (1) implies four VAR coefficients plus the two VAR error variances, six "descriptors" in all. Around the same number of moments or IRFs should be selected for similar power. 3 This is found by imposing the long-run restriction of trade balance (thus, nx t = 0) on the US net export equation and solving for the real exchange rate.
A P P END I X C: Other impulse response functions