Bayesian state-space modeling for analyzing heterogeneous network effects of US monetary policy

Understanding disaggregate channels in the transmission of monetary policy to the real and financial sectors is of crucial importance for effectively implementing policy measures. We extend the empirical econometric literature on the role of production networks in the propagation of shocks along two dimensions. First, we set forth a Bayesian spatial panel state-space model that assumes time variation in the spatial dependence parameter, and apply the framework to a study of measuring network effects of US monetary policy on the industry level. Second, we account for cross-sectional heterogeneity and cluster impacts of monetary policy shocks to production industries via a sparse finite Gaussian mixture model. The results suggest substantial heterogeneities in the responses of industries to surprise monetary policy shocks. Moreover, we find that the role of network effects varies strongly over time. In particular, US recessions tend to coincide with periods where between 40 to 60 percent of the overall effects can be attributed to network effects; expansionary economic episodes show muted network effects with magnitudes of roughly 20 to 30 percent.


INTRODUCTION
A growing number of papers explores how shocks on the micro-and macro-level propagate through economic networks and how such shocks relate to aggregate fluctuations. Most articles provide substantial evidence for the importance of network effects (see, for instance, Gabaix, 2011;Acemoglu et al., 2012;Elliott et al., 2014;Acemoglu et al., 2015;Ozdagli and Weber, 2017). Recent empirical analyses, however, suffer from a set of limiting shortcomings: they mainly rely on constant parameter specifications and either focus on aggregate data or neglect heterogeneities among cross-sectional observations.
In this article, we address these issues and extend the literature on spatial panel data models (see, for instance, Elhorst, 2014;Aquaro et al., 2015;LeSage and Chih, 2016), linking them to the vast literature on Bayesian state-space modeling (see Kim and Nelson, 1999). As a topical application, we focus on the transmission of monetary policy shocks through the US production network. Our approach is closely related to Ozdagli and Weber (2017), who generalize the setup proposed in Bernanke and Kuttner (2005) and Gürkaynak et al. (2005) for analyzing the impact of changes in monetary policy on equity prices. 1 While Bernanke and Kuttner (2005) and Gürkaynak et al. (2005) focus mostly on aggregate data like the S&P500 index, Ozdagli and Weber (2017) find substantial evidence for higher-order effects of monetary policy on stock market returns using disaggregate data on the industry-level, and attribute between 60 to 80 percent of the total effects to spillovers between industries.
In the empirical application, however, Ozdagli and Weber (2017) neglect industry specific idiosyncrasies and disregard time-variation in the strength of network dependencies. This is problematic for two reasons. First, pooling information across industries may conceal important underlying structural relationships, and potentially distorts the estimated importance of some industries in the disaggregate transmission of monetary policy shocks compared to others (see also Bernanke and Kuttner, 2005). Second, structural breaks in macroeconomic and financial series are increasingly drawing interest in the related literature (see, for instance, Cogley and Sargent, 2005;Primiceri, 2005;Sims and Zha, 2006). Time-varying parameter models are a popular tool for alleviating concerns of misspecification arising from nonlinear dynamics in small-scale models (Feldkircher et al., 2017).
To address these empirical shortcomings and circumvent concerns of biases, we develop a Bayesian state-space model for analyzing network effects of US monetary policy, allowing for heterogeneity both over time and the cross-section. Our contributions are thus both of methodological and empirical nature. First, we assume the spatial dependence parameter to vary over time via imposing a random walk state-equation, and provide Bayesian prior and posterior distributions alongside a sampling algorithm for inference. Second, we address how to efficiently exploit cross-sectional information for obtaining precise inference, but allow for heterogeneous effects across units in a stochastic fashion. 2 Most of the established spatial methods for panel data analysis rely on deterministic data transformations such as fixed effects. By contrast, we take a fully Bayesian stance by imposing a hierarchical shrinkage prior on the regression coefficients and the residual variances. In particular, we base our prior setup on sparse finite Gaussian mixtures (see Malsiner-Walli et al., 2016), providing a link to the literature on random coefficient and heterogeneity models (Verbeke and Lesaffre, 1996;Allenby et al., 1998;Frühwirth-Schnatter et al., 2004).
From an empirical perspective, two main findings are worth noting. First, there is substantial evidence pointing towards the necessity of addressing industry specific reactions to monetary surprises. Estimated effects are much smaller when disregarding the notion that some industries are more sensitive to Fed policy changes than others, up to a magnitude of roughly one percentage point. In particular, we find that the average stock market response across industries to a one percentage point surprise increase in federal funds futures translates to a median response across all industries of approximately 1.9 percentage points, with industry-specific estimates up to five percentage points. Second, we find substantial evidence for time-variation in the strength of network dependency structures. In particular, US recessions tend to coincide with periods where between 40 to 60 percent of the overall effects can be attributed to network effects; expansionary economic episodes show muted network effects with magnitudes of around 20 to 30 percent.
The remainder of the paper is structured as follows. In Section 2, we set forth the spatial panel model. Section 3 discusses the Bayesian prior setup and provides a Gibbs sampling algorithm for inference. We apply the model in a study of the network effects of US monetary policy in Section 5. Section 6 concludes.

A TIME-VARYING SPATIAL DEPENDENCE PANEL MODEL
The baseline model is in the spirit of spatial panel specifications 3 and can be written for observation i = 1, . . . , N as where y it is the response variable at time t = 1, . . . , T . We include K exogenous covariates in the K × 1-vector x it with associated observation specific parameter vector β i of size K × 1 and a Gaussian error term with zero mean and variance σ 2 i . Information on the cross-sectional dependency structure is incorporated using weighted averages of the "foreign" quantities y jt (j = 1, . . . , N ) with exogenous weights w ij denoting the elements of an N × N weighting matrix W subject to the typical restrictions w ii = 0, w ij ≥ 0 and N j=1 w ij = 1 that guarantee the stability of the model. The first novelty proposed in this paper is that the scalar parameter ρ t features time-variation. The state equation for the spatial dependence parameter ρ t is a random walk process: Established econometric methods typically rely on constant spatial dependency structures. Note that ς = 0 implies that the model collapses to a constant parameter model, and testing whether the data suggests time-varying spatial dependence is thus a variance selection problem, as discussed in the context of a standard time-varying parameter model by Frühwirth-Schnatter and Wagner (2010). We exploit this fact below by imposing a suitable shrinkage prior on these variances that pushes the model towards the constant parameter specification if suggested by likelihood information.

Interpreting the model coefficients
The approach to modeling spatial dependence pursued in this paper establishes a large system of simultaneous equations with specific parametric restrictions. Consequently, standard interpretations for linear regressions have to be adapted to account for the notion of crosssectional dependencies. Here, we follow LeSage and Chih (2016) and derive the impact matrix for the kth coefficient for k = 1, . . . , K with respect to a change in the kth exogenous covariate x kt = (x 1kt , . . . , x N kt ) for all cross-sectional units as Here, B k = diag(β 1k , . . . , β N k ) with β ik referring to the kth coefficient of observation i. Following LeSage and Pace (2009), it is conventional to define 1/N × tr(S kt ) as the average direct effect, 1/N × ι N S kt ι N as the average total effect, and the difference between the two as the average indirect, or network effect. Assuming time-varying spatial dependence yields an impact matrix S kt for t = 1, . . . , T .
It is worth mentioning that the Bayesian approach we set forth allows for adequate quantification of uncertainty surrounding all model parameters and functions thereof. Besides full posterior distributions of the total, direct and indirect effects, we obtain confidence bounds for the overall strength of the network effects over time.

PRIOR SPECIFICATION
We estimate the proposed model using Bayesian methods. This involves selecting suitable prior distributions for all parameters and combining them with the likelihood of the data. We first discuss the prior setup for the time-varying spatial dependence parameter. Conditional on a draw of the full history of this parameter {ρ t } T t=1 , inference for the other model parameters is mostly standard, and we subsequently discuss the prior setup for the regressions coefficients and the error variances.

A spike-and-slab prior testing for time variation
For the spatial dependence parameter, we propose a prior setup that stochastically determines whether time-variation is required for adequately capturing observed dynamics, nesting conventional constant parameter specifications as a special case of our framework. We adapt a variant of the well known stochastic search variable selection prior of George and McCulloch (1993) in the context of the time-varying parameter model (for a related approach, see Frühwirth-Schnatter and Wagner, 2010).
We impose a mixture of two gamma priors allowing either for substantial mass close to zero, suppressing time variation, or loose enough to allow for time-varying spatial dependence. In particular, we specify a mixture of two Gaussians on the signed square root of the innovation variance in Eq. (2), The latent binary indicator δ dictates which one of the two components is active. Given δ = 1, the prior on ς 2 is rather loose based on larger values of B 1 and reflects time variation in the spatial dependence parameter by allowing for non-zero variances of the error term in the state equation. For δ = 0, the second component with variance B 0 close to zero is active, pushing the signed square root of the innovation variance towards zero, effectively ruling out time variation.
As a byproduct, this specification yields a posterior probability measure whether time variation for these coefficients is necessary to adequately reflect the data generating process. The binary indicators δ are assigned a Bernoulli distribution δ ∼ BER(p) with prior inclusion probability p = 0.5. This establishes a prior that assumes contant and time-varying spatial dependence to be equally likely.

Sparse finite mixtures to pool coefficients
There are multiple possibilities to estimate observation specific parameters . . , N , with two extreme cases: either one decides to pool information over the crosssection, restricting θ i to be equal for all units, or one introduces truly observation specific parameters. The first restriction, especially in the empirical application of this paper, is likely to be overly restrictive and may mask important structural dynamics. The second variant, however, implies estimating a large number of parameters, and may thus result in imprecise estimates and overfitting issues.
In this paper, we allow for heterogeneous parameters per unit i, but introduce a hierarchical prior that exploits cross-sectional information for more precise inference and pushes similar clusters of observations towards estimated cluster-specific common means. We follow Malsiner-Walli et al. (2016) and introduce a sparse finite mixture of Gaussians prior for the observation specific regression coefficients β i , resembling a random effects specification (Verbeke and Lesaffre, 1996;Allenby et al., 1998;Frühwirth-Schnatter et al., 2004). The prior is given by where f N denotes the Gaussian probability density function, {ω m } M m=1 are mixture weights and {µ m } M m=1 refer to group-specific means for a pre-determined number of M clusters. By introducing an auxiliary variable η i , Eq. (3) can be rewritten as: with η i = m denoting an integer indicating that β i belongs to the mth cluster. Consequently, denotes a common K-dimensional diagonal prior covariance matrix. We select independent inverse gamma priors for the diagonal elements v k (k = 1, . . . , K) of the prior covariance matrix, We specify a shrinkage prior on the mixture component weights. To achieve further parsimony, we combine the shrinkage prior on the weights with shrinkage on the mixture-specific means (Yau and Holmes, 2011;Malsiner-Walli et al., 2016): with µ 0 referring to a common mean and V 0 = LRL denoting the prior covariance matrix for the the coefficient-specific shrinkage parameters l j for j = 1, . . . , K and R = diag(R 2 1 , . . . , R 2 K ) refers to a K-dimensional diagonal matrix with the jth element R 2 j given by the range of (µ 1j , . . . , µ M j ). In what follows, we specify a normal gamma shrinkage prior (Griffin and Brown, 2010) on µ m assuming that l j (j = 1, . . . , K) is gamma distributed, l j ∼ G(e 0 , e 1 ).
In the empirical application, we specify e 0 = e 1 = 0.1. 5 To complete the set-up for the regression coefficients, we specify an improper Gaussian prior on the common mean µ 0 ∼ N (0, Q), centered on zero and with precision Q −1 = 0 K . Sofar, we remained silent on how we specify the priors for the unit-specific error variances σ 2 i . Here, we choose to cluster variances for the M groups and use a conjugate inverse gamma hierarchical prior (Frühwirth-Schnatter, 2006), with hyperparameters specified as ξ = 2.5 + (T − 1)/2, ψ = 0.5 + (T − 1)/2 and Ψ = 100ψ/(ξR 2 y ). R y denotes the range of the dependent variable. The hierarchical structure again implies that the group variances σ 2 m arise from a common distribution (Malsiner-Walli et al., 2016). The cluster specific variance σ 2 m is assigned to all observations i that are associated with the mth cluster. Producing draws for the full history of the time-varying spatial dependence parameter, however, is novel to the literature. In the following, we propose a sampling algorithm for the timevarying spatial dependence parameter. Due to the non-Gaussian setup, Kalman-filter based methods (Carter and Kohn, 1994;Frühwirth-Schnatter, 1994) are inapplicable. Simulation from the posterior distribution can be carried out using a Metropolis-Hastings algorithm. We denote the current state of the respective quantity by s−1 and s refers to a proposal from the candidate density. The procedure is similar to the algorithm proposed in the context of Bayesian stochastic volatility models in Jacquier et al. (2002). We rely on three proposal densities:

1.
For all points in time other than the first and last observation, a draw ρ (s) t is generated from the proposal distribution given by ρ t+1 )/2 and S t = ς 2 /2. 5 As suggested by Malsiner-Walli et al. (2016), e0 < 1 is important to strongly push µm towards a common mean to avoid overlapping component-specific densities. This specification contrasts to Yau and Holmes (2011), who choose a Lasso prior on lj with e0 = 1 (see also Park and Casella, 2008). 6 Identification issues in mixture models arising from label switching may be resolved by implementing a random permutation sampler and ex post clustering of the posterior draws, or using economic theory to impose restrictions on the component means or variances (see Frühwirth-Schnatter, 2001).

2.
Since no initial value ρ 0 is available, we rely on Jacquier et al. (2002) who show that this quantity can be obtained by drawing from a Gaussian distribution ρ 0 ∼ N (µ 0 , S 0 ).

3.
A similar problem arises for the final value at t = T , due to no ρ T +1 being available. Jacquier et al. (2002) suggest drawing from the modified candidate density ρ (s) T −1 and S T = ς 2 .
For each point in time, we generate a proposal for ρ (s) t that can be used to calculate the acceptance probability of the Metropolis-Hastings algorithm. To simplify notation, we definẽ t ) as the vector of spatial lags depending on the current value of ρ (s) t , with σ 2 i referring to the clustered error variance assigned to industry i, and set˜ it = (y then the acceptance probability ζ of the proposal ρ (s) t implied by the likelihood is The candidate draw ρ (s) t is accepted with probability ζ, while in the opposite case, we retain the previous draw ρ (s−1) t . After obtaining the full history for ρ t , it is easy to simulate the variance ς 2 , and the latent binary indicator δ. The conditional posterior of δ = 1|ς 2 is given by δ = 1|ς 2 ∼ BER (u 1 /(u 0 + u 1 )) , Conditional on the latent binary indicator δ and the full history of the spatial dependence parameter, it can be shown that the conditional posterior distribution of ς 2 is a generalized inverse Gaussian distribution. The parameter can thus be drawn using This completes the section on model estimation. We proceed by applying the proposed econometric framework to a study of time-varying effects in the transmission of US monetary policy shocks through the production network.

Data and model specification
For the sake of brevity, we only provide a brief overview of the data and refer to Ozdagli and Weber (2017) for more details. The event returns for industries used as dependent variables y it are constructed based on returns for all common stocks trading on the NYSE, Amex or Nasdaq around press releases by the Federal Open Market Committee (FOMC). In particular, the dependent variable is defined as the difference between the last trade observation before and the first observation after the event window.
To establish the cross-sectional dependency structure via the weighting matrix W , following Ozdagli and Weber (2017)  The vector x it in Eq. (1) thus collapses to a scalar x t that is common to all i, while β i is the associated observation-specific parameter capturing the sensitivity of industry i to the monetary policy shock. Moreover, we include an industry-specific intercept term α i . The information set includes data on FOMC announcements between early 1994 and late 2008, that is, T = 121.

Empirical results
First, we assess the importance of allowing for cross-sectional heterogeneities across industries.
For this purpose, we estimate restricted versions of the general model proposed in Eq. (1) reflecting the empirical approaches of Bernanke and Kuttner (2005), Gürkaynak et al. (2005) and Ozdagli and Weber (2017). Second, we provide a discussion of the main findings of this paper resulting from relying on a time-varying spatial dependence specification.
7 Concerns of central bank information shocks accompanying the monetary policy announcement biasing the effects caused by pure monetary policy shocks (see Nakamura and Steinsson, 2018;Jarociński and Karadi, 2019), can be neglected for the employed dataset (for details, see Ozdagli and Weber, 2017).  (2005), Gürkaynak et al. (2005), abbreviated by BK2005 and GSS2005 respectively, and Ozdagli and Weber (2017). "Homogeneous" refers to pooling information deterministically across industries, while "heterogeneous" indicates industry-specific estimates. For the models featuring heterogeneous coefficients, we take the arithmetic mean over all industries per iteration of the algorithm and report the resulting posterior percentiles. Table 1 displays the results for restricted versions of our model. In particular, the columns labeled BK2005/GSS2005 correspond to econometric frameworks of Bernanke and Kuttner (2005), Gürkaynak et al. (2005), disregarding cross-sectional dependency structures and network effects (that is, ρ 1 = . . . = ρ T = 0). The columns labeled Ozdagli and Weber (2017) feature spatial econometric models without time variation (that is, ρ 1 = . . . = ρ T ). A further distinction is provided by estimating the model with homogeneous and heterogeneous coefficients.
For the models featuring heterogeneous coefficients, we take the arithmetic mean over all industries per iteration of the algorithm and report the resulting posterior percentiles (the posterior median, the 1st and 99th percentile), providing a measure of the average impact of monetary policy shocks on heterogeneous industry returns.
Negative coefficients β imply stock market responses in line with standard economic theory.
Monetary tightening induces a reduction of future expected dividends, and by basic asset pricing theory, higher interest rates increase the discout rate of future dividends, resulting in stock market declines. Considering the first column of Tab. 1, a one percentage point surprise increase of the federal funds rate translates to a decline in stock market returns of about 1.2 percentage points with the 98 percent credible set ranging from approximately −0.9 to −1.5 percentage points.
This example also serves to illustrate the correspondence between interpretation of spatial econometric models and standard linear regressions, with obtained direct, indirect and total effects directly reflecting the regression coefficient due to the assumption of independent and identically distributed error terms. Compared to the findings of Bernanke and Kuttner (2005) and Gürkaynak et al. (2005), our estimates are rather small. Note, however, that the empirical findings are not directly comparable, due to their focus on the aggregate S&P 500 rather than industry-specific returns, and a different sampling period. Relaxing the assumption of parameter homogeneity, we find that the effects are roughly twice as large, where a one percent surprise in the federal funds futures causes stock returns to decline by roughly 2.4 percentage points, on average across industries. We focus on heterogeneities over the cross-sectional dimension in the next section.
Turning to the analysis of network effects, we find that the estimated effects for the spatial econometric specification with pooled coefficients are smaller than those obtained by Ozdagli and Weber (2017), for two reasons. First, our proposed framework directly imputes missing values using Bayesian techniques, and thus accounts for selection bias and adequate uncertainty quantification. Second, in contrast to Ozdagli and Weber (2017) we impose the restriction w ii = 0 to guarantee the stability of the model. 8 The estimated total effects for the homogeneous specification are roughly in line with the effects estimated from a non-spatial model. Higher-order spillover dynamics explain roughly 33 percent of the total effects, with the posterior credible set ranging from 22 to 41 percent.
An interesting finding is that when allowing for heterogeneous coefficients across industries, the obtained effects are roughly in line with the non-spatial specification of negative effects around 2.4 percentage points, while the estimated share of network effect contributions lies between 0.1 and 9.1 percent, with a posterior median of 1.6 percent. This finding suggests that disregarding heterogeneous effects across industries by pooling coefficients is consequential for the parameter ρ that adjusts to reflect idiosyncrasies in direct effects and biases estimates for network effects.
Allowing for time-varying spatial dependence and cross-sectional heterogeneity In the following, we discuss our findings for the full model featuring time-varying spatial dependence and cross-sectional heterogeneities. Given the importance of industry-specific idiosyncrasies identified in Tab. 1, we begin by discussing our findings for the heterogeneous regression coefficients. The mixture model provides substantial support for one common cluster, however, roughly in 25 percent of the draws we find evidence for two clusters. Differences mainly originate from idiosyncrasies in the sensitivity of industries to the monetary policy shocks, while the intercept terms α i are pushed more strongly towards their common mean. The error variances σ 2 m are heavily shrunk towards homogeneity. Turning to the industry-specific impacts of monetary policy surprises captured by β i , we find that the median across industries is approximately −1.3 percentage points in response to a one percentage point increase in the instrument. This is roughly in line with our findings for the nonspatial specification featuring homogeneous coefficients in Tab. 1. As discussed in the context of how to interpret spatial models, however, the regression coefficients cannot be interpreted directly. For this purpose, we calculate the total effect per region which corresponds to the row sums of S kt depicted in the bottom panel of Fig. 1. For simplicity, we consider the average effect over time and refer to the following paragraphs for information on time-variation of the estimates. The median impact across industries and over time to a one percent surprise increase in federal funds futures around Fed announcements is about −1.9 percentage points, with half of the industries showing declines in stock returns between −1.3 to −2.4 percentage points.
We find that effects for all industries are negative, with a substantial number exhibiting effects higher than −2.5 up to more than −5 percentage points. Considering hypothetical responses to a surprise 25 basis points interest rate hike, this implies that a number of industries shows stock market return declines exceeding 0.75 percent.
We proceed with discussing our findings for time-varying spatial dependence. Figure 2 shows the evolution of ρ t over time. The solid black line indicates the posterior median, alongside the 98 and 90 percent credible sets in shaded blue. FOMC announcement dates are indicated by the black vertical lines. Inference on the binary indicator dictating time variation shows that likelihood information strongly suggests time-varying spatial dependence, with δ = 1 for all iterations of the sampling algorithm, translating into substantial differences in the parameter over time. It remains to quantify the overall importance of the network effects over time. Figure 3 shows

CLOSING REMARKS
This paper studies the importance of spillover effects in the transmission of monetary policy shocks through the US production network. We propose a novel Bayesian spatial panel statespace model to capture time-variation in the magnitude of network effects. Moreover, we address industry specific heterogeneities via a sparse finite Gaussian mixture prior on the model coefficients. Our results suggest substantial differences in industry responses, and identify recessionary episodes as periods where network effects play a crucial role.