Forecast Uncertainty, Disagreement, and the Linear Pool

The linear pool is the most popular method for combining density forecasts. We analyze the linear pool's implications concerning forecast uncertainty in a new theoretical framework that focuses on the mean and variance of each density forecast to be combined. Our results show that, if the variance predictions of the individual forecasts are unbiased, the well-known 'disagreement' component of the linear pool exacerbates the upward bias of the linear pool's variance prediction. Moreover, we find that disagreement has no predictive content for ex-post forecast uncertainty under conditions which can be empirically relevant. These findings suggest the removal of the disagreement component from the linear pool. The resulting centered linear pool outperforms the linear pool in simulations and in empirical applications to inflation and stock returns.

of the linear pool, we propose a simple alternative for pooling the information from all individual forecasts which does not include a disagreement component. This alternative pool yields better forecasts than the linear pool in simulations and empirical applications.

Introduction
There is a growing recognition that measuring forecast uncertainty matters for economic policy. For example, many central banks have followed the Bank of England's lead in publishing probabilistic forecasts of inflation and related variables; see Franta, Baruník, Horváth, andŠmídková (2014 , Table 1). Similarly, Manski (2015) calls for systematic measurement and communication of uncertainty in official statistics. In statistical terms, confronting uncertainty about future developments means to issue density forecasts, rather than traditional point forecasts. An immediate question is how to make 'good' density forecasts. In light of many available forecasting methods and data sources, it is often a combination of several individual forecasts, rather than a single forecast, that is considered for this purpose.
While various combination methods have been proposed, in a recent comprehensive survey Aastveit, Mitchell, Ravazzolo, and van Dijk (2018, p. 20) argue that "[..] most applications still focus on the linear opinion pool [..]". Given a set of n individual density forecasts f 1 , . . . , f n , the linear opinion pool, or simply linear pool (LP), is calculated as are the combination weights (Stone, 1961). The concept of the LP is, for instance, employed to produce aggregate probability distributions in the Surveys of Professional Forecasters (SPF) conducted by the European Central Bank and the Federal Reserve Bank of Philadelphia.
In the present paper, we analyze the LP's implications concerning forecast uncertainty. For this purpose, we develop a novel mean-variance prediction space framework for the joint distribution of mean forecasts, variance forecasts, and the target variable in terms of their first two moments. This setup allows us to derive several new results. We focus on the LP's 'disagreement' component which quantifies differences between the mean forecasts of the individual densities. While disagreement has received considerable attention as a potential proxy for economic uncertainty (e.g. Dovern, Fritsche, and Slacalek, 2012), its role turns out to be problematic in the context of the LP.
First, we show that if the individual density forecasts are variance-unbiased, the LP's forecast is underconfident (i.e. the LP's variance is too large), and the expected disagreement is one component of this upward bias. This result sharpens the findings by Gneiting and Ranjan (2013) who characterize the LP's underconfidence in terms of its Probability Integral Transform (PIT). Second, under a set of conditions including joint normality, we show that, within the LP, disagreement has no predictive content for squared forecast errors, thereby starkly violating a desideratum of good uncertainty forecasts. Third, we show that choosing combination weights for the LP entails a trade-off since the weights affect both the mean forecast and the variance forecast of the LP. Weights that are optimal for the mean are usually not optimal for the corresponding variance.
The first two results indicate that disagreement harms the LP's variance forecasts, which suggests that a variance specification without disagreement should be considered. We therefore propose the centered linear pool (CLP), a trivial modification of the LP which achieves this goal and alleviates the trade-off between mean-optimal and varianceoptimal weights. We illustrate our results and investigate the performance of the CLP in simulations and empirical examples of inflation and stock return forecasts. In both 1 empirical examples, the CLP outperforms the LP. We conjecture that the CLP may be a better starting point than the LP for considering sophisticated specifications of the combination weights as proposed, e.g. by Billio, Casarin, Ravazzolo, and van Dijk (2013) and Del Negro, Hasegawa, and Schorfheide (2016).
The remainder of this paper is structured as follows: Section 2 derives simple yet important properties of an optimal variance forecast. These properties form a benchmark for evaluating any variance forecast, including that of the LP. Section 3 presents a baseline example which motivates our analysis of the LP and previews our main results. Section 4 presents a general result on bias in the LP's variance forecast, which is based on the sole assumption that the individual variance forecasts are unbiased. Section 5 turns to a prediction space framework which prescribes a joint model for mean forecasts, variance forecasts, and the variable to be predicted. We introduce this more specific setup in order to derive more specific results, and to identify determinants of the LP's performance. Sections 6 and 7 contain results of Monte Carlo simulations and empirical applications, and Section 8 concludes.
2 Properties of an optimal variance forecast As a first step in our analysis, we derive two simple yet crucial properties of an optimal variance forecast. As a measure of forecast accuracy, we consider the Dawid and Sebastiani (1999) scoring rule which depends only on the mean and variance of a forecast distribution, in line with the focus of our analyses. Specifically, the Dawid-Sebastiani score (DSS) equals the negative logarithmic score of a Gaussian forecast density f N with mean m and variance v, i.e.
where y denotes the realization of the target variable. 1 Note that a smaller score corresponds to a better forecast. Consider forecasting the parameters m and v of a random variable Y , conditional on some information set I.
where S = (Y − m) 2 denotes the squared forecast error. Hence the Dawid-Sebastiani score rewards forecast densities f (that need not be Gaussian) with a correctly specified conditional mean forecast m and conditional variance forecast v, where the latter depends on the former. Note that the Dawid-Sebastiani score focuses on the first two moments of f exclusively. As a consequence, a density forecast with misspecified higher order moments 1 In the literature, the equivalent score log(v)+ (y−m) 2 v is usually used. However, we stick to the variant in Equation (1) for better comparability with the logarithmic score in our Monte Carlo simulations. may perform equally well as the correct density forecast, but not strictly better. In the terminology of Gneiting and Raftery (2007), the Dawid-Sebastiani score is a proper but not a strictly proper scoring rule.
In a multi-observation setup, we treat the mean and variance forecasts as random variables M and V . Variation in M and V may be informative (resulting from variation in the conditioning information set I) or not. The optimality condition in Equation (3)

The linear pool's variance forecast: Baseline example
We next provide a simple example which shows that the LP is likely to violate at least one of the above-mentioned desiderata of an optimal variance forecast, and can easily violate both. Consider a variable Y determined by where X 1 , X 2 and U are distributed as Forecaster 1 only observes X 1 , and forecaster 2 only observes X 2 . Both forecasters aim to predict the distribution of Y and state the correct forecast distribution given their information sets. Each forecaster i ∈ {1, 2} thus issues a Gaussian forecast density with mean M i and variance V i . Table 1 lists the formulas for M i and V i , as well as all other relevant formulas for this example. The LP of the two forecasts is given by where f lp is the density of the combined forecast, f 1 and f 2 are the individual densities, and 0 < ω 1 < 1 is the weight on the first forecast. Here and throughout the paper, we take the combination weights to be fixed, non-stochastic quantities. We denote the mean and variance of this combined density by M lp and V lp , respectively.
As shown in Table 1, both forecasters fulfill the requirements mentioned in Section 2. First, their variance forecasts and squared forecast errors are identical in expectation. Second, the covariance between each variance forecast and the corresponding squared error is equal to the variance of the expected (conditional) squared forecast error. Due to the homoskedasticity in this simple example, the latter two terms are in fact equal to zero.
The LP's variance forecast is of the form V lp = a + D, where a is a constant and Object Formula Individual forecasters Mean forecasts Table 1: Formulas for the baseline example. Moments of linear pool follow from aggregation of individual forecast densities according to f lp = ω 1 f 1 + (1 − ω 1 )f 2 . MSFE denotes the mean squared forecast error.
is the well-known measure of disagreement between the two point forecasts. 2 Strikingly, the LP's variance V lp fails at least one of the requirements mentioned in Section 2, and it fails both in a special but important case. First, the LP's expected variance, E[V lp ] exceeds its MSFE, E[S], for all admissible values of ω 1 . The LP can therefore be labeled underconfident. The disagreement term D, which is positive-valued, contributes to the LP's underconfidence. Second, the LP's variance V lp has no predictive content for its squared forecast error in the important case ω 1 = 0.5. With this value of ω 1 , the covariance between both quantities equals zero, and it can also be shown that D is independent of S (see Appendix A). Note that ω 1 = 0.5 is a popular default choice in practice, and minimizes the MSFE of the combined mean forecast in the present example. For other choices of ω 1 , the relation between D and S depends on ρ, σ 2 X and σ 2 U , but often implies weak correlation between D and S. In the case ω 1 = 0.5, disagreement can thus be regarded as a noise term which deteriorates the LP's variance forecast.

Bias in the linear pool's variance forecast: General case
We now consider the more general situation where n density forecasts f i with corresponding mean and variance forecasts {m i , v i } n i=1 are available, where the index i denotes an individual forecast. The LP determines the combined density as f lp = n i=1 ω i f i , implying the mean and the variance forecast In what follows, we will mostly take the mean specification in (4) as given, and investigate the properties of the variance specification in (5) conditional on (4). Moreover, we will consider a simple modification, the centered linear pool (CLP), with m clp = m lp and Hence, the CLP has the same mean forecast as the LP, but its variance forecast does not contain the disagreement term. Denoting a density f i with mean m i and variance v i by f i (m i , v i ), the CLP is constructed as f clp = n i=1 ω i f i (m clp , v i ). Thus, each individual density is simply relocated such that its mean equals m clp = m lp instead of m i before being combined.
Equations (4) to (6) are formulated in terms of given mean and variance forecasts . From an ex-ante perspective, these objects are random variables which we denote by . Randomness in the forecasts may reflect both conditioning information and noise. To evaluate the performance of a combination method, it is necessary to make assumptions about the joint distribution of the underlying individual forecasts and the target variable Y . We first consider a situation in which the individual forecasts imply a correct assessment of their own uncertainty, as formulated in the following assumption.

A1 The individual variance forecasts
As mentioned earlier, assumption A1 is implied by the optimality condition of the Dawid-Sebastiani scoring rule in Equation (3). However, A1 imposes only an unconditional notion of unbiasedness, and is hence much weaker than Equation (3). Denoting the squared error of the combined mean forecast n i=1 ω i M i by S, we have the following first result.
Proposition 4.1. Consider any joint distribution of M , V and Y . Under A1 and assuming that the combination weights ω i are positive and sum to one, it holds that i.e. the variance of both the LP and the CLP is upward biased, with the bias being larger for the LP.
where the key inequality in the first line follows from the convexity of the square function.
The result shows that the variance forecasts of the LP and the CLP are upward biased, even though the individual variance forecasts are unbiased. This finding is similar in spirit to Gneiting and Ranjan (2013, Theorem 3.1(c)) who show that an LP of 'neutrally dispersed' forecast densities is underconfident. In contrast to our focus on the mean and variance, Gneiting and Ranjan (2013) define 'neutral dispersion' and underconfidence in terms of the PIT which depends on the entire forecast density. Of course, considering the entire density is appealing in principle. Without further restrictions, however, it can be hard to identify what drives a density's dispersion as measured by the PIT. Our approach of defining a density's uncertainty in terms of its variance allows us to sharpen the result of Gneiting and Ranjan (2013) by identifying disagreement as a variance-bias augmenting term. 3 Finally, note that the statement of Proposition 4.1 remains true if A1 is replaced by the following, weaker assumption: Hence if the individual densities are underconfident, the LP is underconfident as well.

Properties of the linear pool in a prediction space model
We next impose more structure on the joint distribution of the forecasts and realizations in order to derive more specific results and identify drivers of the LP's forecast performance.

Prediction space model for forecasts and realizations
The joint distribution of forecasts and realizations has already been considered as an important analytical tool in the contributions by Bates and Granger (1969) and Murphy and Winkler (1987). Gneiting and Ranjan (2013) provide a formalization as a 'prediction space' for full density forecasts. Here we consider a simplified variant where each forecast is characterized by a mean and variance only; see Ehm, Gneiting, Jordan, and Krüger (2016, Section 3.1) for further discussion.
Consider a vector of mean forecasts M ∈ R n , a vector of variance forecasts V ∈ R n + , an error term U ∈ R (see below) and a positive-valued common factor η ∈ R + such that where the column on the right-hand side denotes the expectations of the variables, and the matrix denotes the variance-covariance matrix. The random variable η is assumed to be independent of the variables on the left-hand side, i.e. of η −1/2 M, η −1/2 U, η −1 V . 4 Without loss of generality, we assume that the expectation of the target variable Y conditional on η is zero. The error term U is defined as the residual of the projection of the target variable Y on the bias-corrected forecasts where γ is the vector of coefficients resulting from this projection. The conditional distri-bution of the original variables thus is given by Note that η acts as a common factor which scales the mean vectors of M and V , as well as the model's covariance and variance terms (except the ones that are restricted to zero). Albeit simple, such a common factor specification of time-varying uncertainty is in line with the empirical results of Carriero, Clark, and Marcellino (2016) for many macroeconomic and financial variables. Without loss of generality, we assume that E[η] = 1. Note that we impose few restrictions on the process that generates η. For example, η could be a discrete variable that takes values of 0.5 and 1.5 with probability one half each. As another example, η could follow an autocorrelated time series process. The only requirement we impose is that E[η] = 1 and V[η] ≡ σ 2 η exist; hence, the process generating η must be stationary. That said, none of our subsequent results depends on any properties of this process other than σ 2 η . By varying the biases in µ M , the weight vector γ and the variance of U , the framework can accommodate a wide range of scenarios, with mean forecasts M ranging from poor to precise. Finally, we assume that the variance forecasts V are conditionally uncorrelated with M and U . Hence, our setup does not cover situations where changes in uncertainty affect the mean of the target variable, like, for instance, in GARCH-in-mean models.

Bias in the linear pool's variance forecast
The following result uses the prediction space framework to derive a specific characterization of the bias in the LP's variance forecast.
. Hence the LP and CLP systematically deviate from S according to Proof. See Appendix C.
Proposition 5.1 quantifies the relationship between the LP's expected squared forecast error, E[S], and the LP's variance forecast, E[V lp ]. As shown in Section 2, both quantities should be equal in optimum. The proposition thus provides a precise assessment of the LP's underconfidence. 5 Propositions 4.1 and 5.1 have the same qualitative interpretation: If the individual variance forecasts are unbiased, then both the LP and the CLP are underconfident. However, Proposition 5.1 exploits the prediction space framework to make a precise quantitative statement on the pools' underconfidence.

Disagreement encompassed by weighted average variance forecasts
We next derive a result that speaks to the desideratum that a variance forecast should be highly correlated with the squared errors of the mean forecast (see Section 2). Specifically, consider the following linear regressions: and where (11) is aimed at the case of constant variance forecasts ω V , such that (10) cannot be employed. The LP entails the assumptions that a 0 = 0, a 1 = a 2 = 1 or, alternatively, that b 1 = b 2 = 1. We are particularly interested in the coefficients a 1 and b 1 which quantify the contribution of disagreement to the forecast of the squared error. The population coefficient of a 1 equals whereα andβ are the population coefficients from a linear regression of D on a constant and ω V . A similar expression holds for b 1 . 6 The following result states conditions under which a 1 = b 1 = 0 holds, corresponding to a particularly stark violation of the LP's implicit assumption.
A3 The combination weights minimize E[S] subject to the constraint of adding up to one (Bates and Granger, 1969).
A4 The joint distribution of M and U conditional on η is normal.
forecasters (N in their notation) diverges to infinity; in that case, their expression σ 2 λ|th corresponds to our E[S]. However, their statement is different from ours for any fixed value of N . Furthermore, Lahiri and Sheng (2010) Then, a 1 = b 1 = 0, i.e. ω V encompasses disagreement in the prediction of S.
Proof. See Appendix C.
Assumption A2 restricts the biases in the mean forecasts M to be equal, with unbiased mean forecasts (κ = 0 in A2) being an important special case. From an empirical perspective, A2 can be motivated by the fact that different model-based forecasting methods often rely on a similar sample of data for parameter estimation. Assumption A3 restricts the way in which a researcher derives the combination weights ω. In the context of density forecasting, the restriction of adding up to one is necessary to guarantee that the combined object is a density. 7 Since these weights minimize the MSFE, we refer to them as the MSFE-optimal weights ω * . Note that ω * can have negative elements which could, in principle, lead to a negative variance prediction by the LP. We disregard the latter case which seems of minor applied relevance. We also note that ω * is optimal in terms of the Dawid-Sebastiani score (given the prior adding-up-constraint), and is hence consistent with our setup. 8 Assumptions A4 and A5 impose restrictions on the prediction space, conditional on the common factor η. However, the prediction space retains considerable flexibility by specifying alternative distributions of η. 9 While the assumptions underlying Proposition 5.2 are restrictive, they do not appear implausible from an empirical point of view. Furthermore, note that the case covered by the proposition is a fairly drastic one, in that disagreement plays no role whatsoever, which is a particularly stark contrast to the LP's assumption. Below we present numerical results which show that similar results (disagreement having limited but nonzero predictive content) hold under a broader set of conditions.

Choice of combination weights
The magnitude of the LP's variance bias (discussed in Propositions 4.1 and 5.1) and the correlation between D and S (discussed in Proposition 5.2) depend on ω, the vector of combination weights. Hence it seems tempting to choose ω in a way that limits the drawbacks of the LP mentioned above. Unfortunately, this approach is not feasible in general: ω also affects the LP's mean forecast, and a given choice of ω can generate good mean forecasts but poor variance forecasts or vice versa. The following result describes such a situation. 7 In combinations of point forecasts, the same restriction is also commonly employed. 8 As noted in Section 2, minimizing the expected Dawid-Sebastiani score requires the researcher to first select an MSFE-optimal point forecast, and then select the corresponding variance. The Bates and Granger (1969) minimization problem solves the first step, given the prior constraint that the weights add up to one. 9 Moreover, A4 could be relaxed slightly at the cost of a more demanding exposition. In fact, U can be non-normal if certain moments of U are uncorrelated with certain moments of M conditional on η. Normality of M conditional on η continues to be required.
Then the MSFE-optimal combination weights ω * maximize the upward bias of the LP's variance.
Proof. See Appendix C.
Proposition 5.3 presents a simple but empirically relevant scenario in which the weights that minimize the MSFE of the LP's mean forecast also maximize the upward bias of the LP's variance forecast. Assumption A6 requires that the mean forecasts to be combined have the same MSFE, which holds approximately in many applications. An interesting special case of A6 arises when all forecasts are unbiased and all pairwise correlations between the forecast errors are identical, i.e. Cor(Y − M i , Y − M j ) = ρ for all pairs of forecasts i, j with i = j. In this case, equal weights are optimal in terms of MSFE (Timmermann, 2006, Section 2.4). Motivated by the good performance of equal combination weights in applications, the latter case has received much attention in the literature (e.g. Smith and Wallis, 2009;Elliott, 2011;Claeskens, Magnus, Vasnev, and Wang, 2016).
If the assumptions of Propositions 5.1 and 5.2 apply, it is clear that the MSFE-optimal weights ω * are not optimal for the LP's variance because the disagreement component is a bias-augmenting noise term. Since disagreement and, consequently, the bias equals zero if one forecast receives a weight of one, the LP faces a trade-off between accurate mean forecasts achieved by using ω * and accurate variance forecasts achieved by using a weight vector ι [i] that places a weight of one on a single forecast i ∈ {1, . . . , n}, and a weight of zero on all other forecasts. The DSS-optimal weights for the LP will differ from ω * if the gain in variance forecast accuracy obtained by moving from ω * towards ι [i] exceeds the corresponding loss in mean forecast accuracy for any forecast i. The variance forecast will become more accurate in this case because its bias is reduced and its disagreement component becomes correlated with the squared forecast error. The CLP faces a similar trade-off. However, moving from ω * towards ι [i] will yield smaller gains in variance forecast accuracy for the CLP, because the initial bias is smaller and, hence, the bias reduction will be smaller. Moreover, there is no disagreement component which becomes correlated with the squared forecast error. Therefore, the DSS-optimal weights of the LP can be expected to differ more strongly from the MSFE-optimal weights ω * than the DSS-optimal weights of the CLP.

Disagreement as a proxy for weighted average variance
Our results up to this point present conditions under which disagreement is detrimental to the LP's performance. These results refer to the situation where both disagreement D and the weighted average variance ω V are available, as is clearly the case when combining various density forecasts. In the distinct situation where only mean forecasts are available, however, disagreement is sometimes considered as a simple proxy for predictive uncertainty. The economic policy uncertainty index provided by Baker, Bloom, and Davis (2019), which considers disagreement among participants of the Federal Reserve Bank of Philadelphia's SPF, is a prominent example. The following result states conditions under which disagreement is a viable proxy of the weighted average variance in our theoretical framework.
A8 Σ M is such that its diagonal elements are all equal to σ 2 , and its off-diagonal elements are all equal to θ.
Then if the number of forecasters n goes to infinity, it holds that Cor(D, S)/Cor(ω V, S) → 1 and that Cor(D, ω V ) → 1.
Proof. See Appendix C.
The result shows that under certain conditions, disagreement and weighted average variance are equivalent for large n, in the sense that their correlation converges to one and they become equally correlated with S, the squared forecast error of the combination. From a practical perspective, the proposition thus provides a justification for using D as a proxy for ω V when the latter is not available and the number of forecasters is large enough. Figure 1 illustrates the proposition by plotting the correlation between D and ω V . The correlation increases in n and σ 2 η , the variance of the common factor η. Conversely, the figure shows that the correlation of D and ω V tends to be low if only few forecasts are available and if σ 2 η is small, such that fluctuations in disagreement reflect noise rather than fluctuations in η. The result that σ 2 η needs to be large enough in order to render disagreement a good proxy for uncertainty is supported by the empirical findings of Boero, Smith, andWallis (2015, p. 1044) who state that "[..] the joint results from the US and UK surveys suggest the encompassing conclusion that disagreement is a useful proxy for uncertainty when it exhibits large fluctuations [..]".
While certainly restrictive, A2', A7 and A8 may be useful assumptions to make when using a survey of forecasters like the SPF. In such a survey, n may be relatively large, and making simplifying assumptions along the lines of A2', A7 and A8 is often a sensible strategy in order to reduce estimation noise. In particular, Capistrán and Timmermann (2009) show that entry and exit of forecasters makes it challenging to identify more refined correlation structures among the individual point forecasts.

Monte Carlo simulations
Here we illustrate our results via simulation examples. For simplicity, we fix the factor η in Equation (9) at a value of one, which corresponds to the limiting case that V[η] → 0. 10 Specifically, we simulate variants of the baseline example in Section 3, where the target variable is given by In all simulations, U is normally distributed. Using the notation of Section 5, the mean forecast of forecaster i is given by M i = X i such that equation (8) holds with γ = 1, 1 and µ M = 0, 0 . Moreover, Σ M equals The corresponding variance forecasts equal Thus, the conditional moments of V are described by (9) with µ V = V 1 , V 2 , Σ V = 0, and η = 1. Hence, each forecaster produces mean and variance forecasts that are ideal conditional on her information set, thereby fulfilling assumption A1 of unconditional unbiasedness. The combined mean forecast of each combination scheme to be considered equals where ω 1 is the weight for the first forecast.
In the Monte Carlo simulations, we employ three combination schemes, the linear pool (LP), the centered linear pool (CLP) and, additionally, a variance-unbiased linear pool (VULP). The latter is difficult to apply in practice, but it is useful to illustrate some of the theoretical results. We denote the ith density forecast by where M i denotes the mean and V i the variance of f i . The density f i does not need to be normally distributed, but we suppress the potential dependence on additional parameters in our notation. The density of the LP is given by whereas the density of the CLP equals Thus, the CLP relocates both individual density forecasts at M c , and then combines the relocated densities linearly. Finally, the density of the VULP is Hence, in addition to relocating, the VULP rescales both densities such that the individual variance forecasts are reduced by E [D] . Each of these pools implies a mean and a variance forecast. While all three combined densities have the same mean forecast (i.e. M lp = M clp = M vulp = M c ), the respective variance forecasts differ. The LP produces the variance forecast The CLP yields Using the VULP results in with E [D] = ω 1 (1 − ω 1 ) σ 2 X 2 + 1 . Note that V clp and V vulp are constant, whereas V lp contains a stochastic component.
In the first case considered, we employ σ 2 X 2 = 1. 11 The weight ω 1 for the first forecast ranges from 0 to 1, and the MSFE-optimal weight is ω * 1 = 0.5. As shown in the lower left panel of Figure 2, the variance forecasts have the property described in Proposition 5.1. The MSFE of the combined forecast equals E[S] = V vulp . The CLP's constant variance, V clp , lies halfway between the constant optimal variance V vulp and the LP's expected variance E[V lp ]. The difference between the CLP's variance and the LP's variance is given by . In line with Proposition 5.3, E [D] reaches its maximum at ω * 1 , implying that the largest bias in the LP's variance forecasts occurs when MSFE-optimal weights are used.
The regression coefficients displayed in the lower right panel at ω * 1 illustrate the result of Proposition 5.2. The coefficient for disagreement equals zero, because disagreement D has no explanatory power for the squared error S beyond what is contained in the combined variance forecast ω V which is constant here. Since ω V is larger than E [S], its coefficient is smaller than one. A coefficient equal to one for D, as is implicitly used by the LP, is only observed for weights which differ considerably from ω * 1 . The regression coefficient for ω V is almost constant across all values of ω 1 .
The scoring rules displayed in the two upper panels of Figure 2 show that all pools perform best when using the weight ω * 1 . The LP performs worse than the CLP and the VULP at ω * 1 and for a wide range of weights around ω * 1 . The VULP outperforms the two other pools for most values of ω 1 . Concerning the Dawid-Sebastiani score and the weight choice ω 1 = ω * 1 , the superiority of the VULP follows from the fact that its mean and variance forecasts satisfy the optimality restrictions in Equations (2) and (3). For values of ω 1 close to zero or one, the differences in the scores tend to be small. With the current setting, the CLP and the VULP yield normal densities, leading to the equality of the Dawid-Sebastiani score and the logarithmic score. The LP produces non-normal densities.
The second case we consider is equal to the first case except for the value of σ 2 X 2 which now equals 1.5. Thus, f 2 has a lower variance than f 1 , and M 2 produces a lower MSFE than M 1 . Therefore, the weighted average variance forecast ω V of the CLP displayed in Figure 3 increases with ω 1 . The variance forecast of the VULP is minimal at ω * 1 = 0.4 which is the MSFE-optimal weight. The bias term E [D] continues to have its maximum at ω 1 = 0.5. 12 As implied by Proposition 5.2, the regression coefficient for D equals zero at ω * 1 . The regression coefficient for ω V slowly decreases with ω. The optimal weights differ for each pool. For the VULP, the minimal Dawid-Sebastiani score is attained at ω * 1 = 0.4. For the other pools, however, it is optimal to reduce the bias of their variance forecasts at the cost of lower accuracy of their mean forecasts. For the CLP, the smallest Dawid-Sebastiani score is reached at ω 1 = 0.37. For the LP, which has a larger variance bias than the CLP, a considerably smaller weight of ω 1 = 0.24 turns out to be optimal. Similar observations apply to the logarithmic score, where the optimal weights for the VULP and the CLP are virtually the same as for the Dawid-Sebastiani score, although the forecast densities are non-normal. For the LP, the optimal weight equals ω 1 = 0.3. Since the LP's scores are flatter due to its higher variance bias, these simulation results indicate that finding optimal combination weights for the LP is likely to be more difficult than for the CLP in empirical applications. Like in the first case, with respect to both scores the LP performs worst for a wide range of weights around ω * 1 , and the VULP performs best.
In the third case, X 1 and X 2 both follow t-distributions with 5 degrees of freedom, rescaled such that σ 2 X 1 = σ 2 X 2 = 1. Each individual forecast continues to be ideal -conditional on the respective information set -in terms of its mean and variance prediction. However, the forecast densities f i are misspecified in terms of their functional form since they are rescaled t-distributions, while the correct density would be given by the density of M j + U , i.e. the sum of a (rescaled) t-distributed and a normal random variable. Concerning the weighted average variance forecasts displayed in Figure 4, the results are identical to those from the first case. Yet the regression coefficients displayed in the lower right panel differ because the assumption of joint normality of M 1 , M 2 , U required by Proposition 5.2 is violated. As a consequence, at ω * 1 , D contains information beyond that contained in ω V. However, the coefficient for D still has its minimum at ω * 1 . Again, the coefficient for ω V is relatively stable across all values of ω.
For each pool, the Dawid-Sebastiani score and the logarithmic score differ because all pooled densities are non-normal. All pools attain their lowest values at ω * 1 , and the LP performs worse than the CLP and the VULP at ω * 1 and for a certain range of weights around ω * 1 . This range is narrower than in the first case, but still covers more than the central 50% of all weights considered. The VULP outperforms the CLP with respect to the Dawid-Sebastiani score. For the logarithmic score, the VULP and CLP attain similar values, with the CLP performing marginally better.

Empirical case studies
We next investigate the properties of the LP and CLP in two case studies from macroeconomics and finance. In both case studies, we construct measures of forecast uncertainty from time series models with stochastic volatility.

Forecasting inflation
Here we employ inflation forecasts from the univariate unobserved component model with stochastic volatility (UCSV) by Chan (2013) and from the bivariate unobserved component model with trends and cycles (biUC) by Chan, Koop, and Potter (2016). The UCSV model is closely related to the model in Stock and Watson (2007), but the stochastic volatilities evolve as AR(1)-processes instead of random walks. Trend inflation in the biUC model is bounded between 0% and 5%, and the non-accelerating inflation rate of unemployment (NAIRU) is bounded between 4% and 7%. See Chan (2013) and Chan et al. (2016) for all details on the Bayesian estimation methodology and prior choices. 13 The models are estimated recursively using the annualized quarterly growth rate of the GDP deflator and the unemployment rate of the US from 1948Q1 to 2018Q2. The evaluation sample starts in 1964Q4. We investigate forecasts for one and two quarters ahead. We denote the weight for the UCSV model by ω 1 .  Figure 2: Monte Carlo simulations, case 1 (Gaussian forecasts, σ 2 X 2 = 1). The top row shows the Dawid-Sebastiani and logarithmic scores (log scores) for the linear pool (LP), the centered linear pool (CLP) and the variance-unbiased linear pool (VULP), plotted against the combination weight ω 1 . A lower score indicates a more accurate forecast. The vertical lines indicate the optimal weights for each pool considered. The optimal weight for the VULP (shown in blue) is also optimal in terms of MSFE. The bottom row shows the forecast variance and the regression coefficients in Equation (10), again plotted against ω 1 . The vertical lines indicates the VULP-and MSFE-optimal weight. All results are based on 10,000 simulations and 10,000 observations for Y in each simulation.    Figure 5 presents the forecasts for the mean and the variance of inflation. The forecasts of both models are more strongly correlated for h = 1 than for h = 2, and the biUC model tends to forecast larger variances especially around 1980. Figure 6 displays results of the LP and the CLP for all positive weights. 14 The middle row of the figure summarizes the pools' variance forecasts and their MSFE. The results for the corner weights ω 1 = 0 and ω 1 = 1 reveal that the variance forecasts of both models exceed their respective MSFEs; this upward bias is more pronounced for the biUC model. Furthermore, the MSFEs of the pools' mean forecasts attain their smallest values at ω * 1 = 0.48 for h = 1 and at ω * 1 = 0.32 for h = 2.
The regression coefficients of Equation (10) displayed in the lower row of Figure 6 exhibit a dependence on the weights which is similar to the Monte Carlo simulations. The coefficient of ω V is positive, smaller than 1, and relatively stable. It decreases with ω 1 due to the stronger variance bias of the biUC model. The coefficient of D reaches values around 1 only for relatively extreme values of ω 1 . For weights around ω * 1 , the coefficient is comparatively small and stable. In contrast to the Monte Carlo simulations considered, the coefficient of D becomes negative for a wide range of weights around ω * 1 . For both horizons, in general, the Dawid-Sebastiani score of the CLP shown in the top row of Figure 6 is lower than the Dawid-Sebastiani score of the LP. The opposite holds for extreme weights only, where the scores of both pools are almost identical. As in the Monte Carlo simulations with unequal MSFEs, the optimal weights of both pools result from the implicit trade-off between mean and variance forecast accuracy. Relative to ω * 1 both pools prefer to put more weight on the model with the lower MSFE, leading to a lower (and thus, less biased) variance forecast. The trade-off between mean and variance accuracy is particularly daunting for the LP, for which disagreement enters the variance equation. As a consequence, the optimal weight for the LP implies a more pronounced deterioration of its mean forecast accuracy. While the optimal weights of the CLP for h = 1 and h = 2 equal 0.45 and 0.23, respectively, the corresponding optimal weights of the LP are given by 0.40 and 0.12.
Since equal weights are ubiquitously used in practice, we report more detailed results for this case. As shown in Tables 2, the CLP's weighted average variance forecasts exceed the respective MSFEs by roughly 50% for both horizons. Disagreement further aggravates this bias for the LP, accounting for 8% of the LP's variance at h = 1 and for 11% at h = 2. The superior predictive accuracy of the CLP over the LP is not statistically significant at h = 1, but it is for h = 2. 15 Table 3 reports on regressions of S on a constant, D and ω V . For both horizons, neither the positive constants nor the negative coefficients of D are statistically different from 0, whereas the positive coefficients of ω V are. These results indicate that D is encompassed by ω V in the prediction of the squared forecast errors S. The pairwise correlations of D, S, and ω V are all positive, but the correlations of D and S are clearly smaller than those of ω V and S.  Newey and West (1987) standard errors, with truncation lag chosen according to Andrews (1991) and Zeileis (2004).   Table 2 for details. On the right, correlation coefficients between disagreement, equally weighted average variance forecasts, and squared forecast errors.

Forecasting monthly excess stock returns
As a second case study, we consider predicting monthly excess returns of the S&P 500 via models of the form where R t is the monthly excess return, X t is one out of 15 predictors considered by Welch and Goyal (2008), and ε t is an error term with stochastic volatility. We follow Rapach, Strauss, and Zhou (2010) in combining 15 univariate models of the form (12). Each model is estimated via Bayesian methods; see Appendix D for details and prior choices. The models are estimated recursively, with observations ranging back to January 1970. The evaluation period is from January 1990 to December 2015. We only consider the case of equal combination weights due to the larger number of models involved. Figure 7 presents the return forecasts. In contrast to the previous case study, we find that disagreement D is dwarfed by the average variance forecast ω V , with the latter exceeding disagreement by a factor of around four hundred on average (see Table 2). This result can be explained by low predictability of the mean excess return, which corresponds to similar mean predictions of the 15 individual models. Since D is very small in magnitude, it hardly hampers the predictive accuracy of the LP, which is statistically indistinguishable from its CLP counterpart. Yet, Table 3 shows that, like in the previous case study, ω V encompasses D in the prediction of the squared forecast errors S. Relative to the correlation of ω V and S, the correlation of D and S is larger than in the inflation case study. Based on Proposition 5.4, this result might be explained by the fact that more forecasts are pooled here, rendering D a less noisy predictor of S relative to ω V .
Procedures which aim at modifying the aggregated density provided by the linear pool are unable to target the noisy disagreement term only. Therefore, the consideration of optimized time-varying weights as surveyed in Aastveit et al. (2018, Section 3), or the spread-adjusted or beta-transformed linear pools proposed in Gneiting and Ranjan (2013, Sections 3.2 and 3.3) can be expected to yield only limited gains in accuracy unless the disagreement term is relatively small. However, the same methods can be applied to the centered linear pool, and the corresponding gains could actually be larger in that case, for instance because the likelihood function, like the scores considered, becomes less flat with respect to the weights.
Finally, our analysis rests on the assumption that a better proxy of forecast uncertainty (based on the individual models' variance forecasts) than disagreement is available. This assumption is plausible in the context of forecast density combinations. In situations where only point forecasts are available, our Proposition 5.4 formalizes the notion that disagreement may be a reasonable 'second-best', provided that it is calculated from a sufficiently large cross-section of point forecasts. The latter assessment is in line with the approach used in many empirical studies based on macroeconomic survey forecasts.
We can write W as W = C Z, where C is the lower-diagonal Cholesky matrix of Ω, and Z is a trivariate vector of independent standard normals. Simple algebra yields that A standard result (Craig, 1943) states that D and S are independent if C aa CC bb C = 0. Simple but tedious algebra shows that this condition is satisfied for ω = 0.5.

B Basic formulas
The following formulas are used repeatedly in the proofs of Appendix C, and are thus listed for easier reference. Formulas involving third and higher moments of Gaussian quadratic forms involve cumbersome yet well-known calculations available in Theorems 1.6 and 1.7 of Schmidt (2013).
The result that R n → 1 as n → ∞ then follows from noting that the first summand in the denominator vanishes as n → ∞. The proof for the second statement is very similar and thus omitted.

D Details on the SV model
This appendix provides details on the model used in Section 7.2, which is of the form where the vector X t = 1, Z t contains an intercept and a scalar regressor Z t . We denote the sample period by t = 1, . . . , T , and let X = X 0 X 1 . . . X T −1 and Y = Y 1 , . . . , Y T .
We use n G = 20, 000 Gibbs sampler draws, which are preceded by a burn-in period of 5, 000 draws. We compute the forecast mean and variance as these formulas follow from the usual view that the forecast distribution is an equally weighted mixture of the n G forecast distributions obtained at the individual Gibbs iterations.
Priors Table 5 summarizes the prior parameters. We use loose priors for β and a diffuse prior for (the initial log variance) h 0 . Our choices of a ν and b ν allow for a considerable amount of stochastic volatility, as shown in the right panel of Figure 7. Combination Weight ω 1 a1 a2 Figure 6: Inflation case study. Top row: Dawid-Sebastiani score plotted against the combination weight ω 1 placed on the UCSV model. A smaller score is better. The vertical lines mark the ex-post optimal weight for the pools (black and orange) and in terms of MSFE (blue). Middle row: Forecast variance plotted against combination weight; the blue curve indicates the MSFE. Bottom row: Regression coefficients in Equation (10), plotted against combination weight. 33 Year Variance Figure 7: Return case study. Predicted mean (left) and variance (right) of the 15 models, plotted over time.