Local Prediction Pools

We propose local prediction pools as a method for combining the predictive distributions of a set of experts conditional on a set of variables believed to be related to the predictive accuracy of the experts. This is done in a two step process where we first estimate the conditional predictive accuracy of each expert given a vector of covariates$\unicode{x2014}$or pooling variables$\unicode{x2014}$and then combine the predictive distributions of the experts conditional on this local predictive accuracy. To estimate the local predictive accuracy of each expert, we introduce the simple, fast, and interpretable caliper method. Expert pooling weights from the local prediction pool approaches the equal weight solution whenever there is little data on local predictive performance, making the pools robust and adaptive. We also propose a local version of the widely used optimal prediction pools. Local prediction pools are shown to outperform the widely used optimal linear pools in a macroeconomic forecasting evaluation, and in predicting daily bike usage for a bike rental company.


Introduction
Forecast combination has a long history in statistics and related areas (Clemen, 1989;Winkler, 1981) and is widely used in forecasting and policy making (Adolfson et al., 2007).Early approaches focus on aggregating point forecasts (Bates and Granger, 1969), whereas a more recent strand of the literature is more concerned with combining forecast distributions (Hall and Mitchell, 2007;Geweke and Amisano, 2011;Billio et al., 2013;Kapetanios et al., 2015;Yao et al., 2018;McAlinn et al., 2020;McAlinn, 2021;Casarin et al., 2023).These combined predictive distributions may come from statistical models learned from data, or be elicited directly from experts without explicit probabilistic models, or be a mix of the two types.
An example of forecast combination is macroeconomic forecasting and policy making at central banks where predictive distributions from dynamic stochastic general equilibrium (DSGE) and vector autoregressive (VAR) models are combined with forecast distributions elicited from internal experts (Kjellberg and Villani, 2010).We will use the terms expert and expert distribution irrespective of whether the predictive distribution comes from statistical models or from elicitation of expert opinions.
The optimal way to linearly combine statistical predictive models is to jointly estimate the model parameters in all models and the pooling weights in the combined prediction in a mixture model (Frühwirth-Schnatter, 2006).This ideal is often unattainable in practice however, either because the set of predictive distributions includes informally elicited expert opinion or because the models are too complex to be estimated jointly as a mixture.An example of the former is when large forecasting institutions use dedicated teams that work on models in isolation, using their own software implementations, which makes it practically impossible to re-estimate as a single mixture model.Del Negro et al. (2014) call this informational friction.
A common approach to the combination of expert distributions in the literature is the linear prediction pool (Lindley et al., 1979;Hall and Mitchell, 2007;Geweke and Amisano, 2011) where the combined distribution is a linear, often convex, combination of the expert distributions.Such linear pools have been shown to be optimal from a Bayesian perspective under certain specific assumptions (Genest and Zidek, 1986;West, 1992).The expert weights in linear pools are usually chosen to maximize the out-of-sample predictive performance with respect to some scoring rule, most often the logarithmic scoring rule; such optimized pools have been termed optimal prediction pools.A related set of aggregation methods are called stacking in the machine learning literature (Wolpert, 1992) and have more recently also been further developed in the statistical literature (Yao et al., 2018).Geweke and Amisano (2011) show that optimal prediction pools will typically converge to a solution that puts non-zero weight on more than one model in large samples; this is in contrast to Bayesian model averaging where the posterior model probabilities will asymptotically concentrate entirely on one of the models, even when the data generating process is outside the set of compared models (Berk, 1966).
The original linear and optimal prediction pools use a single time-invariant weight for each model.We will term such a weighting scheme a global pool.Global pools implicitly make the strong assumption that the predictive ability of the experts is the same over time and for all possible values of any conditioning variables used as explanatory variables in the models.Some recent work have proposed to use time-varying weights in optimal prediction pools to allow models to be up-or down-weighted during certain time periods, see e.g.Del Negro et al. (2014) and Billio et al. (2013).Li et al. (2022) have recently proposed a generalization of the optimal prediction pools in Geweke and Amisano (2011) where the weights are allowed to depend on a set of covariates through a softmax function.Similarly, Yao et al. (2021) extends the stacking method of Yao et al. (2018) by allowing the model weights to vary as a function of the data.
In this paper, we take a general perspective similar to that in Yao et al. (2021) and allow the expert weights to vary with respect to a general set of pooling variables, which are variables that are believed to affect the predictive ability of the experts.These pooling variables may include time-giving us time-varying expert weights-but also other variables that may be related to expert performance.The pooling variables may be part of the information set of some of the experts, but can equally well be completely external variables not used by any of the experts, for example a business cycle indicator aggregated from survey expectations or sentiments extracted from social media.We call such weighting schemes local prediction pools to emphasize that they are determined by the local predictive performance of the experts.
The main challenge with local prediction pools is the need to learn the local predictive performance of all the experts.The learned local performance must allow for robust interpolation and extrapolation across the space of the pooling variables for it to be useful when constructing pools for predicting new data.This is a challenging problem, particularly when the number of pooling variables is large and limited data is available on the prediction performance of the experts.
Given historical measures of predictive ability for each expert and data on the pooling variables, learning local prediction performance is a problem of surface estimation.The pooling surface for each of the experts can be estimated using a multitude of smoothing techniques where the pooling surface is estimated by averaging locally around the point of interest in the space of pooling variables.With this perspective, a global prediction pool is an extreme special case where all observations are used equally to estimate global performance and to construct a single weight on each model, regardless of the state of the local pooling variables.We propose an easily implemented nonparametric method for estimating the pooling surface that automatically adapts the degree of locality to the local concentration of data in the pooling space and the differing historical local performance of the models in the pool.The expert weights from this estimator approaches equal weights locally as the number of past local predictions decreases.We also introduce a local version of the optimal prediction pool in Geweke and Amisano (2011).
To allow us to interpret local prediction pools in subjectivist Bayesian terms we take the decision maker perspective of Lindley et al. (1979), where expert predictions are treated as data used by a decision maker to update her predictive beliefs.We formalize our local prediction pools using an extension of the Bayesian synthesis framework in Johnson and West (2018).
The rest of the paper proceeds as follows: Section 2 develops the local prediction pools framework; Section 3 introduces the caliper method for estimating local predictive ability together with an illustrative theoretical example; Section 4 contains two applications, in the first we use local prediction pools to make better quarterly forecasts of key macroeconomic variables, and in the second we predict daily bike usage for a bike rental service; Section 5 concludes.

The local pooling framework
This section establishes a theoretical framework for local prediction pools in which a decision maker (DM) wants to create a combined, or pooled, predictive distribution for a variable of interest, y t , based on the predictive distributions of K experts.The experts may be formal statistical models or opinionated humans.To help accomplish this, the decision maker uses historical data in the form of a sequence of predictions made by the experts.Furthermore, the DM also has access to a vector of pooling variables, z t ∈ Z, over which she believes that the predictive ability of the experts vary.The aim of the decision maker is to pool the experts' forecasts based on their local predictive ability at the current z t .
To achieve her aim, the decision maker needs to i) set up a pooling space Z, ii) estimate the local predictive ability of each expert over Z, and iii) use a pooling function to synthesize the predictions of the experts, conditional on their local predictive ability.This section goes through these steps in turn, and positions local pools within the Bayesian predictive synthesis (BPS) framework of Johnson and West (2018).
Any scoring function can be used to measure the predictive ability in step ii) but we will use the logarithmic scoring rule in the form of the log predictive density.The logarithmic scoring rule has the unique advantage of being both local and proper (Bernardo and Smith, 1994), and is commonly used in model selection.Further, the expected log predictive density (ELPD) of a predictive distribution is proportional to the Kullback-Leibler divergence with regards to the data-generating process (Hall and Mitchell, 2007).The linear pooling function in step iii) is motivated by the Bayesian predictive synthesis framework (Johnson and West, 2018).

Setting up the pooling space
The pooling space Z should include all variables that the DM believes co-vary with the experts' predictive ability.This can include (transformations of) covariates used by the experts, as well as variables that none of the experts use.While the DM will often include some, or even all, of the covariates used by the experts, this does not have to be the case.In theory, it is possible to set up a pooling space without even knowing which variables the experts used when they produced their forecasts.The space Z should be constructed using variables that the DM perceives as determinants of local predictive ability.

Estimating local predictive ability over Z
The purpose of using a local pool is to exploit variations in predictive ability over Z.We conceptualize this variation as a hypersurface in Z for each expert.An intuitive measure of predictive ability of a model is the expected log predictive density (ELPD) for a new data point (Gelman et al., 2014).The ELPD of expert k, trained on a sample (y 1 , ..., y T ), for a new single observation from the data-generating process is given by where p k (ỹ T +1 | y 1 , . . ., y T ) is the predictive distribution of expert k and F (ỹ T +1 ) is the cdf of the data-generating process.We denote the local expected log predictive density of a model k for a specific point z T +1 in Z by Estimating local predictive ability is a challenging problem since the predictions of the experts are typically sparse in Z, especially when Z is high-dimensional.To tackle this estimation problem, the DM is free to use whatever parametric or non-parametric model she thinks best captures how the predictive abilities of the experts change over Z.This can mean simple parametric regression models, more elaborate modeling of smoothness using Gaussian processes, or non-parametric techniques like k-nearest neighbors.
Local predictive ability does not have to be modeled in the same way for all experts.This allows the decision maker to incorporate beliefs about generalizability that differs between experts.For example, the predictive ability of a complex model might vary more quickly over Z, and the decision maker may therefore be less certain about the predictive performance for regions in Z that the model has not visited in the past.

Synthesizing predictive distributions
The final step in forming the local prediction pool is the combination of the predictions made by the experts, conditional on their (estimated) local predictive ability.Exactly how this combination is to be done is ultimately up to the DM, but we will limit the scope of this paper by only considering linear pools of the form where H denotes the set of historical predictive distributions supplied by the K experts, p k is the predictive distribution of expert k for y t+1 , and w k is the weight given to that same expert.A reasonable constraint to put on the weights is to have them be non-negative and summing to one, as in the optimal linear prediction pools of Hall and Mitchell (2007) and Geweke and Amisano (2011), where the weights of the experts are selected to maximize the historical performance of the pool.Linear pools are simple yet powerful, and have the additional advantage of allowing us to reframe the third step in subjectivist Bayesian terms as Jeffrey's updating (Johnson and West, 2018).
The problem of how to combine conflicting probability assessments, such as predictive distributions, has a long history (Lindley et al., 1979).One solution is the decision maker approach where the predictive distributions are treated as data to be used by a decision maker (Genest and Zidek, 1986).Once the distributions are taken as data points, it becomes fairly straight-forward to think in conventional Bayesian terms of prior to posterior updating.Johnson and West (2018) show that the use of linear pools can be justified from a subjective Bayesian perspective through a framework they call Bayesian predictive synthesis (BPS).BPS uses a synthesis function that specifies the posterior conditional on the predictions of the experts.
where H = h 1 (•), . . ., h K (•) is the set of predictive densities supplied by the experts and α(y|x) is the synthesis function.This updating does not obtain the posterior through the application of Bayes theorem but rather through Jeffrey's updating (Diaconis and Zabell, 1982).Johnson and West (2018) derive a linear pool version of BPS where δ x (y) is the Dirac delta function.We can easily extend (5) to a local pool by letting the weights depend on a vector of pooling variables z where H is the set containing the K predictive distributions supplied by the experts.This extension allows us to position local prediction pools within the Bayesian predictive synthesis framework.
3 The caliper method for learning local predictive performance In this section we propose the caliper method as a simple, interpretable way of modeling local predictive ability and combining expert forecasts in a linear pool.We use a simulated example to illustrate the method.

The caliper method
The caliper method estimates ELPD(z) by averaging all historical log predictive scores that occurred within a given distance (caliper width) from z. Formally, the decision maker estimates the local ELPD(z) for expert k by where I ρ (z) is the set of n ρ (z) observations that lie within a caliper of width ρ centered at z.We will use the Euclidean distance on standardized pooling variables in the applications, but any distance measure can be used to define the caliper.When n ρ (z) = 0, i.e. when there are no historical observations within the caliper, the ELPD(z) estimate is set to zero for each expert, leading to equal weights when combining predictions.
The caliper method is similar to k-nearest neighbors (kNN).However, there are two important differences: a) kNN will always base its estimate on the k nearest observations, regardless of distance.If all observations are far away, the kNN estimate can therefore be based on data of dubious relevance.The caliper method, on the other hand, will only include observations it regards as close enough, and will default to equal weights when there is no relevant data.
b) kNN will use exactly k observations, even when there are many more observations close by.The caliper method, on the other hand, is capable of exploiting variation in the data density in Z.
Once the decision maker has access to estimates of local predictive ability for each expert, she needs to combine these predictions in some way.The caliper method combines predictive distribution using a local linear pool where the weight of expert k is calculated by feeding the estimates of local predictive ability through a softmax transformation: The weights in ( 9) use what we will refer to as natural scaling where the local ELPD estimates are scaled by the number of observations, n ρ (z), used in forming the estimate.
Natural scaling will lead to model weights that discriminate more sharply between models locally when there is more data available; Bayesian model averaging has the same behavior, but globally.The caliper width, ρ, determines how close in Z a previous prediction has to be in order to be deemed relevant for the local estimates, the caliper width should therefore match how quickly the DM thinks ELPD changes over Z. Selecting the caliper width is a question of bias-variance tradeoff: a smaller width will better capture the local part of ELPD(z), but this will come at the expense of basing the estimate on fewer observations, thereby increasing variance.How small a caliper width the DM can afford will depend on the sample size and the dimension of Z.
Natural scaling introduces a tension between the locality of experts' performance and the degree of discrimination between experts: increasing the caliper width ρ does not only affect the bias-variance trade-off in the locality of the estimate, it also changes the degree of discrimination between models.This means that the caliper width that gives best predictive performance may have little to do with how quickly predictive ability varies in Z.To break this tension, we allow for departures from natural scaling by introducing a separate scaling factor τ in the softmax weights The scaling factor determines how sharply we discriminate between models with differing estimated predictive ability; it allows us to modify the behavior of the synthezising step from equal weights (τ = 0) to turning the synthesis into model selection (τ → ∞).
To use the caliper method with discrimination, the decision maker must specify two hyperparameters: i) the caliper width ρ and ii) the scaling factor τ .The DM could in principle put a prior on ρ and τ , but for the sake of simplicity we treat them as fixed hyperparameters for the decision maker to select.Alternatively, if the decision maker has no strong preferences for these hyperparameters, they can be determined by optimization.This would mean generating pooled predictions for a grid of values of (ρ, τ ) and at time t selecting the hyperparameters that gave best predictive performance in time periods before time t.

Illustrative example
To illustrate the process of using local predictive pools and the caliper method, we work through an example in which the decision maker has access to predictions from two experts, each in the form of a model Expert 1 : Each expert uses a diffuse normal-inverse-gamma (NIG) prior for the parameters to produce a Bayesian predictive distribution.In order to be able to generate example data, as well as to derive theoretical quantities like local predictive ability, we need to assume a specific data-generating process.We use the simple linear model where ϵ ∼ N(0, 1) and new observations from the DGP are generated by independently drawing values of x 1 and x 2 from the N(0, 1) distribution.The first step in creating a local prediction pool is for the DM to set up the pooling space.She decides to include the covariates of both experts in Z, so that z T = (x 1,T , x 2,T ).If she wanted to expand Z she could include, for example, an interaction effect (x 3 = x 1 × x 2 ), higher order terms (x 4 = x 2 1 , x 5 = x 2 2 ), or a variable that neither expert uses.The second step in creating a local prediction pool is estimating the local predictive ability of each expert.As we have access to the data-generating process we can visualize how the predictive ability of the experts varies over Z.The ELPD(z)-surfaces of the experts can be found in Figure 1 a)-b).Each expert (unknowingly) omits one of the covariates in the DGP, and so the predictive ability of each expert deteriorates with the absolute value of this omitted covariate.Figure 1 c) illustrates that there are regions of Z where the predictive ability of one expert dominates.Using local pooling, we aim to capture this variation in predictive ability as a function of the pooling variables.In most applications, the decision maker will not have access to the data-generating process, making it impossible to directly calculate how the predictive ability of each expert varies over Z, and it therefore has to be estimated.In this example, the decision maker will use the caliper method with natural scaling, described in the previous section.
The caliper method requires selecting a caliper width to control the inherent biasvariance tradeoff in estimating local predictive performance.Figure 2 shows the sampling distribution of the error in the estimate of ELPD(z) for Expert 2 as a function of the caliper width.The figure was constructed by repeatedly sampling realizations of size N = 2000 from the data generating process in (13) with the last 1000 observations being used to estimate the predictive ability of the model.Since the predictive distribution of the expert depends on the realized data, each realization has its own true ELPD(z).
Figure 2 a) illustrates the performance of Expert 2 at the point z = (0, 0) where this expert fits the data well.Increasing the caliper width will lead to reduced variance, but also an increasing negative bias in the ELPD(z) estimate.This is because as we move further away from the point (0, 0) the caliper will cover areas where the model has worse fit than at (0, 0).
Figure 2 b) shows the performance of Expert 2 at the point z = (x 1 , x 2 ) = (2, 0).At this point Expert 2 fits the data poorly since it omits x 1 .Increasing the caliper width again leads to reduced variance, but now the bias in the ELPD(z) estimate will be increasingly positive.This is because the majority of new observations captured by the increasing caliper width will be from areas in Z where the model has better fit than at the current point.The third step in creating a local prediction pool is aggregating the predictive distributions of the experts based on their (estimated) local predictive ability.In our example, the DM wants to make predictions at the two points z = (0, 0) and z = (2, 0).We compare the performance of the DM with two reference methods: a pool with equal weights, and the linear prediction pool of Geweke and Amisano (2011).At z = (0, 0) both models are equally misspecified, and make almost identical predictions.As each expert makes more or less identical predictions, any linear combination of their predictions will also be more or less identical.At z = (2, 0) Expert 1, which includes x 1 , greatly outperforms Expert 2. The caliper method captures this, which translates into markedly better predictions for a range of caliper widths.
If we keep increasing the caliper width we will eventually arrive at an estimate of local predictive ability that is no longer local in any meaningful sense.For example, using the data-generating process in this simulation, a caliper width of ρ = 50 will almost always includes all previous observations.When this is the case and both of the experts have the same global predictive ability, we observe the same polarizing behavior as that of Bayesian posterior probabilities described in Yang and Zhu (2018).Since both models are equally misspecified globally over Z, the difference in estimated predictive ability follows a random walk and will not converge to zero.As the sample size increases for any given sample, one of the models will therefore completely dominate the pool.Note that this is only the case for natural scaling.

Applications
In the applications we will refer to the local pooling method described in Section 2 as a local decision maker pool (local DM for short).We will also consider a pool that assigns equal weights to all predictive distributions (equal weights) and the linear pool of Geweke and Amisano ( 2011), which we will refer to as a global optimization-based pool (global opt.), since it obtains its weights by optimizing the historical log scores over all of Z.
The caliper method works by subsetting the data set based on variables that the decision maker believes that the predictive ability of the experts may vary over.This suggests that we could extend the global optimization-based linear pool into a local pool in a similar manner.To this end we introduce the local optimization-based linear pool (local opt.), which works exactly as the pool in Geweke and Amisano (2011), except that when optimizing the weights at time t, it only includes past predictions made within a given caliper width of z t .If there are no past predictions over which to optimize, each expert is given the same weight.

US macroeconomic forecasting
In our first applied example we use the framework developed in the previous sections to forecast key macroeconomic time series in the US.The dataset used by the experts consists of the seven US macroeconomic variables in Smets and Wouters (2007): quarterly real GDP growth (gdp), quarterly inflation rate (tcpi), the federal funds rate (fed), quarterly real consumption growth, quarterly real investment growth, hours worked, and real compensation per hour.These time series are transformed in accordance with Gustafsson et al. (2023).
The decision maker is interested in predicting the three variables gdp, fed, and tcpi.To aid her in this, the decision maker has access to experts in the form of predictive distributions from a set of models: i) a Bayesian homoscedastic VAR(1) model estimated on all seven variables, ii) a Bayesian VAR(1) model with stochastic volatility estimated on all variables, iii) Bayesian Additive Regression Tree models (BART, Chipman et al. (2010)) for each of gdp, fed, and tcpi as univariate response variables with one lag of all seven macro variables as explanatory variables, and iv) a Bayesian VAR(1) model with time-varying parameters and stochastic volatility for the three-dimensional response vector with gdp, tcpi and fed.For each model class, we obtain the univariate one-stepahead predictive distribution for gdp, fed, and tcpi; this allows us to explore differences in the local weighting schemes across the three variables.
When setting up her pooling space Z, the decision maker has access to all the variables used by the experts.In addition, the decision maker has access to an additional pooling variable in the form of ISM's manufacturing purchasing managers' index (pmi), which is not used by any of the experts (Lahiri and Monokroussos, 2013).The data set includes 218 observations, 72 of which are used in the initial estimation of the experts' models.Using all eight variables to form Z is therefore not a good idea, as the DM would be estimating a hypersurface in an eight-dimensional space based on roughly 150 observations.The decision maker therefore only uses GDP growth, inflation, the federal funds rate, and pmi to construct Z.
The decision maker uses the caliper method with natural scaling to estimate local predictive ability in Z-for a version using the caliper method with discrimination, see Appendix A. She continually updates the caliper width, ρ, at each time step by maximiz-  ing the historical log predictive density score over all previous aggregate predictions.The same method is used to dynamically select the caliper width of the local optimizationbased pool.The dynamically selected caliper widths are shown in Figure 4. Table 1 displays the sum of out-of-sample log predictive scores for all the methods, and Figure 5 shows the development of these log scores over time relative to the equal weights method.All methods outperform equal weights by roughly the same amount when predicting gdp and fed.For tcpi, the local DM pool performs the best, outperforming the local optimization-based pool with some margin.All aggregation methods outperform the best individual experts.See Villani et al. (2009) for a discussion of how differences in the log predictive scores can be loosely interpreted using Jeffreys' scale of evidence for log Bayes factors.
Figure 6 explains why the two local pools outperform the globally optimized pool for tcpi by displaying the log predictive density evaluations over time for the four pooling schemes and the individual experts.The figure shows that while the globally optimized pool relies almost exclusively on the TVPSV model, which has the best performance over the whole data set, the local pools correctly put greater weight on the BART model when it performs well, and opts for a more equally weighted pool when BART predicts poorly.It is important to emphasize that the time variation in the weights come from being at different locations in Z over time.

Bike rental prediction
In our second application we make one-step-ahead daily predictions of bike rentals using the bike sharing data in Fanaee-T and Gama (2014).To help construct our predictions we use three experts: i) a Bayesian linear regression model, ii) a BART model (Bayesian additive regression trees), and iii) a Bayesian linear regression model with stochastic volatility, as well as a set of variables to construct a pooling space.
The bike sharing data includes the daily number of rentals, our main variable of interest, from January 1, 2011 to December 31, 2012.The experts use several covariates related to the weather, an indicator for season, and the number of bike rentals the previous day.They also use indicators for workday and holiday, the latter being based on a list of official US holidays.
As pooling variables we use humidity, wind-speed, and temperature from Fanaee-T and Gama (2014), as well as a decision-maker specific variable which we will call family holiday.The family holiday variable takes the value 1 on Thanksgiving and Christmas (Eve and Day), and is included to represent the decision maker's belief that there are certain holidays that Americans spend with family, and so we would expect that bike rentals follow a different pattern on these days.This variable is not included in the original dataset Fanaee-T and Gama (2014) and is therefore not typically used in predictive models for this dataset.The idea is that the DM believes that this variable can affect the local relative predictive performance of the models and therefore wants to use it as an additional pooling variable.
Accounting for the missed observations from taking lags, we have a total of 730 observations as 2012 was a leap year.We split these 730 observations into three batches.The first batch, consisting of 200 observations, is used as training data for the experts without any recording of predictions.The experts' predictions on the subsequent batch of 200 observations are then used to get initial estimates of the experts' local predictive abilities.We use the third and final batch of 330 observations for evaluating the aggregate prediction from the local prediction pool, always updating the experts and the pool weights as time progresses.
The decision maker uses the caliper method with natural scaling.Since the DM does not have a strong a priori opinion about which values to select she runs through a selection of values that she thinks are reasonable, and then selects the caliper width at each time point that has historically yielded the best predictions for the local pool, as shown in Figure 7.The same approach is used to select caliper widths for the local optimization based pool.
Figure 8 shows cumulative log scores of the one-day-ahead predictions for all methods relative to the equal weights method.The global optimization-based pool initially    8 can be found in Table 2.

Conclusions
We have presented a framework for local prediction pools based on the Bayesian predictive synthesis approach of Johnson and West (2018).The framework combines expert predictive distribution locally by weighing experts based on their estimated past performance under similar conditions-i.e. for similar values of the pooling variables-to the ones at the present prediction.Viewing expert predictions as data (Lindley et al., 1979), our framework can be viewed as an extension that allows us to incorporate, in a flexible manner, the belief that the relevance of expert data points can change depending on the conditions under which we are making our predictions (Savage, 1971).
We propose the caliper method as a simple, easy to interpret, estimator of local predictive performance.The workings of local pools and the caliper method is illustrated by a simulated example, together with two empirical applications.The proposed local pools are shown to outperform a pool with equal weights and the popular globally optimized linear pool (Geweke and Amisano, 2011) in both applications.
Although our local prediction pools are shown to work well in both applications, we would like to raise two points.First, as was noted by Savage (1971, p. 797), when we subset data to only include observations that are relevant according to some criterion, the amount of data needed will increase rapidly with the complexity of that criterion; as the dimension of Z grows, so does the amount of predictions we need from each expert.To reduce the amount of predictions needed and to get more robust local estimates, the caliper method imposes a certain amount of smoothness over the pooling space by averaging past predictive performance within the caliper.Second, the parameters of formal model experts are estimated globally using all data, which may corrupt what would otherwise be a locally accurate expert.This is something that the decision maker's local weights can only partially correct for.
This last point will always be a problem when the parameters of the expert models cannot feasibly be estimated jointly with the weights in the mixture, as is often the case in applied work.However, as estimating the model parameters and the mixing weights jointly will result in more powerful pools, an interesting extension could be the intermediate case where some experts are taken as fixed while some have parameters that may be estimated jointly with the pooling weights.This would apply, for example, when combining human expert predictions with predictions from simple statistical models.
The decision maker framework combined with the modeling of predictive ability as something that varies over a pooling space opens the door to several extensions, such as exploring different models for estimating local predictive ability and methods for pooling conditional on local predictive ability estimates.Further, there is nothing that requires us to estimate predictive ability using the same model for each expert.Using different models for the experts would, for example, let us express beliefs that one expert's predictive ability varies more quickly over the pooling space.

Figure 1 :
Figure 1: Hypersurfaces of the predictive ability of the two experts in Z.

Figure 2 :
Figure 2: Sampling distribution of the errors in the estimate of local predictive ability of Expert 2 by the caliper method at two points in Z.

Figure 3 :
Figure 3: Expected log predictive density for a new observations at z = (2, 0) for the caliper method together with reference methods.Expectation is taken with regards to the data-generating process in (13).(See the main text for details.)

Figure 4 :
Figure 4: Dynamically selected caliper widths for the macroeconomic data.

Figure 5 :Figure 6 :
Figure 5: Cumulative log scores relative to equal weights of one-step-ahead quarterly forecasts.

Figure 8 :
Figure 8: Cumulative log scores relative to equal weights for daily one-step ahead predictions of bike rentals.

Figure 9 :
Figure 9: Cumulative log scores relative to equal weights of one-step-ahead quarterly forecasts.

Table 1 :
Comparison of different pooling schemes.Sum of log predictive densities for one-step-ahead quarterly forecasts of the three variables fed, gdp, and tcpi, for the period 2010:Q1 to 2019:Q1.Bold numbers indicate the best method for each variable.

Table 2 :
Comparison of different pooling schemes.Sum of log predictive densities for one-step-ahead daily forecasts of bike rentals from February 1 to December 31.performs similarly to the local pools, but as a greater number of past local predictions become available the local pools start to outperform the global pool.The equal weights scheme performs poorly.The totals from Figure