Bounds for the weight of external data in shrinkage estimation

Shrinkage estimation in a meta-analysis framework may be used to facilitate dynamical borrowing of information. This framework might be used to analyze a new study in the light of previous data, which might differ in their design (e.g., a randomized controlled trial (RCT) and a clinical registry). We show how the common study weights arise in effect and shrinkage estimation, and how these may be generalized to the case of Bayesian meta-analysis. Next we develop simple ways to compute bounds on the weights, so that the contribution of the external evidence may be assessed a priori. These considerations are illustrated and discussed using numerical examples, including applications in the treatment of Creutzfeldt-Jakob disease and in fetal monitoring to prevent the occurrence of metabolic acidosis. The target study's contribution to the resulting estimate is shown to be bounded below. Therefore, concerns of evidence being easily overwhelmed by external data are largely unwarranted.


Introduction
In some situations it is useful to support an estimate using additional external evidence, for example, when a small study in the context of a rare disease may be supplemented with data from a clinical registry or electronic health records, or when the result from a meta-analysis may be backed by an analysis in a similar field, e.g., a related but somewhat different population. The involved data contributions then take on different roles, namely, that of a source (the external data) and a target (the data of primary interest). Dynamic borrowing refers to the class of approaches where the apparent, empirical similarity or compatibility of the source and the target is taken into account when judging to what degree the two should be lumped together . Such approaches may be implemented, e.g., via hierarchical models or informative priors; both are actually equivalent to some degree in the context of the normal-normal hierarchical model (NNHM) (Schmidli et al., 2014). Similarly, closely related (or partly equivalent) approaches are given by the bias allowance framework (Welton et al., 2012) or the power prior framework (Ibrahim and Chen, 2000). A recent example of such an approach is given by the EARLY PRO-TECT trial in paediatric Alport disease, where data from a randomized controlled trial (RCT) were supported by source data from an open-label arm and a clinical registry (Gross et al., 2020).
In the context of dynamic borrowing within the NNHM framework, the flow of information is quite commonly illustrated by quoting weights of data sources as these are combined to a joint estimate. As the eventual estimate may be expressed as a weighted average of the input data, the corresponding weights are a useful means of quantifying the studies' contributions to or influence on the eventual result (Hedges and Olkin, 1985;Hartung et al., 2008). Analogous weights arise for shrinkage estimates (Raudenbush and Bryk, 1985;Robinson, 1991;Viechtbauer, 2010), and, as we will show below, also in the Bayesian paradigm with prior distributions on effect and heterogeneity parameters.
When combining originally separate data sets in a meta-analysis or using shrinkage estimation, there sometimes is concern that evidence from the target data may be overwhelmed by a much larger set of source data, e.g., when combining a small RCT with a large clinical registry or routine data (e.g. electronic health records) (Weber et al., 2018). In such cases it is instructive to explicate the notion of study contributions by considering their weights. Again, we can see the dynamic nature of the approach in the changing weight of external data with varying data compatibility or discrepancy. It turns out that within the Bayesian framework we can determine the minimum weight of the target study (the RCT in the above example) a priori for a given analysis, and with that we are able to provide more insights into the general behaviour of the meta-analysis procedure. The derived formulas show shrinkage estimation to behave reasonably and also predictably.
In the following, we will review the NNHM, and show how "study weights" arise in effect and shrinkage estimation and how the concept may be extended to the Bayesian framework. Then we take a closer look at the weights' properties and show how these may be bounded across possible prior settings and/or data realisations. The arguments are illustrated by a numerical study, and the ideas are employed in two example applications involving the joint analysis of a "small" target and a "large" source study, as well as two equally-sized studies. Due to the few-studies setup (Friede et al., 2017b;, we will be focusing on Bayesian methods and only in between point out some connections to common analogous frequentist results. We close with a discussion of the findings and their practical implications.

The normal-normal hierarchical model (NNHM)
The NNHM models a set of k estimates y i and their standard errors σ i as where θ i are the study-specific effects. The θ i are not necessarily identical for all studies, but they are also associated with a certain amount of variation, expressed as The mean parameter µ is the overall mean effect, while τ denotes the between-study variability (heterogeneity). As noted elsewhere (Hedges and Olkin, 1985;Hartung et al., 2008;Röver, 2020), marginalizing over the parameters θ i , the model may be written as The NNHM is a random-effects (RE) model, which in the special case of τ = 0 reduces to a fixed-effect (FE) (or common-effect) model. It provides a good approximation for many types of effect measures where measurement uncertainty and between-study variability may be assumed to be (approximately) normally distributed (Jackson and White, 2018). Data analysis may then aim at estimating the overall effect µ or study-specific effects θ i ("shrinkage estimation"); in the present investigation, we will mostly be concerned with the latter.
In the following, we will denote vectors of effect estimates (y 1 , . . . , y k ) and their standard errors (σ 1 , . . . , σ k ) by y and σ, respectively. Furthermore, we will be mostly concerned with the special case of only two studies (k = 2) and a non-informative (improper) uniform prior for the overall effect (p(µ) ∝ 1).

Study weights 3.1 Conditional weights
Assuming an (improper) uniform prior for the overall effect µ, the conditional posterior distribution of µ (given τ ) is normal with meañ where the inverse-variance (IV) weights w j (τ ) are given by as in the frequentist framework (Hedges and Olkin, 1985;Hartung et al., 2008;Friede et al., 2017a). A similar formula also applies for a normal prior (Röver, 2020). These two (conditionally conjugate) priors are computationally simple, readily motivated, and because of that reason, probably also the most commonly used priors for µ for this model. Informative priors might for example be motivated by general plausibility considerations or empirical data (Günhan et al., 2020), or determined through expert elicitation (Hampson et al., 2014;Best et al., 2020).
The conditional posterior of the study-specific effect θ j (the shrinkage estimate) is also normal with meanθ j (τ ) depending on y i andμ(τ ), namelỹ where the corresponding weight (Röver, 2020;Wandel et al., 2017) is The formulation in (6) shows to which degree the estimate is shrunk towards the common overall meanμ(τ ) (depending on the amount of heterogeneity). Equation (6) may be re-written as so that the actual shrinkage weights c ij (τ ) (the weight of the ith study in the estimate of the jth study) become more explicit. In the special case of only two studies (k = 2), the coefficients c ij (τ ) simplify to c 11 (τ ) = σ 2 2 + 2τ 2 σ 2 1 + σ 2 2 + 2τ 2 , and analogously for c 22 and c 21 . The conditional meanμ(τ ) commonly also arises in frequentist approaches as an overall effect estimator, where usually a heterogeneity estimateτ is plugged in for τ (Hedges and Olkin, 1985;Hartung et al., 2008). Similarly, plug-in estimates ofθ j (τ ) are widely used and commonly known as "best linear unbiased prediction (BLUP)" (Raudenbush and Bryk, 1985;Robinson, 1991;Viechtbauer, 2010). The weights (w j (τ ) or c ij (τ )) are then often quoted along with the results in order to illustrate the individual studies' contributions to the overall result. Note that while weights may be appealing, they still constitute an ultimately somewhat heuristic notion of the concept of a study's contribution, as these only relate to the posterior expectation. A more complete picture might actually be obtained by considering the corresponding metaanalytic-predictive (MAP) prior (Schmidli et al., 2014), which comprehensively describes the information conveyed by the source study.

Marginal weights
In a Bayesian multiparameter model, the conditional expectations (of effects µ or θ j ) as derived above are commonly of limited interest; what is usually more interesting are the marginal posterior expectations, as these refer to the posterior distribution integrated over other parameters such as the heterogeneity τ in the considered model. Marginal posterior expectations here result from the conditional expectations as expected values with respect to the heterogeneity's marginal posterior distribution p(τ | y, σ), i.e., In both cases, the conditional expectations result in convex combinations of the form i α i (τ ) y i (see Equations (4) and (10)). For convex (or, more generally, linear) combinations, we may re-write the expectations as so that it becomes apparent that the marginal expectation may again be expressed as a weighted average of the effects y i , where the study weights now arise as the posterior expected weights. These constitute straightforward generalizations of the common conditional weights to the Bayesian context. The weights can be obtained from one-dimensional integrals (expectations) involving the heterogeneity's marginal posterior distribution and may easily be computed numerically; they are returned by default by the bayesmeta R package (Röver, 2015(Röver, , 2020.

Properties
For τ = 0 the NNHM reduces to the FE model, in which all study effects θ i coincide with the overall mean µ. As τ is varied between the two extremes of τ = 0 and τ → ∞, several effects may be observed for the conditional weights: • The IV-weights w j (τ ) move (not necessarily monotonically) from "fixed-effect" weights w j (0) = 1 σ 2 j i 1 σ 2 i that depend on the study's precision towards "average" weights w i (∞) = 1 k where all studies have the same weight.
• The shrinkage weights c jj (τ ) (the contribution of the jth study to its own shrinkage estimate) increase monotonically from the FE weight towards 1.
For the conditional expectations, this implies: • The conditional effect estimateμ(τ ) moves from the FE estimate towards an unweighted average.
• The conditional shrinkage estimatesθ j (τ ) move from the FE estimate towards the "un-pooled" original estimates y j .
These effects are also summarized in Table 1. Posterior expectations of the weights depend on the heterogeneity's posterior distribution p(τ | y, σ). For a uniform effect prior, a given heterogeneity prior p(τ ) and standard errors σ i , the posterior density is given by with (see, e.g., Eqn. (11) in Röver (2020)), where p(τ ) is the heterogeneity's prior density, and f σ (τ ) is a lengthier term involving τ and σ. From (15) one can see that the heterogeneity's posterior depends on the data (y 1 , y 2 ) only via the absolute difference |y 2 − y 1 |, which in a sense constitutes the "empirical" or "observed" amount of heterogeneity, through the exponential term g y (τ ).
A closer look at g y (τ ) shows that it always remains between zero and one (0 < g y (τ ) ≤ 1). For y 2 = y 1 , it is constant at g y (τ ) = 1. For a given difference |y 2 − y 1 | > 0, it takes its minimum at τ = 0 and then increases monotonically with τ . For any given τ it decreases monotonically in |y 2 − y 1 |. One might think of g y (τ ) as "ruling out" smaller τ values in the heterogeneity posterior and pushing the posterior mode towards higher τ values as |y 2 − y 1 | increases.
The functional form of the posterior (15) implies that for increasing |y 2 − y 1 | the resulting marginal heterogeneity posterior becomes stochastically larger (Shaked and Shanthikumar, 2007); see also the appendix for a derivation. When varying the prior distribution p(τ ) in (14), we may to some extent also predict the effect on the heterogeneity posterior: in particular, choosing a stochastically larger heterogeneity prior will imply a stochastically larger posterior as well (see also the appendix). Table 1 Effects on several expressions when varying the heterogeneity τ between its extremes. heterogeneity extreme The above conditions imply that we can derive bounds for the shrinkage weights. As mentioned previously, concerns are sometimes raised that the target estimates may be overwhelmed by the source data, i.e., that certain weights may become too small (Weber et al., 2018). In the following, we will describe the conditions under which we can derive lower bounds on weights, i.e., where we can make sure that weights remain above a certain minimum. Important consequences for the weights, valid quite generally or for certain heterogeneity priors p(τ ), are derived below. Note that while we assume the standard errors σ i to be given (a common assumption to be made in meta-analysis or study design considerations), the data (estimates y i ) or the prior (p(τ )) may be varied.

A study's minimum contribution to its own shrinkage estimate: the "FE weight"
The (conditional) shrinkage weight c jj (0), i.e. the jth study's contribution to its own shrinkage estimate evaluated at τ = 0, constitutes a lower bound for the posterior mean weights. Any heterogeneity prior p(τ ) may attach prior probability to τ values larger than zero, for which the weights are only increasing. These "FE weights" may simply be computed as the common study weights in a fixed-effect meta-analysis. This property holds independent of the actual data (y i ) or the heterogeneity prior (p(τ )).

Minimum posterior mean shrinkage weight: the "coincidence weight"
For any prior distribution p(τ ), the coincidence case of y 1 = y 2 is the data realisation yielding the lowest possible posterior mean shrinkage weight. Any data with |y 2 − y 1 | > 0 will imply a stochastically larger heterogeneity posterior that will (due to monotonicity of weights c jj (τ ) as a function of τ ) lead to larger posterior mean shrinkage weights. The coincidence weights may simply be computed by performing the meta-analysis with the data (y 1 and y 2 ) substituted by two identical numbers. This property holds independent of the data (y i ) and for any given heterogeneity prior (p(τ )).

Stochastically ordered priors and their posterior mean weights
Considering stochastically ordered families of heterogeneity priors allows to vary the posterior mean shrinkage weight. For properly chosen stochastically smaller priors, the posterior mean may approach the FE weight, while for stochastically larger priors the posterior mean weight may approach 100%. An obvious, simple way to yield a stochastically ordered family of prior distributions for the heterogeneity is by using (or introducing) a scale parameter (Mood et al., 1974, Sec. VII.6.2). An example would be given by varying the scale parameter within the half-normal family of prior distributions . This property holds for given data (y i ) and a stochastically ordered family of heterogeneity priors (p(τ )).

Numerical illustration
In order to demonstrate the shrinkage weights' properties, we consider an illustrative case motivated by a scenario involving a log-odds-ratio (log-OR) endpoint, analogous to the simulation scenario discussed by . For a study of size n i featuring two treatment arms and a binary endpoint, the results may be summarized in a 2×2 contingency table. Assuming an even distribution of events and nonevents across table cells implies a log-OR estimate with a standard error of approximately 4 √ ni (Röver, 2020). Considering a combination of a "small" and a "large" study with sizes n 1 = 25 and n 2 = 400 then leads to standard errors of σ 1 = 0.8 and σ 2 = 0.2, respectively. We will then derive the smallest RCT's shrinkage estimate (for the study-specific effect θ 1 ) that is of course primarily informed by y 1 , but supported by the external data y 2 . The present case of σ 1 ≫ σ 2 is the kind of setting in which we expect to see larger gains from shrinkage estimation, and this is exactly where using historical data is the most attractive in practice.
For the analysis, we choose a half-normal heterogeneity prior with scale 0.5 (HN(0.5)), which constitutes a conservative choice for the present scenario (Friede et al., 2017a;. For illustration purposes, we also utilize a (stochastically larger) HN(1.0) prior. We then fix the target y 1 (arbitrarily) at zero and vary the source y 2 in order to investigate the effect on the resulting shrinkage estimates and weights. Fig. 1 illustrates estimates' and weights' dependence on the difference between estimates (y 1 and y 2 ). The top row of forest plots shows three example cases of (a) coinciding target and source estimates, (b) some moderate and (c) larger discrepancy between the two; the resulting shrinkage estimate for the target is shown in blue. The second row shows the posterior means of θ 1 (solid lines) and the corresponding 95% CIs (dashed lines) across the continuum of source data values. At the top of the plot the three cases (a)-(c) are marked, and the blue lines correspond to the estimates also shown above. The red lines  Figure 1 Effect of varying the difference between quoted estimates (y 2 − y 1 ) on the first shrinkage estimate (for θ 1 ). The top row shows three data examples of (a) coinciding and (b)-(c) increasingly diverging estimates, along with the resulting shrinkage estimate for the target study. The second row illustrates the estimates across the continuum of increasing y 2 values relative to the "plain" interval (y 1 ± 1.96σ 1 ). The bottom panel shows the posterior mean shrinkage weight (E[c 11 (τ )| y, σ]) for the first study, based on two different priors and for varying y 2 − y 1 . Note that y 2 − y 1 = 0 constitutes the "coincidence case".
show analogous estimates, but corresponding to a (stochastically larger) HN(1.0) prior. Note that "large" |y 2 − y 1 | values (here e.g. |y 2 − y 1 | > 1.96(σ 1 + σ 2 ) = 1.96) would imply non-overlapping CIs for source and target studies (as in case (c)), which in reality may mean that estimates would not actually be pooled at all. The practically most relevant bit of the plot is hence in the neighbourhood of zero.
Finally, the bottom plot shows the posterior expected weights to illustrate the first (target) study's contribution to its own shrinkage estimate. The minimum (for both heterogeneity priors) is attained in the "coincidence case" (a) of y 2 − y 1 = 0; e.g., for the HN(0.5) prior the coincidence weight is at 29%. Increasing the observed effect difference |y 2 − y 1 | (i.e., the "observed heterogeneity") then yields increasing weights for y 1 , implying less borrowing from the source. In cases (b) and (c), the shrinkage weight amounts to 38% and 63%, respectively. Also, the choice of a stochastically larger prior, here realized by a larger scale parameter in the same familiy of distributions, leads to larger weights for y 1 , for any |y 2 − y 1 |, including the minimum at |y 2 − y 1 | = 0. The first study's absolute minimum shrinkage weight, the "FE weight", in this case is at c 11 (0) = σ 2 1 σ 2 1 +σ 2 2 = 1 17 = 5.9%. Note that while "y 1 = y 2 " constitutes a "worst case" in a certain sense (leading to the lowest shrinkage weight), it also still is the most desirable case, in the sense that this is when the data are in agreement and one would expect to learn the most from the source study.

Creutzfeldt-Jakob example
A small randomized controlled trial (RCT) was conducted in order to investigate the effect of doxycycline on survival in patients suffering from Creutzfeldt-Jakob disease (CJD). In this ultra rare condition, only 12 patients could be recruited, and so data from an observational study were considered as complementing evidence (Varges et al., 2017). Both studies quote estimated hazard ratios (HRs), and these estimates along with their standard errors are jointly analyzed in a meta-analysis; the data are also shown in Table 2. With the focus being on the evidence from the RCT, a shrinkage estimate for this study is derived . Both studies are in agreement, suggesting a beneficial treatment effect, while the absolute effect magnitude is larger for the observational data.
Since the larger observational study provides a much more precise estimate (smaller standard error), one might fear that the randomized evidence will be overwhelmed by the external data in a joint analysis. The FE weight in this case amounts to c 22 (0) = σ 2 2 σ 2 1 +σ 2 2 = 13.5%; this would be the RCT's weight in an FE analysis, and it constitutes a lower bound on the RCT's weight for any data realization (y 1 , y 2 ) or any heterogeneity prior (p(τ )).
For a log-HR, we may then assume a half-normal prior with scale 0.5 (HN(0.5)) for the heterogeneity (Friede et al., 2017a;. For this prior, we get a minimum posterior mean weight (coincidence weight) for the randomized study of 38.9%, which may already be considered reassuringly large, in view of the sample sizes involved and compared to the FE weight. Any data realization (y 1 , y 2 ) will hence yield an eventual weight ≥ 38.9% for the RCT. Also, a larger scale of the heterogeneity prior (i.e., a larger expected amount of heterogeneity) will increase the minimum weight  for y 2 ; e.g., a HN(1.0) prior would yield a minimum expected shrinkage weight of 52.1%. For the actual data (Table 2), we then get a weight of 39.5%, slightly above the minimum, for the RCT. Table 3 shows weights and estimates corresponding to the two different heterogeneity priors. In both cases, the actual weights are not far from their minimum value, and for both analyses there is a sizeable gain in precision for the shrinkage estimate when compared to the original estimate (y 2 , σ 2 ) alone.

Metabolic acidosis example
A gynaecological RCT investigated whether fetal monitoring using cardiotocography (CTG) combined with ECG ST-segment analysis (ST) reduced the occurrence of metabolic acidosis, compared to CTG alone (Westerhuis et al., 2007). Here the relative risk (RR) of metabolic acidosis comparing the two treatment groups is of interest. When analyzing the data, evidence from an earlier, similar RCT (Amer-Wåhlin et al., 2001) may be utilized to support parameter estimation. This example data set was originally investigated by Rietbergen et al. (2011); the corresponding data are shown in Table 4.
Primary interest focuses on the more recent target study by Westerhuis et al. (2007) and on a shrinkage estimate of its study-specific effect θ 2 . The two trials are of roughly comparable size (5667 vs. 4238 participants), and from the "FE weight" of c 22 (0) = σ 2 2 σ 2 1 +σ 2 2 = 54.3% one can already see that the second study will definitely contribute the majority of weight when estimating its own effect θ 2 .
For a log-RR, we may again use a half-normal prior with scale 0.5 for the heterogeneity (Friede et al., 2017a;; this yields a minimum (coincidence) mean shrinkage weight of 72.5%. A larger heterogeneity prior scale again leads to an increased shrinkage weight; e.g., for a HN(1.0) prior, the minimum weight is at 78.7%. Table 5 shows the corresponding weights and estimates. Compared to the previous example, the precision gain is not quite as large here.

Conclusions
Bayesian meta-analysis provides a transparent means for extrapolation or borrowing of strength from external data . Also within a Bayesian inference framework, study weights for overall and shrinkage effect estimates may be derived as posterior expected weights, for any number of studies k. The FE weights (conditional on τ = 0) constitute the absolute minimum shrinkage weights across all heterogeneity priors and data realizations. In case of k = 2 studies, the heterogeneity posterior depends on the data only via the absolute difference in both estimates (|y 2 − y 1 |). A larger difference leads to a stochastically larger heterogeneity posterior. When the estimates coincide, i.e. y 2 = y 1 , the smallest possible shrinkage weight for a given heterogeneity prior (across all possible data realizations) is obtained.
Concerning the choice of heterogeneity prior, a stochastically larger prior leads to a stochastically larger posterior, and with that to increased (minimum and actual) shrinkage weights.
The above findings have important implications for the weightings that may occur within a metaanalysis. The shrinkage weight is bounded below (irrespective of the prior and data) by the FE weight. For any particular given prior, the (posterior mean) shrinkage weight is also bounded below across possible data realisations by the "coincidence weight". Having a bound on the weight effectively means bounding the "leverage" of the external data for the shrinkage estimate. A lower bound of, say, 50% means that the resulting shrinkage estimate will not move more than halfway from the effect y i towards the external data in case of near-concordant evidence. For greater discrepancies, the target study's weight will even be larger, or conversely, the influence of the source data will be smaller.
The FDA Guidance on "Leveraging existing clinical data for extrapolation to pediatric uses of medical devices"(U.S. Department of Health and Human Services (HHS), Food and Drug Administration (FDA), 2016) for example elaborates on issues commonly encountered in extrapolation endeavours. One concern raised here is the exchangeability assumption (3) commonly made in hierarchical models. In the common case of only k = 2 studies, however, the same model (as far as shrinkage estimation is concerned) may alternatively be motivated via the reference model . This is similar to the bias allowance model framework (Welton et al., 2012), where the target study is estimating the parameter of interest "directly", while the source is associated with a potential bias term of unknown direction and magnitude. Moreover, the advantages of using (informative) priors on the heterogeneity parameter are acknowledged in the guidance document, in particular as this facilitates dynamical borrowing based on the empirically observed compatibility of source and target data.
We would like to encourage consideration of minimum weights as a diagnostic tool of the evidence constitution and of implications of prior settings for a given or anticipated data scenario. The study of weights should, however, not be used for guiding the selection of the heterogeneity prior. The choice of prior should primarily be driven by considerations of prior information on between-study variability. Different amounts on heterogeneity might however be anticipated in different contexts -e.g., in the two examples discussed above, greater heterogeneity may be plausible between observational and randomized data than for the case of two RCTs .
A closely related approach to investigating the contributions of data sources to a joint analysis works via the consideration of effective sample sizes (ESS) (Neuenschwander et al., 2020). Target and source data may be assessed based on their "share" of the total sample size, and also the effect of heterogeneity on the resulting MAP prior may be quantified via prior maximum sample sizes (Neuenschwander et al., 2010). Robustification of a MAP prior may be achieved by implementing a mixture of vague and informative prior components (Schmidli et al., 2014). In case the source data's weight is considered too large, a simple remedy might be to artificially inflate its associated standard error, similarly to the idea behind a power prior approach (Ibrahim and Chen, 2000). Alternatively, the target sample size might also be increased, in case that is an option. For an overviev of approaches to downweighting of external data, see also Viele et al. (2014).
The considerations which provide some insights into the inner workings of shrinkage estimation facilitate diagnostics even before considering actual data. A requirement is that the standard errors (σ i ) need to be known beforehand. Quite often these may be approximated based on a unit information standard deviation (UISD) and the sample size (Kass and Wasserman, 1995;. The need to make assumptions about anticipated standard errors is a common issue in similar design-of-experiment contexts (Cohen, 1988). Fears of external evidence "overruling" the target data (Weber et al., 2018) may be unwarranted, or may be checked before carrying out the target study, as the NNHM behaves predictably and reasonably within a Bayesian analysis. Potential problems arise or are amplified when using frequentist methods: the concerningly common occurrence of zero heterogeneity estimates means that analyses may fall back to an FE approach, which here is the least cautious or least conservative analysis. For the case of few studies, the probability of obtaining a zero heterogeneity estimate is alarmingly high -approaching 50% even for moderate amounts of heterogeneity (Friede et al., 2017a), which may actually render frequentist heterogeneity estimation for small k a somewhat questionable exercise. Within a Bayesian framework, marginalisation over the plausible range of heterogeneity values will lead to a more conservative behaviour. In summary, with the target study's contribution to the resulting Bayesian shrinkage estimate being bounded below, concerns of evidence being easily overwhelmed by external source data can be addressed a-priori, and may be shown to be largely unwarranted.

A Appendix
A.1 Stochastic ordering of heterogeneity posteriors Consider two parameter sets y a and y b for which 0 ≤ |y a;2 − y a;1 | < |y b;2 − y b;1 |. Then the ratio of the heterogeneity's marginal posterior densities is given by (cf. (14)) where c ya and c y b are the densities' normalizing constants, and the where the latter ratio of "g y (τ )" terms is monotonically increasing in τ . With that, condition (C) in Lehmann (1955) is fulfilled, and the posterior corresponding to y b is stochastically larger than the one associated with y a .

A.2 Stochastic ordering of posteriors for different priors
Consider two heterogeneity priors with densities p 1 (τ ) and p 2 (τ ) where p 2 is stochastically larger than p 1 . A posterior distribution constitutes a special case of a "weighted distribution" (Mȩczarski, 2015). For the posterior distributions corresponding to p 1 and p 2 follows that these will inherit the same stochastic ordering (Bartoszewicz and Skolimowska, 2006).