As highlighted in Burgman (2005), the perception of expert information is often one of honour and prestige and it is not uncommon for expert information to go unchallenged. The truth is that experts are invariably subject to bias and depending on the nature of that bias, their opinion may influence models and the decision-making process. It is therefore important to be aware of the impact that priors can have on models as this may influence the style of elicitation and choice of experts. In this section, we explore the influential nature of expert information in the context of a Bayesian model, where empirical data are captured through a likelihood, expert opinion is represented through a prior and the posterior distribution is the result of combining the likelihood with the prior to obtain inferences about the model parameters. Note, in all prior specifications, we work in terms of the precision, τ = 1/σ2 to be consistent with the notation used in standard Bayesian texts (Gelman et al. 2003).
There are several scenarios that can arise when combining the likelihood with priors generated from expert opinion. Both the amount of data, mean, precision and the way in which the prior mean and precision is captured and incorporated into a model can influence the posterior estimate. In situations where data are limited, the expert’s opinion has the potential to drive model predictions. The facilitator therefore needs to be aware of the issues that can lead to bias and ensure that expert biases can be minimized.
As more data become available, the likelihood is moderated with the prior. However, in situations where the prior directly specifies the mean and precision, an informative prior can lead to a very informative posterior distribution, irrespective of the empirical data and how much data are collected (Lele & Allen 2006). If priors are incorporated into the model as an adjustment to an overall mean and precision, depending on their specification, the posterior estimates can be conservative. Here, the term adjustment refers to a shift in the mean or a rescaling of the precision, where the mean and precision are also considered random variables with appropriate priors attached.
To illustrate these concepts, we conducted two investigations using the ecological example described in Martin et al. (2005). We focus here on one bird species, the noisy miner (Manorina melanocephala) and consider two studies. The first investigation is a simulation study that examines the variation in the mean posterior estimates of abundance under three livestock grazing regimes when we alter the sample size (low, moderate or high), prior mean and precision and the type of prior (indirect or direct) used in the model. The second study took the raw data collected by Martin et al. (2005) and investigated changes to the posterior estimates for different means, precisions and sample sizes as outlined in Table 4 and compared them with the actual priors used in Martin et al. (2005).
Results of Martin et al. (2005) showed an increase in noisy miner abundance under high grazing and a decrease under low levels of grazing. In general, the expert data confirmed these predictions and tightened up credible intervals around estimates for both the moderate and high grazing regimes. Under low grazing, expert information did not alter the posterior estimate substantially.
We used the empirical data collected for the noisy miner to simulate scenarios of abundance (yi) under low (grazeLi), moderate (grazeMi) and high (grazeHi) grazing levels collected at site i in a eucalypt woodland. Data were generated using a negative binomial (NB) distribution with a mean, θi and overdispersion parameter, ϕ that reflected estimates from Martin et al. (2005). The negative binomial density can be expressed as
where the overdispersion parameter, ϕ allows the variance of the distribution to exceed the mean.
We fit a Bayesian generalized linear model (GLM) (eqn 1) to 100 simulated datasets and estimated the grazing parameters,βL, βM and βH for low (L), moderate (M) and high (H) grazing respectively. Prior information for the low grazing parameter for both models was based on scenarios in Table 4. Prior information for the moderate and high grazing parameters (i.e mM, pM and mH, pH respectively) were taken from Martin et al. (2005) and represent expert summaries from a subset of the 20 experts (19 moderate and 18 high). We chose a gamma (Ga) prior for the overdispersion parameter and Normal (N) priors for the grazing parameters (eqn 2).
Depending on the specification of the prior information for the mean (m) and precision (p) for each grazing level (e.g. mL, mM, mH, pL, pM, pH), the priors placed on the parameters may take one of two forms. The first form, as shown in eqn 2 is the indirect specification (Method 8, Table 1), a relative measure that shifts the overall grazing mean, μ and rescales the precision, τ.
The second form is shown in eqn 3 and represents a direct specification where the experts provide information about the mean and precision for each grazing parameter (Method 3, Table 1). Note, although the specification outlined in eqn 2 can be populated using information elicited directly, we used an indirect style of elicitation adopted by Martin et al. (2005) for the purpose of this simulation study.
Both models were fit using the R2WinBUGS package (Sturtz et al. 2005) in r (Ihaka & Gentleman 1996) and each simulation was run with a 10 000 burn-in followed by 10 000 monitored iterations (determined using standard diagnostic measures). We examined changes in the mean (mL = ± 2, ± 10), precision (pL = 0.5, 5, 50), sample size (low: n = 12, moderate: n = 48 and high: n = 192) and prior specification (indirect or direct) (Table 4) for each simulation and stored the parameter estimates for each level of grazing, the precision, standard deviation and 95% credible intervals. We also investigated whether there were significant differences between low and moderate, and low and high grazed sites. Significance was concluded if the credible interval for the difference (log-scale) between the respective grazing regimes did not include zero.
Figures 2 and 3 show respectively changes in the standard deviation and proportion of times a significant difference is observed between estimates of low and moderate grazing. We observe how the variability in the estimate of βL decreases as the sample size and precision increases across the 100 simulations when an indirect prior is assumed and a mean of ± 2 is elicited (Figs 2a and 3a). The proportion of significant results also increased, but only slightly. Results for the direct prior (mean of ± 2) (Figs 2b and 3b) are much more dramatic. Figure 2b shows that even with moderate or high amounts of data, the direct specification of the prior can be quite informative, even at low precision. This highlights the importance of the elicitation process, ensuring the accuracy in the prior specification.
Figure 2. Plots showing the variation in the mean estimate of abundance under low grazing across different prior scenarios for (a) an indirect prior and (b) a direct prior specification. Here, the standard deviation is used as a measure of variation, mL and pL represent the prior mean and precision respectively at low grazing and n represents the number of observations. The darker the shading, the higher the standard deviation.
Download figure to PowerPoint
Figure 3. Plots showing the proportion of times the estimate of abundance under low grazing across different prior scenarios was significantly different from moderately grazed areas for (a) indirect prior and (b) a direct prior specification. An estimate was ‘significant’ if the 95% credible intervals did not include zero for the difference between coefficients. The darker the shading, the higher the proportion of significant results.
Download figure to PowerPoint
We also repeated this simulation using prior means of ± 10 and found more striking results (not shown). In these scenarios, we found that the variation around the mean was much higher as the prior mean was chosen well outside the range of the data. Of more concern was the proportion of times a significant result was achieved. Virtually every scenario produced significant estimates nearly 100% of the time, indicating the strength of the prior, irrespective of the amount of data.
Exploration of the real data and prior information
In addition to the simulation study, we explored changes in the results from introducing different prior scenarios for the noisy miner (Table 4). Datasets were generated of varying size: low (n = 12), moderate (n = 48) and high (n = 192) by taking samples (with replacement) from the empirical data. Sampling was stratified to ensure balance across all three grazing regimes. The model was then fit to all three datasets using the same prior mean and precisions that were explored in the simulation study and specified in Table 4.
The results from the resampling exercise provided similar conclusions to those presented in Simulation study section. With the introduction of an indirect prior we observed that as the sample size increased from 12 to 192, the credible intervals around the posterior estimates for the low grazing parameter did not change considerably (Fig. 4) and were more in line with the estimate produced in Martin et al. (2005). Credible intervals narrowed with the incorporation of a direct prior suggesting that the estimates are strongly influenced by a directly specified prior. This is exacerbated further when a mean of ± 10 was explored with varying precision of 0.5, 5 and 50 (Fig. 5).
Figure 4. Posterior estimates from fitting the Bayesian model to the re-sampled empirical dataset and corresponding 95% credible intervals for the low grazing parameter in the Bayesian model of bird abundance for a sample size of (a) n = 12, (b) n = 48 and (c) n = 192. A prior mean of ± 2 is investigated, where the prior is either indirect (solid line) or direct (dotted line), where I = indirect and D = direct method of incorporating prior into model and Actual is the estimate calculated by Martin et al. (2005). Note, the scale on the y-axis on each plot is different.
Download figure to PowerPoint
Figure 5. Posterior estimates from fitting the Bayesian model to the re-sampled empirical dataset and corresponding 95% credible intervals for the low grazing parameter in the Bayesian model of bird abundance for a sample size of (a) n = 12, (b) n = 48 and (c) n = 192. A prior mean of ± 10 is investigated, where the prior is either indirect (solid line) or direct (dotted line), where I = indirect and D = direct method of incorporating prior into model and Actual is the estimate calculated by Martin et al (2005). Note, the scale on the y-axis on each plot is different.
Download figure to PowerPoint